## Probability and Statistics - 線性迴歸診斷：異質變異 與 自相關 Linear Regression Diagnostics : Heteroscedasticity and Autocorrelation

### 一、線性迴歸模型診斷 (Linear Regression Model Diagnostics)

• 隨機項的(條件)期望值為零：${\rm E}\left(\varepsilon_i\right)=0,$
• 隨機項的變異數皆相同 (homoscedasticity)：${\rm var}\left(\varepsilon_i\right)=\sigma^2<\infty,$
• 隨機項無自我相關 (no autocorrelation) ：${\rm cov}\left(\varepsilon_i,\varepsilon_j\right)=0，i\not=j$

[注意] 關於期望值為零這個條件

The mean of the residuals will always be zero provided that there is a constant term in the regression. Without a constant term,

• $R^2$ (ESS/TSS) could be negative.
• biased slope coefficient estimate.

### 二、異質變異誤差項檢定 (Detection of Heteroscedasticity)

• White 檢定 (最常使用) : 假設誤差項的變異數和自變數或自變數的二次式組合有關。
• Breusch-Pagan(BP)/Godfrey 檢定 (最具一般性) : 假設誤差項的變異數可能與其他變數相關。
• Goldfeld-Quandt 檢定 (較麻煩,且僅適用於橫斷面資料) : 假設誤差項的變異數和某一個變數有關。

#### 1. Breusch-Pagan (BP) /Godfrey 異質性檢定流程

1. 利用 OLS 估計迴歸式 $y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + u$ 並求得 OLS 殘差項的平方 $\hat u^2$。
2. 估計迴歸式 $\hat u^2 = \delta_0 + \delta_1x_1 + \delta_2x_2 + ... + \delta_kx_k + error$，並求出 $R_{\hat u^2}^2$
3. 求出 F 或 LM 統計量及對應的 p 值 (前者用 $F_{k,n-k-1}$ 分配，後者用 $\chi_k^2$ 分配)。若 p 值夠小，亦即，其小於所選定的顯著水準，則拒絕同質性的虛無假設。
• F 統計量可以寫成：$F = \frac{R_{\hat u^2}^2 /k}{ (1-R_{\hat u^2}^2) / (n - k - 1)}$
• LM 統計量可以寫成：$LM = n \times R_{\hat u^2}^2$

#### 2. White 異質性檢定流程

• 用 OLS 估計模型$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + u$。求得 OLS 殘差和配適值。計算殘差平方 $\hat u^2$ 和配適值平方 $\hat y^2$
• 做 $\hat u^2 = \delta_0 + \delta_1\hat y$ 的迴歸。求出迴歸的 R 平方值，$R_{\hat u^2}^2$
• 求出 F 或 LM 統計量，並計算 p 值 (前者用 $F_{k,n-k-1}$ 分配，後者用 $\chi_k^2$ 分配)。

[用心去感覺]  三個獨立變數的white檢定估計式

$\hat u^2 = \delta_0 + \delta_1x_1 + \delta_2x_2 + \delta_3x_3 + \delta_4x_1^2 + \delta_5x_2^2 + \delta_6x_3^2 + \delta_7x_1x_2 + \delta_8x_1x_3 + \delta_9x_2x_3 + error$

#### 3. Goldfeld–Quandt (GQ) 異質性檢定流程

1. Split the total sample of length T into two sub-samples of length T1 and T2.
2. The null hypothesis : $H_0 : \sigma_1^2 = \sigma_2^2$
3. GQ test statistic : GQ = \frac{s_1^2}{s_2^2}

[用心去感覺] GQ 檢定的缺點

the choice of where to split the sample is that usually arbitrary and may crucially affect the outcome of the test.

### 二、自相關檢定 (Detection of Autocorrelation)

#### 1. Durbin-Watson 自相關檢定

If $e_t$ is the residual associated with the observation at time $t$, then the test statistic is

$d = {\sum_{t=2}^T (e_t - e_{t-1})^2 \over {\sum_{t=1}^T e_t^2}}$, where T is the number of observations.

Since $d$ is approximately equal to $2(1 − r)$, where r is the sample autocorrelation of the residuals, $d = 2$ indicates no autocorrelation.

DW has 2 critical values, an upper critical value ($d_u$) and a lower critical value ($d_L$), and there is also an intermediate region where we can neither reject nor not reject $H_0$.

#### 2. Breush-Godfrey LM 檢定

Breush-Godfrey LM 檢定(Serial Correlation LM Test；序列相關 LM 檢定)：運用殘差來檢定是否具有落後 p 期內的自我相關

Consider a linear regression of any form, for example

$Y_t = \alpha_0+ \alpha_1 X_{t,1} + \alpha_2 X_{t,2} + u_t \,$

where the residuals might follow an $AR(p)$ autoregressive scheme, as follows:

$u_t = \rho_1 u_{t-1} + \rho_2 u_{t-2} + \cdots + \rho_p u_{t-p} + \varepsilon_t. \,$

The simple regression model is first fitted by ordinary least squares to obtain a set of sample residuals $\hat{u}_t$.

Breusch and Godfrey proved that, if the following auxiliary regression model is fitted

$\hat{u}_t = \alpha_0 + \alpha_1 X_{t,1} + \alpha_2 X_{t,2} + \rho_1 \hat{u}_{t-1} + \rho_2 \hat{u}_{t-2} + \cdots + \rho_p \hat{u}_{t-p} + \varepsilon_t \,$

and if the usual $R^2$ statistic is calculated for this model, then the following asymptotic approximation can be used for the distribution of the test statistic

$n R^2\,\sim\,\chi^2_p, \,$

when the null hypothesis ${H_0: \lbrace \rho_i = 0 \text{ for all } i \rbrace }$ holds (that is, there is no serial correlation of any order up to p). Here n is the number of data-points available for the second regression, that for $\hat{u}_t$,

$n=T-p, \,$

where $T$ is the number of observations in the basic series. Note that the value of n depends on the number of lags of the error term (p).

