8  Regression Analysis

8.1 Data Collection

Many datasets provided via the WWW:

  • Excel/CSV files provided by some organisation (Bundesbank, EZB, Statistisches Bundesamt, Eurostat …)
  • Application programming interface (API): Fred Database
  • Data scraping (extract data from a HTML code using R or Python)

CSV (Comma-separated values) is the most common format

Checking data for missing values and errors

Tidy data format (variables in columns, obs. in rows)

Compute descriptive statistics (mean, std.dev, min/max, distribution)

Report sufficient info on the data source (for replication)

8.2 Data Preparation

Assess the quality of the data source

Transform text into numerical values (dummy variables)

Plausibility checks / descriptive statistics

data set may contain missing values (‘NA’, dots, blank)

few NA: just ignore them (the row will be dropped)

when many observations lost: imputation (replace NA by estimated values)

a) Multiple Imputation: Assume that xk,tx_{k,t} is missing. For available observations run the regression

xk,t=γ0+j=1k1γjxj,t+ϵi x_{k,t} = \gamma_0 + \sum_{j=1}^{k-1} \gamma_j x_{j,t} + \epsilon_i \Rightarrow replace the missing values by x^k,t\hat{x}_{k,t}.

For missing values in more regressors: iterative approach

MaxLike approach available for efficient imputation

8.3 OLS estimator

OLS: Ordinary least-square estimator

b=argminβ{(yXβ)(yXβ)} b = \underset{\beta}{\text{argmin}} \left\{ (y - X\beta)^\prime (y - X\beta) \right\}

yields the least-squares estimator:

b=(XX)1Xy b = \color{red} {(X^\prime X)^{-1} X^\prime y}


Unbiased estimator for σ2\sigma^2: (note that Xe=0X' e = 0)

s2=1NK(yXb)(yXb) s^2 = \frac{1}{N - K} (y - Xb)^\prime (y - Xb)

Maximum-Likelihood (ML) estimator

Log-likelihood function assuming normal distribution:

(β,σ2)=lnL(β,σ2)=ln[i=1Nf(ui)]=N2ln2πN2lnσ212σ2(yXβ)(yXβ)\begin{align*} \ell(\beta, \sigma^2) &= \ln L(\beta, \sigma^2) = \ln \left[ \prod_{i=1}^{N} f(u_i) \right] \\ &= -\frac{N}{2} \ln 2\pi -\frac{N}{2} \ln \sigma^2 - \frac{1}{2\sigma^2} \color{blue}{(y - X\beta)^\prime (y - X\beta)} \end{align*}

ML and OLS of β\beta are identical under normality

ML estimator for σ2\sigma^2:

σ~2=1N(yXb)(yXb) \tilde{\sigma}^2 = \frac{1}{N} (y - Xb)^\prime (y - Xb)

Goodness of fit:

R2=ESSTSS=1SSRTSS=1eeyyNyˉ2=rxy2 \color{blue}{R^2} \color{black}{ = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS} =} \quad \color{blue}{1 - \frac{e^\prime e}{y^\prime y - N\bar{y}^2}} \quad \color{black}{=} \quad \color{red}{r^2_{xy}}


adjusted R2R^2:

Rˉ2=1ee/(NK)(yyNyˉ2)/(N1) \bar{R}^2 = 1 - \frac{e^\prime e/(N - K)} {(y^\prime y - N\bar{y}^2)/(N - 1)}

8.4 Properties of the OLS estimator

a) Expectation \quad [note that b=β+(XX)1Xuestimation errorb = \color{red}{\beta} \color{black}{+ \underbrace{(X'X)^{-1}X'u}_{\color{blue}{\text{estimation error}}}}]

E(b)=βE(s2)=σ2E(σ~2)=σ2(NK)/N\begin{align*} & \color{blue}{E(b) = \beta} & \\ & E(s^2) = \sigma^2 & \\ & E(\tilde{\sigma}^2) = \sigma^2 (N - K)/N & \end{align*}


b) Distribution \quad assuming uN(0,σ2IN)u \sim \mathcal{N}(0, \sigma^2 I_N)

bN(β,Σb),Σb=σ2(XX)1 \color{blue}{b \sim \mathcal{N}(\beta, \Sigma_b)}\color{black}{, \quad \Sigma_b = \sigma^2 (X'X)^{-1}} (NK)σ2s2χNK2 \frac{(N-K)}{\sigma^2}s^2 \sim \chi^2_{N-K}

c) Efficiency

bb is BLUE

under normality: bb and s2s^2 are MVUE

8.5 Testing Hypotheses

Significance level or size of a test (Type I error)

P(tkcα/2β=β0)=α P(|t_k| \geq c_{\alpha/2} | \color{red}{\beta = \beta_0}\color{black}{) = \alpha^*}

where α\color{red}{\alpha} is the nominal: size and α\color{blue}{\alpha^*} is the actual size

a test is unbiased (controls the size) if α=α\alpha^* = \alpha

a test is asymptotically valid if αα\alpha^* \rightarrow \alpha for NN \rightarrow \infty

1 - type II error or power of the test: P(tkcα/2β=β1)=π(β1) P(|t_k| \geq c_{\alpha/2} | \color{red}{\beta = \beta^1}\color{black}{) = \pi(\beta^1)}

a test is consistent if

π(β1)1for allβ1β0 \pi(\beta^1) \rightarrow 1 \quad \text{for all} \quad \beta^1 \neq \beta_0

The conventional significance level is α=0.05\color{red}{\alpha = 0.05} for a moderate sample size (N[50,500]N \in [50, 500], say)


a test is uniform most powerful (UMP) if π(β)π(β)for allββ0 \color{red}{\pi(\beta) \geq \pi^*(\beta)} \quad \color{black}{\text{for all} \quad \beta \neq \beta^0} where π(β)\pi^*(\beta) denotes the power function of any other unbiased test statistic.

\Rightarrow The one-sided t-test is UMP but in many cases there does not exist a UMP test.

The pp-value (or marginal significance level) is defined as

p-value=P(tktˉkβ=β0)=1F0(tk) \text{p-value} = P(t_k \geq \bar{t}_k | \beta = \beta^0) = 1 - F_0(t_k)

that is, the probability to observe a larger value of the observed statistic tˉk\bar{t}_k .

Under the null hypothesis the pp-value is uniformly distributed on [0, 1]. Since it is a random variable, it is NOT a probability (that the null hypothesis is correct).


Testing general linear hypotheses on β\beta

JJ linear hypotheses on β\beta represented by

H0:Rβ=q,J×1 H_0 : \quad \color{blue}{R\beta = q}\color{black}{, \quad J \times 1}

Wald statistic RbqN(0,σ2R(XX)1R) Rb - q \sim \mathcal{N}\left(0, \sigma^2 R(X'X)^{-1}R' \right)

if σ2\sigma^2 is known:

1σ2(Rbq)[R(XX)1R]1(Rbq)χJ2 \frac{1}{\sigma^2} (Rb - q)' [R(X'X)^{-1}R']^{-1} (Rb - q) \sim \chi^2_J

if σ2\sigma^2 is replaces by s2s^2:

F=1Js2(Rbq)[R(XX)1R]1(Rbq)=NKJ  (ereree)eeχJ2/JχNK2/(NK)FNKJ\begin{align*} F &= \frac{1}{Js^2} (Rb - q)' [R(X'X)^{-1}R']^{-1} (Rb - q) = \frac{N - K}{J}\; \color{blue}{\frac{(e_r'e_r - e'e)}{e'e}} \\ &\sim \frac{\chi^2_J/J}{\chi^2_{N-K}/(N - K)} \equiv \color{red}{F^J_{N-K}} \end{align*}


Alternatives to the F statistic

Generalized LR test: GLR=2((θ^)(θr^))GLR = 2 \left( \ell(\hat{\theta}) - \ell(\hat{\theta_r}) \right) =N(logererlogee)χJ2= N (\log e'_r e_r - \log e'e) \sim \chi^2_J

\Rightarrow first order Taylor expansion yields the Wald/F statistic

LM (score) test: Define the “’score vector” as: s(θr^)=logL(θ)θθ=θr^=1σ^r2Xer s(\hat{\theta_r}) = \left. \frac{\partial \log L(\theta)}{\partial \theta} \right|_{\theta=\hat{\theta_r}} = \frac{1}{\hat{\sigma}^2_r} X' e_r

The LM test statistic is given by LM=s(θr^)I(θr^)1s(θr^)χJ2 \text{LM} = s(\hat{\theta_r})' I(\hat{\theta_r})^{-1} s(\hat{\theta_r}) \sim \chi^2_J

where I(θr^)I(\hat{\theta_r}) is some estimate of the information matrix

In the regression the LM statistic can be obtained from testing γ=0\gamma = 0 the auxiliary regression 1=γsi(θr^)+νi 1 = \gamma' s_i(\hat{\theta_r}) + \nu_i \Rightarrow uncentered R2R^2: Ru2=sˉ(sisi)1sˉR^2_u = \bar{s}' (\sum s_i s'_i)^{-1} \bar{s}. NRu2χJ2N \cdot R^2_u \sim \chi^2_J



8.5.1 Specification tests


a) Test for Heteroskedasticity (Breusch-Pagan / Koenker)

variance function: σi2=α0+ziα\sigma^2_i = \alpha_0 + \color{red}{z'_i \alpha}

since E(u^i2)σ2E(\hat{u}^2_i) \approx \sigma^2 estimate the regression u^i2=α0+ziα+νi \hat{u}^2_i = \alpha_0 + z'_i \alpha + \nu_i F\Rightarrow F or LMLM test statistic for H0H_0: α=0\color{red}{\alpha = 0}

in practice zi=xiz_i = x_i but also cross-products and squares of the regressors (White test)

robust (White) standard errors: replace invalid formula Var(b)=σ2(XX)1Var(b) = \sigma^2(X'X)^{-1} by the estimator: Var^(b)=(XX)1(i=1nu^i2xixi)(XX)1 \widehat{Var}(b) = (X'X)^{-1} \left( \sum_{i=1}^{n} \color{red}{\hat{u}^2_i}\color{black}{ x_i x'_i} \right) (X'X)^{-1}

b) Tests for Autocorrelation

(i) Durbin-Watson-Test: dw=t=2N(u^tu^t1)2t=1Nu^t22(1ρ^) dw = \frac{ \sum_{t=2}^{N} (\hat{u}_t - \hat{u}_{t-1})^2}{\sum_{t=1}^{N} \hat{u}^2_t} \approx 2(1 - \color{red}{\hat{\rho}}\color{black}{)} Problem: Distribution depends on XX \Rightarrow uncertainty range

(ii) Breusch-Godfrey Test: ut=ρ1ut1++ρmutm+vtu_t = \color{blue}{\rho_1 u_{t-1} + \dots + \rho_m u_{t-m}} \color{black}{+ v_t}

replace utu_t by u^t\hat{u}_t and include xtx_t to control for the estimation error in utu_t and testing H0H_0: ρ1==ρm=0\color{red}{\rho_1 = \dots = \rho_m = 0}

(iii) Box-Pierce Test: Qm=Tj=1mρ^j2aχm2 Q_m = T \sum_{j=1}^{m} \color{red}{\hat{\rho}_j}\color{black}{^2 \stackrel{a}{\sim} \chi^2_m} test of autocorrelation up to lag order mm


HAC standard errors:

Heteroskedasticity and Autocorrelation Consistent standard errors (Newey/West 1987)

standard errors that account for autocorrelation up to lag hh (truncation lag)

“Rule of thumb” for choosing HH (e.g. Eviews/Gretl) h=int[4(T/100)2/9] h = int[4 (T/100)^{2/9}]


Relationship between autocorrelation and dynamic models:

Inserting ut=ρut1+vtu_t = \rho u_{t-1} + v_t yields

yi=ρyt1+βxiρβγxt1+vi y_i = \rho y_{t-1} + \beta' x_i - \underbrace{\rho \beta'}_{\gamma} x_{t-1} + v_i \Rightarrow Common factor restriction: γ=βρ\gamma = -\beta\rho


Test for normality

The asymptotic properties of the OLS estimator do not depend on the validity of the normality assumption

Deviations from the normal distribution only relevant in very small samples

Outliers may be modeled by mixing distributions

Tests for normality are very sensitive against outliers

Under the null hypothesis E(ui3)=0E(u^3_i) = 0 and E(ui4)=3σ4E(u^4_i) = 3\sigma^4

Jarque-Bera test statistic: JB=n[16m^32+124(m^43)2]dχ22 JB = n \left[ \color{blue}{\frac{1}{6} \hat{m}_3^2} \color{black}{ + } \color{red}{\frac{1}{24} (\hat{m}_4 - 3)^2\color{black}{} } \right] \stackrel{d}{\to} \chi^2_2

where m^3=1Tσ^3t=1Tu^i3m^4=1Tσ^4t=1Tu^i4 \hat{m}_3 = \frac{1}{T \hat{\sigma}^3} \sum_{t=1}^{T} \hat{u}^3_i \quad \quad \hat{m}_4 = \frac{1}{T\hat{\sigma}^4} \sum_{t=1}^{T} \hat{u}^4_i

Other tests: χ2\chi^2 and Kolmogorov-Smirnov Test



8.6 Nonlinear regression models

a) Polynomial regression

including squares, cubic etc. transformations of the regressors: Yi=β0+β1Xi+β2Xi2++βpXip+ui Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \dots + \beta_p X_i^p + u_i

where pp is the degree of the polynomial (typically p=2p = 2)

Interpretation (for p=2p = 2)

YX=β1+2β2XΔY(β1+2β2X)ΔXexact: ΔY=β1ΔX+β2(X+ΔX)2β2X2=(β1+2β2X)ΔX+β2(ΔX)2\begin{align*} \frac{\partial Y}{\partial X} &= \beta_1 + 2\beta_2X \\ \Rightarrow \Delta Y &\approx (\beta_1 + 2\beta_2X) \color{red}{\Delta X} \\ \text{exact: } \Delta Y &= \beta_1\Delta X + \beta_2(X + \Delta X)^2 - \beta_2X^2 \\ &= (\beta_1 + 2\beta_2X) \color{red}{\Delta X} \color{black}{+ \beta_2(\Delta X)^2} \end{align*}

\Rightarrow the effect on YY depends on the level of XX

for small changes in XX the derivative provides a good approximation


Computing standard errors for the nonlinear effect:

Method 1:

s.e.(ΔY^)=var(b1)+4X2var(b2)+8Xcov(b1,b2)=ΔY^/F\begin{align*} \text{s.e.}\left( \Delta \hat{Y} \right) &= \sqrt{ \text{var}(b_1) + 4X^2 \text{var}(b_2) + 8X \text{cov}(b_1, b_2) } \\ &= |\Delta \hat{Y}| / \sqrt{F} \end{align*}

where FF is the FF statistic for the test E(ΔYi^)=β1+2Xβ2=0E(\Delta \hat{Y_i}) = \beta_1 + 2X\beta_2 = 0


Method 2:

Yi=β0+(β1+2Xβ2)β1Xi+β2(12XXi)Xi2Xi+ui Y_i = \beta_0 + \underbrace{(\beta_1 + 2X\beta_2)}_{\beta^*_1} X_i + \beta_2 \underbrace{ \left(1 - 2\frac{X}X_i\right)X^2_i}_{X^*_i} + u_i

Regression Yi=β0+β1Xi+β2Xi+uiY_i = \beta_0 + \beta^*_1 X_i + \beta^*_2 X^*_i + u_i and t-test of β1=0\beta^*_1 = 0

Confidence interval for the effect are obtained as ΔY^±zα/2s.e.(ΔY^)\Delta \hat{Y} \pm z_{\alpha/2} \cdot s.e.(\Delta \hat{Y}) or b1±s.e.(b1)b^*_1 \pm \text{s.e.}(b^*_1)


Logarithmic transformation

Three possible specifications:

log-linear: log(Yi)=β0+β1Xi+uilinear-log: Yi=β0+β1log(Xi)+uilog-log: log(Yi)=β0+β1log(Xi)+ui\begin{align*} \text{log-linear: } & & \color{blue}{\log} \color{black}{(Y_i)} & = \beta_0 + \beta_1X_i + u_i \\ \text{linear-log: } & & Y_i & = \beta_0 + \beta_1\color{blue}{\log} \color{black}{(X_i)} + u_i \\ \text{log-log: } & & \color{blue}{\log}\color{black}{(Y_i)} & = \beta_0 + \beta_1\color{blue}{\log}\color{black}{(X_i)} + u_i \end{align*}

Note that in the log-linear model

β1=dlog(Y)dX=1YouterdYdXinner=dY/YdX \beta_1 = \frac{d \log(Y)}{d X} = \underbrace{\frac{1}{Y}}_{outer} \cdot \underbrace{\frac{d Y}{d X}}_{inner} = \frac{d Y/Y}{d X} where dY/YdY/Y indicates the relative change

In a similar manner it can be shown that for the log-log model β1=(dY/Y)/(dX/X)\beta_1 = (dY/Y)/(dX/X) is the elasticity

Note that the derivative refers to a small change. Exact:

Y1Y0Y0=eβ1ΔX1 \frac{Y_1 - Y_0}{Y_0} = e^{\beta_1 \Delta X} - 1

where log(Y0)=β0+β1Xlog(Y_0) = \beta_0 + \beta_1X and log(Y1)=β0+β1(X+ΔX)log(Y_1) = \beta_0 + \beta_1(X + \Delta X).

For small ΔX\Delta X we have (Y1Y0)/Y0β1ΔX(Y_1-Y_0)/Y_0 \approx \beta_1\Delta X


Interaction effects

Interaction terms are products of regressors:

Yi=β0+β1X1i+β2X2i+β3(X1i×X2i)+ui Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i where X1i,X2iX_{1i}, X_{2i} may be discrete or continuous

Note that we can also write the model with interaction term as

Yi=β0+β1X1i+(β2+β3X1i)effect depends on X1iX2i+ui Y_i = \beta_0 + \beta_1 X_{1i} + \underbrace{ \left( \color{red}{\beta_2 + \beta_3 X_{1i}} \right)}_{\text{effect depends on } X_{1i}} X_{2i} + u_i

If X2iX_{2i} is discrete (dummy), then the coefficient is different for X2i=1X_{2i} = 1 and X2i=0X_{2i} = 0

Standard errors also depend on X2iX_{2i}:

Yi=β0+β1X1i+β2X2i+β3(X1iX1i)X2i+ui Y_i = \beta_0 + \beta_1 X_{1i} + \color{red}{\beta^*_2}\color{black}{ X_{2i} } + \beta_3 \color{blue}{(X_{1i} - \overline{X}_{1i}) X_{2i}} \color{black}{ + u_i} where β2=β2+β3X1i\beta^*_2 = \beta_2 + \beta_3 \overline{X}_{1i} and X1i\overline{X}_{1i} is a fixed value of X1iX_{1i}.


Nonlinear least-squares (NLS)

Assume a nonlinear relationship between YiY_i and XiX_i where the parameters enter nonlinearly

Yi=f(Xi,β)+ui Y_i = f(X_i,\beta) + u_i

Example:

f(Xi,β)=β1+β2Xiβ3+ui f(X_i, \beta) = \beta_1 + \beta_2 X^{\color{red}{\beta_3} }_i + u_i

Assuming i.i.d. normally distributed errors, the maximum likelihood principle results in minimizing the sum of squared residuals:

SSR(β)=i=1n(yif(Xi,β))2 SSR(\beta) = \sum_{i=1}^{n} \left( y_i - f(X_i, \beta) \right)^2 The SSR can be minimized by using iterative algorithms (Gauss-Newton method)

The Gauss-Newton method requires the first derivative of the function f(Xi,β)f(X_i,\beta) with respect to β\beta.