8  Regression Analysis

8.1 Data Collection

Many datasets provided via the WWW:

  • Excel/CSV files provided by some organisation (Bundesbank, EZB, Statistisches Bundesamt, Eurostat …)
  • Application programming interface (API): Fred Database
  • Data scraping (extract data from a HTML code using R or Python)

CSV (Comma-separated values) is the most common format

Checking data for missing values and errors

Tidy data format (variables in columns, obs. in rows)

Compute descriptive statistics (mean, std.dev, min/max, distribution)

Report sufficient info on the data source (for replication)

8.2 Data Preparation

Assess the quality of the data source

Transform text into numerical values (dummy variables)

Plausibility checks / descriptive statistics

data set may contain missing values (‘NA’, dots, blank)

few NA: just ignore them (the row will be dropped)

when many observations lost: imputation (replace NA by estimated values)

a) Multiple Imputation: Assume that x_{k,t} is missing. For available observations run the regression

x_{k,t} = \gamma_0 + \sum_{j=1}^{k-1} \gamma_j x_{j,t} + \epsilon_i \Rightarrow replace the missing values by \hat{x}_{k,t}.

For missing values in more regressors: iterative approach

MaxLike approach available for efficient imputation

8.3 OLS estimator

OLS: Ordinary least-square estimator

b = \underset{\beta}{\text{argmin}} \left\{ (y - X\beta)^\prime (y - X\beta) \right\}

yields the least-squares estimator:

b = \color{red} {(X^\prime X)^{-1} X^\prime y}


Unbiased estimator for \sigma^2: (note that X' e = 0)

s^2 = \frac{1}{N - K} (y - Xb)^\prime (y - Xb)

Maximum-Likelihood (ML) estimator

Log-likelihood function assuming normal distribution:

\begin{align*} \ell(\beta, \sigma^2) &= \ln L(\beta, \sigma^2) = \ln \left[ \prod_{i=1}^{N} f(u_i) \right] \\ &= -\frac{N}{2} \ln 2\pi -\frac{N}{2} \ln \sigma^2 - \frac{1}{2\sigma^2} \color{blue}{(y - X\beta)^\prime (y - X\beta)} \end{align*}

ML and OLS of \beta are identical under normality

ML estimator for \sigma^2:

\tilde{\sigma}^2 = \frac{1}{N} (y - Xb)^\prime (y - Xb)

Goodness of fit:

\color{blue}{R^2} \color{black}{ = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS} =} \quad \color{blue}{1 - \frac{e^\prime e}{y^\prime y - N\bar{y}^2}} \quad \color{black}{=} \quad \color{red}{r^2_{xy}}


adjusted R^2:

\bar{R}^2 = 1 - \frac{e^\prime e/(N - K)} {(y^\prime y - N\bar{y}^2)/(N - 1)}

8.4 Properties of the OLS estimator

a) Expectation \quad [note that b = \color{red}{\beta} \color{black}{+ \underbrace{(X'X)^{-1}X'u}_{\color{blue}{\text{estimation error}}}}]

\begin{align*} & \color{blue}{E(b) = \beta} & \\ & E(s^2) = \sigma^2 & \\ & E(\tilde{\sigma}^2) = \sigma^2 (N - K)/N & \end{align*}


b) Distribution \quad assuming u \sim \mathcal{N}(0, \sigma^2 I_N)

\color{blue}{b \sim \mathcal{N}(\beta, \Sigma_b)}\color{black}{, \quad \Sigma_b = \sigma^2 (X'X)^{-1}} \frac{(N-K)}{\sigma^2}s^2 \sim \chi^2_{N-K}

c) Efficiency

b is BLUE

under normality: b and s^2 are MVUE

8.5 Testing Hypotheses

Significance level or size of a test (Type I error)

P(|t_k| \geq c_{\alpha/2} | \color{red}{\beta = \beta_0}\color{black}{) = \alpha^*}

where \color{red}{\alpha} is the nominal: size and \color{blue}{\alpha^*} is the actual size

a test is unbiased (controls the size) if \alpha^* = \alpha

a test is asymptotically valid if \alpha^* \rightarrow \alpha for N \rightarrow \infty

1 - type II error or power of the test: P(|t_k| \geq c_{\alpha/2} | \color{red}{\beta = \beta^1}\color{black}{) = \pi(\beta^1)}

a test is consistent if

\pi(\beta^1) \rightarrow 1 \quad \text{for all} \quad \beta^1 \neq \beta_0

The conventional significance level is \color{red}{\alpha = 0.05} for a moderate sample size (N \in [50, 500], say)


a test is uniform most powerful (UMP) if \color{red}{\pi(\beta) \geq \pi^*(\beta)} \quad \color{black}{\text{for all} \quad \beta \neq \beta^0} where \pi^*(\beta) denotes the power function of any other unbiased test statistic.

\Rightarrow The one-sided t-test is UMP but in many cases there does not exist a UMP test.

The p-value (or marginal significance level) is defined as

\text{p-value} = P(t_k \geq \bar{t}_k | \beta = \beta^0) = 1 - F_0(t_k)

that is, the probability to observe a larger value of the observed statistic \bar{t}_k .

Under the null hypothesis the p-value is uniformly distributed on [0, 1]. Since it is a random variable, it is NOT a probability (that the null hypothesis is correct).


Testing general linear hypotheses on \beta

J linear hypotheses on \beta represented by

H_0 : \quad \color{blue}{R\beta = q}\color{black}{, \quad J \times 1}

Wald statistic Rb - q \sim \mathcal{N}\left(0, \sigma^2 R(X'X)^{-1}R' \right)

if \sigma^2 is known:

\frac{1}{\sigma^2} (Rb - q)' [R(X'X)^{-1}R']^{-1} (Rb - q) \sim \chi^2_J

if \sigma^2 is replaces by s^2:

\begin{align*} F &= \frac{1}{Js^2} (Rb - q)' [R(X'X)^{-1}R']^{-1} (Rb - q) = \frac{N - K}{J}\; \color{blue}{\frac{(e_r'e_r - e'e)}{e'e}} \\ &\sim \frac{\chi^2_J/J}{\chi^2_{N-K}/(N - K)} \equiv \color{red}{F^J_{N-K}} \end{align*}


Alternatives to the F statistic

Generalized LR test: GLR = 2 \left( \ell(\hat{\theta}) - \ell(\hat{\theta_r}) \right) = N (\log e'_r e_r - \log e'e) \sim \chi^2_J

\Rightarrow first order Taylor expansion yields the Wald/F statistic

LM (score) test: Define the “’score vector” as: s(\hat{\theta_r}) = \left. \frac{\partial \log L(\theta)}{\partial \theta} \right|_{\theta=\hat{\theta_r}} = \frac{1}{\hat{\sigma}^2_r} X' e_r

The LM test statistic is given by \text{LM} = s(\hat{\theta_r})' I(\hat{\theta_r})^{-1} s(\hat{\theta_r}) \sim \chi^2_J

where I(\hat{\theta_r}) is some estimate of the information matrix

In the regression the LM statistic can be obtained from testing \gamma = 0 the auxiliary regression 1 = \gamma' s_i(\hat{\theta_r}) + \nu_i \Rightarrow uncentered R^2: R^2_u = \bar{s}' (\sum s_i s'_i)^{-1} \bar{s}. N \cdot R^2_u \sim \chi^2_J



8.5.1 Specification tests


a) Test for Heteroskedasticity (Breusch-Pagan / Koenker)

variance function: \sigma^2_i = \alpha_0 + \color{red}{z'_i \alpha}

since E(\hat{u}^2_i) \approx \sigma^2 estimate the regression \hat{u}^2_i = \alpha_0 + z'_i \alpha + \nu_i \Rightarrow F or LM test statistic for H_0: \color{red}{\alpha = 0}

in practice z_i = x_i but also cross-products and squares of the regressors (White test)

robust (White) standard errors: replace invalid formula Var(b) = \sigma^2(X'X)^{-1} by the estimator: \widehat{Var}(b) = (X'X)^{-1} \left( \sum_{i=1}^{n} \color{red}{\hat{u}^2_i}\color{black}{ x_i x'_i} \right) (X'X)^{-1}

b) Tests for Autocorrelation

(i) Durbin-Watson-Test: dw = \frac{ \sum_{t=2}^{N} (\hat{u}_t - \hat{u}_{t-1})^2}{\sum_{t=1}^{N} \hat{u}^2_t} \approx 2(1 - \color{red}{\hat{\rho}}\color{black}{)} Problem: Distribution depends on X \Rightarrow uncertainty range

(ii) Breusch-Godfrey Test: u_t = \color{blue}{\rho_1 u_{t-1} + \dots + \rho_m u_{t-m}} \color{black}{+ v_t}

replace u_t by \hat{u}_t and include x_t to control for the estimation error in u_t and testing H_0: \color{red}{\rho_1 = \dots = \rho_m = 0}

(iii) Box-Pierce Test: Q_m = T \sum_{j=1}^{m} \color{red}{\hat{\rho}_j}\color{black}{^2 \stackrel{a}{\sim} \chi^2_m} test of autocorrelation up to lag order m


HAC standard errors:

Heteroskedasticity and Autocorrelation Consistent standard errors (Newey/West 1987)

standard errors that account for autocorrelation up to lag h (truncation lag)

“Rule of thumb” for choosing H (e.g. Eviews/Gretl) h = int[4 (T/100)^{2/9}]


Relationship between autocorrelation and dynamic models:

Inserting u_t = \rho u_{t-1} + v_t yields

y_i = \rho y_{t-1} + \beta' x_i - \underbrace{\rho \beta'}_{\gamma} x_{t-1} + v_i \Rightarrow Common factor restriction: \gamma = -\beta\rho


Test for normality

The asymptotic properties of the OLS estimator do not depend on the validity of the normality assumption

Deviations from the normal distribution only relevant in very small samples

Outliers may be modeled by mixing distributions

Tests for normality are very sensitive against outliers

Under the null hypothesis E(u^3_i) = 0 and E(u^4_i) = 3\sigma^4

Jarque-Bera test statistic: JB = n \left[ \color{blue}{\frac{1}{6} \hat{m}_3^2} \color{black}{ + } \color{red}{\frac{1}{24} (\hat{m}_4 - 3)^2\color{black}{} } \right] \stackrel{d}{\to} \chi^2_2

where \hat{m}_3 = \frac{1}{T \hat{\sigma}^3} \sum_{t=1}^{T} \hat{u}^3_i \quad \quad \hat{m}_4 = \frac{1}{T\hat{\sigma}^4} \sum_{t=1}^{T} \hat{u}^4_i

Other tests: \chi^2 and Kolmogorov-Smirnov Test



8.6 Nonlinear regression models

a) Polynomial regression

including squares, cubic etc. transformations of the regressors: Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \dots + \beta_p X_i^p + u_i

where p is the degree of the polynomial (typically p = 2)

Interpretation (for p = 2)

\begin{align*} \frac{\partial Y}{\partial X} &= \beta_1 + 2\beta_2X \\ \Rightarrow \Delta Y &\approx (\beta_1 + 2\beta_2X) \color{red}{\Delta X} \\ \text{exact: } \Delta Y &= \beta_1\Delta X + \beta_2(X + \Delta X)^2 - \beta_2X^2 \\ &= (\beta_1 + 2\beta_2X) \color{red}{\Delta X} \color{black}{+ \beta_2(\Delta X)^2} \end{align*}

\Rightarrow the effect on Y depends on the level of X

for small changes in X the derivative provides a good approximation


Computing standard errors for the nonlinear effect:

Method 1:

\begin{align*} \text{s.e.}\left( \Delta \hat{Y} \right) &= \sqrt{ \text{var}(b_1) + 4X^2 \text{var}(b_2) + 8X \text{cov}(b_1, b_2) } \\ &= |\Delta \hat{Y}| / \sqrt{F} \end{align*}

where F is the F statistic for the test E(\Delta \hat{Y_i}) = \beta_1 + 2X\beta_2 = 0


Method 2:

Y_i = \beta_0 + \underbrace{(\beta_1 + 2X\beta_2)}_{\beta^*_1} X_i + \beta_2 \underbrace{ \left(1 - 2\frac{X}X_i\right)X^2_i}_{X^*_i} + u_i

Regression Y_i = \beta_0 + \beta^*_1 X_i + \beta^*_2 X^*_i + u_i and t-test of \beta^*_1 = 0

Confidence interval for the effect are obtained as \Delta \hat{Y} \pm z_{\alpha/2} \cdot s.e.(\Delta \hat{Y}) or b^*_1 \pm \text{s.e.}(b^*_1)


Logarithmic transformation

Three possible specifications:

\begin{align*} \text{log-linear: } & & \color{blue}{\log} \color{black}{(Y_i)} & = \beta_0 + \beta_1X_i + u_i \\ \text{linear-log: } & & Y_i & = \beta_0 + \beta_1\color{blue}{\log} \color{black}{(X_i)} + u_i \\ \text{log-log: } & & \color{blue}{\log}\color{black}{(Y_i)} & = \beta_0 + \beta_1\color{blue}{\log}\color{black}{(X_i)} + u_i \end{align*}

Note that in the log-linear model

\beta_1 = \frac{d \log(Y)}{d X} = \underbrace{\frac{1}{Y}}_{outer} \cdot \underbrace{\frac{d Y}{d X}}_{inner} = \frac{d Y/Y}{d X} where dY/Y indicates the relative change

In a similar manner it can be shown that for the log-log model \beta_1 = (dY/Y)/(dX/X) is the elasticity

Note that the derivative refers to a small change. Exact:

\frac{Y_1 - Y_0}{Y_0} = e^{\beta_1 \Delta X} - 1

where log(Y_0) = \beta_0 + \beta_1X and log(Y_1) = \beta_0 + \beta_1(X + \Delta X).

For small \Delta X we have (Y_1-Y_0)/Y_0 \approx \beta_1\Delta X


Interaction effects

Interaction terms are products of regressors:

Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i where X_{1i}, X_{2i} may be discrete or continuous

Note that we can also write the model with interaction term as

Y_i = \beta_0 + \beta_1 X_{1i} + \underbrace{ \left( \color{red}{\beta_2 + \beta_3 X_{1i}} \right)}_{\text{effect depends on } X_{1i}} X_{2i} + u_i

If X_{2i} is discrete (dummy), then the coefficient is different for X_{2i} = 1 and X_{2i} = 0

Standard errors also depend on X_{2i}:

Y_i = \beta_0 + \beta_1 X_{1i} + \color{red}{\beta^*_2}\color{black}{ X_{2i} } + \beta_3 \color{blue}{(X_{1i} - \overline{X}_{1i}) X_{2i}} \color{black}{ + u_i} where \beta^*_2 = \beta_2 + \beta_3 \overline{X}_{1i} and \overline{X}_{1i} is a fixed value of X_{1i}.


Nonlinear least-squares (NLS)

Assume a nonlinear relationship between Y_i and X_i where the parameters enter nonlinearly

Y_i = f(X_i,\beta) + u_i

Example:

f(X_i, \beta) = \beta_1 + \beta_2 X^{\color{red}{\beta_3} }_i + u_i

Assuming i.i.d. normally distributed errors, the maximum likelihood principle results in minimizing the sum of squared residuals:

SSR(\beta) = \sum_{i=1}^{n} \left( y_i - f(X_i, \beta) \right)^2 The SSR can be minimized by using iterative algorithms (Gauss-Newton method)

The Gauss-Newton method requires the first derivative of the function f(X_i,\beta) with respect to \beta.