Data Collection
Many datasets provided via the WWW:
Excel/CSV files provided by some organisation (Bundesbank, EZB, Statistisches Bundesamt, Eurostat …)
Application programming interface (API): Fred Database
Data scraping (extract data from a HTML code using R or Python)
CSV (Comma-separated values) is the most common format
Checking data for missing values and errors
Tidy data format (variables in columns, obs. in rows)
Compute descriptive statistics (mean, std.dev, min/max, distribution)
Report sufficient info on the data source (for replication)
Data Preparation
Assess the quality of the data source
Transform text into numerical values (dummy variables)
Plausibility checks / descriptive statistics
data set may contain missing values (‘NA’, dots, blank)
few NA: just ignore them (the row will be dropped)
when many observations lost: imputation (replace NA by estimated values)
a) Multiple Imputation: Assume that x k , t x_{k,t} x k , t is missing. For available observations run the regression
x k , t = γ 0 + ∑ j = 1 k − 1 γ j x j , t + ϵ i
x_{k,t} = \gamma_0 + \sum_{j=1}^{k-1} \gamma_j x_{j,t} + \epsilon_i
x k , t = γ 0 + j = 1 ∑ k − 1 γ j x j , t + ϵ i ⇒ \Rightarrow ⇒ replace the missing values by x ^ k , t \hat{x}_{k,t} x ^ k , t .
For missing values in more regressors: iterative approach
MaxLike approach available for efficient imputation
OLS estimator
OLS: Ordinary least-square estimator
b = argmin β { ( y − X β ) ′ ( y − X β ) }
b = \underset{\beta}{\text{argmin}} \left\{ (y - X\beta)^\prime (y - X\beta) \right\}
b = β argmin { ( y − Xβ ) ′ ( y − Xβ ) }
yields the least-squares estimator:
b = ( X ′ X ) − 1 X ′ y
b = \color{red} {(X^\prime X)^{-1} X^\prime y}
b = ( X ′ X ) − 1 X ′ y
Unbiased estimator for σ 2 \sigma^2 σ 2 : (note that X ′ e = 0 X' e = 0 X ′ e = 0 )
s 2 = 1 N − K ( y − X b ) ′ ( y − X b )
s^2 = \frac{1}{N - K} (y - Xb)^\prime (y - Xb)
s 2 = N − K 1 ( y − X b ) ′ ( y − X b )
Maximum-Likelihood (ML) estimator
Log-likelihood function assuming normal distribution:
ℓ ( β , σ 2 ) = ln L ( β , σ 2 ) = ln [ ∏ i = 1 N f ( u i ) ] = − N 2 ln 2 π − N 2 ln σ 2 − 1 2 σ 2 ( y − X β ) ′ ( y − X β ) \begin{align*}
\ell(\beta, \sigma^2) &= \ln L(\beta, \sigma^2) = \ln \left[ \prod_{i=1}^{N} f(u_i) \right] \\
&= -\frac{N}{2} \ln 2\pi -\frac{N}{2} \ln \sigma^2 - \frac{1}{2\sigma^2} \color{blue}{(y - X\beta)^\prime (y - X\beta)}
\end{align*} ℓ ( β , σ 2 ) = ln L ( β , σ 2 ) = ln [ i = 1 ∏ N f ( u i ) ] = − 2 N ln 2 π − 2 N ln σ 2 − 2 σ 2 1 ( y − Xβ ) ′ ( y − Xβ )
ML and OLS of β \beta β are identical under normality
ML estimator for σ 2 \sigma^2 σ 2 :
σ ~ 2 = 1 N ( y − X b ) ′ ( y − X b )
\tilde{\sigma}^2 = \frac{1}{N} (y - Xb)^\prime (y - Xb)
σ ~ 2 = N 1 ( y − X b ) ′ ( y − X b )
Goodness of fit:
R 2 = E S S T S S = 1 − S S R T S S = 1 − e ′ e y ′ y − N y ˉ 2 = r x y 2
\color{blue}{R^2} \color{black}{ = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS} =} \quad \color{blue}{1 - \frac{e^\prime e}{y^\prime y - N\bar{y}^2}} \quad \color{black}{=} \quad \color{red}{r^2_{xy}}
R 2 = TSS ESS = 1 − TSS SSR = 1 − y ′ y − N y ˉ 2 e ′ e = r x y 2
adjusted R 2 R^2 R 2 :
R ˉ 2 = 1 − e ′ e / ( N − K ) ( y ′ y − N y ˉ 2 ) / ( N − 1 )
\bar{R}^2 = 1 - \frac{e^\prime e/(N - K)} {(y^\prime y - N\bar{y}^2)/(N - 1)}
R ˉ 2 = 1 − ( y ′ y − N y ˉ 2 ) / ( N − 1 ) e ′ e / ( N − K )
Properties of the OLS estimator
a) Expectation \quad [note that b = β + ( X ′ X ) − 1 X ′ u ⏟ estimation error b = \color{red}{\beta} \color{black}{+ \underbrace{(X'X)^{-1}X'u}_{\color{blue}{\text{estimation error}}}} b = β + estimation error ( X ′ X ) − 1 X ′ u ]
E ( b ) = β E ( s 2 ) = σ 2 E ( σ ~ 2 ) = σ 2 ( N − K ) / N \begin{align*}
& \color{blue}{E(b) = \beta} & \\
& E(s^2) = \sigma^2 & \\
& E(\tilde{\sigma}^2) = \sigma^2 (N - K)/N &
\end{align*} E ( b ) = β E ( s 2 ) = σ 2 E ( σ ~ 2 ) = σ 2 ( N − K ) / N
b) Distribution \quad assuming u ∼ N ( 0 , σ 2 I N ) u \sim \mathcal{N}(0, \sigma^2 I_N) u ∼ N ( 0 , σ 2 I N )
b ∼ N ( β , Σ b ) , Σ b = σ 2 ( X ′ X ) − 1
\color{blue}{b \sim \mathcal{N}(\beta, \Sigma_b)}\color{black}{, \quad \Sigma_b = \sigma^2 (X'X)^{-1}}
b ∼ N ( β , Σ b ) , Σ b = σ 2 ( X ′ X ) − 1 ( N − K ) σ 2 s 2 ∼ χ N − K 2
\frac{(N-K)}{\sigma^2}s^2 \sim \chi^2_{N-K}
σ 2 ( N − K ) s 2 ∼ χ N − K 2
c) Efficiency
b b b is BLUE
under normality: b b b and s 2 s^2 s 2 are MVUE
Testing Hypotheses
Significance level or size of a test (Type I error)
P ( ∣ t k ∣ ≥ c α / 2 ∣ β = β 0 ) = α ∗
P(|t_k| \geq c_{\alpha/2} | \color{red}{\beta = \beta_0}\color{black}{) = \alpha^*}
P ( ∣ t k ∣ ≥ c α /2 ∣ β = β 0 ) = α ∗
where α \color{red}{\alpha} α is the nominal: size and α ∗ \color{blue}{\alpha^*} α ∗ is the actual size
a test is unbiased (controls the size) if α ∗ = α \alpha^* = \alpha α ∗ = α
a test is asymptotically valid if α ∗ → α \alpha^* \rightarrow \alpha α ∗ → α for N → ∞ N \rightarrow \infty N → ∞
1 - type II error or power of the test: P ( ∣ t k ∣ ≥ c α / 2 ∣ β = β 1 ) = π ( β 1 )
P(|t_k| \geq c_{\alpha/2} | \color{red}{\beta = \beta^1}\color{black}{) = \pi(\beta^1)}
P ( ∣ t k ∣ ≥ c α /2 ∣ β = β 1 ) = π ( β 1 )
a test is consistent if
π ( β 1 ) → 1 for all β 1 ≠ β 0
\pi(\beta^1) \rightarrow 1 \quad \text{for all} \quad \beta^1 \neq \beta_0
π ( β 1 ) → 1 for all β 1 = β 0
The conventional significance level is α = 0.05 \color{red}{\alpha = 0.05} α = 0.05 for a moderate sample size (N ∈ [ 50 , 500 ] N \in [50, 500] N ∈ [ 50 , 500 ] , say)
a test is uniform most powerful (UMP) if π ( β ) ≥ π ∗ ( β ) for all β ≠ β 0
\color{red}{\pi(\beta) \geq \pi^*(\beta)} \quad \color{black}{\text{for all} \quad \beta \neq \beta^0}
π ( β ) ≥ π ∗ ( β ) for all β = β 0 where π ∗ ( β ) \pi^*(\beta) π ∗ ( β ) denotes the power function of any other unbiased test statistic.
⇒ \Rightarrow ⇒ The one-sided t-test is UMP but in many cases there does not exist a UMP test.
The p p p -value (or marginal significance level) is defined as
p-value = P ( t k ≥ t ˉ k ∣ β = β 0 ) = 1 − F 0 ( t k )
\text{p-value} = P(t_k \geq \bar{t}_k | \beta = \beta^0) = 1 - F_0(t_k)
p-value = P ( t k ≥ t ˉ k ∣ β = β 0 ) = 1 − F 0 ( t k )
that is, the probability to observe a larger value of the observed statistic t ˉ k \bar{t}_k t ˉ k .
Under the null hypothesis the p p p -value is uniformly distributed on [0, 1]. Since it is a random variable, it is NOT a probability (that the null hypothesis is correct).
Testing general linear hypotheses on β \beta β
J J J linear hypotheses on β \beta β represented by
H 0 : R β = q , J × 1
H_0 : \quad \color{blue}{R\beta = q}\color{black}{, \quad J \times 1}
H 0 : Rβ = q , J × 1
Wald statistic R b − q ∼ N ( 0 , σ 2 R ( X ′ X ) − 1 R ′ )
Rb - q \sim \mathcal{N}\left(0, \sigma^2 R(X'X)^{-1}R' \right)
R b − q ∼ N ( 0 , σ 2 R ( X ′ X ) − 1 R ′ )
if σ 2 \sigma^2 σ 2 is known:
1 σ 2 ( R b − q ) ′ [ R ( X ′ X ) − 1 R ′ ] − 1 ( R b − q ) ∼ χ J 2
\frac{1}{\sigma^2} (Rb - q)' [R(X'X)^{-1}R']^{-1} (Rb - q) \sim \chi^2_J
σ 2 1 ( R b − q ) ′ [ R ( X ′ X ) − 1 R ′ ] − 1 ( R b − q ) ∼ χ J 2
if σ 2 \sigma^2 σ 2 is replaces by s 2 s^2 s 2 :
F = 1 J s 2 ( R b − q ) ′ [ R ( X ′ X ) − 1 R ′ ] − 1 ( R b − q ) = N − K J ( e r ′ e r − e ′ e ) e ′ e ∼ χ J 2 / J χ N − K 2 / ( N − K ) ≡ F N − K J \begin{align*}
F &= \frac{1}{Js^2} (Rb - q)' [R(X'X)^{-1}R']^{-1} (Rb - q) = \frac{N - K}{J}\; \color{blue}{\frac{(e_r'e_r - e'e)}{e'e}} \\
&\sim \frac{\chi^2_J/J}{\chi^2_{N-K}/(N - K)} \equiv \color{red}{F^J_{N-K}}
\end{align*} F = J s 2 1 ( R b − q ) ′ [ R ( X ′ X ) − 1 R ′ ] − 1 ( R b − q ) = J N − K e ′ e ( e r ′ e r − e ′ e ) ∼ χ N − K 2 / ( N − K ) χ J 2 / J ≡ F N − K J
Alternatives to the F statistic
Generalized LR test: G L R = 2 ( ℓ ( θ ^ ) − ℓ ( θ r ^ ) ) GLR = 2 \left( \ell(\hat{\theta}) - \ell(\hat{\theta_r}) \right) G L R = 2 ( ℓ ( θ ^ ) − ℓ ( θ r ^ ) ) = N ( log e r ′ e r − log e ′ e ) ∼ χ J 2 = N (\log e'_r e_r - \log e'e) \sim \chi^2_J = N ( log e r ′ e r − log e ′ e ) ∼ χ J 2
⇒ \Rightarrow ⇒ first order Taylor expansion yields the Wald/F statistic
LM (score) test : Define the “’score vector” as: s ( θ r ^ ) = ∂ log L ( θ ) ∂ θ ∣ θ = θ r ^ = 1 σ ^ r 2 X ′ e r
s(\hat{\theta_r}) = \left. \frac{\partial \log L(\theta)}{\partial \theta} \right|_{\theta=\hat{\theta_r}} = \frac{1}{\hat{\sigma}^2_r} X' e_r
s ( θ r ^ ) = ∂ θ ∂ log L ( θ ) ∣ ∣ θ = θ r ^ = σ ^ r 2 1 X ′ e r
The LM test statistic is given by LM = s ( θ r ^ ) ′ I ( θ r ^ ) − 1 s ( θ r ^ ) ∼ χ J 2
\text{LM} = s(\hat{\theta_r})' I(\hat{\theta_r})^{-1} s(\hat{\theta_r}) \sim \chi^2_J
LM = s ( θ r ^ ) ′ I ( θ r ^ ) − 1 s ( θ r ^ ) ∼ χ J 2
where I ( θ r ^ ) I(\hat{\theta_r}) I ( θ r ^ ) is some estimate of the information matrix
In the regression the LM statistic can be obtained from testing γ = 0 \gamma = 0 γ = 0 the auxiliary regression 1 = γ ′ s i ( θ r ^ ) + ν i
1 = \gamma' s_i(\hat{\theta_r}) + \nu_i
1 = γ ′ s i ( θ r ^ ) + ν i ⇒ \Rightarrow ⇒ uncentered R 2 R^2 R 2 : R u 2 = s ˉ ′ ( ∑ s i s i ′ ) − 1 s ˉ R^2_u = \bar{s}' (\sum s_i s'_i)^{-1} \bar{s} R u 2 = s ˉ ′ ( ∑ s i s i ′ ) − 1 s ˉ . N ⋅ R u 2 ∼ χ J 2 N \cdot R^2_u \sim \chi^2_J N ⋅ R u 2 ∼ χ J 2
Specification tests
a) Test for Heteroskedasticity (Breusch-Pagan / Koenker)
variance function: σ i 2 = α 0 + z i ′ α \sigma^2_i = \alpha_0 + \color{red}{z'_i \alpha} σ i 2 = α 0 + z i ′ α
since E ( u ^ i 2 ) ≈ σ 2 E(\hat{u}^2_i) \approx \sigma^2 E ( u ^ i 2 ) ≈ σ 2 estimate the regression u ^ i 2 = α 0 + z i ′ α + ν i
\hat{u}^2_i = \alpha_0 + z'_i \alpha + \nu_i
u ^ i 2 = α 0 + z i ′ α + ν i ⇒ F \Rightarrow F ⇒ F or L M LM L M test statistic for H 0 H_0 H 0 : α = 0 \color{red}{\alpha = 0} α = 0
in practice z i = x i z_i = x_i z i = x i but also cross-products and squares of the regressors (White test)
robust (White) standard errors: replace invalid formula V a r ( b ) = σ 2 ( X ′ X ) − 1 Var(b) = \sigma^2(X'X)^{-1} Va r ( b ) = σ 2 ( X ′ X ) − 1 by the estimator: V a r ^ ( b ) = ( X ′ X ) − 1 ( ∑ i = 1 n u ^ i 2 x i x i ′ ) ( X ′ X ) − 1
\widehat{Var}(b) = (X'X)^{-1} \left( \sum_{i=1}^{n} \color{red}{\hat{u}^2_i}\color{black}{ x_i x'_i} \right) (X'X)^{-1}
Va r ( b ) = ( X ′ X ) − 1 ( i = 1 ∑ n u ^ i 2 x i x i ′ ) ( X ′ X ) − 1
b) Tests for Autocorrelation
(i) Durbin-Watson-Test: d w = ∑ t = 2 N ( u ^ t − u ^ t − 1 ) 2 ∑ t = 1 N u ^ t 2 ≈ 2 ( 1 − ρ ^ )
dw = \frac{ \sum_{t=2}^{N} (\hat{u}_t - \hat{u}_{t-1})^2}{\sum_{t=1}^{N} \hat{u}^2_t} \approx 2(1 - \color{red}{\hat{\rho}}\color{black}{)}
d w = ∑ t = 1 N u ^ t 2 ∑ t = 2 N ( u ^ t − u ^ t − 1 ) 2 ≈ 2 ( 1 − ρ ^ ) Problem: Distribution depends on X ⇒ X \Rightarrow X ⇒ uncertainty range
(ii) Breusch-Godfrey Test: u t = ρ 1 u t − 1 + ⋯ + ρ m u t − m + v t u_t = \color{blue}{\rho_1 u_{t-1} + \dots + \rho_m u_{t-m}} \color{black}{+ v_t} u t = ρ 1 u t − 1 + ⋯ + ρ m u t − m + v t
replace u t u_t u t by u ^ t \hat{u}_t u ^ t and include x t x_t x t to control for the estimation error in u t u_t u t and testing H 0 H_0 H 0 : ρ 1 = ⋯ = ρ m = 0 \color{red}{\rho_1 = \dots = \rho_m = 0} ρ 1 = ⋯ = ρ m = 0
(iii) Box-Pierce Test: Q m = T ∑ j = 1 m ρ ^ j 2 ∼ a χ m 2
Q_m = T \sum_{j=1}^{m} \color{red}{\hat{\rho}_j}\color{black}{^2 \stackrel{a}{\sim} \chi^2_m}
Q m = T j = 1 ∑ m ρ ^ j 2 ∼ a χ m 2 test of autocorrelation up to lag order m m m
HAC standard errors:
H eteroskedasticity and A utocorrelation C onsistent standard errors (Newey/West 1987)
standard errors that account for autocorrelation up to lag h h h (truncation lag)
“Rule of thumb” for choosing H H H (e.g. Eviews/Gretl) h = i n t [ 4 ( T / 100 ) 2 / 9 ]
h = int[4 (T/100)^{2/9}]
h = in t [ 4 ( T /100 ) 2/9 ]
Relationship between autocorrelation and dynamic models:
Inserting u t = ρ u t − 1 + v t u_t = \rho u_{t-1} + v_t u t = ρ u t − 1 + v t yields
y i = ρ y t − 1 + β ′ x i − ρ β ′ ⏟ γ x t − 1 + v i
y_i = \rho y_{t-1} + \beta' x_i - \underbrace{\rho \beta'}_{\gamma} x_{t-1} + v_i
y i = ρ y t − 1 + β ′ x i − γ ρ β ′ x t − 1 + v i ⇒ \Rightarrow ⇒ Common factor restriction: γ = − β ρ \gamma = -\beta\rho γ = − βρ
Test for normality
The asymptotic properties of the OLS estimator do not depend on the validity of the normality assumption
Deviations from the normal distribution only relevant in very small samples
Outliers may be modeled by mixing distributions
Tests for normality are very sensitive against outliers
Under the null hypothesis E ( u i 3 ) = 0 E(u^3_i) = 0 E ( u i 3 ) = 0 and E ( u i 4 ) = 3 σ 4 E(u^4_i) = 3\sigma^4 E ( u i 4 ) = 3 σ 4
Jarque-Bera test statistic: J B = n [ 1 6 m ^ 3 2 + 1 24 ( m ^ 4 − 3 ) 2 ] → d χ 2 2
JB = n \left[ \color{blue}{\frac{1}{6} \hat{m}_3^2} \color{black}{ + } \color{red}{\frac{1}{24} (\hat{m}_4 - 3)^2\color{black}{} } \right] \stackrel{d}{\to} \chi^2_2
J B = n [ 6 1 m ^ 3 2 + 24 1 ( m ^ 4 − 3 ) 2 ] → d χ 2 2
where m ^ 3 = 1 T σ ^ 3 ∑ t = 1 T u ^ i 3 m ^ 4 = 1 T σ ^ 4 ∑ t = 1 T u ^ i 4
\hat{m}_3 = \frac{1}{T \hat{\sigma}^3} \sum_{t=1}^{T} \hat{u}^3_i \quad \quad \hat{m}_4 = \frac{1}{T\hat{\sigma}^4} \sum_{t=1}^{T} \hat{u}^4_i
m ^ 3 = T σ ^ 3 1 t = 1 ∑ T u ^ i 3 m ^ 4 = T σ ^ 4 1 t = 1 ∑ T u ^ i 4
Other tests: χ 2 \chi^2 χ 2 and Kolmogorov-Smirnov Test
Nonlinear regression models
a) Polynomial regression
including squares, cubic etc. transformations of the regressors: Y i = β 0 + β 1 X i + β 2 X i 2 + ⋯ + β p X i p + u i
Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \dots + \beta_p X_i^p + u_i
Y i = β 0 + β 1 X i + β 2 X i 2 + ⋯ + β p X i p + u i
where p p p is the degree of the polynomial (typically p = 2 p = 2 p = 2 )
Interpretation (for p = 2 p = 2 p = 2 )
∂ Y ∂ X = β 1 + 2 β 2 X ⇒ Δ Y ≈ ( β 1 + 2 β 2 X ) Δ X exact: Δ Y = β 1 Δ X + β 2 ( X + Δ X ) 2 − β 2 X 2 = ( β 1 + 2 β 2 X ) Δ X + β 2 ( Δ X ) 2 \begin{align*}
\frac{\partial Y}{\partial X} &= \beta_1 + 2\beta_2X \\
\Rightarrow \Delta Y &\approx (\beta_1 + 2\beta_2X) \color{red}{\Delta X} \\
\text{exact: } \Delta Y &= \beta_1\Delta X + \beta_2(X + \Delta X)^2 - \beta_2X^2 \\
&= (\beta_1 + 2\beta_2X) \color{red}{\Delta X} \color{black}{+ \beta_2(\Delta X)^2}
\end{align*} ∂ X ∂ Y ⇒ Δ Y exact: Δ Y = β 1 + 2 β 2 X ≈ ( β 1 + 2 β 2 X ) Δ X = β 1 Δ X + β 2 ( X + Δ X ) 2 − β 2 X 2 = ( β 1 + 2 β 2 X ) Δ X + β 2 ( Δ X ) 2
⇒ \Rightarrow ⇒ the effect on Y Y Y depends on the level of X X X
for small changes in X X X the derivative provides a good approximation
Computing standard errors for the nonlinear effect:
Method 1:
s.e. ( Δ Y ^ ) = var ( b 1 ) + 4 X 2 var ( b 2 ) + 8 X cov ( b 1 , b 2 ) = ∣ Δ Y ^ ∣ / F \begin{align*}
\text{s.e.}\left( \Delta \hat{Y} \right) &= \sqrt{ \text{var}(b_1) + 4X^2 \text{var}(b_2) + 8X \text{cov}(b_1, b_2) } \\
&= |\Delta \hat{Y}| / \sqrt{F}
\end{align*} s.e. ( Δ Y ^ ) = var ( b 1 ) + 4 X 2 var ( b 2 ) + 8 X cov ( b 1 , b 2 ) = ∣Δ Y ^ ∣/ F
where F F F is the F F F statistic for the test E ( Δ Y i ^ ) = β 1 + 2 X β 2 = 0 E(\Delta \hat{Y_i}) = \beta_1 + 2X\beta_2 = 0 E ( Δ Y i ^ ) = β 1 + 2 X β 2 = 0
Method 2:
Y i = β 0 + ( β 1 + 2 X β 2 ) ⏟ β 1 ∗ X i + β 2 ( 1 − 2 X X i ) X i 2 ⏟ X i ∗ + u i
Y_i = \beta_0 + \underbrace{(\beta_1 + 2X\beta_2)}_{\beta^*_1} X_i + \beta_2 \underbrace{ \left(1 - 2\frac{X}X_i\right)X^2_i}_{X^*_i} + u_i
Y i = β 0 + β 1 ∗ ( β 1 + 2 X β 2 ) X i + β 2 X i ∗ ( 1 − 2 X X i ) X i 2 + u i
Regression Y i = β 0 + β 1 ∗ X i + β 2 ∗ X i ∗ + u i Y_i = \beta_0 + \beta^*_1 X_i + \beta^*_2 X^*_i + u_i Y i = β 0 + β 1 ∗ X i + β 2 ∗ X i ∗ + u i and t-test of β 1 ∗ = 0 \beta^*_1 = 0 β 1 ∗ = 0
Confidence interval for the effect are obtained as Δ Y ^ ± z α / 2 ⋅ s . e . ( Δ Y ^ ) \Delta \hat{Y} \pm z_{\alpha/2} \cdot s.e.(\Delta \hat{Y}) Δ Y ^ ± z α /2 ⋅ s . e . ( Δ Y ^ ) or b 1 ∗ ± s.e. ( b 1 ∗ ) b^*_1 \pm \text{s.e.}(b^*_1) b 1 ∗ ± s.e. ( b 1 ∗ )
Logarithmic transformation
Three possible specifications:
log-linear: log ( Y i ) = β 0 + β 1 X i + u i linear-log: Y i = β 0 + β 1 log ( X i ) + u i log-log: log ( Y i ) = β 0 + β 1 log ( X i ) + u i \begin{align*}
\text{log-linear: } & & \color{blue}{\log} \color{black}{(Y_i)} & = \beta_0 + \beta_1X_i + u_i \\
\text{linear-log: } & & Y_i & = \beta_0 + \beta_1\color{blue}{\log} \color{black}{(X_i)} + u_i \\
\text{log-log: } & & \color{blue}{\log}\color{black}{(Y_i)} & = \beta_0 + \beta_1\color{blue}{\log}\color{black}{(X_i)} + u_i
\end{align*} log-linear: linear-log: log-log: l o g ( Y i ) Y i l o g ( Y i ) = β 0 + β 1 X i + u i = β 0 + β 1 l o g ( X i ) + u i = β 0 + β 1 l o g ( X i ) + u i
Note that in the log-linear model
β 1 = d log ( Y ) d X = 1 Y ⏟ o u t e r ⋅ d Y d X ⏟ i n n e r = d Y / Y d X
\beta_1 = \frac{d \log(Y)}{d X} = \underbrace{\frac{1}{Y}}_{outer} \cdot \underbrace{\frac{d Y}{d X}}_{inner} = \frac{d Y/Y}{d X}
β 1 = d X d log ( Y ) = o u t er Y 1 ⋅ inn er d X d Y = d X d Y / Y where d Y / Y dY/Y d Y / Y indicates the relative change
In a similar manner it can be shown that for the log-log model β 1 = ( d Y / Y ) / ( d X / X ) \beta_1 = (dY/Y)/(dX/X) β 1 = ( d Y / Y ) / ( d X / X ) is the elasticity
Note that the derivative refers to a small change. Exact :
Y 1 − Y 0 Y 0 = e β 1 Δ X − 1
\frac{Y_1 - Y_0}{Y_0} = e^{\beta_1 \Delta X} - 1
Y 0 Y 1 − Y 0 = e β 1 Δ X − 1
where l o g ( Y 0 ) = β 0 + β 1 X log(Y_0) = \beta_0 + \beta_1X l o g ( Y 0 ) = β 0 + β 1 X and l o g ( Y 1 ) = β 0 + β 1 ( X + Δ X ) log(Y_1) = \beta_0 + \beta_1(X + \Delta X) l o g ( Y 1 ) = β 0 + β 1 ( X + Δ X ) .
For small Δ X \Delta X Δ X we have ( Y 1 − Y 0 ) / Y 0 ≈ β 1 Δ X (Y_1-Y_0)/Y_0 \approx \beta_1\Delta X ( Y 1 − Y 0 ) / Y 0 ≈ β 1 Δ X
Interaction effects
Interaction terms are products of regressors:
Y i = β 0 + β 1 X 1 i + β 2 X 2 i + β 3 ( X 1 i × X 2 i ) + u i
Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i
Y i = β 0 + β 1 X 1 i + β 2 X 2 i + β 3 ( X 1 i × X 2 i ) + u i where X 1 i , X 2 i X_{1i}, X_{2i} X 1 i , X 2 i may be discrete or continuous
Note that we can also write the model with interaction term as
Y i = β 0 + β 1 X 1 i + ( β 2 + β 3 X 1 i ) ⏟ effect depends on X 1 i X 2 i + u i
Y_i = \beta_0 + \beta_1 X_{1i} + \underbrace{ \left( \color{red}{\beta_2 + \beta_3 X_{1i}} \right)}_{\text{effect depends on } X_{1i}} X_{2i} + u_i
Y i = β 0 + β 1 X 1 i + effect depends on X 1 i ( β 2 + β 3 X 1 i ) X 2 i + u i
If X 2 i X_{2i} X 2 i is discrete (dummy), then the coefficient is different for X 2 i = 1 X_{2i} = 1 X 2 i = 1 and X 2 i = 0 X_{2i} = 0 X 2 i = 0
Standard errors also depend on X 2 i X_{2i} X 2 i :
Y i = β 0 + β 1 X 1 i + β 2 ∗ X 2 i + β 3 ( X 1 i − X ‾ 1 i ) X 2 i + u i
Y_i = \beta_0 + \beta_1 X_{1i} + \color{red}{\beta^*_2}\color{black}{ X_{2i} } + \beta_3 \color{blue}{(X_{1i} - \overline{X}_{1i}) X_{2i}} \color{black}{ + u_i}
Y i = β 0 + β 1 X 1 i + β 2 ∗ X 2 i + β 3 ( X 1 i − X 1 i ) X 2 i + u i where β 2 ∗ = β 2 + β 3 X ‾ 1 i \beta^*_2 = \beta_2 + \beta_3 \overline{X}_{1i} β 2 ∗ = β 2 + β 3 X 1 i and X ‾ 1 i \overline{X}_{1i} X 1 i is a fixed value of X 1 i X_{1i} X 1 i .
Nonlinear least-squares (NLS)
Assume a nonlinear relationship between Y i Y_i Y i and X i X_i X i where the parameters enter nonlinearly
Y i = f ( X i , β ) + u i
Y_i = f(X_i,\beta) + u_i
Y i = f ( X i , β ) + u i
Example:
f ( X i , β ) = β 1 + β 2 X i β 3 + u i
f(X_i, \beta) = \beta_1 + \beta_2 X^{\color{red}{\beta_3} }_i + u_i
f ( X i , β ) = β 1 + β 2 X i β 3 + u i
Assuming i.i.d. normally distributed errors, the maximum likelihood principle results in minimizing the sum of squared residuals:
S S R ( β ) = ∑ i = 1 n ( y i − f ( X i , β ) ) 2
SSR(\beta) = \sum_{i=1}^{n} \left( y_i - f(X_i, \beta) \right)^2
SSR ( β ) = i = 1 ∑ n ( y i − f ( X i , β ) ) 2 The SSR can be minimized by using iterative algorithms (Gauss-Newton method)
The Gauss-Newton method requires the first derivative of the function f ( X i , β ) f(X_i,\beta) f ( X i , β ) with respect to β \beta β .