10 Limited Dependent Variables
Binary choice models
Choice based on Utility
\begin{align*} U_{i0} &= x'_i \gamma_0 + \epsilon_{i0} \\ U_{i1} &= x'_i \gamma_1 + \epsilon_{i1} \end{align*}
where
\begin{align*} U_{ij} &: \text{utility due to the choice of j} \\ x_i &: \text{variables characterizing the individual i} \end{align*}
Decision rule:
\begin{align*} y^*_i &= U_{i1} - U_{i0} = \begin{cases} \color{red}{> 0} & \Rightarrow \text{ choose 1} \\ \leq 0 & \Rightarrow \text{ choose 0} \end{cases} \\ y^*_i &= x'_i \color{blue}{(\gamma_1 - \gamma_0)} \color{black}{+ \epsilon_{i1} - \epsilon_{i0}} \\ &= x'_i \color{blue}{\beta} \color{black}{+ \epsilon_i} \end{align*}
where \varepsilon_i = \epsilon_{i1} - \epsilon_{i0}
10.1 Linear probability model
y^*_i in the binary choice model is typically not observed. What we observe is:
y_i = \begin{cases} 1 & \text{for } y^*_i > 0, \\ 0 & \text{for } y^*_i \leq 0. \end{cases} Assuming that the probability function is linear we have
\begin{align*} E(y_i | x_i) &= \color{blue}{P(y_i = 1 | x_i)} \color{black}{\cdot 1 + } \color{red}{P(y_i = 0 | x_i)} \color{black}{\cdot 0} \\ &= x'_i \beta \end{align*}
In this case we can estimate the linear regression:
y_i = x'_i \beta + u_i A linear probability function is pretty unrealistic and implies that \varepsilon_i is uniformly distributed (see below)
The errors u_i are heteroskedastic (variance depends on x_i). Robust standard errors are required.
10.2 Probit/Logit models
Consider the binary choice model with
\begin{align*} P(y_i = 1) &= P(\varepsilon_i > -x'_i \beta) \\ &= 1 - F(-x'_i \beta) \end{align*}
where F(\cdot) denotes the distribution function of \varepsilon_i
It follows that
\begin{align*} E(y_i | x_i) &= 1 - F(-x'_i \beta)\\ &= F(x'_i \beta) \text{ if the distribution is symmetric} \end{align*}
Nonlinear regression model:
\begin{align*} y_i &= \color{blue}{E(y_i | x_i)} \color{black}{+ u_i} \\ &= \color{red}{F(x'_i \beta)} \color{black}{+ u_i \quad \text{for symmetric distributions}} \end{align*}
error is (centered) binomially distributed with p_i = F(x'_i \beta)
estimation with Maximum Likelihood (similar to nonlinear regression)
Popular distributions:
\begin{align*} F &\sim \color{red}{\text{normal} \color{black}{\text{ distribution}}} \\ &\sim \color{blue}{\text{logistic} \color{black}{\text{ distribution}}} \end{align*}
Choice of the Distribution:
- Usually no information about the distribution
- Referring to the central limit theorem
- Practical reasons
- Specification tests
- Nonparametric estimation
Normal distribution (“Probit”) F \equiv \Phi(z) = \int_{-\infty}^{z} \frac{1}{\sqrt{2\pi}} e^{-u^2 / 2} \, du Logistic distribution (“Logit”)
F \equiv L(z) = \frac{1}{1 + e^{-z}}
Both distributions are symmetric:
1 -F(-z) = F(z) and therefore: y_i = \color{blue}{F (x'_i\beta)} \color{black}{+ v_i}
Both distributions are very similar \Phi(z) \approx L\left(\frac{\pi}{\sqrt{3}}z\right)
Marginal probability effect
partial effect of x_i on y_i
MPE_i = \frac{\partial F(x'_i \beta)}{\partial x_i} = \color{red}{\phi(x'_i \beta)} \color{black}{ \beta} \Rightarrow effect depends on the level of x_i
Maximum likelihood (ML) estimator
log-likelihood function for a symmetric distribution:
\log L(\beta) = \sum_{i=1}^{N} \color{blue}{y_i} \color{red}{ \log F(x'_i \beta)}\color{black}{ + } \color{blue}{(1 - y_i)} \color{red}{\log[1 - F(x'_i \beta)]}
Differentiation with respect to \beta yields the first order condition:
s(\widehat{\beta}) = \sum_{i=1}^{N} \frac{e_i f(x'_i \widehat{\beta})}{F(x'_i \widehat{\beta})(1 - F(x'_i \widehat{\beta}))} x_i = 0
where e_i = y_i - F(x'_i \widehat{\beta})
Nonlinear system of K equations: Iterative algorithm
Estimator is equivalent to nonlinear LS with heteroskedasticn errors
Goodness of fit
(i) McFadden R^2:
\text{MF-}R^2 = 1 - \frac{\log L(\widehat{\beta})}{\log L(\beta = 0)}
(ii) forecasting y_i: Let
\widehat{y}_i = \begin{cases} 1 & \text{if } \color{blue}{F(x'_i \widehat{\beta}) > 0.5} \color{black}{\text{ or } x'_i \widehat{\beta} > 0,} \\ 0 & \text{otherwise} \end{cases} frequency of wrong forecasts:
\frac{n_{01} + n_{10}}{n} = \frac{\sum_{i=1}^{n} (y_i - \widehat{y_i})^2}{n}
\Rightarrow R^2 based on the number of wrong forecasts
10.3 Classification
Let F_i denote the estimated probability for y_i = 1. The optimal assignment to the unknown alternatives {0, 1} is \widehat{y_i} = 1 if F_i > 0.5.
This classification rule works poorly if F_i is small. Assume that x_i \sim \text{U}[0, 1] and
y^*_i = -2 +2x +u_i
then the probability for y_i = 1 is 0.2, but in the sample, no unit value is predicted!
One may calibrate the threshold to reduce the classification error such that
\sum_{i=1}^{n} 1(\widehat{F_i} > \tau) = \sum_{i=1}^{n} y_i
\Rightarrow match the unconditional probabilities.
Trade-off between the two types of misclassification
Useful tool: ROC curve (true positive vs. false positive) If \tau is decreased \rightarrow more ONEs. These can be correct and false detections.
A classification blue is uniformly better than red if ROC is always above ROC
\Rightarrow maximize the area below the ROC curve
The target of the Probit/Logit estimator is P(y_i = 1) = F(x'_i\beta). The optimal estimator of the probability coincides with the efficient estimator of \beta.
The classification problem seeks an “optimal” estimator for y_i based on the indicator function \widehat{y_i} by minimizing some combination of the (error rates):
\begin{align*} \text{False Positive} &= \sum_{i} y_i (1 - \widehat{y_i}) / \sum_{i} y_i \quad\text{and} \\ \text{False Negative} &= \sum_{i} (1 - y_i) \widehat{y_i} / \sum_{i} (1 - y_i) \end{align*}
Note that F(x'_i \beta) > \tau is equivalent to x'_i \beta > \tau^* with \tau^* = F^{-1}(\tau). \Rightarrow distribution not relevant for classification
Support vector classifier: Maximize M subject to:
\begin{align*} &(2y_i - 1)(x'_i \beta) \geq M(1 - \xi_i) \\ &\xi_i > 0, \quad \sum \xi_i \leq C \\ &\beta' \beta = 1 \end{align*}
10.4 Sample selection model
\begin{align*} \color{blue}{\text{Regression model:}} \quad &y_i = x'_{1i}\beta_1 + u_{1i} \quad \color{blue}{\text{if } h_i = 1} \\ \color{red}{\text{Selection rule:}} \quad &h^*_i = x'_{2i}\beta_2 + u_{2i} \quad \text{with } E(u^2_{2i}) = 1 \\ \\ &h_i = \begin{cases} 1 & \text{if } h^*_i > 0 \quad \color{red}{\text{observed}} \\ 0 & \text{otherwise} \quad \color{red}{\text{not observed}} \end{cases} \end{align*}
Equivalent to the Tobit model if:
x_{1i} = x_{2i}, \quad \beta_1 / \sigma = \beta_2, \quad u_{1i} / \sigma = u_{2i}
truncated joint density
E(y_i | \color{red}{y_i \text{ observed}}\color{black}{) = x'_{1i}\beta +} \color{red}{\varrho\sigma} \color{black}{\lambda_i}
where \varrho = E(u_{1i}u_{2i})/\sigma and
\lambda_i = \frac{\phi(x'_{2i}\beta_2)}{\Phi(x'_{2i}\beta_2)}
Heckman estimator
First step: Probit estimator
\tilde{y}^*_i = x'_i \tilde{\beta} + u_i
where \tilde{\beta} = \beta / \sigma and
y_i = \begin{cases} 1 & \text{if } \tilde{y}^*_i > 0 \\ 0 & \text{otherwise} \end{cases}
Second step: augmented regression:
\lambda_i = \frac{\phi(x'_i \tilde{\beta})}{\Phi(x'_i \tilde{\beta})}
and
y_i | y^*_i > 0 = x'_i \beta + \sigma \color{red}{\hat{\lambda}_i} \color{black}{ + \nu_i}
Standard errors are biased
ML estimator is available