The Stambaugh Bias in Panel Predictive Regressions

Erik Hjalmarsson^*

NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at http://www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at http://www.ssrn.com/.

Abstract:

This paper analyzes predictive regressions in a panel data setting. The standard fixed effects estimator suffers from a small sample bias, which is the analogue of the Stambaugh bias in time-series predictive regressions. Monte Carlo evidence shows that the bias and resulting size distortions can be severe. A new bias-corrected estimator is proposed, which is shown to work well in finite samples and to lead to approximately normally distributed t-statistics. Overall, the results show that the econometric issues associated with predictive regressions when using time-series data to a large extent also carry over to the panel case. The results are illustrated with an application to predictability in international stock indices.

Keywords: Panel data, pooled regression, predictive regression, stock return predictability

JEL classification: C22, C23, G1

1 Introduction

Predictive regressions are important tools for evaluating and testing economic models. Although tests of stock return predictability, and the related market efficiency hypothesis, are probably the most common application, many rational expectations models can be tested in a similar manner (Mankiw and Shapiro, 1986). Traditionally, forecasting regressions have been evaluated in time-series frameworks. However, with the increased availability of data, in particular international financial and macroeconomic data, it becomes natural to extend the single time-series framework to a panel data setting; for instance, Cohen et al. (2003) and Polk et al. (2006) rely on predictive panel data regressions in some of their analyses.

It is well known that the apparently simple linear regression model most often used for evaluating predictability in fact raises some very tough econometric issues. The high degree of persistence found in many predictor variables, such as the earnings- or dividend-price ratios in the prototypical stock return forecasting regression, is at the root of most econometric problems associated with predictive regressions. The near persistence of the regressors, coupled with a strong contemporaneous correlation between the innovations in the regressor and the regressand, causes standard OLS estimates to suffer from a small sample bias and normal tests to have the wrong size; this is the so-called Stambaugh (1999) bias in predictive regressions.

In the panel case, with pooled regressions, it turns out that as long as no fixed effects are included, the pooled estimator is unbiased. However, once one allows for fixed effects in the pooled regression, an analogue of the Stambaugh bias is also present in the panel case. This result can be understood in light of the representation of the bias in predictive regressions derived by Stambaugh. Under the assumption that the predictor variable follows an auto-regressive process, he shows that the bias in the OLS estimate of the slope coefficient in the predictive regression is a function of the bias of the OLS estimate of the auto-regressive coefficient in the predictor variable. It is well known that the bias in the OLS estimator of auto-regressive coefficients is more severe if an intercept is included in the regression equation. Therefore, in the time-series case, the Stambaugh bias is less severe if no intercept is included in the predictive regression. This is, of course, mostly of theoretical interest since in almost all empirical applications an intercept is required. The same idea holds in the panel case; but, rather than differentiating between the case of intercept or no intercept, the relevant cases are now a common intercept or individual intercepts, i.e. fixed effects.

In this paper, I propose a simple bias correction to the fixed effects estimator in pooled predictive regressions. An analogue representation of the time-series Stambaugh bias is also derived. It is shown that the asymptotic bias in the fixed effects estimator in the predictive regression can be expressed as a function of the bias in the pooled fixed effects estimator of the auto-regressive coefficient in the predictor variable. The results in this paper complement those of Hjalmarsson (2007), which also studies the bias in the fixed effects estimator but does not explicitly analyze the connection with the Stambaugh bias in time-series regressions or the direct bias correction procedure suggested here.

The bias-corrected estimator is straightforward to implement. The key parameter on which the bias depends is the auto-regressive root in the regressor variable. The practical implementation of the bias-corrected fixed effects estimator is therefore facilitated by the fact that even though the fixed effects estimator of the auto-regressive coefficient is biased, an alternative unbiased and consistent estimator is readily available. Since the bias-corrected estimator is approximately asymptotically normal, it becomes trivial to perform inference on the slope parameter.

Simulation results show that the bias-corrected estimator works well in finite samples. These simulations also show the importance of controlling for the bias in the panel case. The average rejection rates for the test corresponding to the standard fixed effects estimator exceed 75 percent in some cases, under the null hypothesis of no predictability, and for a nominal five percent test.

As an illustration of the methods derived in this paper, I test for stock return predictability in an international panel of returns from 18 different stock indices using the corresponding dividend- and earnings-price ratios, as well as the book-to-market values, as predictors. The empirical results clearly illustrate the theoretical results in the paper. Based on the results from the standard fixed effects estimator, the evidence in favour of return predictability is very strong, using either of the three predictor variables. However, when using the robust methods developed here, the evidence disappears almost completely. Thus, both the simulation results and the empirical results clearly show that the Stambaugh bias is at least as important in panel regressions as it is in time-series regressions.

The rest of the paper is organized as follows. Section 2 outlines the panel model and shows the Stambaugh bias in panel predictive regressions. Section 3 describes the bias-corrected estimator and Section 4 illustrates the small sample properties of this estimator, as well as those of the pooled estimator without fixed effects. Section 5 shows the results from the empirical application to stock return predictability and offers a brief conclusion. The Appendix outlines the derivations of the main results.

2 The Stambaugh bias

2.1 Model and assumptions

Consider a panel model with dependent variable $y_{i,t}$ , , , and corresponding regressor, $x_{i,t}$ . Here, represents the cross-sectional dimension (e.g. firm or country) and represents the time-series dimension. The behavior of $y_{i,t}$ and $x_{i,t}$ are modelled as follows,

$\displaystyle y_{i,t}$ $\displaystyle =\alpha_{i}+\beta x_{i,t-1}+u_{i,t},$	(1)
$\displaystyle x_{i,t}$ $\displaystyle =\gamma_{i}+\rho x_{i,t-1}+v_{i,t},$	(2)

where $\rho=1+c/T$ . The auto-regressive root of the regressor is parameterized as being local-to-unity, which captures the near unit-root behavior of many predictor variables but is less restrictive than a pure unit-root assumption (e.g. Cavanagh et al. 1995, and Campbell and Yogo, 2006). The model can be seen as a panel analogue of the time-series models studied by Mankiw and Shapiro (1986), Cavanagh et al. (1995), Stambaugh (1999), Lewellen (2004), and Campbell and Yogo (2006), among others.

The innovation processes are assumed to satisfy martingale difference sequences with finite fourth order moments and the regressor $x_{i,t}$ is generally endogenous in the sense that $u_{i,t}$ and $v_{i,t}$ are contemporaneously correlated. That is, let $w_{i,t}=\left( u_{i,t} ,v_{i,t}\right) ^{\prime}$ and $\mathcal{F}_{t}=\left\{ \left. w_{i,s}\right\vert s\leq t,i=1,...,n\right\}$ be the filtration generated by the innovation processes. Then, for all , and , $E\left[ \left. w_{it}\right\vert \mathcal{F}_{t-1}\right] =0$ , $E\left[ \left. w_{i,t}w_{i,t}^{\prime}\right\vert \mathcal{F}_{t-1}\right] =\Omega_{i}=\left[ \left( \omega_{11i},\omega_{12i}\right) ,\left( \omega_{12i},\omega_{22i}\right) \right] ^{\prime}$ and $\Omega =\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\Omega_{i}$ . Finally, it is assumed that the innovations are cross-sectionally independent.¹

Let $J_{i}\left( r\right)$ denote the limiting process of the scaled regressor $x_{i,t}$ . That is, as $T\rightarrow\infty$ , $\frac{x_{i,t=\left[ Tr\right] }}{\sqrt{T}}\Rightarrow J_{i}\left( r\right)$ , where $J_{i}\left( r\right)$ , defined in the Appendix, is the standard asymptotic process for a near unit-root variable. Also, let $\underline{J}_{i}=J_{i} -\int_{0}^{1}J_{i}$ be the demeaned version of $J_{i}$ and let $\Omega _{xx}\equiv E\left[ \int_{0}^{1}J_{i}^{2}\right]$ , and $\underline{\Omega }_{xx}\equiv E\left[ \int_{0}^{1}\underline{J}_{i}^{2}\right]$ .

Following the work of Phillips and Moon (1999), results for the panel estimators are derived using sequential limits, which implies first keeping fixed and letting go to infinity, and then letting go to infinity. Such sequential convergence is denoted $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$ .²

2.2 The bias in the fixed effects estimator

Let $\tilde{y}_{i,t}=y_{i,t}-\frac{1}{nT}\sum_{i=1}^{n}\sum_{t=1}^{T}y_{i,t}$ denote the overall demeaned data and let $\underline{y}_{i,t}=y_{i,t}-\frac {1}{T}\sum_{t=1}^{T}y_{i,t}$ denote the time-series demeaned data. Define $\tilde{x}_{i,t}$ and $\underline{x}_{i,t}$ analogously. The pooled estimator of $\beta$ when there are no individual effects, i.e. when $\alpha_{i} \equiv\alpha$ for all , is given by

$\displaystyle \hat{\beta}_{Pool}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\tilde{x}_{i,t-1} ^{2}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\tilde{y}_{i,t}\tilde {x}_{i,t-1}\right) .$

(3)

The fixed effects estimator, allowing for individual effects is given by

$\displaystyle \hat{\beta}_{FE}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x}_{i,t-1} ^{2}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{y} _{i,t}\underline{x}_{i,t-1}\right) .$

(4)

As shown by Hjalmarsson (2007), and outlined in the Appendix, as $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$ ,

$\displaystyle \sqrt{n}T\left( \hat{\beta}_{Pool}-\beta\right) \Rightarrow N\left( 0,\omega_{11}\Omega_{xx}^{-1}\right) ,$

(5)

under the assumption that $\alpha_{i}\equiv\alpha$ for all , whereas

$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \rightarrow_{p}-\omega_{12}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) c}dsdr\right) \underline {\Omega}_{xx}^{-1},$

(6)

whenever $\omega_{12}\neq0$ .³ Thus, the estimator without individual effects is asymptotically unbiased and normally distributed; summing up over the cross-section in the pooled estimator eliminates the usual near unit-root asymptotic distributions found in the time-series case. The fixed effects estimator, on the other hand, suffers from a second order bias; in practice, this means that the estimator will exhibit a small sample bias and test statistics will not have standard distributions.

The intuition behind these results is that when pooling the data, independent cross-sectional information dilutes the endogeneity effects and thus potentially alleviates the bias effects seen in the time-series case; persistent regressors that are exogenous do not cause any inferential issues. This intuition holds when no individual intercepts are allowed in the specification. The bias in the fixed effects estimation arises because the time-series demeaning induces a correlation between the innovation processes $u_{i,t}$ and the demeaned regressors $\underline{x}_{i,t-1}$ ; intuitively, this happens because information available after time is used in the demeaning of $x_{i,t-1}$ .⁴

2.3 An alternative representation of the fixed effects bias

In the case of a predictive time-series regression, Stambaugh (1999) shows that the bias in the OLS estimator of the slope coefficient $\beta$ in equation (1) is a function of the bias in the OLS estimator of the coefficient $\rho$ in equation (2). Here we derive an analogue result for the fixed effects estimator in the panel case.

Note that $-\int_{0}^{1}\left( \int_{0}^{r}e^{\left( r-s\right) c}ds\right) dr=-\left. \left( e^{c}-c-1\right) \right/ c^{2}$ and let $\theta\left( c\right) \equiv-\left. \left( e^{c}-c-1\right) \right/ c^{2}$ . The limiting bias of $\hat{\beta}_{FE}$ is thus given by $T^{-1}\left( \left. \omega_{12}\theta\left( c\right) \right/ \underline{\Omega}_{xx}\right) \,$ . Let the fixed effects estimator of $\rho$ be given by

$\displaystyle \hat{\rho}_{FE}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x}_{i,t-1} ^{2}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t}\underline{x}_{i,t-1}\right) .$

(7)

As shown in Moon and Phillips (2000), the bias in $\hat{\rho}_{FE}$ is in fact equal to $T^{-1}\left( \left. \omega_{22}\theta\left( c\right) \right/ \underline{\Omega}_{xx}\right)$ . Thus, the limiting bias in the pooled fixed effects estimator of $\beta$ can be written as a function of the limiting bias in the fixed effects estimator of the auto-regressive parameter $\rho$ . That is,

$\displaystyle \underset{\left( T,n\rightarrow\infty\right) _{\operatorname{seq}} }{\text{p-}\lim}T\left( \hat{\beta}_{FE}-\beta\right) =\underset{\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}}{\text{p-}\lim}\frac {\omega_{12}}{\omega_{22}}T\left( \hat{\rho}_{FE}-\rho\right) .$

(8)

This is the analogue of the expression for the bias in the time-series estimator of $\beta$ given by Stambaugh (1999). The Stambaugh bias thus carries over directly to pooled regressions, once fixed effects are included.

In the time-series case, it is well known that standard least squares estimates of auto-regressive coefficients close to unity are much more biased when there is an intercept included in the regression. These effects also carry over to a predictive regression with persistent regressors; if no intercept is included in the regression, the Stambaugh bias will be much smaller. Of course, an intercept is required in almost all time-series applications. The panel data therefore gets us halfway: if only a common intercept is included in the pooled regression, the resulting estimator is well behaved, but once individual intercepts are included the bias shows up in the panel case as well.

3 A bias-corrected estimator

3.1 The infeasible estimator

For a known , a bias-corrected fixed effects estimator is given by

$\displaystyle \hat{\beta}_{FE}^{+}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t-1}^{2}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline {y}_{i,t}\underline{x}_{i,t-1}-nT\omega_{12}\theta\left( c\right) \right) .$

(9)

As shown in the Appendix, as $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$ ,

$\displaystyle \sqrt{n}T\left( \hat{\beta}_{FE}^{+}-\beta\right) \Rightarrow N\left( 0,\omega_{11}\underline{\Omega}_{xx}^{-1}-\left( \omega_{12}\theta\left( c\right) \right) ^{2}\underline{\Omega}_{xx}^{-2}\right) .$

(10)

Thus, the bias-corrected estimator $\hat{\beta}_{FE}^{+}$ is asymptotically normally distributed and converges at the same $\sqrt{n}T-$ rate as the standard pooled estimator.

3.2 The feasible estimator

In order to implement $\hat{\beta}_{FE}^{+}$ in practice, estimates of and $\omega_{12}$ are required. The parameter $\omega_{12}$ is the average covariance between the error terms $u_{i,t}$ and $v_{i,t}$ and can be estimated by averaging the estimates of the individual covariances $\omega_{12i}$ ; estimates of $\omega_{12i}$ can be formed using fitted residuals from either the pooled or time-series estimates of equations (1) and (2).⁵ In practice, the implementation of $\hat{\beta} _{FE}^{+}$ will not be very sensitive to the exact way of estimating $\omega_{12}$ . Rather, the crucial parameter is , which is more difficult to estimate consistently.

In the time-series case, consistent estimation of is not possible. That is, $\rho$ can be estimated consistently, but not with enough precision to identify $c=T\left( \rho-1\right)$ . This is also the reason why the Stambaugh bias is difficult to correct in practice in time-series regressions; for instance, given the lack of precise knowledge of $\rho,$ Lewellen (2004) suggests a bias correction that leads to conservative tests by imposing a maximum value on the bias under the assumption that $\rho\leq1$ . In the panel data case, it is possible to estimate consistently.

As discussed previously, the pooled estimate of $\rho$ is biased when including fixed effects. This bias naturally carries over to the estimator of $c=T\left( \rho-1\right)$ as well. However, as discussed in Moon and Phillips (2000), even when there are fixed effects in equation (2), a consistent estimator of is obtained by simply using the plain pooled estimator without any demeaning of the variables. The estimator of $\rho$ with fixed effects is biased for reasons similar to those of the fixed effects estimator of $\beta$ ; by not demeaning the data, the bias is no longer present. Intuitively, the fixed effects, or intercepts, in equation (2) can be ignored in the estimation of $\rho$ because when the root $\rho=1+c/T$ is close to unity, there is enough variation in $x_{i,t}\,$ that these intercepts are of negligible importance. Therefore, let $\hat{\rho }_{Pool}$ be the plain pooled estimator of $\rho$ ,

$\displaystyle \hat{\rho}_{Pool}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}x_{i,t-1}^{2}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}x_{i,t}x_{i,t-1}\right) ,$

(11)

and define the corresponding estimator of as $\hat{c}=T\left( \hat{\rho }_{Pool}-1\right) .$ Moon and Phillips (2000) show that this estimator of is consistent; again, observe that the data used in estimating is not time-series demeaned and that demeaning the data in the time-series dimension will lead to a bias in the estimator. A feasible version of $\hat{\beta} _{FE}^{+}$ is thus given by substituting $\omega_{12}\theta\left( c\right) \,$ with $\hat{\omega}_{12}\theta\left( \hat{c}\right)$ in equation (9).

Formally, the asymptotic normality of $\hat{\beta}_{FE}^{+}$ is shown only for the infeasible version of the estimator, which is based on the true (unknown) value of . Although it is outside the scope of this paper to derive the exact limiting distribution of the feasible version of the estimator, the simulation results below show that inference based on the assumption of normality works well also in this case.

3.3 Practical inference

Finally, in order to perform feasible inference using either $\hat{\beta }_{Pool}$ or $\hat{\beta}_{FE}^{+}$ , one only needs estimates of $\Omega_{xx}$ , $\underline{\Omega}_{xx}$ , and $\omega_{11}$ . Natural estimators of $\Omega_{xx}$ and $\underline{\Omega}_{xx}$ are given by $\hat{\Omega} _{xx}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\tilde{x} _{i,t-1}^{2}$ and $\underline{\hat{\Omega}}_{xx}=\frac{1}{n}\sum_{i=1} ^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\underline{x}_{i,t-1}^{2}$ , respectively. Let $\hat{u}_{i,t}$ be the fitted residuals and $\hat{\omega}_{11}=\frac{1}{n} \sum_{i=1}^{n}\frac{1}{T}\sum_{t=1}^{T}\hat{u}_{i,t}^{2}$ .⁶ An estimate of the variance of $\hat{\beta}_{Pool}$ is thus given by $\hat{\omega}_{11}\hat{\Omega}_{xx}^{-1}$ .

Similarly, the natural estimator of $\underline{\Phi}_{ux}$ is given by $\underline{\hat{\Phi}}_{ux}^{+}=\hat{\omega}_{11}\underline{\hat{\Omega} }_{xx}-\left( \hat{\omega}_{12}\theta\left( \hat{c}\right) \right) ^{2}$ . However, this estimator of $\underline{\Phi}_{ux}$ suffers from the drawback that it is not necessarily positive. Furthermore, subtracting off the term coming from the bias correction of the estimator, without controlling for the possibility that the feasible bias correction induces additional variance in the estimator of $\beta$ through the sampling error in $\hat{c}$ , may lead to too low an estimate of the variance of the feasible version of $\hat{\beta }_{FE}^{+}$ . That is, as mentioned above, the exact limiting distribution of the feasible estimator is unknown, and it therefore seems reasonable to use an estimator that is more robust. Thus, I propose to use the estimator $\underline{\hat{\Phi}}_{ux}=\hat{\omega}_{11}\underline{\hat{\Omega}}_{xx}$ , and estimate the variance of $\hat{\beta}_{FE}^{+}$ by $\hat{\omega} _{11}\underline{\hat{\Omega}}_{xx}^{-1}$ ; this will result in a more conservative, i.e. larger, estimate of the variance. In both the pooled and fixed effects cases, therefore, standard estimators can be used to estimate the variance of the estimators.

Since the distributions of $\hat{\beta}_{Pool}$ and $\hat{\beta}_{FE}^{+}$ are (approximately) asymptotically normal, implementing tests on the slope coefficient becomes trivial. The statistic for the estimator $\hat{\beta }_{Pool}$ , for instance, will satisfy

$\displaystyle t_{Pool}=\frac{\hat{\beta}_{Pool}-\beta_{0}}{\sqrt{\left. \hat{\omega} _{11}\hat{\Omega}_{xx}^{-1}\right/ \left( nT^{2}\right) }}\Rightarrow N\left( 0,1\right) .$

(12)

The statistic $t_{FE}^{+}$ corresponding to $\hat{\beta}_{FE}^{+}$ is constructed in an analogous manner using $\underline{\hat{\Omega}}_{xx}$ instead of $\hat{\Omega}_{xx}$ . In the simulations and empirical illustrations below, we also consider the properties of the statistic corresponding to the standard fixed effects estimator, $t_{FE}$ , which again is identical to $t_{Pool}$ with $\hat{\Omega}_{xx}$ replaced by $\underline{\hat{\Omega}} _{xx}$ ; given the above discussion, $t_{FE}$ will not be standard normally distributed unless $\omega_{12}=0$ . However, inference using $\hat{\beta} _{FE}$ and $t_{FE}$ under the normality assumption provides a useful illustration of the biases that occur if one ignores the issues resulting from the endogeneity and persistence of the regressor.

4 Simulation evidence

To evaluate the small sample properties of the panel data estimators proposed in this paper, a Monte Carlo study is performed. In the first experiment, the properties of the point estimates are considered. Equations (1) and (2) are simulated for the case with a single regressor. The innovations $\left( u_{i,t},v_{i,t}\right)$ are drawn from normal distributions with mean zero, unit variance, and correlations $\delta =0,-0.4,-0.7,$ and . The slope parameter $\beta$ is set equal to and the local-to-unity parameter is set to . The sample size is given by . The small value of $\beta$ is chosen in order to reflect the fact that most forecasting regressions are used to test a null of $\beta=0$ , and any plausible alternative is often close to zero. The intercepts $\alpha_{i}$ are all set equal to zero. All results are based on 10,000 repetitions.

Three different estimators are considered: the pooled estimator with no fixed effects, $\hat{\beta}_{Pool}$ , the fixed effects estimator, $\hat{\beta}_{FE}$ , and the bias-corrected fixed effects estimator, $\hat{\beta}_{FE}^{+}$ . The bias-correction term in the estimator $\hat{\beta}_{FE}^{+}$ is estimated by $\hat{\omega}_{12}\theta\left( \hat{c}\right)$ , where $\hat{c}$ is the panel estimate of the local-to-unity parameter and $\hat{\omega}_{12}$ is estimated as $n^{-1}\sum_{i=1}^{n}\hat{\omega}_{12i}$ with $\hat{\omega} _{12i}$ the covariance between the residuals from a time-series estimation of equation (1) and the residuals from the pooled estimation of equation (2).⁷

The results are shown in Figure 1. $\hat{\beta}_{Pool}$ and $\hat{\beta}_{FE}^{+}$ are virtually unbiased whereas $\hat{\beta}_{FE}$ exhibits a rather substantial bias when the absolute value of the correlation $\delta$ is large. The bias-corrected estimator, $\hat{\beta}_{FE}^{+}$ , has a slightly less peaked distribution than the standard pooled estimator, $\hat{\beta}_{Pool}$ , but overall the bias correction appears to produce good point estimates.

The second part of the Monte Carlo study concerns the size and power of the pooled tests. The same setup as above is used, but, in order to calculate the power of the tests, the slope coefficient $\beta$ now varies between and . The tests are evaluated under the assumption that the limiting distributions are standard normal; i.e. the null is rejected for absolute test values greater than . Panel A in Table 1 shows the average sizes of the nominal five percent tests under the null hypothesis of $\beta=0$ for the two sided tests corresponding to the three different estimators considered above. Figure 2 shows the average rejection rates of the five percent two-sided tests, evaluating a null of $\beta=0$ for different values of the true $\beta$ ; that is, the power curves of the tests. Again, the results are based on 10,000 repetitions.

Apart from the test based on the standard fixed effects estimator, the other two tests perform very well in terms of size, with actual rejection rates very close to five percent in the nominal five percent test. Table 1 and the power curves in Figure 2 clearly show the effects of the second order bias in the fixed effects estimator. The test based on the bias-corrected fixed effects estimator has similar power properties to the test based on the standard pooled estimator.

In practice, the assumption that $\rho$ is identical for all , i.e. that the regressors all have the same persistence, may seem restrictive. I therefore briefly analyze the robustness of the bias-correction method proposed here to deviations from this assumption. In particular, identical size simulations as those reported in Panel A of Table 1 are shown in Panel B when the individual local-to-unity parameters, $c_{i}$ , are drawn from a uniform distribution with support $\left[ -20,-2\right]$ . As is seen, the results are very similar, and the bias correction appears fairly robust to this generalization. The standard pooled estimator should not be affected, since the assumption of a common parameter is not needed in deriving its asymptotic result.

In summary, the simulation evidence shows the importance of controlling for the bias arising from fitting individual intercepts in the pooled regression. The bias correction of the fixed effects estimator appears to work well, producing nearly unbiased results and correctly sized tests with good power. In cases where individual effects are not present, the pooled estimator performs well also when the regressors are highly endogenous, as the theory would predict.

5 Empirical illustration and conclusion

To illustrate the methods developed in this paper, I consider the question of stock return predictability in an international data set. The data are obtained from the MSCI database and consist of a panel of total returns for stock markets in 18 different countries and three corresponding forecasting variables: the dividend- and earnings-price ratios as well as the book-to-market values. With varying success, all three of these variables have been used extensively in tests of stock return predictability for U.S. data (e.g. Lewellen, 2004, and Campbell and Yogo, 2006), and to a lesser degree in international data (e.g. Ang and Bekaert, 2007). All three of these forecasting variables are highly persistent, and since they are all valuation ratios, their innovations are likely to be highly correlated with the innovations to the returns process. The data are on a monthly basis and the returns data span the period 1970.1 to 2002.12, though not all forecasting variables or all countries are available for this whole time-period. In particular, I have data for stock indices in the following countries: Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hong Kong, Italy, Japan, the Netherlands, Norway, Singapore, Spain, Sweden, Switzerland, the UK, and the USA.⁸ The dividend price ratio $\left( d-p\right)$ is available for all countries except Hong Kong and for the entire sample period from 1970.1 onwards. The earnings price ratio $\left( e-p\right)$ is available for all countries except Italy and Switzerland, from 1974.12 onwards. The book-to-market value $\left( b-p\right)$ is available for all countries from 1974.12 onwards. All returns are expressed in U.S. dollars, and the dependent variable in the predictive regressions is given by the excess return over the 1-month U.S. T-bill rate. Finally, all data are log-transformed.

The results from the pooled forecasting regressions are shown in Table 2. The estimates of and the correlation between the innovations in the returns and predictor processes show that the forecasting variables are clearly near unit-root processes and highly endogenous. The standard pooled fixed effects estimator, $\hat{\beta}_{FE}$ , delivers highly significant estimates and clearly rejects the null-hypothesis of no predictability. Given the high persistence and endogeneity found in the data, however, these results are likely to be upward biased. As seen from the estimates based on the bias-corrected fixed effects estimator, $\hat{\beta}_{FE}^{+}$ , significance disappears when controlling for the bias induced by the time-series demeaning in the fixed effects estimator; $\hat{\beta}_{FE}^{+}$ is implemented in a manner identical to that described in the simulation section above. Overall, the case for stock return predictability in this international data set, using either of the three predictor variables, must be considered very weak.

These results are in line with the extensive study of stock return predictability in international data by Hjalmarsson (2007), which suggests that the predictive power of valuation ratios is typically weak in international data. Ang and Bekaert (2007) also find in a smaller international sample, covering France, Germany, the UK and the USA over a similar sample period, that the predictive ability of the dividend-price ratio is very weak in all four of these countries.⁹

The empirical results illustrate well the difficulties of performing inference in regressions with persistent and endogenous variables, and that these difficulties also prevail when a panel of data, rather than a single time-series, is available. Indeed, judging by the vast difference between the estimates and test statistics resulting from the standard fixed effects estimator and those from the robust estimators, it is clear that the bias effects can be as large in panel estimations as in time-series regressions.

Appendix A: The asymptotic properties of $\hat{\beta}_{Pool}$ , $\hat{\beta}_{FE}$ , and $\hat{\beta}_{FE}^{+}$

Hjalmarsson (2007) derives the asymptotic properties of $\hat{\beta}_{Pool}$ and $\hat{\beta}_{FE}$ in a similar setting to the one considered here, although he does not consider the bias-corrected estimator $\hat{\beta} _{FE}^{+}$ . The following derivations therefore primarily recollect those found in Hjalmarsson (2007).

Given the conditions on $u_{i,t}$ and $v_{i,t}$ , as $T\rightarrow\infty$ , $\frac{1}{\sqrt{T}}\sum_{t=1}^{\left[ Tr\right] }w_{i,t}\Rightarrow B_{i}\left( r\right) =BM\left( \Omega_{i}\right) \left( r\right)$ , where $B_{i}\left( \cdot\right) =\left( B_{1i}\left( \cdot\right) ,B_{2i}\left( \cdot\right) \right) ^{\prime}$ denotes a two-dimensional Brownian motion. As $T\rightarrow\infty$ , $\frac{x_{i,t=\left[ Tr\right] } }{\sqrt{T}}\Rightarrow J_{i}\left( r\right)$ , where $J_{i}\left( r\right) =\int_{0}^{r}e^{\left( r-s\right) c}dB_{2,i}\left( s\right)$ . Analogous results hold for the time-series demeaned data, $\underline{x}_{i,t}$ , with $J_{i}$ replaced by $\underline{J}_{i}.$

First note that $\frac{\tilde{x}_{i,t=\left[ Tr\right] }}{\sqrt{T}} =\frac{x_{i,t}}{\sqrt{T}}+O_{p}\left( \frac{1}{\sqrt{n}}\right) \Rightarrow J_{i}\left( r\right) ,$ so that the overall demeaning has no asymptotic effects. By standard results as $T\rightarrow\infty$ , $\frac{1}{T}\sum _{t=1}^{T}u_{i,t}\tilde{x}_{i,t-1}\Rightarrow\int_{0}^{1}dB_{1,i}J_{i}$ and $\frac{1}{T^{2}}\sum_{t=1}^{T}\tilde{x}_{i,t-1}^{2}\Rightarrow\int_{0} ^{1}J_{i}^{2}$ . Since $B_{1,i}$ and $J_{i}$ are across and $E\left[ \int_{0}^{1}dB_{1,i}J_{i}\right] =0$ , it follows that as $n\rightarrow\infty$ , $\frac{1}{n}\sum_{i=1}^{n}\int_{0}^{1}J_{i}^{2}\rightarrow_{p}\Omega_{xx}$ and $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\int_{0}^{1}dB_{1,i}J_{i}\Rightarrow N\left( 0,\Phi_{ux}\right)$ where $\Phi_{ux}\equiv E\left[ \left( \int_{0}^{1}dB_{1,i}J_{i}\right) ^{2}\right]$ , by the weak law of large numbers and the central limit theorem, respectively. Thus, as $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}},$ $\frac{1}{n}\sum _{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\tilde{x}_{i,t-1}^{2}\rightarrow _{p}\Omega_{xx}$ and $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{1}{T}\sum _{t=1}^{T}u_{i,t}\tilde{x}_{i,t-1}\Rightarrow N\left( 0,\Phi_{ux}\right)$ . It follows that $\sqrt{n}T\left( \hat{\beta}_{Pool}-\beta\right) \Rightarrow N\left( 0,\Phi_{ux}\Omega_{xx}^{-2}\right)$ . By the Itô isometry, $\Phi_{ux}=\omega_{11}E\left[ \int_{0}^{1}J_{i}^{2}\right] =\omega _{11}\Omega_{xx}$ and thus $\Phi_{ux}\Omega_{xx}^{-2}=\omega_{11}\Omega _{xx}^{-1}.$

Similarly, $\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T} \underline{x}_{i,t-1}^{2}\rightarrow_{p}\underline{\Omega}_{xx}$ , as $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$ . However, simple calculations yield that $E\left[ \int_{0}^{1}dB_{1,i}\underline{J} _{i}\right] =-\omega_{12}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C}dsdr\right) \neq0$ , and it follows that $\frac{1}{n}\sum _{i=1}^{n}\frac{1}{T}\sum_{t=1}^{T}u_{i,t}\underline{x}_{i,t-1}\rightarrow _{p}-\omega_{12}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C}dsdr\right) ,$ as $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$ . Thus, $T\left( \hat{\beta}_{FE}-\beta\right) \rightarrow_{p}-\omega_{12}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) c}dsdr\right) \underline{\Omega}_{xx}^{-1}$ as $\left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$ .

By removing the mean of the term $\int_{0}^{1}dB_{1,i}\underline{J}_{i}$ in the bias-corrected estimator $\hat{\beta}_{FE}^{+}$ , the central limit theorem once more applies and, by the same arguments as for $\hat{\beta}_{Pool}$ , $\sqrt{n}T\left( \hat{\beta}_{FE}^{+}-\beta\right) \Rightarrow N\left( 0,\underline{\Phi}_{ux}\underline{\Omega}_{xx}^{-2}\right)$ where $\underline{\Phi}_{ux}=E\left[ \left( \int_{0}^{1}dB_{1,i}\underline{J} _{i}-E\left[ \int_{0}^{1}dB_{1,i}\underline{J}_{i}\right] \right) ^{2}\right] =E\left[ \left( \int_{0}^{1}dB_{1,i}\underline{J}_{i} -\omega_{12}\theta\left( c\right) \right) ^{2}\right]$ . By the Itô isometry, it follows that $\underline{\Phi}_{ux}=\omega_{11}\underline{\Omega }_{xx}-\left( \omega_{12}\theta\left( c\right) \right) ^{2}$ and $\underline{\Phi}_{ux}\underline{\Omega}_{xx}^{-2}=\omega_{11}\underline {\Omega}_{xx}^{-1}-\left( \omega_{12}\theta\left( c\right) \right) ^{2}\underline{\Omega}_{xx}^{-2}$ .

References

Ang, A., and G. Bekaert, 2007. Stock Return Predictability: Is it There? Review of Financial Studies 20, 651-707.

Campbell, J.Y., and M. Yogo, 2006. Efficient Tests of Stock Return Predictability, Journal of Financial Economics 81, 27-60.

Cavanagh, C., G. Elliot, and J. Stock, 1995. Inference in Models with Nearly Integrated Regressors, Econometric Theory 11, 1131-1147.

Cohen, R., C. Polk, and T. Vuolteenaho, 2003. The Value Spread, Journal of Finance 58, 609-641.

Hjalmarsson, E., 2007. Predicting Global Stock Returns, Working Paper, Federal Reserve Board.

Lewellen, J., 2004. Predicting Returns with Financial Ratios, Journal of Financial Economics, 74, 209-235.

Mankiw, N.G., and M.D. Shapiro, 1986. Do We Reject Too Often? Small Sample Properties of Tests of Rational Expectations Models, Economics Letters 20, 139-145.

Moon, H.R., and P.C.B. Phillips, 2000. Estimation of Autoregressive Roots near Unity using Panel Data, Econometric Theory 16, 927-998.

Phillips, P.C.B., and H.R. Moon, 1999. Linear Regression Limit Theory for Nonstationary Panel Data, Econometrica 67, 1057-1111.

Polk, C., S. Thompson, and T. Vuolteenaho, 2006. Cross-Sectional Forecasts of the Equity Premium, Journal of Financial Economics 81, 101-141.

Stambaugh, R., 1999. Predictive Regressions, Journal of Financial Economics 54, 375-421.

Thompson, S.B., 2006. Simple Formulas for Standard Errors that Cluster by Both Firm and Time, Working Paper, Harvard University.

Table 1: Size Results from the Monte Carlo Study - Panel A ()

The table shows the average rejection rates under the null of $\beta=0$ , for the two-sided tests corresponding to the respective estimators; the nominal size of the tests are percent. The differing values of $\delta$ are given in the top row of the table and the results are based on repetitions. The sample size used is and . In Panel A, the local-to-unity parameter, , is set equal to . In Panel B, separate local-to-unity parameters are drawn for each from a uniform distribution with support [-20,-2].

Estimator	$\delta=0.0$	$\delta=-0.4$	$\delta=-0.7$	$\delta =-0.95$
$\hat{\beta}_{POOL}$	0.050	0.051	0.054	0.050
$\hat{\beta}_{FE}$	0.052	0.211	0.546	0.807
$\hat{\beta}_{FE}^{+}$	0.054	0.052	0.056	0.054

Table 1: Size Results from the Monte Carlo Study - Panel B ( $c_{i}\sim U\left[ -20,-2\right]$ )

Estimator	$\delta=0.0$	$\delta=-0.4$	$\delta=-0.7$	$\delta =-0.95$
$\hat{\beta}_{POOL}$	0.053	0.051	0.053	0.053
$\hat{\beta}_{FE}$	0.056	0.150	0.362	0.584
$\hat{\beta}_{FE}^{+}$	0.056	0.054	0.059	0.064

Table 2: Results from the Empirical Regressions

The table shows the point estimates and corresponding statistics (in parentheses) from the pooled regressions of excess stock returns onto either the dividend price ratio , the earnings price ratio , or the book-to-market value . The first column indicates which of the three forecasting variables is used and the second and third columns give the size of the panel used in the regression. The next two columns give the results for the standard fixed effects estimator and the bias-corrected fixed effects estimator, respectively. The final two columns give the estimate of the local-to-unity parameter in the regressors and the average correlation between the innovations to the returns and the regressors, respectively.

Variable			$\hat{\beta}_{FE}$	$\hat{\beta}_{FE}^{+}$	$\hat {c}_{pool}$	$\hat{\delta}$
d - p	17	396	0.007 (3.840)	-0.002 (-1.254)	-0.004	-0.771
e - p	16	337	0.011 (4.924)	0.000 (-0.089)	0.091	-0.697
b - p	18	337	0.008 (4.016)	0.002 (0.948)	-1.538	-0.835

Figure 1: Estimation results from the Monte Carlo study

The graphs show the kernel density estimates of the estimated slope coefficients, for samples with and . The solid lines, labeled Pooled in the legend, show the results for the standard pooled estimator without individual intercepts, $\hat{\beta}_{Pool}$ ; the long dashed lines, labeled Fixed Effects, show the results for the standard fixed effects estimator, $\hat{\beta}_{FE}$ ; the dotted lines, labeled Bias Corrected FE, show the results for the bias-corrected fixed effects estimator, $\hat{\beta}_{FE}^{+}$ . All results are based on 10,000 repetitions.

Figure 2: Power Results from the Monte Carlo Study

The graphs show the average rejection rates for a two-sided percent test of the null hypothesis of $\beta=0,$ for samples with , and . The axis shows the true value of the parameter $\beta$ , and the axis indicates the average rejection rate. The solid lines, labeled Pooled, give the results for the test corresponding to the standard pooled estimator without individual intercepts, $\beta_{Pool}$ ; the long dashed lines, labeled Fixed Effects, show the results for the test corresponding to the standard fixed effects estimator, $\hat{\beta}_{FE}$ ; the dotted lines, labeled Bias Corrected FE, show the results for the test corresponding to the bias-corrected fixed effects estimator, $\hat{\beta}_{FE}^{+}$ . The flat lines indicate the 5% rejection rate. All results are based on 10,000 repetitions.

Footnotes

* Helpful comments have been provided by Lennart Hjalmarsson, Randi Hjalmarsson, and George Korniotis, as well as seminar participants at the European Summer Meeting of the Econometric Society in Vienna. Tel.: +1-202-452-2426; fax: +1-202-263-4850; email: [email protected]. The views in this paper are solely the responsibility of the author and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of any other person associated with the Federal Reserve System. Return to text

1. In order to highlight the effects of the Stambaugh bias in panel regressions, the effects of cross-sectional dependence are not considered. In certain applications it may be desirable to allow for clustering of the errors either across time for a given individual , or across individuals (i.e. cross-sectional correlation). As shown by Thompson (2006), it is straightforward to construct standard error estimators that control for such clustering across both time and individuals. His framework could easily be used in the current context and the details are omitted. Return to text

2. Subject to potential rate restrictions, such as $n/T\rightarrow0$ , these results can generally be shown to hold as and go to infinity jointly; technical proofs of such joint convergence is not pursued in the current study, however. Return to text

3. In the special case of $\omega_{12}=0$ , it follows easily that $\hat{\beta}_{FE}$ is also asymptotically normally distributed with convergence rate $\sqrt{n}T$ . Return to text

4. Polk et al. (2006) make the same conjecture regarding inference in pooled predictive regressions, namely that independent cross-sectional information dilutes the endogeneity effects, but do not recognize that this intuition fails in the presence of fixed effects. Their regressor is nearly exogenous however, and their empirical conclusions should therefore still be fairly accurate. Return to text

5. Recall that although both the time-series and pooled fixed effects estimators of $\beta$ and $\rho$ are generally biased in finite samples, they are still consistent estimators. Estimators of the covariance $\omega_{12i}$ based on the fitted residuals will therefore be consistent. Return to text

6. Alternatively, the estimator of $\omega_{11}$ could be replaced by a robust variance estimator to allow for heteroskedasticity in the error terms. Return to text

7. As mentioned before, the exact estimation procedure for the $\omega_{12i}s$ does not play a crucial role in the properties of $\hat{\beta}_{FE}^{+}$ . Return to text

8. Hong Kong is, of course, not a country. Return to text

9. Ang and Bekaert (2007) do argue that the dividend-price ratio has some predictive ability when considered jointly with the short interest rate. Return to text

This version is optimized for use by screen readers. Descriptions for all mathematical expressions are provided in LaTex format. A printable pdf version is available. Return to text