The Federal Reserve Board eagle logo links to home page

Predicting Global Stock Returns*

Erik Hjalmarsson

NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at http://www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at http://www.ssrn.com/.


Abstract:

I test for stock return predictability in the largest and most comprehensive data set analyzed so far, using four common forecasting variables: the dividend- and earnings-price ratios, the short interest rate, and the term spread. The data contain over 20,000 monthly observations from 40 international markets, including 24 developed and 16 emerging economies. In addition, I develop new methods for predictive regressions with panel data. Inference based on the standard fixed effects estimator is shown to suffer from severe size distortions in the typical stock return regression, and an alternative robust estimator is proposed. The empirical results indicate that the short interest rate and the term spread are fairly robust predictors of stock returns in developed markets. In contrast, no strong or consistent evidence of predictability is found when considering the earnings- and dividend-price ratios as predictors.

Keywords: Cross-sectional dependence, panel data, pooled regression, predictive regression, stock return predictability

JEL classification: C22, C23, G12, G15



1  Introduction

Our empirical knowledge regarding the predictability of stock returns by variables such as the dividend-price ratio has been subject to constant updating over time. Early work by Fama and French (1988, 1989) and Campbell and Shiller (1988) concluded that there is generally strong evidence of predictability. Recent studies that use more robust econometric methods, such as Campbell and Yogo (2006) and Lewellen (2004), still find evidence of predictability, but their results are much less conclusive than the earlier studies.

Despite the mixed evidence and uncertainty regarding stock return predictability, there have been surprisingly few attempts at furthering our understanding by using data other than that of the U.S. stock-market. Since the predictable component of stock returns must be small, if indeed one does exist, there seems to be little chance of reaching a decisive conclusion using U.S. data alone, which effectively provides only one time-series at the market level. There has, of course, been some analysis of predictability in international stock returns, but many of the results are based on relatively small data sets and non-robust econometric methods.1 In addition, most international results are based only on individual time-series regressions and very little analysis has been conducted with pooled panel data regressions. Yet, it is well known that pooling the data may lead to more powerful methods, which is particularly relevant when studying stock return predictability since any predictable component will always be small relative to the overall variance in the returns process.

The aim of this paper is twofold. First, by considering a large global data set, I provide the most comprehensive picture of stock return predictability to date. The data contain over 20,000 monthly observations from 40 countries, including markets in 24 developed economies.2 The longest data series is for the U.K. stock-market and dates back to 1836 while data for eight other markets date back to before 1935. Second, I develop and apply new results for pooled forecasting regressions, utilizing the panel structure of the data.

Since an international data set of stock returns and forecasting variables provides a panel, the theory part of this paper analyzes econometric inference in predictive regressions in a panel data setting, when the regressors are nearly persistent and endogenous.3 As is well known (Stambaugh (1999)), OLS inference in the corresponding time-series predictive regressions is generally biased and various bias and size correction procedures have been proposed.

In the panel case, it turns out that the pooled estimator is unbiased as long as no fixed effects are included. The intuition behind this is that when pooling the data, independent cross-sectional information dilutes the endogeneity effects that cause the Stambaugh bias in the time-series case. That is, the Stambaugh bias only arises when the predictors are both persistent and endogenous; by pooling the data, the endogeneity is, in a sense, removed, and hence also the bias. Furthermore, the standard pooled estimator has an asymptotically normal distribution and normal inference can therefore be performed.

The intuition just described for the standard pooled estimator no longer holds when fixed effects are allowed for, and the asymptotic properties of the pooled estimator with fixed effects are very different from those of the pooled estimator with a common intercept. The time-series demeaning of the data, which is implicit in a fixed effects estimation, causes the fixed effects estimator to suffer from a second order bias that invalidates inference from standard test statistics. When demeaning each time-series in the panel, information after time t is used to form the time t regressor, and information before time t is used to form the time t returns. This induces a correlation between the lagged value of the demeaned regressor and the error term in the forecasting equation, which gives rise to the second order bias in the fixed effects estimator. Thus, in contrast to the case with a common intercept, the regressors no longer act as if they were exogenous. To correct for this bias, I develop an estimator based on the idea of recursive demeaning (e.g. Moon and Phillips (2000), and Sul et al. (2005)). By using information only after time $ t$ in the demeaning of the returns and the non-demeaned regressor as an instrument, the distortive effects arising from standard demeaning are eliminated.

The overall conclusion from the theoretical results and the supporting Monte Carlo simulations is that, in the typical panel data case with fixed effects, persistent and endogenous regressors will cause standard inference to be biased. While this result is well established in the time-series case (e.g. Stambaugh (1999)), the results in this paper show that equal caution is required when working with panel data.

In the empirical analysis, I conduct time-series regressions for individual countries as well as pooled regressions. In both types of analyses, I estimate regressions for four of the most commonly used forecasting variables: the dividend- and earnings-price ratios, the short interest rate, and the term spread. In the pooled regressions, countries are either all grouped together in a global panel or split up into groups of developed and emerging markets.

The results indicate that the short interest rate and the term spread are both fairly robust predictors of (excess) stock returns in developed markets. The null of no predictability is clearly rejected in the pooled regressions for developed markets as well as in a number of individual time-series regressions. These results are generally in line with those found by Campbell and Yogo (2006) with U.S. data and with the limited international results of Ang and Bekaert (2007). In contrast to the interest rate variables, no strong or consistent evidence of predictability is found when considering the earnings- and dividend-price ratios as predictors. In particular, neither predictor yields any consistent predictive power for the developed markets and, as seen in plots of the regression coefficient over time, this is especially true for the dividend-price ratio.

The rest of the paper is organized as follows. Sections II and III describe the empirical model and derive the main asymptotic properties of the pooled estimators. The finite sample properties of the procedures developed in this paper are analyzed through Monte Carlo experiments in Section IV. The data are described in Section V and the empirical results, including out-of-sample exercises, are provided in Section VI. Section VII concludes and technical assumptions and proofs are found in the Appendix.


II  Pooled Estimation in Predictive Regressions

II.A  Model and Assumptions

Consider a panel model with dependent variables $ y_{i,t}$, $ i=1,...,n$, $ t=1,...,T$, and the corresponding vector of regressors, $ x_{i,t}$, where $ x_{i,t}$ is an $ m\times1$ vector. In this paper, $ y_{i,t}$ is the stock return in country $ i$, and $ x_{i,t}$ are the corresponding predictor variables. The behavior of $ y_{i,t}$ and $ x_{i,t}$ are modelled as follows:

$\displaystyle y_{i,t}$ $\displaystyle =\alpha_{i}+\beta_{i}^{\prime}x_{i,t-1}+\gamma_{i}^{\prime} f_{t}+u_{i,t},$ (1)
$\displaystyle x_{i,t}$ $\displaystyle =x_{i,t}^{0}+\Gamma_{i}^{\prime}z_{t},$(2)
$\displaystyle x_{i,t}^{0}$ $\displaystyle =A_{i}x_{i,t-1}^{0}+v_{i,t},$(3)
$\displaystyle z_{t}$ $\displaystyle =A_{g}z_{t-1}+g_{t}.$(4)

That is, stock returns $ y_{i,t}$ are a function of the past values of the predictor variables plus two factors representing country specific $ \left( u_{i,t}\right) $ and global $ \left( f_{t}\right) $ innovations. In the typical time-series predictive regression using, for instance, aggregate U.S. data, these two error terms are generally not distinguishable, and in terms of econometric inference, it makes no difference whether the shocks are U.S. specific or global in some sense. However, when pooling data from several countries, it becomes important to control for whether innovations to returns are due to country specific shocks or shocks that are common to all countries in the sample. Intuitively, if one ignores the presence of common factors in the error terms, the total amount of (independent) variation in the pooled data is overstated, and the econometric inference will be biased.

The vector of predictor variables, $ x_{i,t}$, is also assumed to be the sum of country specific $ \left( x_{i,t}^{0}\right) $ and global $ \left( z_{t}\right) $ terms. Both $ x_{i,t}^{0}$ and $ z_{t}$ follow $ AR\left( 1\right) $ processes. More precisely, the auto-regressive roots of both of these processes are parameterized as being local-to-unity, such that $ A_{i}=I+C_{i}/T$ and $ A_{g}=I+C_{g}/T$, where both $ A_{i}$ and $ A_{g}$ are $ m\times m$ matrices. This captures the near unit-root, or highly persistent, behavior of many predictor variables, but is less restrictive than a pure unit-root assumption. The near unit-root construction, where the autoregressive root drifts closer to unity as the sample size increases, is used as a tool to enable an asymptotic analysis where the persistence in the data remains large relative to the sample size, even when the sample size increases to infinity. That is, if the auto-regressive roots are treated as fixed and strictly less than unity, then as the sample size grows, the regressors will behave as strictly stationary processes asymptotically, and the standard first order asymptotic results will not provide a good guide to the actual small sample properties of the model. If the roots are exactly equal to unity, the usual unit-root asymptotics apply to the model, but this is clearly a restrictive assumption for most potential predictor variables. Instead, by using the near unit-root construction, the effects from the high persistence in the regressor will appear also in the asymptotic results, but without imposing the strict assumption of a unit root.

Finally, the regressors $ x_{i,t}$ can be endogenous in the sense that $ u_{i,t}$ and $ v_{i,t}$ are contemporaneously correlated; $ f_{t}$ and $ g_{t}$ may be contemporaneously correlated as well, and can, in fact, be identical. The model specification is completed in Appendix A with some additional formal assumptions. Unless otherwise noted, all variables appearing in the asymptotic distributions derived below are defined in Appendix A.

II.B  Motivations for Pooling

II.B.1  Practical and Econometric Considerations

The theoretical part of this paper analyzes the pooled estimation of the slope coefficient in equation (1). That is, by pooling data from several countries, an estimate of a joint slope coefficient $ \beta$ is obtained. If the individual slope coefficients are all identical, such that $ \beta _{i}=\beta$ for all $ i=1,...,n$, the pooled estimator will converge to this common parameter. In addition, the pooled estimator can either impose a common intercept $ \alpha$, or allow for individual intercepts, or fixed effects, $ \alpha_{i}$. When the restrictions $ \beta_{i}=\beta$, and potentially $ \alpha_{i}=\alpha$, hold for all $ i$, pooling the data should lead to more precise estimates than time-series estimation of each individual $ \beta_{i}$.

When the slope coefficients $ \beta_{i}$ are not all identical, pooled estimation may still be useful. In this case, the pooled estimator will converge to a well-defined average slope coefficient. The pooled estimate, and related tests, thus makes a statement about the average predictive relationship in the panel, which provides a useful tool for interpreting and understanding the empirical results, especially if the individual time-series regressions deliver mixed results. Furthermore, and as importantly, the pooled estimate may in some respects provide at least as good an estimate of $ \beta_{i}$ for a given $ i$, by providing a possibly less noisy estimate than the time-series one. That is, if the $ \beta_{i}s$ are not identical, the pooled estimator will generally not provide an unbiased estimate for a given $ \beta_{i}$, but in a bias-variance trade-off it may still dominate the time-series estimate of $ \beta_{i}$. This bias-variance trade-off is illustrated by out-of-sample forecasts at the end of this paper where it is shown that the forecasts based on the pooled estimator often dominate those based on the time-series estimates.

II.B.2  Economic Rationale

Is it likely, from the perspective of economic theory, that the $ \alpha_{i}s $ and $ \beta_{i}s$ are identical across $ i$? That is, can one justify pooling the data from an economic perspective, and if so, should fixed effects be included?

Consider first the question of fixed effects. Under the null of no predictability, such that $ \beta_{i}=0$ for all $ i$, the restriction of $ \alpha_{i}=\alpha$ for all $ i$ imposes the same expected excess return in all countries. Although it is very difficult to obtain precise estimates of average returns, detailed empirical studies such as Jorion and Goetzman (1999) strongly suggest that the equity premium varies across countries. In addition, if an international world CAPM applies, identical $ \alpha_{i}s$ in the absence of predictability imply that the world CAPM beta for each country is identical. The restriction of identical CAPM betas is strongly rejected in previous studies such as Ferson and Harvey (1994), who report international CAPM betas in the range from $ 0.4$ to 1.3, and Harvey (1995), who shows that the world CAPM betas for some emerging markets are negative. Although the world CAPM does not offer a complete model of stock returns, it does capture a sizeable amount of the variation in international stock returns (Ferson and Harvey, 1994). Model predictions that strongly contradict it, such as identical CAPM betas for all countries, should thus be seen as a warning sign of misspecification. Therefore, given the importance of having a model that is correctly specified under the null hypothesis, fixed effects should generally be included.4

In order to understand the economic constraints that are imposed by identical $ \beta_{i}s$, one needs to analyze a model that implies predictability in stock returns. Menzly et al. (2004) explicitly analyze cross-sectional differences in time-series return predictability. They use an external habit model similar to Campbell and Cochrane (1999) and show that the dividend-price ratio predicts excess stock returns. The slope coefficient in this predictive regression varies across assets as a function of the properties of the assets' cash-flow share of overall income; in an international asset pricing framework, with integrated markets, each country portfolio can be viewed as an individual asset, as in the international CAPM. The model in Menzly et al. (2004) thus implies that, in general, the slope coefficients $ \beta_{i}$ in the predictive regression in equation (1) may not be identical across $ i$. However, the model says little about how disperse the slope coefficients actually are in practice. That is, even though it is unlikely that the $ \beta_{i}s$ are identical across countries or assets, as is true for most parameters that may be estimated in economics or finance, what is of primary importance for the empirical scope of this paper is whether they are similar enough that it may be beneficial from an econometric point of view to treat them as equal.5 From this empirical perspective, the implications of the Menzly et al. (2004) model are essentially silent, and it is unlikely that other models of return predictability would deliver any stronger practical implications.

The results in the current paper suggest that pooling the data, and thus imposing a common slope coefficient, is in fact quite often empirically justified in the sense that the null hypothesis of a common slope coefficient can often not be rejected in formal statistical tests and the forecasts based on the pooled estimates often tend to outperform those based on the individual time-series estimates in out-of-sample exercises. Thus, even though economic theory does not generally predict that the $ \beta_{i}s$ are all identical, it cannot be a priori rejected that they are similar enough for there to be benefits from pooling the data, which ties back to the discussion on the practical motivations in the section above.

In general, it is quite reasonable to conjecture that countries that share many common characteristics are more likely to have similar predictability patterns than those that do not. One of the most natural splits along these lines in international data is to distinguish between developed and emerging markets. Previous literature, such as Harvey (1995), also shows that emerging markets tend to have different return characteristics than developed markets, and different patterns of predictability. To the extent that stock markets in different countries are more likely to have similar predictability if they are priced globally in integrated financial markets, rather than locally in segregated markets, the group of developed markets is also likely to better satisfy this requirement. The empirical analysis separately analyzes developed and emerging market panels, and includes a test of slope homogeneity that shows that these two groups of countries appear more homogenous than all countries combined.

II.C  Pooled Estimation

To understand the basic properties of pooled estimators of a common slope coefficient $ \beta$ in equation (1), it is instructive to start with analyzing the case when there are no common factors in the data. That is, let $ \gamma_{i}\equiv0$ and $ \Gamma_{i}\equiv0$, for all $ i$. This assumption will be maintained throughout the remainder of Section II and the effects of common factors are analyzed in Section III. Unless otherwise noted, it is assumed that the slope coefficients $ \beta_{i}$ are identical and equal to $ \beta$ for all $ i$.

II.C.1  The Standard Pooled Estimator without Fixed Effects

To estimate the parameter $ \beta$, consider first the traditional pooled estimator when there are no individual effects, i.e. when $ \alpha_{i} \equiv\alpha$ for all $ i$. Although the previous discussion strongly suggested the use of individual intercepts in the international analysis performed in the current paper, there may be other cases when a common intercept can be justified. In addition, a comparison of the pooled estimator with and without fixed effects highlights some important differences and helps form an understanding of the effects of pooling the data. The pooled estimator with a common intercept is given by

$\displaystyle \hat{\beta}_{Pool}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\tilde{x}_{i,t-1} \tilde{x}_{i,t-1}^{\prime}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1} ^{T}\tilde{y}_{i,t}\tilde{x}_{i,t-1}\right) ,$ (5)

where

$\displaystyle \tilde{y}_{i,t}=y_{i,t}-\frac{1}{nT}\sum_{i=1}^{n}\sum_{t=1}^{T}y_{i,t},$ and $\displaystyle \tilde{x}_{i,t}=x_{i,t}-\frac{1}{nT}\sum_{i=1}^{n}\sum_{t=1}^{T}x_{i,t}.$ (6)

Following the work of Phillips and Moon (1999), asymptotic results for the panel estimators are derived using sequential limits, which implies first keeping the cross-sectional dimension, $ n$, fixed and letting the time-series dimension, $ T$, go to infinity, and then letting $ n$ go to infinity. Such sequential convergence is denoted $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$.6 As mentioned before, the definitions of the variables that appear in the theorems and derivations below are all found in Appendix A, unless otherwise noted.



Theorem 1:  With $ \gamma_{i}\equiv0$, $ \Gamma_{i}\equiv0$, $ \alpha _{i}\equiv\alpha$, and $ \beta_{i}\equiv\beta$ for all $ i$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}},$

$\displaystyle \sqrt{n}T\left( \hat{\beta}_{Pool}-\beta\right) \Rightarrow N\left( 0,\Omega_{xx}^{-1}\Phi_{ux}\Omega_{xx}^{-1}\right) .$ (7)


The pooled estimator of $ \beta$ is thus asymptotically normally distributed; summing up over the cross-section eliminates the usual near unit-root asymptotic distributions found in the time-series case. The rate of convergence is also faster in the pooled case $ \left( \sqrt{n}T\right) $ compared to the time-series case $ \left( T\right) $, which again is a result of the additional cross-sectional information. The limiting distribution depends on $ \Omega_{xx}$ and $ \Phi_{ux}$ and, in order to perform inference, estimates of these parameters are required. Let $ \hat{u}_{i,t}=\tilde{y} _{i,t}-\hat{\beta}_{Pool}\tilde{x}_{i,t-1}$ , $ \hat{\Phi}_{ux}=\frac{1}{n} \sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\sum_{s=1}^{T}\left( \hat{u} _{i,t}\tilde{x}_{i,t-1}\right) \left( \hat{u}_{i,s}\tilde{x}_{i,s-1}\right) ^{\prime}$ , and $ \hat{\Omega}_{xx}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2} }\sum_{t=1}^{T}\tilde{x}_{i,t-1}\tilde{x}_{i,t-1}^{\prime}$ . The estimator $ \hat{\Phi}_{ux}$ is thus the panel equivalent of HAC (heteroskedasticty and auto-correlation consistent) estimators for long-run variances.

Standard tests can now be performed. For instance, the null hypothesis $ \beta_{\left( k\right) }=\beta_{\left( k\right) }^{0}$, for some $ k=1,...,m $, where $ \beta=\left( \beta_{\left( 1\right) },...,\beta _{\left( m\right) }\right) ^{\prime}$ , can be tested using a $ t-$test. Let $ \hat{\Sigma}=\hat{\Omega}_{xx}^{-1}\hat{\Phi}_{ux}\hat{\Omega}_{xx}^{-1}$ . Using the results derived above, it follows easily that under the null-hypothesis,

$\displaystyle t_{k}=\frac{\hat{\beta}_{\left( k\right) ,Pool}-\beta_{\left( k\right) }^{0}}{\sqrt{\left. a^{\prime}\hat{\Sigma}a\right/ \left( nT^{2}\right) } }\Rightarrow N\left( 0,1\right) ,$ (8)

as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$, where $ a$ is an $ m\times1$ vector with the $ k$'th component equal to one and zero elsewhere, and $ \hat{\beta}_{\left( k\right) ,Pool}$ is the $ k$'th component of $ \hat{\beta}_{Pool}$. More general linear hypotheses can be evaluated using a Wald test.

II.C.2  Fixed Effects

Let $ \underline{y}_{i,t}$ and $ \underline{x}_{i,t}$ denote the time-series demeaned data. That is, $ \underline{x}_{i,t}=x_{i,t}-\frac{1}{T}\sum_{t=1} ^{T}x_{i,t-1}$ and $ \underline{y}_{i,t}=y_{i,t}-\frac{1}{T}\sum_{t=1} ^{T}y_{i,t}$. The fixed effects pooled estimator, which allows for individual intercepts, is then given by

$\displaystyle \hat{\beta}_{FE}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t-1}\underline{x}_{i,t-1}^{\prime}\right) ^{-1}\left( \sum_{i=1}^{n} \sum_{t=1}^{T}\underline{y}_{i,t}\underline{x}_{i,t-1}\right) ,$ (9)

and

$\displaystyle \hat{\beta}_{FE}-\beta=\left( \frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}} \sum_{t=1}^{T}\underline{x}_{i,t-1}\underline{x}_{i,t-1}^{\prime}\right) ^{-1}\left( \frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T} \underline{u}_{i,t}\underline{x}_{i,t-1}\right) .$ (10)

Clearly, the estimator is still consistent. Its asymptotic distribution, however, will be affected by the demeaning. For fixed $ n$, as $ T\rightarrow \infty$,

$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \Rightarrow\left( \frac{1}{n} \sum_{i=1}^{n}\int_{0}^{1}\underline{J}_{i}\underline{J}_{i}^{\prime}\right) ^{-1}\left( \frac{1}{n}\sum_{i=1}^{n}\int_{0}^{1}dB_{1,i}\underline{J} _{i}\right) ,$ (11)

where $ \underline{J}_{i}$ and $ dB_{1,i}$ are the limiting processes of $ \underline{x}_{i,t}$ and $ u_{i,t}$, respectively, as defined in Appendix A; the limiting process for $ v_{i,t}$ is denoted $ dB_{2,i}$. Let $ \omega_{21}=\lim_{n\rightarrow\infty}n^{-1}\sum\omega_{21i}$ denote the average covariance vector between $ u_{i,t}$ and $ v_{i,t}$, and observe that

$\displaystyle E\left[ \int_{0}^{1}dB_{1,i}\underline{J}_{i}\right]$ $\displaystyle =E\left[ \int _{0}^{1}dB_{1,i}\left( r\right) J_{i}\left( r\right) -\int_{0}^{1} dB_{1,i}\left( s\right) \int_{0}^{1}J_{i}\left( r\right) dr\right]$    
  $\displaystyle =-\int_{0}^{1}\int_{0}^{1}\int_{0}^{r}E\left[ e^{\left( r-s\right) C_{i} }\right] E\left[ dB_{1,i}\left( s\right) dB_{2,i}\left( q\right) \right] dsdr$    
  $\displaystyle =-\left( \int_{0}^{1}\int_{0}^{r}E\left[ e^{\left( r-s\right) C_{i} }\right] dsdr\right) \omega_{21},$ (12)

which is different from zero whenever $ \omega_{21}\neq0$. Thus, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$,

$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \rightarrow_{p}-\underline{\Omega }_{xx}^{-1}\left( \int_{0}^{1}\int_{0}^{r}E\left[ e^{\left( r-s\right) C_{i}}\right] dsdr\right) \omega_{21},$ (13)

and the estimator suffers from a second order bias from the demeaning process.7

The differences in sample properties between the standard pooled estimator with a common intercept and the fixed effects estimator are rather striking. Mechanically, the standard pooled estimator works well because for each $ i$, the terms in the numerator of the estimator have mean zero and are independently distributed across $ i$. As they are summed up over $ n$, the central limit theorem applies and an asymptotically normally distributed estimator is obtained. More intuitively, when pooling the data, independent cross-sectional information dilutes the endogeneity effects that cause the Stambaugh (1999) bias in the time-series case. The same result does not hold for the fixed effects estimator because the numerator terms no longer have a zero mean as a consequence of the time-series demeaning of the data, which leads to a correlation between the innovation processes $ u_{i,t}$ and the demeaned regressors $ \underline{x}_{i,t-1}$ whenever the regressor is endogenous. Thus, unlike in the case with a common intercept, the pooling does not remove the endogeneity effects and the estimator suffers from a second order bias.

More generally, from the perspective of panel data econometrics, the natural way of understanding the detrimental impact of fixed effects is to view them as an instance of the incidental parameter problem, which was originally raised by Neyman and Scott (1948) and discussed in a panel data context by Nickell (1981). That is, as the panel grows larger asymptotically, the number of (incidental) fixed effects that need to be estimated also goes to infinity, as the cross-sectional dimension grows. Thus, although more and more data becomes available asymptotically, the number of parameters to estimate also increases. In the traditional (dynamic) panel setup studied by Nickell (1981), where $ T$ is fixed as $ n\rightarrow\infty$, inclusion of fixed effects causes the standard estimator of the slope coefficient to become inconsistent. Here, where both $ n$ and $ T$ tend to infinity, the fixed effects estimator remains consistent but with a second order bias.

II.D  Recursive Demeaning

The second order bias in the fixed effects estimator arises because the demeaning process induces a correlation between the innovation processes $ u_{i,t}$ and the demeaned regressors $ \underline{x}_{i,t-1}$. Intuitively, $ u_{i,t}$ and $ \underline{x}_{i,t-1}$ are correlated because, in the demeaning of $ x_{i,t-1}$, information available after time $ t-1$ is used. Or, equivalently, because in the demeaning of the dependent variable, $ y_{i,t}$, information before time $ t$ is used. One solution is therefore to use recursive demeaning of $ x_{i,t}$ and $ y_{i,t}$ (e.g. Moon and Phillips, (2000), and Sul et al. (2005)). In particular, I will consider a `forward demeaned' equation. That is, define

$\displaystyle \underline{y}_{i,t}^{dd}=y_{i,t}-\frac{1}{T-t+1}\sum_{s=t}^{T}y_{i,s},$ and  $\displaystyle \underline{x}_{i,t}^{dd}=x_{i,t}-\frac{1}{T-t+1}\sum_{s=t} ^{T}x_{i,s}.$ (14)

Observe that

$\displaystyle \underline{y}_{i,t}^{dd}=y_{i,t}-\frac{1}{T-t+1}\sum_{s=t}^{T}y_{i,s} =\beta^{\prime}\left( x_{i,t-1}-\frac{1}{T-t+1}\sum_{s=t}^{T}x_{i,s-1} \right) +u_{i,t}-\frac{1}{T-t+1}\sum_{s=t}^{T}u_{i,s}=\beta^{\prime }\underline{x}_{i,t-1}^{dd}+\underline{u}_{i,t}^{dd}, $

and consider the following pooled estimator, using the recursively demeaned data,

$\displaystyle \hat{\beta}_{RD}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t-1}^{dd}x_{i,t-1}^{\prime}\right) ^{-1}\left( \sum_{i=1}^{n}\sum _{t=1}^{T}\underline{y}_{i,t}^{dd}x_{i,t-1}\right) .$ (15)

In $ \hat{\beta}_{RD}$, the non-demeaned regressors $ x_{i,t-1}$ are used as instruments, and the dependent variable, $ \underline{y}_{i,t}^{dd}$, is formed using data dated only after time $ t$. Since $ \underline{u}_{i,t}^{dd}$ and $ x_{i,t-1}$ are now independent of each other, unlike $ \underline{u}_{i,t}$ and $ \underline{x}_{i,t-1}$, the estimator $ \hat{\beta}_{RD}$ will not suffer from the same second order bias as the standard fixed effects estimator. This is stated formally in the following theorem.



Theorem 2:  With $ \gamma_{i}\equiv0$, $ \Gamma_{i}\equiv0$, and $ \beta _{i}\equiv\beta$ for all $ i$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}},$
$\displaystyle \sqrt{n}T\left( \hat{\beta}_{RD}-\beta\right) \Rightarrow N\left( 0,\left( \underline{\Omega}_{xx}^{RD}\right) ^{-1}\underline{\Phi}_{ux}^{RD}\left( \underline{\Omega}_{xx}^{RD}\right) ^{-1}\right) ,$ (16)
where $ \underline{\Phi}_{ux}^{RD}$ and $ \underline{\Omega}_{xx}^{RD}$ are defined in the proof of the theorem.


To perform inference, let $ \hat{u}_{i,t}^{dd}=\underline{y}_{i,t}^{dd} -\hat{\beta}_{RD}\underline{x}_{i,t-1}^{dd}$ , $ \underline{\hat{\Phi}} _{ux}^{RD}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\sum _{s=1}^{T}\left( \hat{u}_{i,t}^{dd}x_{i,t-1}\right) \left( \hat{u} _{i,s}^{dd}x_{i,s-1}\right) ^{\prime}$ , and $ \underline{\hat{\Omega}} _{xx}^{RD}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\underline {x}_{i,t-1}^{dd}x_{i,t-1}^{\prime}$ . The $ t-$test and Wald test based on $ \underline{\hat{\Phi}}_{ux}^{RD}$and $ \underline{\hat{\Omega}}_{xx}^{RD}$ will satisfy the usual properties. Observe that the forward demeaning of the data introduces a moving average component in the returns process, which is reflected in the limiting distribution derived in the proof of Theorem 2. The variance-covariance matrix estimator that was just proposed automatically accounts for this by calculating the long-run variance using the forward demeaned residuals and the panel equivalent of a HAC estimator.

The recursive demeaning procedure gives up some efficiency by relying on a somewhat inefficient method for demeaning the data. However, there are no clear-cut alternatives in the general case when the autoregressive roots $ C_{i}$ (or equivalently, $ A_{i}$) are unknown. If the $ C_{i}s$ were known, the bias term in equation (13) could be directly estimated and a bias-corrected fixed effects estimator could be constructed. More ambitiously, for known $ C_{i}s$, a panel version of fully modified estimation could be considered, as suggested by Phillips and Moon (1999) in the pure unit-root case. However, although such procedures are likely more efficient than the recursive demeaning proposed here, they are not feasible in practice since the $ C_{i}s$ are unknown.8

II.E  Relaxing the Pooling Assumption

II.E.1  Properties of the Pooled Estimators when the $ \beta_{i}s$ Are Not Identical

So far, the focus has been on the problems raised by fixed effects. However, it is also possible that the slope coefficients $ \beta_{i}$ may vary across $ i$. In this section, I therefore discuss the properties of the pooled estimator when the $ \beta_{i}s$ are not identical.9 To start with, suppose $ \beta_{i} =\beta+\theta_{i}$, where $ \left\{ \theta_{i}\right\} _{i=1}^{n}$ is $ iid$ with mean zero.



Theorem 3:   Let $ \gamma_{i}\equiv0$, $ \Gamma_{i}\equiv0$, and $ \beta_{i}=\beta+\theta_{i}$ for all $ i$.
      (a) If $ \theta_{i}$ is orthogonal to $ x_{i,t}$ for all $ i$ and $ t$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}},$
$\displaystyle \sqrt{n}\left( \hat{\beta}_{FE}-\beta\right) \Rightarrow N\left( 0,\underline{\Omega}_{xx}^{-1}\underline{\Phi}_{xx}^{\theta}\underline{\Omega }_{xx}^{-1}\right) ,$ (17)
where $ \underline{\Phi}_{xx}^{\theta}$ is defined in the proof of the theorem.
      (b) If $ \theta_{i}$ is not orthogonal to $ x_{i,t}$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}},$
$\displaystyle \hat{\beta}_{FE}\rightarrow_{p}\beta+\underline{\Omega}_{xx}^{-1}E\left[ \left( \int_{0}^{1}\underline{J}_{i}\underline{J}_{i}^{\prime}\right) \theta_{i}\right] .$ (18)
Analogous results also hold in the case without fixed effects.

In the case where the distribution of the slope coefficients is independent of the regressors, it follows that the pooled estimator converges to the average parameter $ \beta\equiv E\left[ \beta_{i}\right] $. The rate of convergence is much slower than in the homogenous case, however, and the fixed effects estimator no longer suffers from a small sample bias. These results stem from the fact that when the $ \beta_{i}s$ are non-identical, the residuals in the regression are now given by $ \theta_{i}^{\prime}x_{i,t-1}+u_{i,t}$. Since $ x_{i,t-1}$ is a near integrated process, it will dominate the asymptotic properties of the residuals, and will therefore slow down the rate of convergence and also render the second order bias term in the fixed effects estimator irrelevant. However, when the deviations $ \theta_{i}$ are small, the second order bias term is still a concern. Results from Monte Carlo simulations, which are not presented here, show that for most potentially relevant values of $ \beta$ and $ \theta_{i}$ in a stock return predictability context, the bias in the fixed effects estimator is still highly relevant. Likewise, when the deviations $ \theta_{i}$ are small, the slow-down in the rate of convergence will not be as drastic as in Theorem 3.10

If the $ \theta_{i}s$ are correlated with the regressors, the pooled estimator does not converge to the average slope coefficient $ \beta$. However, as discussed at length in Phillips and Moon (1999, 2000), the average of the individual parameters $ \beta_{i}$ is not necessarily the natural way of defining an average relationship between $ y_{i,t}$ and $ x_{i,t-1}$. Phillips and Moon note that in a framework with persistent variables, one can define the individual regression coefficients $ \beta_{i} $ as $ \beta_{i} =\Omega_{xx,i}^{-1}\Omega_{yx,i}$, where $ \Omega_{xx,i}$ is the long-run variance for $ x_{i,t}$ and $ \Omega_{yx,i}$ is the long-run covariance between $ y_{i,t}$ and $ x_{i,t-1}$. They then define the long-run average relationship between $ y_{i,t}$ and $ x_{i,t-1}$ as $ \beta_{LRA}\equiv E\left[ \Omega_{xx,i}\right] ^{-1}E\left[ \Omega_{yx,i}\right] $ , rather than $ E\left[ \Omega_{xx,i}^{-1}\Omega_{yx,i}\right] =\beta$, and show that the pooled estimator, with or without fixed effects, will converge to $ \beta _{LRA}$ under very general conditions; in the special case when $ \beta _{i}=\beta+\theta_{i}$ and $ \theta_{i}$ is independent of $ x_{i}$, it follows that $ \beta_{LRA}=\beta$. Thus, $ \hat{\beta}_{FE}$ and $ \hat{\beta}_{Pool}$ converge to a well defined average relationship under very general circumstances, although not necessarily to $ \beta=\lim_{n\rightarrow\infty }\frac{1}{n}\sum_{i=1}^{n}\beta_{i}$ .

II.E.2  A Test of Slope Homogeneity

The analysis above shows that the pooled estimators are robust to deviations from the assumption of homogenous slope coefficients, and will converge to a well-defined average coefficient when the $ \beta_{i}s$ are non-identical. In many cases, it is still of interest, however, to evaluate whether the slope coefficients are in fact all equal.

I adopt a version of a test originally proposed by Swamy (1970) and further developed by Pesaran (2007). The basic idea is to analyze a weighted sum of squared differences between the unrestricted time-series estimates of the individual $ \beta_{i}s$ and the fixed effects pooled estimate, which imposes a common slope coefficient.

Define the following weighted fixed effects estimator,

$\displaystyle \hat{\beta}_{WFE}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\frac{\underline {x}_{i,t-1}\underline{x}_{i,t-1}^{\prime}}{\hat{\omega}_{11i}}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\frac{\underline{y}_{i,t} \underline{x}_{i,t-1}}{\hat{\omega}_{11i}}\right) ,$ (19)

where $ \hat{\omega}_{11i}$ is an estimate of the variance of $ u_{i,t}$ $ \left( \omega_{11i}\right) $; the standardization by $ \omega_{11i}$ leads to a natural reduction in nuisance parameters in the asymptotic distribution of the below test statistic. Further, let

$\displaystyle S_{\beta}=\sum_{i=1}^{n}\left( \hat{\beta}_{i}-\hat{\beta}_{WFE}\right) ^{\prime}\left( \sum_{t=1}^{T}\frac{\underline{x}_{i,t-1}\underline {x}_{i,t-1}^{\prime}}{\hat{\omega}_{11i}}\right) \left( \hat{\beta}_{i} -\hat{\beta}_{WFE}\right) ,$ (20)

where $ \hat{\beta}_{i}$ is the OLS estimate of the slope coefficient for country $ i$.



Theorem 4:   With $ \gamma_{i}\equiv0$, $ \Gamma_{i}\equiv0$, and under $ H_{0}:\beta_{i}=\beta$ for all $ i$, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}}$,
$\displaystyle \Delta_{\beta}=\sqrt{n}\left( \frac{\frac{1}{n}S_{\beta}-\mu_{Z}}{\sigma_{Z} }\right) \Rightarrow N\left( 0,1\right) .$ (21)
where $ \mu_{Z}$ and $ \sigma_{Z}$ are defined in the proof of the theorem.


Given $ \mu_{Z}$ and $ \sigma_{Z}$, $ \Delta_{\beta}$ provides an asymptotically normally distributed test of slope homogeneity. Unfortunately, $ \mu_{Z}$ and $ \sigma_{Z}$ are functions of the unknown nuisance parameters $ \left\{ C_{i}\right\} _{i=1}^{n}$; they are also functions of the average correlation $ \left( \delta\right) $ between the innovations $ u_{i,t}$ and $ v_{i,t}$, but this value can easily be estimated.

Through simulations, it easy to show that $ \mu_{Z}$ changes fairly slowly with the values of the $ C_{i}s$, whereas $ \sigma_{Z}$ can vary substantially from small changes in the $ C_{i}s$. In order to obtain a feasible test with approximately correct size, I therefore propose to use $ \mu_{Z}$, evaluated for a common value of $ C_{i}=\tilde{C}$ for all $ i$, where $ \tilde{C}$ is given by the average of the median unbiased estimates of each $ C_{i}$. As originally shown by Stock (1991), median unbiased, although inconsistent, estimates of each $ C_{i}$ can be obtained by inverting a unit-root test statistic. Further, $ \sigma_{Z}$ is replaced by an empirical estimate that is consistent under the null hypothesis of $ \beta_{i}=\beta$ for all $ i$. Write $ S_{\beta}\equiv\sum_{i=1}^{n}Z_{i,n,T}$ where $ Z_{i,n,T}$ represents the expression in (20). From the proof of Theorem 4, an estimate of $ \sigma_{Z}$ is obtained by calculating the sample standard deviation of $ Z_{i,n,T}$. Under the alternative, when the $ \beta_{i}s$ are not all identical, this estimate will be upward biased for $ \sigma_{Z}$, and some power will therefore be lost. However, given a lack of knowledge of the $ C_{i}s$, and the strong dependence of $ \sigma_{Z}$ on the values of the $ C_{i}s$, this seems like a preferable approach.

In terms of practical implementation, the median unbiased estimates for $ C_{i}$ are obtained by inverting the DF-GLS unit-root test statistic, as described in detail in Campbell and Yogo (2006). In the case when $ x_{i,t}$ is a vector, the same procedure can be applied to each of the component processes of $ x_{i,t}$ and, with the extra restriction that $ C_{i}$ is a diagonal matrix, one can proceed exactly as in the scalar case. The variances $ \left( \omega_{11i}\right) $ of $ u_{i,t}$ and the average correlation $ \left( \delta\right) $ between $ u_{i,t}$ and $ v_{i,t}$ are estimated from the residuals of the time-series regressions of equations (1) and (2). The values for $ \mu_{Z}$ are obtained by direct simulation of the asymptotic expression given in the proof of Theorem 4; these values are available from the author upon request. The null hypothesis is rejected for large positive values of $ \Delta_{\beta}$; e.g. a five percent test would reject for values larger than 1.65.


III  Cross-Sectional Dependence

III.A  The Effects of Common Factors

I now return to the general setup with common factors in the data. The following theorem summarizes the asymptotic properties of both the standard pooled estimator and the fixed effects estimator when there are common factors. Again, the $ \beta_{i}s$ are assumed to be identical unless otherwise noted.



Theorem 5:   (a) With $ \alpha_{i}\equiv\alpha$ and $ \beta_{i}\equiv\beta$ for all $ i$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}},$
$\displaystyle T\left( \hat{\beta}_{Pool}-\beta\right) \Rightarrow\left( \Omega _{xx}+\Omega_{zz}\right) ^{-1}\left[ \int_{0}^{1}\left( \gamma^{\prime }dB_{f}\right) \left( \Gamma^{\prime}J_{g}\right) \right] .$ (22)
(b) With $ \beta_{i}\equiv\beta$ for all $ i$, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}},$
$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \Rightarrow\left( \underline{\Omega }_{xx}+\underline{\Omega}_{zz}\right) ^{-1}\left[ \int_{0}^{1}\left( \gamma^{\prime}dB_{f}\right) \left( \Gamma^{\prime}\underline{J}_{g}\right) -\left( \int_{0}^{1}\int_{0}^{r}E\left[ e^{\left( r-s\right) C_{i} }\right] dsdr\right) \omega_{21}\right] .$ (23)

Thus, in the presence of the general factor structure outlined in Section II, the standard pooled estimator exhibits a non-standard limiting distribution, although it is still consistent; standard tests can therefore not be used. Similarly, the limiting behavior of the fixed effects estimator is determined by the bias term arising from the time-series demeaning of the data, as well as an additional term that stems from the common factors in the data. Note that the term $ \int_{0}^{1}\left( \gamma^{\prime}dB_{f}\right) \left( \Gamma^{\prime}J_{g}\right) $ is random and can take on both negative and positive values. Thus, correcting for it will have an ambiguous effect on the outcome of the estimation and test results.

III.B  Robust Estimators

Based on the methods of Pesaran (2006), I propose an estimator that is more robust to cross-sectional dependence in the data. Pesaran's (2006) idea is to project the data onto the space orthogonal to the common factors, thereby removing the cross-sectional dependence from the data used in the estimation. However, since the factors are not observed in practice, an indirect approach is required. Pesaran suggests using the cross-sectional means of the dependent and independent variable as proxies for the common factors. A similar approach is adopted below, but only the cross-sectional means of the regressors are used to control for the common factors. This is done because of the different orders of integration between the error terms and the regressors. For $ \beta\neq0$, the stochastic behavior of $ y_{i,t}$ is dominated by that of $ x_{i,t-1}$, and the matrix $ T^{-1/2}\left( \bar{y}_{\cdot,t},\bar{x} _{\cdot,t-1}\right) ^{\prime}$ would be asymptotically singular.

Thus, consider the following estimator of $ \beta$,

$\displaystyle \hat{\beta}_{Pool}^{+}=\left( \sum_{i=1}^{n}\mathbf{X}_{i,-1}^{\prime }\mathbf{M}_{\mathbf{\bar{H}}}\mathbf{X}_{i,-1}\right) ^{-1}\left( \sum_{i=1}^{n}\mathbf{X}_{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}} }\mathbf{Y}_{i}\right)$ (24)

where $ \mathbf{Y}_{i}$ denotes the $ T\times1$ matrix of the observations for the dependent variable and $ \mathbf{X}_{i}$ the $ T\times m$ matrix of regressor observations. $ \mathbf{M}_{\mathbf{\bar{H}}}=\mathbf{I} -\mathbf{\bar{H}}\left( \mathbf{\bar{H}}^{\prime}\mathbf{\bar{H}}\right) ^{-1}\mathbf{\bar{H}}^{\prime}$ is a $ T\times T$ matrix and $ \mathbf{\bar{H}}$ is the $ T\times m$ matrix of observations of $ \bar{H}_{t}=\frac{1}{n} \sum_{i=1}^{n}x_{i,t-1}=\bar{x}_{\cdot,t-1}.$

The estimator $ \hat{\beta}_{Pool}^{+}$ is obtained by applying the pooled estimator to the residuals from a projection of the original data onto the cross-sectional averages of the regressors. The intuition behind this is that the cross-sectional average of $ x_{i,t}$ is close to the common stochastic trend $ z_{t}$, since the cross-sectional averages of the cross-sectionally independent data may be expected to be close to zero; the projection onto the compliment of the cross-sectional means will therefore remove the effects of the common factors in the regressors. As is shown in the proof of the following theorem, it is sufficient to remove the factors from the regressors (and not from the innovations to the regressand) in order to achieve a mixed normal distribution.



Theorem 6:   With $ \alpha_{i}\equiv\alpha$ and $ \beta_{i}\equiv\beta$ for all $ i$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$,
$\displaystyle \sqrt{n}T\left( \hat{\beta}_{Pool}^{+}-\beta\right) \Rightarrow MN\left( 0,\Omega_{xx}^{-1}\left( \Phi_{ux}+\Phi_{fx}\right) \Omega_{xx}^{-1}\right) .$ (25)
where $ MN\left( \cdot\right) $ denotes a mixed normal distribution.

The estimator $ \hat{\beta}_{Pool}^{+}$ thus achieves a $ \sqrt{n}T-$convergence rate and an asymptotic mixed normal distribution. The mixed normality in this case arises from the common factors, which leads to a mixed normal distribution rather than the normal distribution seen above in the no common factors case. That is, the limiting distribution is effectively a normal distribution with a random variance-covariance matrix that is a function of the common shocks; conditional on the realization of the common factors, the distribution is thus normal. For practical purposes, the mixed normal distribution allows for standard inference in that the $ t-$tests and Wald tests will have asymptotically standard distributions. Allowing for fixed effects in the arguments, it is easy to show the following result.



Corollary 1:   Let $ \widehat{\mathbf{X}}_{i,-1}=\mathbf{M} _{\mathbf{\bar{H}}}\mathbf{X}_{i,-1}$ and $ \widehat{\mathbf{Y}}_{i} =\mathbf{M}_{\mathbf{\bar{H}}}\mathbf{Y}_{i}$ and define
$\displaystyle \hat{\beta}_{FE}^{+}=\left( \sum_{i=1}^{n}\underline{\widehat{\mathbf{X}} }_{i,-1}^{\prime}\underline{\widehat{\mathbf{X}}}_{i,-1}\right) ^{-1}\left( \sum_{i=1}^{n}\underline{\widehat{\mathbf{X}}}_{i,-1}^{\prime}\underline {\widehat{\mathbf{Y}}}_{i}\right) ,$ (26)
and
$\displaystyle \hat{\beta}_{RD}^{+}=\left( \sum_{i=1}^{n}\widehat{\mathbf{X}}_{i,-1} ^{dd\prime}\widehat{\mathbf{X}}_{i,-1}\right) ^{-1}\left( \sum_{i=1} ^{n}\widehat{\mathbf{X}}_{i,-1}^{\prime}\widehat{\mathbf{Y}}_{i}^{dd}\right)$ (27)
where $ \underline{\widehat{\mathbf{X}}}_{i,-1}$and $ \underline{\widehat{\mathbf{Y}}}_{i}$ represent the time-series demeaned versions of $ \widehat{\mathbf{X}}_{i,-1}$ and $ \widehat{\mathbf{Y}}_{i}$, and $ \widehat{\mathbf{X}}_{i,-1}^{dd}$ and $ \widehat{\mathbf{Y}}_{i}^{dd}$ are the recursively demeaned variables. Then, with $ \beta_{i}\equiv\beta$ for all $ i$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$,
$\displaystyle T\left( \hat{\beta}_{FE}^{+}-\beta\right) \rightarrow_{p}-\underline{\Omega }_{xx}^{-1}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C_{i} }dsdr\right) \omega_{21},$ (28)
and
$\displaystyle \sqrt{n}T\left( \hat{\beta}_{RD}^{+}-\beta\right) \Rightarrow MN\left( 0,\left( \underline{\Omega}_{xx}^{RD}\right) ^{-1}\left( \underline{\Phi }_{ux}^{RD}+\underline{\Phi}_{fx}^{RD}\right) \left( \underline{\Omega} _{xx}^{RD}\right) ^{-1}\right) ,$ (29)
where $ \underline{\Phi}_{fx}^{RD}$ is defined analogously with $ \Phi_{fx}$ and $ \underline{\Phi}_{ux}^{RD}$.

$ \hat{\beta}_{Pool}^{+}$ and $ \hat{\beta}_{RD}^{+}$ thus provide pooled estimators for predictive regressions that are asymptotically mixed normally distributed in the presence of common factors and with the allowance for fixed effects in the latter. Standard $ t-$tests and Wald tests can therefore be used; by simply using the defactored data, the variance-covariance matrix can be estimated in a manner analogous to that described above for the no common factors case. The practical implementation of these estimators is thus very simple: Premultiply the data by $ \mathbf{M}_{\mathbf{\bar{H}}}$, and use the resulting variables in the original estimation procedures.

As shown in the simulations below, the $ \Delta_{\beta}-$test of slope homogeneity appears robust to the presence of factors -unlike the pooled $ t- $tests- and I do not attempt to modify it to control for common factors.


IV  Finite Sample Evidence

IV.A  No Cross-Sectional Dependence

To evaluate the small sample properties of the panel data estimators proposed in this paper, a Monte Carlo study is performed. In particular, I focus on the size and power properties of the pooled ttests. Equations (1) and (3) are simulated for the case with a single regressor. The innovations $ \left( u_{i,t},v_{i,t}\right) $ are drawn from normal distributions with mean zero, unit variance, and correlations δ = 0, -0.4, -0.7, and -0.95; there is no cross-sectional dependence. The local-to-unity parameters $ C_{i}$ are drawn from a uniform distribution with support [-20, -2]. In analyzing the power properties, the slope coefficient β varies between -0.05 and 0.05, and is identical for all i. The sample size is given by T = 100, n = 20. The intercepts $ \alpha_{i}$ are normally distributed with mean and standard deviation equal to 0.005. All results are based on 10,000 repetitions. The $ t-$test based on the fixed effects estimator using standard demeaning, $ \hat{\beta}_{FE}$, and that based on the recursively demeaned pooled estimator, $ \hat{\beta}_{RD}$, are considered. Throughout the simulation study, the normal distribution is used to determine significance; i.e. the null is rejected for absolute test values greater than 1.96.

Panel A in Table 1 shows the average rejection rates for the nominal five percent two sided t-tests under the null hypothesis of β = 0. Panels A1 and A2 in Figure 1 show the corresponding power curves of the tests for the cases of δ = 0 and δ = -0.95. Table 1 and the power curves in Figure 1 clearly show the effects of the second-order bias in the fixed effects estimator; the test based on the standard fixed effects estimator severely over rejects under the null hypothesis for δ = -0.95. The test based on the recursively demeaned estimator has rejection rates close to the nominal size under the null, while maintaining decent power properties.11

IV.B  Common Factors

In this section, I repeat the Monte Carlo experiment above for the case when there is a common factor in the innovations. In particular, equations (1)-(4) are now simulated with a single regressor and a single common factor $ f_{t}=g_{t}$, drawn from a standard normal distribution. The factor loadings, $ \gamma_{i}$ and $ \Gamma_{i}$, are also normally distributed with means of minus one and plus one, respectively, and standard deviations equal to 2-1/2 in both cases. The innovations in the returns and regressor processes are formed as $ 2^{-1/2}\left( \gamma_{i}^{\prime }f_{t}+u_{i,t}\right) $ and $ 2^{-1/2}\left( \Gamma_{i}^{\prime}f_{t} +v_{i,t}\right) $, respectively, where $ \left( u_{i,t},v_{i,t}\right) $ are drawn from standard normal distributions; the scaling by 2-1/2 is performed in order to achieve an approximate unit variance in the innovations, which enables easier comparison with the cross-sectionally independent case. As before, the correlation between $ u_{i,t}$ and $ v_{i,t}$ is set δ = 0, -0.4, -0.7 and -0.95.

The results are shown in Panels B and C of Table 1 and in Panels B1, B2, C1, and C2 in Figure 1. Panel B in Table 1 and Panels B1 and B2 in Figure 1 show the outcomes of the Monte Carlo experiments when the model generated with common factors is estimated using the estimators $ \hat{\beta}_{FE}$ and $ \hat{\beta}_{RD}$, which do not control for cross-sectional dependence. It is clear that when the common factors are ignored in the estimation process, the actual size of the corresponding $ t-$tests is very far from the nominal size of $ 5$ percent, with rejection rates above $ 30$ percent under the null.

Panel C in Table 1 and Panels C1 and C2 in Figure 1 show the same results for the estimators $ \hat{\beta}_{FE}^{+}$ and $ \hat{\beta }_{RD}^{+}$, which do control for the common factors. The $ t-$test based on the recursively demeaned data, $ \hat{\beta}_{RD}^{+}$, now possesses good size and power properties. As before, the standard fixed effects estimator exhibits a finite sample bias and extremely poor size properties.

IV.C  Test of Slope Homogeneity

The final set of simulations analyzes the finite sample properties of the $ \Delta_{\beta}-$test of slope homogeneity. The setup is identical to that used above, with the exception of the slope coefficients. Three different scenarios are considered. In the first, the null hypothesis is imposed and $ \beta_{i}=\beta=0$. In the two other cases, the slope coefficients exhibit heterogeneity and are generated as $ \beta_{i}=\beta+\beta\theta_{i}$, where $ \theta_{i}$ is a standard normal random variable, independently distributed across $ i$, and $ \beta$ is equal to $ 0.05$ and $ 0.1$, respectively. The cases with and without common factors are considered, although, as described previously, no adjustment to the test is made when there are common factors. The test is evaluated as a one-sided test with a nominal size of five percent; i.e. the null is rejected when $ \Delta_{\beta}$ is greater than 1.65.

Panel A of Table 2 shows the results without any common factors and Panel B shows the results with common factors. Under the null, the test is somewhat under sized, and marginally more so when there are common factors. Unlike the $ t-$tests analyzed above, the test of slope homogeneity is thus not particularly sensitive to common factors in the data. Under the first alternative, with $ \beta=0.05$, the power of the test is around 45 percent without common factors, and around 35 percent with common factors. Under the second alternative, with $ \beta=0.1$, the power rises to above 90 percent in both cases. The relatively low power under the first alternative reflects the fact that it is difficult to distinguish between such small absolute differences between the $ \beta_{i}s$, even though the relative differences are reasonably large; one should keep in mind that the test compares across the cross-section of the data, which in the simulations only amounts to $ n=20$ observations. Nevertheless, the test can serve as a useful diagnostic of panel homogeneity.


V  Data Description

All of the data come from the Global Financial Data database and are on a monthly frequency. Total returns, including direct returns from dividends, on market wide indices in 40 countries were obtained, as well as the corresponding dividend- and earnings-price ratios and measures of the short and long interest rates.

With the exception of Spain, the dividend-price ratio data is available over the same sample period as the total stock returns. But, the other predictor variables are typically not available during the whole sample of total stock returns. Due to the two world wars, France, Germany, Japan, and the U.K. have some years during which no observations are available. Further, Spain's total returns data start in 1940, but no dividends data is available during 1968-1983. Thus, in the time-series analysis, separate regressions are fitted for each sample period for these five countries, and in the pooled estimation separate intercepts are estimated. In Table 3, which presents the pooled results, the row listing the number of 'countries' in each panel can therefore include more than one count of some countries.

As is conventional in the literature, the dividend-price ratio is defined as the sum of dividends during the past year divided by the current price and the earnings-price ratio is defined as the current price divided by the latest 12 months of available earnings. Short interest rate measures come from Global Financial Data and use rates on 3-month T-bills when available or, otherwise, private discount rates or interbank rates. The long rate is measured by the yield on long-term government bonds. When available, a 10-year bond is used; otherwise, I use that with the closest maturity to 10 years. The term spread is defined as the log difference between the long and short rates. Excess stock returns are defined as the return on stocks, in the local currency, over the local short rate. This provides the international analogue of the typical forecasting regressions estimated for U.S. data. All regressions are run at the one-month frequency using log-transformed variables with the log excess returns over the domestic short rate as the dependent variable.

Countries are pooled into a global panel, as well as developed and emerging stock market panels, according to the MSCI classifications.12


VI  Empirical Results

In the empirical analysis, I conduct pooled regressions as well as time-series regressions for individual countries. The results from the pooled regressions and summaries of the time-series results are presented in Table 3. The time-series results for individual countries are given in Table 4. Each table contains multiple panels, which correspond to the different forecasting variables. For the pooled regressions, results from both the standard fixed effects estimator, $ \hat{\beta}_{FE}$, and the corresponding $ t-$statistic, $ t_{FE}$, as well as the estimator using recursively demeaned data, $ \hat{\beta}_{RD}$ and $ t_{RD}$, are documented. Separate results are shown for the case when common factors are controlled for and when they are not. As discussed at length above, the $ t_{FE}-$test is not robust to the endogeneity and persistence of the regressors, but provides an interesting illustration of the potential pitfalls of not addressing these issues. The short interest rate and the term spread are generally less endogenous and inference based on the fixed effects estimator for these two variables will be fairly accurate (see also the discussion on this topic in Campbell and Yogo (2006)); however, the fixed effects $ t-$test will tend to greatly over reject the null for the dividend- and earnings-price ratio. In Table 3, significant results at the one-sided five percent level based on robust test-statistics from which proper inference can be drawn, i.e. the $ t-$statistic corresponding to $ \hat{\beta}_{RD}^{+}$, are indicated with a $ ^{\ast}$.13

The results from the individual time-series regressions, shown in Table 4, are presented in a similar manner to the pooled regressions. Since normal inference based on the OLS $ t-$statistic will generally be biased, inference based on a robust 90 percent confidence interval for $ \beta_{i}$, using the methods of Campbell and Yogo (2006), are also provided. If viewed as a test, this confidence interval can be seen as a five percent one-sided test and a rejection of the null hypothesis of no predictability is indicated with a $ ^{\ast}$ next to the coefficient estimate; for brevity, the actual confidence intervals are not shown. In Table 3, the number of individual time-series regressions that yield significant coefficients according to the Campbell and Yogo test is indicated in the column labeled CY $ _{\text{sig}}$.

VI.A  Testing for Slope Homogeneity

Before considering the empirical results for the different predictor variables, it is useful to briefly analyze the homogeneity of the slope coefficients in the pooled predictive regressions. Table 3 shows the outcome of the $ \Delta_{\beta}-$test of slope homogeneity; it is a one-sided test, and a value greater than 1.65 indicates that the null hypothesis of $ \beta_{i}=\beta$ for all $ i$ is rejected at the five percent level. As is seen, slope homogeneity can always be rejected for the global panel. For the developed panel, homogeneity is only rejected in the dividend-price ratio regression using the full sample, which spans a much larger range of time than the other panels since the dividend-price ratio is available further back than any of the other predictor variables; when data before 1950 is dropped, slope homogeneity can no longer be rejected. For the emerging market panels, the null of slope homogeneity is only rejected in the regression with the short interest rate. These results thus support the notion that countries within the groups of developed and emerging markets tend to be more homogenous in terms of predictability than countries across these groups.

As shown previously, the pooled analysis is valid also when the slope coefficients are not homogenous. However, the estimates from the global panels, and in a couple of instances from the developed and emerging panels, are best interpreted as average relationships, and the corresponding tests as tests of whether there is predictability on average in the data.

VI.B  The Earnings-Price Ratio

The results for the earnings-price ratio are presented in Panel A of Tables 3 and 4. There is minimal evidence of a positive predictive relationship. Specifically, pooling the data at either the global or developed market levels does not yield a significant coefficient, regardless of whether one controls for common factors; however, there is evidence of a predictive relationship when pooling at the emerging market level and controlling for common factors. To ensure that the developed market results are not driven by the longer earnings-price ratio time-series available for the U.K. and the U.S., I also estimate these pooled regressions when restricting the sample to observations after 1950. The individual country time-series results confirm the lack of evidence of a predictive relationship in the pooled regressions. In particular, in the post-1950 sample, only four of the 38 time-series regressions (Argentina, Jordan, South Africa, and the U.K.) yield any significant coefficients.

There is thus rather weak evidence that the earnings-price ratio predicts stock returns; the majority of evidence that does exist is for emerging economies. It is noteworthy that the null of no predictability would have been rejected in all of the pooled regressions if one relied on non-robust methods that fail to control for the endogeneity and persistence of the regressors, as well as common factors in the data. Controlling for common factors appears to be of potentially great importance. It is interesting to note that doing so does not necessarily weaken the results.

VI.C  The Dividend-Price Ratio

Panel B in Table 3 shows the results from pooled regressions with the dividend-price ratio as the regressor. The results are generally somewhat stronger than for the earnings-price ratio. Specifically, when controlling for common factors, the coefficient is significant when pooling both at the post-1950 global level and the developed market level, as well as in the full sample emerging panel. The overall picture depicted by the individual time-series regressions shown in Panel B of Table 4, however, is still fairly weak, although evidence of predictability is observed for post-1950 Australia, Chile, post-WWII Japan, Jordan, Mexico, Taiwan, the U.K., and post-1950 U.S. The time-series evidence thus presents no clear pattern of predictability and the evidence that exists is distributed fairly equally between developed and emerging markets. As in the case of the earnings-price ratio, the null of no predictability would have been often rejected when using non-robust tests.

VI.D  The Short Interest Rate

In light of the empirical evidence of a predictive relationship seen in U.S. data, one would expect there to be a negative relationship between the current short rate and future stock returns. The data used in all interest rate regressions are restricted to start in 1952 or after, following the convention used in studies with U.S. data.14 The pooled results for the short interest rate are presented in Panel C of Table 3.

The null of no predictability is strongly rejected in the pooled sample of developed markets. In contrast to this strongly significant negative relationship, the pooled relationship in the emerging markets is not significant. Given the rather capricious character of interest rates in many emerging economies (e.g. Argentina), I focus strictly on the developed market results. As seen in Panel C of Table 4, this finding of predictability is supported by the results of the individual time-series regressions for the developed markets. In particular, a significant predictive relationship is found in eight out of 23 developed markets, including: Canada, Germany, the Netherlands, New Zealand, Portugal, Spain, Switzerland, and the U.S.. In addition, a closer look at the individual country level results further strengthens this pattern; in particular, it reveals that the estimates for 15 of the developed markets are more than one standard deviation away from zero while the slope coefficient estimate is negative for 20 of the 23 countries.

VI.E  The Term Spread

Based on the U.S. experience, one would expect there to be a positive predictive relationship, if any, between the term spread and stock returns. As in the case of the short interest rate, I find a positive significant predictive relationship only in developed markets. As shown in Panel D of Table 3, there is strong evidence of a predictive relationship when pooling the developed economies. As this relationship is not evident for the emerging markets, I once again focus on the results for the developed markets. As seen in Panel D of Table 4, this finding of predictability is supported strongly by the results of the individual time-series regressions for the developed markets. For 10 of 23 individual time-series regressions, there is a positive and significant predictive relationship: Canada, France, Germany, Italy, the Netherlands, New Zealand, Norway, Spain, Switzerland, and the U.S.. Furthermore, 14 countries have a coefficient that is more than one standard deviation from zero.

VI.F  Stability over Time

How robust are these patterns of predictability to different sample periods? To analyze this, I consider pooled regressions with expanding windows of observations for the developed markets. I focus on the developed panel since the time-series in that panel typically have longer samples available. A new country is added to the expanding window regression when there are five years of observations available; no 'old' observations are ever dropped from the estimation window and the estimates at each point in time are thus based on all observations available up to that date.15 Confidence intervals, with a nominal 90 percent coverage rate, are calculated in a manner analogous to the test statistics shown in Table 3, based on the normal distribution. The left column of Figure 2 shows the results from using the standard fixed effects estimator, without controlling for common factors. The right column shows the results from the estimator using recursively demeaned data and controlling for common factors. The confidence intervals in the left column are thus typically biased and will generally not have an actual coverage rate of 90 percent; however, these results further illustrate the importance of controlling for endogeneity and common factors.

The results presented in Figure 2 mostly reflect those discussed above based on the complete sample. However, the results for the dividend-price ratio $ \left( d-p\right) $, which generally appeared somewhat stronger than those for the earnings-price ratio $ \left( e-p\right) $, now appear very weak when viewed over time. Overall, the support of any stable predictive ability in either of these two variables is very weak. The term spread $ \left( y-r_{s}\right) $ coefficient fluctuates around zero until the late 1970s, after which the lower bound of the confidence interval typically hovers above zero, a