The Federal Reserve Board eagle logo links to home page

Predictive Regressions with Panel Data

Erik Hjalmarsson1
Federal Reserve Board

NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at http://www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at http://www.ssrn.com/.


Abstract:

This paper analyzes panel data inference in predictive regressions with endogenous and nearly persistent regressors. The standard fixed effects estimator is shown to suffer from a second order bias; analytical results, as well as Monte Carlo evidence, show that the bias and resulting size distortions can be severe. New estimators, based on recursive demeaning as well as direct bias correction, are proposed and methods for dealing with cross sectional dependence in the form of common factors are also developed. Overall, the results show that the econometric issues associated with predictive regressions when using time-series data to a large extent also carry over to the panel case. However, practical solutions are more readily available when using panel data. The results are illustrated with an application to predictability in international stock indices.

Keywords: Cross-sectional dependence; Panel data; Pooled regression; Predictive regression; Stock return predictability.

JEL classification: C22, C23, G1.


1 Introduction

Predictive regressions are important tools for evaluating and testing economic models. Although tests of stock return predictability, and the related market efficiency hypothesis, are probably the most common application, many rational expectations models can be tested in a similar manner (Mankiw and Shapiro, 1986). Traditionally, forecasting regressions have been evaluated in time-series frameworks. However, with the increased availability of data, in particular international financial and macroeconomic data, it becomes natural to extend the single time-series framework to a panel data setting.

It has gradually been discovered that the apparently simple linear regression model most often used for evaluating predictability in fact raises some very tough econometric issues. The high degree of persistence found in many predictor variables, such as the earnings- or dividend-price ratios in the prototypical stock return forecasting regression, is at the root of most econometric problems associated with predictive regressions. The near persistence of the regressors, coupled with a strong contemporaneous correlation between the innovations in the regressor and the regressand, causes standard OLS estimates to be inefficient and normal $ t-$tests to have the wrong size. If the regressor is a unit-root process, the predictive regression becomes a cointegrating relationship and well established methods for dealing with endogenous regressors can be used. However, if the regressor is not a pure unit-root process, but rather a so called near-unit-root process, standard cointegration methods can yield misleading results (c.f. Cavanagh et al., 1995 and Elliot, 1998).2

In this paper, I analyze econometric inference in predictive regressions in a panel data setting, when the regressors are nearly persistent and endogenous. The main contributions are the derivations of the asymptotic properties of pooled estimators in forecasting equations and the proposal of new procedures to deal with the bias effects arising from the persistence and endogeneity of the regressors. New results for controlling for the effects of common factors in panels are also derived. The methods developed in the paper are used to test for stock-return predictability in a panel of international stock returns.

By pooling the data, the econometric issues encountered in the time-series case can, to some extent, be dealt with more easily. Intuitively, persistent regressors cause no problems when they are exogenous. When pooling the data, independent cross-sectional information dilutes the endogeneity effects, and thus potentially alleviates the bias effects seen in the time-series case. This intuition holds when no individual intercepts, or fixed effects, are allowed in the specification. In this case, the standard pooled estimator has an asymptotically normal distribution; the summing up over the cross-section in the pooled estimator eliminates the usual near unit-root asymptotic distributions found in the time-series case. It follows immediately that test statistics have standard distributions and normal inference can be performed.

However, when fixed effects are allowed for, the asymptotic properties of the pooled estimator change. The time-series demeaning of the data, which is implicit in a fixed effects estimation, causes the fixed effects estimator to suffer from a second order bias that invalidates inference from standard test-statistics. To correct for this bias, I develop an estimator based on the idea of recursive demeaning (e.g. Moon and Phillips, 2000, and Sul et al., 2005). When demeaning each time-series in the panel, information after time $ t$ is used to form the time $ t$ regressor; this induces a correlation between the lagged value of the demeaned regressor, used in the estimation of the predictive regression, and the error term in the forecasting equation, which gives rise to the second order bias in the fixed effects estimator. By using information only up till time $ t$ in the demeaning of the regressor and only information after time $ t$ in the demeaning of the dependent variable, the distortive effects arising from standard demeaning are eliminated. The estimator based on recursively demeaned data is shown to have an asymptotically normal distribution and standard inference can again be performed.

Although the estimator based on recursive demeaning is asymptotically normally distributed, it gives up some efficiency by disregarding parts of the data in the demeaning process. An alternative approach to control for the bias in the standard fixed effects estimator is to directly estimate the bias term and subtract it from the original estimator. Monte Carlo simulations show that such a correction works very well in practice, and produces unbiased estimators as well as correctly sized tests with good power. This bias corrected fixed effects estimator thus provides a simple and relatively efficient way of dealing with the bias and size distortions induced by the near persistence and endogeneity of the forecasting variables.

The overall conclusion from the theoretical results and the supporting Monte Carlo simulations is that, in the typical panel data case, persistent and endogenous regressors will cause standard inference to be biased. In the time-series case, this result is well established and the results in this paper show that equal caution is required when working with panel data. However, unlike the time-series case, bias correction methods can be implemented in a relatively straightforward manner and normally distributed test-statistics can be achieved. Unbiased point estimates are also easily calculated, in contrast to the time-series case where the popular Bonferroni bound methods lead to correctly sized tests but not unique unbiased point estimates (Cavanagh et al., 1995, and Campbell and Yogo, 2005).

Another econometric issue that arises in the analysis of panels of financial or macroeconomic data is the potential presence of common factors. The standard panel data assumption of cross-sectional independence is often too restrictive, and I show how the framework in this paper can be extended to a setting where common factors are present in the data. These methods follow the work of Pesaran (2006) and extend his methods to a setting with nearly integrated regressors.

As an illustration of the methods derived in this paper, I consider the classical issue of stock return predictability. I use an international panel of returns from 18 different stock indices and the corresponding dividend- and earnings-price ratios, as well as the book-to-market values. The empirical results from the forecasting regressions with stock returns illustrate well the theoretical results derived in the paper. Based on the results from the standard fixed effects estimator, the evidence in favour of return predictability is very strong, using either of the three predictor variables. However, when using the robust methods developed here, the evidence disappears almost completely.

The rest of the paper is organized as follows. Section 2 describes the model, while Sections 3 and 4 derive the main asymptotic properties of the pooled estimators. The finite sample properties of the procedures developed in this paper are analyzed through Monte Carlo experiments in Section 5. Some generalizations of the econometric model are considered in Section 6, and Section 7 contains the empirical application to stock return predictability. Section 8 concludes and technical proofs are found in the appendix.

Following the work of Phillips and Moon (1999), results for the panel estimators are derived using sequential limits, which usually implies first keeping the cross-sectional dimension, $ n$, fixed and letting the time-series dimension, $ T$, go to infinity, and then letting $ n$ go to infinity. Such sequential convergence is denoted $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$. Subject to potential rate restrictions, such as $ n/T\rightarrow0$, these results can generally be shown to hold as $ n$ and $ T$ go to infinity jointly, denoted $ \left( n,T\rightarrow\infty\right) $, by showing that the sufficient conditions in Phillips and Moon (1999) are satisfied; proofs of such joint convergence are not pursued here, however. Otherwise, standard notation is used. $ BM\left( \Omega\right) $ denotes a Brownian motion with covariance matrix $ \Omega$, $ \Rightarrow$ signifies weak convergence, and $ \rightarrow_{p}$ denotes convergence in probability.

2 Model and assumptions

2.1 The data generating process

Consider a panel model with dependent variables $ y_{i,t}$, $ i=1,...,n$, $ t=1,...,T$, and the corresponding vector of regressors, $ x_{i,t}$, where $ x_{i,t}$ is an $ m\times1$ vector. The behavior of $ y_{i,t}$ and $ x_{i,t}$ are modelled as follows,

$\displaystyle y_{i,t}$ $\displaystyle =\alpha_{i}+\beta^{\prime}x_{i,t-1}+\gamma_{i}^{\prime} f_{t}+u_{i,t},$ (1)
$\displaystyle x_{i,t}$ $\displaystyle =Ax_{i,t-1}+\Gamma_{i}^{\prime}f_{t}+v_{i,t},$ (2)

where $ A=I+C/T$ is an $ m\times m$ matrix, and $ f_{t}$ is a $ k\times1$ vector capturing common factors in the error terms. The factor loadings $ \gamma_{i}$ $ \left( k\times1\right) $ and $ \Gamma_{i}$ $ \left( k\times m\right) $ are treated as random coefficients, as specified below.

This model is a panel analogue of the time-series models studied by Mankiw and Shapiro (1986), Cavanagh et al. (1995), Stambaugh (1999), Jansson and Moreira (2004), Lewellen (2004), and Campbell and Yogo (2005).

Assumption 1   (Innovation processes) Let $ w_{i,t}=\left( u_{i,t} ,v_{i,t},f_{t}\right) ^{\prime}$ and $ \mathcal{F}_{t}=\left\{ \left. w_{i,s}\right\vert s\leq t,i=1,...,n\right\} $ be the filtration generated by the innovation processes. Then, for all $ i=1,...,n$, and,

1. $ E\left[ \left. w_{it}\right\vert \mathcal{F}_{t-1}\right] =0.$

2. $ E\left[ \left. w_{i,t}w_{i,t}^{\prime}\right\vert \mathcal{F} _{t-1}\right] =\Omega_{i}=\left[ \left( \Omega_{uv,i},0\right) ,\left( 0,\Omega_{f}\right) \right] ^{\prime}$ where $ \Omega_{uv,i}=\left[ \left( \omega_{11i},\omega_{12i}\right) ,\left( \omega_{21i},\Omega_{22i}\right) \right] ^{\prime}$ and

$ \Omega=\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\Omega_{i}$ .

3. $ E\left[ u_{i,t}^{4}\right] <\infty$, $ E\left[ \left\vert \left\vert v_{i,t}\right\vert \right\vert ^{4}\right] <\infty,$ and $ E\left[ \left\vert \left\vert f_{i,t}\right\vert \right\vert ^{4}\right] <\infty.$

4. $ E\left[ \left( u_{i,t},v_{i,t}\right) \left( u_{j,s},v_{j,s}\right) ^{\prime}\right] =0$ for all $ t,s$ and $ i\neq j$.

Assumption 2   (Factor loadings) The coefficients $ \gamma_{i}$ and $ \Gamma_{i}$ are $ iid$ across $ i$ and independently distributed of the specific errors, $ u_{j,t}$ and $ v_{j,t}$, and the common factors $ f_{t}$, for all $ i,j,$ and $ t$, with fixed means $ \gamma$ and $ \Gamma$, and finite variances.
Assumption 3   (Rank condition) Rank $ \left( \Gamma\right) =k$.

Assumption1 specifies that the innovation processes follow a martingale difference sequence (mds) with finite fourth moments. The regressor can be endogenous in the sense that $ u_{i,t}$ and $ v_{i,t}$ may be contemporaneously correlated. The common factors, $ f_{t}$, are assumed independent of the specific error components, and Assumption 2 specifies that the factor loadings are distributed independently of other random variables in the model. At the expense of some extra notation, the model could allow for a more general time-series structure in the innovation process $ v_{i,t}$; the results in the paper would carry through with virtually no changes. The mds assumption for the errors in the dependent variables, $ u_{i,t}$, is standard in predictive regressions, and is often based on some orthogonality condition from an underlying rational expectations model. For instance, in financial forecasting regressions the mds assumption is motivated by the efficient markets hypothesis. The rank condition in Assumption 3 is used for identification in the estimation procedures that control for the common factors. It essentially states that all information regarding the common factors in the data can potentially be recovered from the innovation processes of the regressors. This condition turns out to be less restrictive than it seems, since it is the factors in the regressor errors that play the key role in the asymptotic properties of the pooled estimators. That is, a common factor that is only present in the dependent variable will not affect the analysis in any fundamental way. This is analyzed in more detail later in the paper.

It is also assumed that all the time-series in the panel share the same auto-regressive root, $ A=I+C/T$. This assumption is imposed due to the presence of the common factors in the regressors, which would make it difficult to allow for heterogenous persistence. The effects of relaxing this assumption are briefly discussed later in the paper. Given a common auto-regressive root, the $ x_{i,t}$ process can be expressed in a convenient component form,

$\displaystyle x_{i,t}=x_{i,t}^{0}+\Gamma_{i}^{\prime}z_{t},$  $\displaystyle x_{i,t} ^{0}=Ax_{i,t-1}^{0}+u_{i,t},$  $\displaystyle z_{t}=Az_{t-1}+f_{t}.$ (3)

Under Assumption 1, by standard arguments (Phillips and Solo, 1992), $ \frac{1}{\sqrt{T}}\sum_{t=1}^{\left[ Tr\right] }w_{i,t}\Rightarrow B_{i}\left( r\right) =BM\left( \Omega_{i}\right) \left( r\right) $ , where $ B_{i}\left( \cdot\right) =\left( B_{1i}\left( \cdot\right) ,B_{2i}\left( \cdot\right) ,B_{f}\left( \cdot\right) \right) ^{\prime}$ denote a $ 1+m+k- $dimensional Brownian motion. Further, by the results in Phillips (1987,1988), it follows that as $ T\rightarrow\infty$, $ \frac{x_{i,t} }{\sqrt{T}}=\frac{x_{i,t}^{0}}{\sqrt{T}}+\Gamma_{i}^{\prime}\frac{z_{it} }{\sqrt{T}}\Rightarrow J_{i}\left( r\right) +\Gamma_{i}^{\prime}J_{f}\left( r\right) $ , where $ J_{i}\left( r\right) =\int_{0}^{r}e^{\left( r-s\right) C}dB_{2,i}\left( s\right) $ and $ J_{f}\left( r\right) =\int_{0} ^{r}e^{\left( r-s\right) C}dB_{f}\left( s\right) $ . Analogous results hold for the time-series demeaned data, $ \underline{x}_{i,t}=x_{i,t}-\frac{1} {T}\sum_{t=1}^{T}x_{i,t}$, with $ J_{i}$ replaced by $ \underline{J}_{i} =J_{i}-\int_{0}^{1}J_{i}$; when there is no risk of confusion, the dependence of $ J_{i},J_{f}$ and $ B_{i}$ on $ r$ will be suppressed. The following lemma summarizes the key asymptotic results used in the paper.

Lemma 1   Under Assumptions 1-2, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}},$

(a) $ n^{-1/2}\sum_{i=1}^{n}T^{-1}\sum_{t=1}^{T}u_{i,t}x_{i,t-1}\Rightarrow MN\left( 0,\Phi_{ux}+\Phi_{uz}\right) $ where $ \Phi_{ux}\equiv E\left[ \left( \int_{0}^{1}dB_{1,i}J_{i}\right) \left( \int_{0}^{1}dB_{1,i} J_{i}\right) ^{\prime}\right] $ , $ \Phi_{uz}\equiv\Gamma^{\prime}E\left[ \left. \left( \int_{0}^{1}dB_{1,i}J_{f}\right) \left( \int_{0}^{1} dB_{1,i}J_{f}\right) ^{\prime}\right\vert \mathcal{C}\right] \Gamma$ , and $ \mathcal{C}$ is the $ \sigma-$field generated by $ \left\{ f_{t}\right\} _{t=1}^{\infty}$.

(b) $ n^{-1}\sum_{i=1}^{n}T^{-1}\sum_{t=1}^{T}\gamma_{i}^{\prime}f_{t} x_{i,t-1}\Rightarrow\int_{0}^{1}\left( \gamma^{\prime}dB_{f}\right) \left( \Gamma^{\prime}J_{f}\right) .$

(c) $ n^{-1}\sum_{i=1}^{n}T^{-2}\sum_{t=1}^{T}x_{i,t}x_{i,t}^{\prime }\Rightarrow\Omega_{xx}+\Omega_{zz}$ where $ \Omega_{xx}\equiv E\left[ \int_{0}^{1}J_{i}J_{i}^{\prime}\right] $ and $ \Omega_{zz}\equiv\Gamma ^{\prime}\left( \int_{0}^{1}J_{f}J_{f}^{\prime}\right) \Gamma.$

(d) $ n^{-1}\sum_{i=1}^{n}T^{-1}\sum_{t=1}^{T}u_{i,t}\underline{x} _{i,t-1}\rightarrow_{p}-\left( \int_{0}^{1}\int_{0}^{r}E\left[ e^{\left( r-s\right) C}\right] dsdr\right) \omega_{21}.$

(e) $ n^{-1}\sum_{i=1}^{n}T^{-1}\sum_{t=1}^{T}\gamma_{i}^{\prime}\underline {f}_{t}\underline{x}_{i,t-1}\Rightarrow\int_{0}^{1}\left( \gamma^{\prime }dB_{f}\right) \left( \Gamma^{\prime}\underline{J}_{f}\right) .$

(f) $ n^{-1}\sum_{i=1}^{n}T^{-2}\sum_{t=1}^{T}\underline{x}_{i,t}\underline {x}_{i,t}^{\prime}\Rightarrow\underline{\Omega}_{xx}+\underline{\Omega}_{zz} $ where $ \underline{\Omega}_{xx}\equiv E\left[ \int_{0}^{1}\underline{J} _{i}\underline{J}_{i}^{\prime}\right] $ and $ \underline{\Omega}_{zz} \equiv\Gamma^{\prime}\left( \int_{0}^{1}\underline{J}_{f}\underline{J} _{f}^{\prime}\right) \Gamma.$

3 Cross-sectional independence

3.1 The standard pooled estimator

To understand the basic properties of the pooled estimator of $ \beta$, it is instructive to start with analyzing the case when there are no common factors in the data. That is, let $ \gamma_{i}\equiv0$ and $ \Gamma_{i}\equiv0$, for all $ i$. To estimate the parameter $ \beta$ consider first the traditional pooled estimator when there are no individual effects, i.e. when $ \alpha_{i}\equiv0$ for all $ i$.3 The pooled estimator is given by

$\displaystyle \hat{\beta}_{Pool}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}x_{i,t-1} x_{i,t-1}^{\prime}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T} y_{i,t}x_{i,t-1}\right) ,$ (4)

and the following theorem gives its asymptotic properties.
Theorem 1   Under Assumptions 1 and 2, with $ \gamma_{i}\equiv0$, $ \Gamma_{i}\equiv0$, and $ \alpha_{i}\equiv0$ for all $ i$, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}},$
$\displaystyle \sqrt{n}T\left( \hat{\beta}_{Pool}-\beta\right) \Rightarrow N\left( 0,\Omega_{xx}^{-1}\Phi_{ux}\Omega_{xx}^{-1}\right) .$ (5)

The pooled estimator of $ \beta$ is thus asymptotically normally distributed and the limiting distribution depends on $ \Omega_{xx}$ and $ \Phi_{ux}$. To perform inference, estimates of these parameters are required. Let $ \hat {u}_{i,t}=y_{i,t}-\hat{\beta}_{n,T}x_{i,t-1}$, $ \hat{\Phi}_{ux}=\frac{1} {n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\sum_{s=1}^{T}\left( \hat {u}_{i,t}x_{i,t-1}\right) \left( \hat{u}_{i,s}x_{i,s-1}\right) ^{\prime}$ , and $ \hat{\Omega}_{xx}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1} ^{T}x_{i,t-1}x_{i,t-1}^{\prime}$ . The estimator $ \hat{\Phi}_{ux}$ is thus the panel equivalent of HAC estimators for long-run variances.

Standard tests can now be performed. For instance, the null hypothesis $ \beta_{k}=\beta_{k,0}$, for some $ k$, can be tested using a $ t-$test. Let $ \hat{\Sigma}=\hat{\Omega}_{xx}^{-1}\hat{\Phi}_{ux}\hat{\Omega}_{xx}^{-1}$ . Using the results derived above, it follows easily that under the null-hypothesis,

$\displaystyle t_{k}=\frac{\hat{\beta}_{k,pool}-\beta_{k,0}}{\sqrt{a^{\prime}\hat{\Sigma}a} }\Rightarrow N\left( 0,1\right) ,$ (6)

in sequential limits, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$, where $ a$ is an $ m\times1$ vector with the $ k$'th component equal to one and zero elsewhere. More general linear hypotheses can be evaluated using a Wald test.

Thus, when there are no fixed effects in the pooled regression, inference in the panel case becomes trivial since the pooled estimator is asymptotically normally distributed. This is in contrast with the time-series case where the OLS estimator has a non-normal asymptotic distribution which depends on unknown nuisance parameters.

3.2 Fixed effects

In the above analysis, the individual intercepts $ \alpha_{i}$ were all assumed to be equal to zero. This section considers the effects on the pooled estimator when the $ \alpha_{i}s$ are no longer zero and are allowed to vary across the panel.

Let $ \underline{y}_{i,t}$ and $ \underline{x}_{i,t}$ denote the time-series demeaned data. That is, $ \underline{x}_{i,t}=x_{i,t}-\frac{1}{T}\sum_{t=1} ^{T}x_{i,t-1}$ and $ \underline{y}_{i,t}=y_{i,t}-\frac{1}{T}\sum_{t=1} ^{T}y_{i,t}$. The fixed effects pooled estimator, which allows for individual intercepts, is then given by

$\displaystyle \hat{\beta}_{FE}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t-1}\underline{x}_{i,t-1}^{\prime}\right) ^{-1}\left( \sum_{i=1}^{n} \sum_{t=1}^{T}\underline{y}_{i,t}\underline{x}_{i,t-1}\right) ,$ (7)

and
$\displaystyle \hat{\beta}_{FE}-\beta=\left( \frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}} \sum_{t=1}^{T}\underline{x}_{i,t-1}\underline{x}_{i,t-1}^{\prime}\right) ^{-1}\left( \frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T} \underline{u}_{i,t}\underline{x}_{i,t-1}\right) .$ (8)

Clearly, the estimator is still consistent. Its asymptotic distribution, however, will be affected by the demeaning. For fixed $ n$, as $ T\rightarrow\infty$,
$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \Rightarrow\left( \frac{1}{n} \sum_{i=1}^{n}\int_{0}^{1}\underline{J}_{i}\underline{J}_{i}^{\prime}\right) ^{-1}\left( \frac{1}{n}\sum_{i=1}^{n}\int_{0}^{1}dB_{1,i}\underline{J} _{i}\right) .$ (9)

Let $ \omega_{21}=\lim_{n\rightarrow\infty}n^{-1}\sum\omega_{21i}$, and observe that
$\displaystyle E\left[ \int_{0}^{1}dB_{1,i}\underline{J}_{i}\right]$ $\displaystyle =E\left[ \int _{0}^{1}dB_{1,i}\left( r\right) J_{i}\left( r\right) -\int_{0}^{1} dB_{1,i}\left( s\right) \int_{0}^{1}J_{i}\left( r\right) dr\right]$ (10)
  $\displaystyle =-\int_{0}^{1}\int_{0}^{1}\int_{0}^{r}e^{\left( r-q\right) C}E\left[ dB_{1,i}\left( s\right) dB_{2,i}\left( q\right) \right] dsdr$    
  $\displaystyle =-\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C}dsdr\right) \omega_{21},$ (11)

which is different from zero whenever $ \omega_{21}\neq0$. Thus, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$,
$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \rightarrow_{p}-\underline{\Omega }_{xx}^{-1}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C}dsdr\right) \omega_{21},$ (12)

and the estimator suffers from a second order bias from the demeaning process.

3.3 Recursive demeaning

The second order bias arises because the demeaning process induces a correlation between the innovation processes $ u_{i,t}$ and the demeaned regressors $ \underline{x}_{i,t-1}$.4 Intuitively, $ u_{i,t}$ and $ \underline{x}_{i,t-1}$ are correlated because, in the demeaning of $ x_{i,t-1}$, information available after time $ t-1$ is used. At the expense of some efficiency, one solution is therefore to use recursive demeaning of $ x_{i,t}$ and $ y_{i,t}$ (e.g. Moon and Phillips, 2000, and Sul et al., 2005). That is, define

$\displaystyle \underline{x}_{i,t}^{d}=x_{i,t}-\frac{1}{t}\sum_{s=1}^{t}x_{i,s},$  $\displaystyle \underline{y}_{i,t}^{dd}=y_{i,t}-\frac{1}{T-t}\sum_{s=t}^{T} y_{i,s},$ and  $\displaystyle \underline{x}_{i,t}^{dd}=x_{i,t}-\frac{1} {T-t}\sum_{s=t}^{T}x_{i,s}.$ (13)

The process $ \underline{x}_{i,t-1}^{d}$ now only relies on information up till time $ t-1$, and $ \underline{y}_{i,t}^{dd}$ only depends on information from $ t$ to $ T$; the recursive demeaning will not induce a correlation between $ u_{i,t}$ and $ \underline{x}_{i,t-1}^{d}$. The process $ \underline{x} _{i,t}^{dd}$ is used to properly balance the estimator, as shown below. By the continuous mapping theorem, as $ T\rightarrow\infty$ for a fixed $ i$,
$\displaystyle \frac{\underline{x}_{i,t}^{d}}{\sqrt{T}}=\frac{x_{i,t}}{\sqrt{T}}-\left( \frac{t}{T}\right) ^{-1}\frac{1}{T}\sum_{s=1}^{t}\frac{x_{i,s}}{\sqrt{T} }\Rightarrow J_{i}\left( r\right) -r^{-1}\int_{0}^{r}J_{i}\left( u\right) du=\underline{J}_{i}^{d}\left( r\right) ,$ (14)

and
$\displaystyle \frac{\underline{x}_{i,t}^{dd}}{\sqrt{T}}=\frac{x_{i,t}}{\sqrt{T}}-\left( \frac{T-t}{T}\right) ^{-1}\frac{1}{T}\sum_{s=t}^{T}\frac{x_{i,s}}{\sqrt{T} }\Rightarrow J_{i}\left( r\right) -\left( 1-r\right) ^{-1}\int_{r} ^{1}J_{i}\left( u\right) du=\underline{J}_{i}^{dd}\left( r\right) .$ (15)

Consider the following pooled estimator, using the recursively demeaned data,
$\displaystyle \hat{\beta}_{RD}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t-1}^{dd}\underline{x}_{i,t-1}^{d\prime}\right) ^{-1}\left( \sum _{i=1}^{n}\sum_{t=1}^{T}\underline{y}_{i,t}^{dd}\underline{x}_{i,t-1} ^{d}\right) .$ (16)

Theorem 2   Under Assumptions 1 and 2, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}},$
$\displaystyle \sqrt{n}T\left( \hat{\beta}_{RD}-\beta\right) \Rightarrow N\left( 0,\left( \underline{\Omega}_{xx}^{RD}\right) ^{-1}\underline{\Phi}_{ux}^{RD}\left( \underline{\Omega}_{xx}^{RD}\right) ^{-1}\right) ,$ (17)

where $ \underline{\Phi}_{ux}^{RD}=E\left[ \left( \int_{0}^{1}dB_{1,i} ^{dd}\underline{J}_{i}^{d}\right) \left( \int_{0}^{1}dB_{1,i}^{dd} \underline{J}_{i}^{d}\right) ^{\prime}\right] $ , $ \underline{\Omega} _{xx}^{RD}=E\left[ \int_{0}^{1}\underline{J}_{i}^{rr}\underline{J} _{i}^{r\prime}\right] $ , and $ dB_{1,i}^{dd}\left( r\right) =dB_{1,i}\left( r\right) -\left( 1-r\right) ^{-1}\left( B_{1,i}\left( 1\right) -B_{1,i}\left( r\right) \right) $ .

To perform inference, let $ \hat{u}_{i,t}^{dd}=\underline{y}_{i,t}^{dd} -\hat{\beta}_{RD}\underline{x}_{i,t-1}^{dd}$ , $ \underline{\hat{\Phi}} _{ux}^{RD}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2}}\sum_{t=1}^{T}\sum _{s=1}^{T}\left( \hat{u}_{i,t}^{dd}\underline{x}_{i,t-1}^{d}\right) \left( \hat{u}_{i,t}^{dd}\underline{x}_{i,s-1}^{d}\right) ^{\prime}$ , and $ \underline{\hat{\Omega}}_{xx}^{RD}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{T^{2} }\sum_{t=1}^{T}\underline{x}_{i,t-1}^{dd}\underline{x}_{i,t-1}^{d\prime}$ . The $ t-$test and Wald-test based on $ \underline{\hat{\Phi}}_{ux}^{RD}$and $ \underline{\hat{\Omega}}_{xx}^{RD}$ will satisfy the usual properties and the results follow in the same manner as above.

3.4 Direct bias correction

The estimator $ \hat{\beta}_{RD}$ gives up some efficiency through the recursive demeaning process. A more efficient approach would be to directly estimate the bias term in (12) and subtract it from the standard pooled estimator. A simple biased corrected estimator is given by

$\displaystyle \hat{\beta}_{FE}^{+}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}\underline{x} _{i,t-1}\underline{x}_{i,t-1}^{\prime}\right) ^{-1}\left( \sum_{i=1}^{n} \sum_{t=1}^{T}\underline{y}_{i,t}\underline{x}_{i,t-1}-nT\left( \int_{0} ^{1}\int_{0}^{r}e^{\left( r-s\right) \hat{C}}dsdr\right) \hat{\omega} _{21}\right) .$ (18)

Provided that $ \omega_{21}$ and $ C$ are consistently estimated, it follows easily that as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$,
$\displaystyle \sqrt{n}T\left( \hat{\beta}_{FE}^{+}-\beta\right) \Rightarrow N\left( 0,\underline{\Omega}_{xx}^{-1}\underline{\Phi}_{ux}\underline{\Omega} _{xx}^{-1}\right) ,$ (19)

where $ \underline{\Phi}_{ux}$ is defined in an analogous manner to $ \underline{\Omega}_{xx}$, and estimates of $ \underline{\Phi}_{ux}$ and $ \underline{\Omega}_{xx}$ are obtained in an identical manner as before.

An estimate of $ \omega_{21}$ is easy to obtain by averaging the estimates of $ \omega_{21i}$ obtained from time-series regressions. As discussed below, it is possible to consistently estimate $ C$ using panel data, unlike in the time-series case.

3.5 Estimation of $ C$

The estimator proposed above relies on an estimate of $ C$. Moon and Phillips (2000) show how $ C$ can be consistently estimated in equation (2) when $ C$ is a scalar. The diagonal of $ C$ can, of course, be estimated by the individual univariate estimates; thus, if one restricts $ C$ to be diagonal, then an estimate of a matrix $ C$ can be obtained using the methods of Moon and Phillips (2000).

Consider the case of a scalar $ C$, and assume that $ x_{i,t}$ is generated according to equation (2). Noting that $ A=1+\frac{C}{T},$ it is natural to consider estimators of the form $ \hat{C}=T\left( \hat{A}-1\right) $. The pooled estimator of $ A$ is given by,

$\displaystyle \hat{A}=\left( \sum_{i=1}^{n}\sum_{t=1}^{T}x_{i,t-1}^{2}\right) ^{-1}\left( \sum_{i=1}^{n}\sum_{t=1}^{T}x_{i,t}x_{i,t-1}\right) , $
and the corresponding pooled estimator of $ C$ is $ \hat{C}=T\left( \hat{A}-1\right) $. Moon and Phillips (2000) show that this estimator of $ C$ is consistent in the absence of cross-sectional dependence. Observe that the data used in estimating $ C$ is not time-series demeaned; demeaning the data in the time-series dimension will lead to a bias in the estimator.

It is beyond the scope of this paper to formally consider the effects of common factors in the data on the estimator $ \hat{C}$. However, Monte Carlo simulations not reported in the paper indicate that it also remains unbiased in the presence of common factors.

4 Cross-sectional dependence

4.1 The effects of common factors

I now return to the general setup with common factors in the data. The following theorem summarizes the asymptotic properties of the standard pooled estimator, as well as the fixed effects estimator, when there are common factors.

Theorem 3 (a)   Under Assumptions 1-2, with $ \alpha_{i}\equiv0$, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}},$
$\displaystyle T\left( \hat{\beta}_{Pool}-\beta\right) \Rightarrow\left( \Omega _{xx}+\Omega_{zz}\right) ^{-1}\left[ \int_{0}^{1}\left( \gamma^{\prime }dB_{f}\right) \left( \Gamma^{\prime}J_{f}\right) \right] .$ (20)

(b) Under Assumptions 1-2, as $ \left( T,n\rightarrow \infty\right) _{\operatorname{seq}},$

$\displaystyle T\left( \hat{\beta}_{FE}-\beta\right) \Rightarrow\left( \underline{\Omega }_{xx}+\underline{\Omega}_{zz}\right) ^{-1}\left[ \int_{0}^{1}\left( \gamma^{\prime}dB_{f}\right) \left( \Gamma^{\prime}\underline{J}_{f}\right) -\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C}dsdr\right) \omega_{21}\right] .$ (21)

Thus, in the presence of the general factor structure outlined in Assumptions 1 and 2, the standard pooled estimator exhibits a non-standard limiting distribution, although it is still consistent; standard tests can therefore not be used. Similarly, the limiting behavior of the fixed effects estimator is determined by the bias term arising from the time-series demeaning of the data, as well as an additional term that stems from the common factors in the data.

4.2 Robust estimators

Based on the methods of Pesaran (2006), I propose an estimator that is more robust to cross-sectional dependence in the data. Write the model in matrix form,

$\displaystyle \underset{T\times1}{\mathbf{Y}_{i}}$ $\displaystyle =\underset{T\times m}{\mathbf{X} _{i,-1}}\underset{m\times1}{\beta}+\underset{T\times k}{\mathbf{f}} \underset{k\times1}{\gamma_{i}}+\underset{T\times1}{\mathbf{u}_{i}},$ (22)
$\displaystyle \underset{T\times m}{\mathbf{X}_{i}}$ $\displaystyle =\underset{T\times m}{\mathbf{X} _{i,-1}}\underset{m\times m}{A}+\underset{T\times k}{\mathbf{f}} \underset{k\times m}{\Gamma_{i}}+\underset{T\times m}{\mathbf{v}_{i}},$ (23)

where $ \mathbf{Y}_{i}$ denotes the $ T\times1$ matrix of the observations for the dependent variable and $ \mathbf{X}_{i}$ the $ T\times m$ matrix of regressor observations. The $ T\times k$ matrix $ \mathbf{f}$ denotes the unobserved common factors.

The idea of Pesaran (2006) is to project the data onto the space orthogonal to the common factors, thereby removing the cross sectional dependence from the data used in the estimation. However, since the factors in $ f_{t}$ are not observed in practice, an indirect approach is required.

Consider the following estimator of $ \beta$,

$\displaystyle \tilde{\beta}_{Pool}=\left( \sum_{i=1}^{n}\mathbf{X}_{i,-1}^{\prime }\mathbf{M}_{\mathbf{\bar{H}}}\mathbf{X}_{i,-1}\right) ^{-1}\left( \sum_{i=1}^{n}\mathbf{X}_{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}} }\mathbf{Y}_{i}\right)$ (24)

where $ \mathbf{M}_{\mathbf{\bar{H}}}=\mathbf{I}-\mathbf{\bar{H}}\left( \mathbf{\bar{H}}^{\prime}\mathbf{\bar{H}}\right) ^{-1}\mathbf{\bar{H} }^{\prime}$ is a $ T\times T$ matrix and $ \mathbf{\bar{H}}$ is the $ T\times2m $ matrix of observations of $ \bar{H}_{t}$, where
$\displaystyle \bar{H}_{t}=\frac{1}{n}\sum_{i=1}^{n}H_{i,t}=\frac{1}{n}\sum_{i=1}^{n}\left( \begin{array}[c]{c} \Delta_{C}x_{i,t}\\ x_{i,t-1} \end{array} \right) =\left( \begin{array}[c]{c} \Delta_{C}\bar{x}_{\cdot,t}\\ \bar{x}_{\cdot,t-1} \end{array} \right) .$ (25)

Here, $ \Delta_{C}$ denotes the quasi-differencing operator and $ \Delta _{C}x_{i,t}=x_{i,t}-\left( I+C/T\right) x_{i,t-1}=\Gamma_{i}^{\prime} f_{t}+v_{i,t}$ . Since $ \mathbf{X}_{i}=\mathbf{X}_{i}^{0}+\mathbf{Z}\Gamma_{i} $, it follows that
$\displaystyle \mathbf{\bar{H}}\mathbf{=}\left( \begin{array}[c]{cc} \mathbf{\Delta}_{C}\mathbf{\bar{X}} & \mathbf{\bar{X}}_{\cdot,-1} \end{array} \right) =\left( \begin{array}[c]{cc} \mathbf{f}\bar{\Gamma}+\mathbf{\bar{v}} & \mathbf{\bar{X}}_{\cdot,-1} ^{0}+\mathbf{Z}_{-1}\bar{\Gamma} \end{array} \right) .$ (26)

The estimator $ \tilde{\beta}_{Pool}$ is thus obtained by applying the pooled estimator to the residuals from a projection of the original data onto the cross-sectional averages of the regressors and the innovations in the regressors. The intuition behind this is that the cross-sectional averages of $ \Delta_{C}x_{i,t}$ and $ x_{i,t}$ are close to the innovations in the common factors, $ f_{t}$, and the common stochastic trend $ z_{t}$, respectively, since the cross-sectional averages of the cross-sectionally independent data may be expected to be close to zero. In practice, a panel estimate of $ C$ is needed to quasi-difference the data, but since that will not affect the asymptotic properties of the estimator, we ignore this in the analysis below.

To form a better understanding behind the functioning of the above estimator, let \begin{displaymath}\mathbf{Q}=\left( \begin{array}[c]{cc} \mathbf{f}\Gamma & \mathbf{Z}_{-1}\Gamma \end{array}\right) \end{displaymath} , and $ \mathbf{M}_{\mathbf{Q}}=\mathbf{I}-\mathbf{Q}\left( \mathbf{Q}^{\prime}\mathbf{Q}\right) ^{-1}\mathbf{Q}^{\prime}$ . Also, let \begin{displaymath}\mathbf{G=}\left( \begin{array}[c]{cc} \mathbf{f} & \mathbf{Z}_{-1} \end{array}\right) \end{displaymath} and $ \mathbf{M}_{\mathbf{G}}=\mathbf{I}-\mathbf{G}\left( \mathbf{G}^{\prime}\mathbf{G}\right) ^{-1}\mathbf{G}^{\prime}.$ Observe that,

$\displaystyle \sqrt{n}T\left( \tilde{\beta}_{Pool}-\beta\right) =\left( \frac{1}{n} \sum_{i=1}^{n}\frac{\mathbf{X}_{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}} }\mathbf{X}_{i,-1}}{T^{2}}\right) ^{-1}\left[ \left( \frac{1}{\sqrt{n}} \sum_{i=1}^{n}\frac{\mathbf{X}_{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}} }\mathbf{f}\gamma_{i}}{T}\right) +\left( \frac{1}{\sqrt{n}}\sum_{i=1} ^{n}\frac{\mathbf{X}_{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}}} \mathbf{u}_{i}}{T}\right) \right] .$ (27)

Since $ \mathbf{f\subset G}$, $ \mathbf{M}_{\mathbf{G}}\mathbf{f=0}$, and under the rank condition on $ \Gamma$ in Assumption 3, it follows that $ \mathbf{M}_{\mathbf{Q}}\mathbf{f}=\mathbf{M}_{\mathbf{G}}\mathbf{f}$ . Also, since $ \mathbf{Z}_{-1}\mathbf{\subset G}$, $ \mathbf{M}_{\mathbf{G}} \mathbf{X}_{i,-1}=\mathbf{X}_{i,-1}^{0}$ . Thus, to the extent that $ \mathbf{M}_{\mathbf{\bar{H}}}$ is close to $ \mathbf{M}_{\mathbf{Q}}$, the projection onto the compliment of the cross-sectional means will remove most of the effects of the common factors in the data. The following theorem states this formally.
Theorem 4   Under Assumptions 1-3, with $ \alpha_{i}\equiv0$, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$, with $ n/T\rightarrow0$,
$\displaystyle \sqrt{n}T\left( \tilde{\beta}_{Pool}-\beta\right) \Rightarrow MN\left( 0,\Omega_{xx}^{-1}E\left[ \left. \left( \int_{0}^{1}J_{i\cdot f\Gamma }dB_{1,i}\right) \left( \int_{0}^{1}J_{i\cdot f\Gamma}dB_{1,i}\right) ^{\prime}\right\vert \mathcal{C}\right] \Omega_{xx}^{-1}\right) ,$ (28)

where $ J_{i\cdot f\Gamma}=J_{i}-\left( \int_{0}^{1}J_{i}J_{f}^{\prime} \Gamma\right) \left( \Gamma^{\prime}\int_{0}^{1}J_{f}J_{f}^{\prime} \Gamma\right) ^{-1}\left( \int_{0}^{1}\Gamma^{\prime}J_{f}\right) $ is the residual from the orthogonal projection of $ J_{i}$ onto $ \Gamma^{\prime}J_{f}$.

The estimator $ \tilde{\beta}_{Pool}$ thus achieves a $ \sqrt{n}T-$convergence rate and an asymptotic mixed normal distribution. The mixed normality in this case arises from the common factors, which leads to a mixed normal rather than a normal distribution as in the case above with no common factors; a similar result is also noted in Jin (2004). Andrews (2005) provides an extensive discussion on convergence with common shocks. This theorem also extends the results for stationary data in Pesaran (2006) to the nearly integrated case, although using a somewhat different definition of the $ \bar{H}_{t}$ matrix. Allowing for fixed effects in the arguments, it is easy to show the following result.

Corollary 1   Let
$\displaystyle \tilde{\beta}_{FE}=\left( \sum_{i=1}^{n}\underline{\mathbf{X}}_{i,-1} ^{\prime}\mathbf{M}_{\mathbf{\bar{H}}}\underline{\mathbf{X}}_{i,-1}\right) ^{-1}\left( \sum_{i=1}^{n}\underline{\mathbf{X}}_{i,-1}^{\prime} \mathbf{M}_{\mathbf{\bar{H}}}\underline{\mathbf{Y}}_{i}\right) ,$ (29)

and
$\displaystyle \tilde{\beta}_{FE}^{+}=\left( \sum_{i=1}^{n}\underline{\mathbf{X}} _{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}}}\underline{\mathbf{X}} _{i,-1}\right) ^{-1}\left( \sum_{i=1}^{n}\underline{\mathbf{X}} _{i,-1}^{\prime}\mathbf{M}_{\mathbf{\bar{H}}}\underline{\mathbf{Y}} _{i}-nT\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) \hat{C} }dsdr\right) \hat{\omega}_{21}\right)$ (30)

where $ \underline{\mathbf{X}}_{i,-1}$ and $ \underline {\mathbf{Y}}_{i}$ represent the time-series demeaned data. Then, under Assumptions 1-3, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$, with $ n/T\rightarrow0$,
$\displaystyle T\left( \tilde{\beta}_{FE}-\beta\right) \rightarrow_{p}-\underline{\Omega }_{xx}^{-1}\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) C}dsdr\right) \omega_{21},$ (31)

and
$\displaystyle \sqrt{n}T\left( \tilde{\beta}_{FE}^{+}-\beta\right) \Rightarrow MN\left( 0,\underline{\Omega}_{xx}^{-1}E\left[ \left. \left( \int_{0}^{1} \underline{J}_{i\cdot f\Gamma}dB_{1,i}\right) \left( \int_{0}^{1} \underline{J}_{i\cdot f\Gamma}dB_{1,i}\right) ^{\prime}\right\vert \mathcal{C}\right] \underline{\Omega}_{xx}^{-1}\right) .$ (32)

The fixed effects transformation thus has an identical bias effect on the estimator that controls for common factors, and can also be corrected in an identical manner. Similarly, it could also be shown that using recursive demeaning on the data projected onto the cross-sectional means would be asymptotically mixed normally distributed, although these results are omitted here. $ \tilde{\beta}_{Pool}$ and $ \tilde{\beta}_{FE}$ thus provide pooled estimators for predictive regressions that are asymptotically mixed normally distributed in the presence of common factors and with the allowance for fixed effects. Standard $ t-$tests and Wald-tests can therefore be used; the variance-covariance matrix of $ \tilde{\beta}_{FE}$ or $ \tilde{\beta}_{Pool}$ can be estimated in an analogous manner as described above for the case with no common factors, by simply using the defactored data. The practical implementation of $ \tilde{\beta}_{FE}^{+}$, or $ \tilde{\beta}_{Pool}$, is thus very simple: Premultiply the data by $ \mathbf{M}_{\mathbf{\bar{H}}}$, and use the resulting variables in the original procedures for $ \hat{\beta}_{FE}^{+}$ and $ \hat{\beta}_{Pool}$. Note that this also automatically facilitates the correct estimation of $ \omega_{21}$, which now represents the correlation between the cross-sectionally independent errors, $ u_{i,t}$ and $ v_{i,t}$, rather than the correlation between the total innovation processes in the regressors and regressand.

4.3 More general factor structures

The key assumption in deriving the results in Theorem 4 was the rank condition in Assumption 3. It essentially allows for the whole factor structure to be revealed through the innovations in the regressor variables alone. While this is a convenient assumption, since it does not require first stage estimates of the innovations in the regressand, it is also potentially limiting since it effectively restricts the factors in the dependent variable to be a subset of the factors in the regressors. However, it is easy to show that additional factors in the regressand do not fundamentally alter the above results.

Consider the following model, which is a generalization of the original model in equations (1) and (2),

$\displaystyle y_{i,t}$ $\displaystyle =\alpha_{i}+\beta^{\prime}x_{i,t-1}+\gamma_{i}^{\prime}f_{t} +\eta_{i}^{\prime}g_{t}+u_{i,t},$ (33)
$\displaystyle x_{i,t}$ $\displaystyle =Ax_{i,t-1}+\Gamma_{i}^{\prime}f_{t}+v_{i,t},$ (34)

where $ g_{t}$ is a $ l\times1$ vector of additional common factors that only appear in the dependent variable, and $ \eta_{i}$ are the corresponding factor loadings. Assume that $ g_{t}$ and $ \eta_{i}$ are independent of all other random elements in the model and satisfy the same conditions as $ f_{t}$ and $ \gamma_{i}$.
Corollary 2   Suppose the data is generated according to equations (33) and (34) with $ \alpha_{i}\equiv0$, and that Assumptions 1-3 hold. Then, as $ \left( T,n\rightarrow\infty\right) _{\operatorname{seq}}$, with $ n/T\rightarrow0$,
$\displaystyle \sqrt{n}T\left( \tilde{\beta}_{Pool}-\beta\right) \Rightarrow MN\left( 0,\Omega_{xx}^{-1}E\left[ \left. \left( \int_{0}^{1}J_{i\cdot f\Gamma }\left( dB_{1,i}+\eta^{\prime}dB_{g}\right) \right) \left( \int_{0} ^{1}J_{i\cdot f\Gamma}\left( dB_{1,i}+\eta^{\prime}dB_{g}\right) \right) ^{\prime}\right\vert \mathcal{C}\right] \Omega_{xx}^{-1}\right) ,$ (35)

where $ B_{g}$ is the Brownian motion such that $ \frac{1}{\sqrt{T}}\sum _{t=1}^{\left[ Tr\right] }g_{t}\Rightarrow B_{g}\left( r\right) $ as $ T\rightarrow\infty$, and $ \mathcal{C}$ now represents the $ \sigma-$field generated by $ \left\{ f_{t},g_{t}\right\} _{t=1}^{\infty}$.

The limiting distribution changes but remains mixed normal and inference can be performed in an identical manner using standard test-statistics. The procedures derived in the previous subsection are thus robust to additional factors in the dependent variable.

5 Finite sample evidence

5.1 No cross-sectional dependence

To evaluate the small sample properties of the panel data estimators proposed in this paper, a Monte Carlo study is performed. In the first experiment, the properties of the point estimates are considered. Equations (1) and (2) are simulated for the case with a single regressor. The innovations $ \left( u_{i,t},v_{i,t}\right) $ are drawn from normal distributions with mean zero, unit variance, and correlations $ \delta =0,-0.4,-0.7,$ and $ -0.95$; there is no cross-sectional dependence. The slope parameter $ \beta$ is set equal to $ 0.05$ and the local-to-unity parameter $ C$ is set to $ -10$. The sample size is given by $ T=100,$ $ n=20$. The small value of $ \beta$ is chosen in order to reflect the fact that most forecasting regressions are used to test a null of $ \beta=0$, and any plausible alternative is often close to zero. The intercepts $ \alpha_{i}$ are set equal to zero but individual effects are still fitted for all the estimators, except the standard pooled one, in order to evaluate the second-order bias effects arising from demeaning. All results are based on 10,000 repetitions.

Four different estimators are considered: the pooled estimator with no fixed effects, $ \hat{\beta}_{Pool}$, the fixed effects estimator using standard demeaning, $ \hat{\beta}_{FE}$, the recursively demeaned pooled estimator, $ \hat{\beta}_{RD}\,$, and the bias corrected estimator $ \hat{\beta}_{FE}^{+}$. The bias correction in the estimator $ \hat{\beta}_{FE}^{+}$ is estimated by $ -\left( \int_{0}^{1}\int_{0}^{r}e^{\left( r-s\right) \hat{C}}dsdr\right) \hat{\omega}_{21}$ , where $ \hat{C}$ is the panel estimate of the local-to-unity parameter and $ \hat{\omega}_{21}$ is estimated as $ n^{-1} \sum_{i=1}^{n}\hat{\omega}_{21i}$ with $ \hat{\omega}_{21i}$ the covariance between the residuals from a time-series estimation of equation (1) and the quasi-differenced regressors, $ \Delta_{\hat{C}}x_{i,t}$. In general, the standard pooled estimator does not work well when the $ \alpha_{i}s$ differ across $ i$, but is used as a comparison here.

The results are shown in Figure 1. All estimators, except $ \hat{\beta}_{FE}$, are virtually unbiased. The estimator $ \hat{\beta}_{FE}$, which uses standard demeaning to account for individual effects, exhibits a rather substantial bias when the absolute value of the correlation $ \delta$ is large. The recursively demeaned estimator, $ \hat{\beta}_{RD}$, suffers from a lack of efficiency, but it is well centered around the true value.

The second part of the Monte Carlo study concerns the size and power of the pooled $ t-$tests. The same setup as above is used but, in order to calculate the power of the tests, the slope coefficient $ \beta$ now varies between $ -0.05$ and $ 0.05$. Figure 2 shows the average rejection rates of the $ 5$ percent two-sided $ t-$tests, evaluating a null of $ \beta=0$; that is, the power curves of the tests. Panel A in Table 1 shows the average sizes of the nominal $ 5$ percent tests under the null hypothesis of $ \beta=0$ for the two sided $ t-$tests corresponding to the four different estimators considered above. Again, the results are based on 10,000 repetitions.

Apart from the test based on the standard fixed effects estimator, all tests perform well in terms of size, although they all tend to over reject the null hypothesis somewhat. Table 1 and the power curves in Figure 2 clearly show the effects of the second order bias in the fixed effects estimator. The three other tests all exhibit decent power properties although the test based on $ \hat{\beta}_{RD}$ has lower power than the bias corrected estimator.

In summary, the simulation evidence shows the importance of controlling for the second order bias arising from fitting individual intercepts in the pooled regression; the estimator based on recursive demeaning appears to do well and results in test-statistics with correct size and decent power properties. The bias correction of the fixed effects estimator also appears to work well, producing nearly unbiased results and correctly sized tests with good power. Overall, the simulations confirm the analytical results previously derived.

5.2 Common factors

In this section, I repeat the Monte Carlo experiments above, with the exception that there is now a common factor in the innovations. In particular, equations (1) and (2) are now simulated with a single regressor and a single common factor $ f_{t}$, drawn from a standard normal distribution. The factor loadings, $ \gamma_{i}$ and $ \Gamma_{i}$, are also normally distributed with means of minus one and plus one, respectively, and standard deviations equal to $ 2^{-1/2}$ i$ 2^{-1/2}\left( \gamma _{i}^{\prime}f_{t}+u_{i,t}\right) $n both cases. The innovations in the returns and regressor processes are formed as and $ 2^{-1/2}\left( \Gamma_{i}^{\prime }f_{t}+v_{i,t}\right) $, respectively, where $ \left( u_{i,t},v_{i,t}\right) $ are drawn from standard normal distributions; the scaling by $ 2^{-1/2}$ is performed in order to achieve an approximate unit variance in the innovations which enables easier comparison with the cross-sectionally independent case. As before, the correlation between $ u_{i,t}$ and $ v_{i,t}$ is set to $ \delta =0,-0.4,-0.7,$ and $ -0.95$. Note that $ \delta$ no longer represents the overall correlation between the innovations, but rather that between the cross-sectionally independent parts of the innovations. In addition, I allow $ \alpha_{i}$ to vary across $ i$ according to a normal distribution with mean and standard deviation equal to $ 0.05$. Otherwise, the setup is identical to that used in the case with no common factors. Again, all results are based on $ 10,000$ repetitions.

The results are shown in Panels B and C of Table 1 and in Figures 3-6. Panel B in Table 1 and Figures 3 and 4 show the outcomes of the Monte Carlo experiments when the model generated with common factors is estimated using the standard estimators $ \hat{\beta}_{FE},\hat{\beta}_{RD}$, and $ \hat{\beta}_{FE}^{+}$, that do not control for cross-sectional dependence; since $ \alpha_{i}$ now varies across the panel, the standard pooled estimator without fixed effects is not considered. Figure 3 shows that $ \hat{\beta}_{RD}$ and $ \hat{\beta}_{FE}^{+}$ are still fairly unbiased, in accordance with the asymptotic result in Theorem 3, but they are much more variable than in the case with no common factors and exhibit a non-normal distribution. The real downside from not controlling for the cross-sectional dependence is seen in the size of the $ t-$tests displayed in Panel B of Table 1 and in the power curves in Figure 3. It is clear that when the common factors are ignored in the estimation process, the actual size of the corresponding $ t-$tests is very far from the nominal size of $ 5$ percent, with rejection rates between $ 30$ and $ 40$ percent under the null.

Panel C in Table 1 and Figures 5 and 6 show the same results for the estimators $ \tilde{\beta}_{FE},\tilde{\beta}_{RD}$, and