The Federal Reserve Board eagle logo links to home page

Uncertainty Over Models and Data: The Rise and Fall of American Inflation

Seth Pruitt1

NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at http://www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at http://www.ssrn.com/.


Abstract:

An economic agent who is uncertain of her model updates her beliefs in response to the data. The updating is sensitive to measurement error which, in many cases of macroeconomic interest, is apparent from the process of data revision. I make this point through simple illustrations and then analyze a recent model of the Federal Reserve's role in U.S. inflation. The existing model succeeds at fitting inflation to optimal policy, but fails to link inflation to the economic trade-off at the heart of the story. I modify the model to account for data uncertainty and find that doing so ameliorates the existing problems. This suggests that the Fed's model uncertainty is largely overestimated by ignoring data uncertainty. Consequently, now there is an explanation for the rise and fall in inflation: the concurrent rise and fall in the perceived Philips curve trade-off.

Keywords: Data uncertainty, data revisions, real time data, optimal control, parameter uncertainty, learning, extended kalman filter, Markov-chain monte carlo

JEL classification: E01, E58


1.  Introduction

A great deal of research has gone towards identifying the causes of the large swings in inflation in the United States between 1970 and 1985. One strand of literature advances the view that evolving government responses had an important role in these events. Clarida, Gali, and Gertler (2000) provides evidence of different U.S. monetary policy responses over different parts of the postwar era. Boivin (2006) elaborates on this finding using a time-varying Taylor rule estimated on real-time data, as Primiceri (2005) does in a DSGE model using revised data, to characterize the response without giving explanation for its evolution. Romer and Romer (1990) and Owyan and Ramsey (2004) suggest that the changing response might be explained by changing Fed objectives. On the other hand, Sargent (1999) suggests that Federal Reserve beliefs alone, evolving in response to data, can explain the rise and fall of inflation.2

This last explanation relies on the idea that agents learn about their economic environment. In such a model, agents' prediction errors update their beliefs, represented as the parameters to their model of their world. Obviously, these prediction errors are based on the observed data. Orphanides (2001) notes that researchers should carefully attribute to agents the data actually observable at the time of decision-making.3 Because some data is revised, real-time data available to the agent may be different than the revised data available to the researcher. The point of my paper is that agents' data uncertainty, resulting from their knowledge that real-time data is revised, affects the evolution of their beliefs because they know it can affect their prediction errors. Therefore researchers miscalculate the volatility of these beliefs, with a tendency towards overestimation, by ignoring observable data uncertainty.

I explain this point through simple frameworks that have two main ingredients. First, the agent has model uncertainty represented as a parameter shock or error that explicitly allows the agent to change their beliefs in response to observed data. Second, the agent has data uncertainty represented as an error that causes data to be mismeasured. The conclusion from this analysis is that the researcher's estimate of the agent's model uncertainty is biased by ignoring existing data uncertainty, and that this bias is positive in many simple cases and more complicated cases subject to empirically-relevant restrictions.

Then I take an example from the fore mentioned literature in order to understand the effect of this bias in practice. As application I use the framework in Sargent, Williams, and Zha (2006) which follows the main idea of Sargent (1999). The Federal Reserve optimally controls inflation in light of unchanging unemployment and inflation targets, however the Fed is uncertain of its economic model and thus lets its beliefs evolve in response to new data.4 Hence, Sargent, Williams, and Zha (2006) aim to explain the great American inflation as optimal policy given changing estimates of the Philips curve. However, the model suffers from three key problems. One, the Fed's unemployment rate forecasts, the basis for setting inflation, are very inaccurate, much more so than the Greenbook forecasts they should mimic. Second, the Fed's estimated model uncertainty is very large, which undermines the plausibility that the Fed believed in its estimated Philips curve enough to use it as the basis for policy. Third, the model explains the rise in inflation between 1973 and 1975, but does not give a good reason for the drastic fall in inflation between 1980 and 1984.

I adapt the Fed's estimation problem to account for the observable fact that macroeconomic data is revised.5 By accounting for data uncertainty, on which we have actual evidence, the model explains inflation without the fore mentioned problems. The Fed is confident of its model and makes accurate unemployment forecasts that resemble Greenbook forecasts. Importantly, the model now predicts a sharp drop in the Philips trade-off between 1980 and 1984 leading to the concomitant drastic drop of inflation.

The economic difference from Sargent, Williams, and Zha (2006) is that here the Fed is sluggish to change its beliefs in response to real-time data because this data might be revised. Since at least as far back as Zellner (1958) - who admonished readers to be "careful" with "provisional" data - the reality of data uncertainty has been clearly recognized.6 What has been less clear is the impact this data uncertainty may hold for the purposes of modeling. My paper shows that data uncertainty can have a large effect in models where agents change their beliefs in response to incoming data.

In addition to fore mentioned studies on the great inflation, my paper joins literature highlighting the impact of real-time data on economic behavior. This is made possible by the pioneering work of Croushore and Stark (2001) on the Philadelphia Fed's Real-Time Data Set for Macroeconomists, and the St. Louis Fed's ensuing ArchivaL Federal Reserve Economic Data (ALFRED) collection. In early work, Oh and Waldman (1990) used data revisions to identify the effects of real-time macro announcements on future economic activity, highlighting how data mismeasurement itself can influence agents' behavior. Ghysels, Swanson, and Callan (2002) extended this idea by using multiple data vintages to fit policy rules, some of which are adaptively estimated, forecasting the Fed Funds rate. These papers suggest that agents' behavior is connected to the real-time data they actually saw when making decisions. Within a DSGE framework, Aruoba (2004) analyzed the welfare consequences of a one-period signal extraction problem motivated by data uncertainty. Many recent papers by central bank researchers are evidence that macroeconomic data revisions remain significant to policy decisions even today; see Cunningham, Jeffery, Kapetanios, and Labhard (2007) and references therein.

Explicitly modeling data uncertainty is similar in spirit to the macroeconomic learning literature where the main idea is that agents are themselves econometricians. Whereas much of that literature focuses on the problem of learning parameters - for instance see Evans and Honkapohja (2001) or Orphanides and Williams (2006) - I focus attention to the problem of learning the true economic state measured by imperfect real-time data. In addressing the problematic implications in Sargent, Williams, and Zha (2006), I join concurrent work in Carboni and Ellison (2007) and Carboni and Ellison (2008). The first paper changes the Fed's objective to include the goal of accurate unemployment forecasts, resulting in a new policy rule sensitive to the accuracy of parameter estimates; the second paper changes the unemployment forecast estimation to target Greenbook forecasts. In particular, the second paper achieves success at addressing the problems regarding unemployment rate forecasts and large model uncertainty.7 However, neither paper addresses the missing explanation for the rise and fall of inflation.

The paper is organized as follows. Section 2 provides a simple explanation of the impact of data uncertainty when there exists model uncertainty. Section 3 evaluates the impact of data uncertainty in practice by applying it to the Sargent, Williams, and Zha (2006) framework. Section 4 presents the estimation results of both the model without data uncertainty and the model including data uncertainty; it is apparent that the latter remedies the problems of the former, and Section 5 explains why. Section 6 concludes.


2.  Why Data Uncertainty Matters

Suppose an agent forms a forecast of an economic quantity $ y$. The agent's model maps observed data and a parameter vector into this prediction. Model uncertainty is the situation where past predictions and realized data, at some point in time and through the agent's economic model, might change the parameter vector going forward. Data uncertainty is the situation where data is measured with error perhaps, but not necessarily, observed after the fact. The main question here is, if the agent is uncertain both of the model and the data, what is the impact of the researcher ignoring data uncertainty?

To answer this, I consider a few instances of prediction error decomposition, which is the machinery through which agents change their models. The intuition is straightforward that when the agent's prediction is correct there is nothing to change about her beliefs.8 When such is the case, the agent has nothing to learn since her model's performance cannot be improved.

When the prediction error is nonzero, the agent has incentive to evaluate her model and learn from the error. At this point, the agent needs to understand why the prediction error is nonzero: is it due to unpredictable additive shocks, is it due to mismeasured data, is it due to the model (parameter)? We will see that data uncertainty is not negligible when the economic agent is uncertain of her own model.

In the following, $ \epsilon_{i}$ is a shock on the $ i$ parameter, $ \varepsilon_{j}$ is an error on the $ j$ data element, and it is assumed that

$\displaystyle \mathbb{E}\left( \epsilon_{i}\right) =\mathbb{E}\left( \varepsilon _{j}\right) =0 \: , \mathbb{V}\mathrm{ar}\left( \epsilon_{i}\right) =\sigma_{i}^{2}>0 \: , \mathbb{V}\mathrm{ar}\left( \varepsilon_{j}\right) =\sigma_{j}^{2}>0 \: , \: \mathbb{E}\left( \epsilon_{i} \varepsilon _{j}\right) =0 \quad\forall i,j $

I assume that the data and parameter predictions are almost surely nonzero. The parameter shock can be interpreted as actual variation in the parameters, for instance considered by Cooley and Prescott (1976), Primiceri (2005), and Sargent, Williams, and Zha (2006). Alternatively, the shock can be interpreted as a model error allowed for in each period by the agent, just as an empirical model used by the researcher includes an error. I will consider linear predictors calculated under mean squared error loss.

One parameter on mismeasured data

Suppose that the agent believes that

$\displaystyle y = \left( b+\epsilon_{b}\right) \left( x+\varepsilon_{x}\right) $

We call $ b$ the parameter prediction and $ x$ the data prediction. The agent's prediction of $ y$ is

$\displaystyle \hat{y} = bx $

and her forecast error $ \left( y-\hat{y}\right) $ is

$\displaystyle b\varepsilon_{x} + x\epsilon_{b} + \epsilon_{b}\varepsilon_{x} $

When the forecast error is zero the agent trivially decomposes this to imply that both the parameter shock and the data error are zero because their unconditional means are zero and there is no more than this information. When the forecast error is nonzero, the decomposition is nontrivial but simple. The prediction of each shock's contribution depends on the size of the forecast error and the average size of the shocks (supposed by the agent). This decomposition is obtained from the forecast error by noting

$\displaystyle \mathbb{V}\mathrm{ar}\left( y-\hat{y}\right) = b^{2}\sigma_{x}^{2} + x^{2}\sigma_{b}^{2} \implies\left\{ \begin{array}[c]{ccc} \hat{\epsilon_{b}} & = & \frac{b^{2}\sigma_{b}^{2}}{b^{2}\sigma_{b}^{2} +x^{2}\sigma_{x}^{2}}\left( y-\hat{y}\right) \\ \hat{\varepsilon_{x}} & = & \frac{x^{2}\sigma_{x}^{2}}{b^{2}\sigma_{b} ^{2}+x^{2}\sigma_{x}^{2}}\left( y-\hat{y}\right) \end{array} \right\}$

which are well-known linear prediction equations.

Suppose the researcher thinks that $ \varepsilon_{x}=0$ almost surely. This implies that $ \sigma_{x}^{2}=0$ and therefore the researcher deduces from the linear prediction equations an estimator $ \tilde{\sigma}_{b}^{2}$ of how large on average the agent regards the parameter shock to be

$\displaystyle \tilde{\sigma}_{b}^{2} = \frac{\mathbb{V}\mathrm{{ar}\left( y-\hat{y}\right) }}{x^{2}}$

Said another way, the researcher would use the LHS of the above equation to uncover the $ \sigma_{b}^{2}$ supposedly believed by the agent.

However, since the agent holds data uncertainty, this estimator is actually

$\displaystyle \tilde{\sigma}_{b}^{2} = \sigma_{b}^{2} + \frac{b^{2}}{x^{2}}\sigma_{x}^{2} $

which means that

$\displaystyle \mathrm{Bias}\left( \tilde{\sigma}_{b}^{2}\right) = \frac{b^{2}}{x^{2}} \sigma_{x}^{2}$ (2.1)

Therefore the researcher's estimate of the agent's $ \sigma_{b}$, the shock size unit, is biased upwards. The bias depends on the researcher's data and parameter prediction and the actual variance of the data error.

Two parameters, one on mismeasured data

Suppose that the agent believes that

$\displaystyle y = \left( b+\epsilon_{b}\right) \left( x+\varepsilon_{x}\right) + \left( d+\epsilon_{d}\right) $

We call $ d$ a parameter prediction.9 The agent's prediction of $ y$ is again linear and the forecast error follows naturally.

Again, suppose the researcher thinks that $ \varepsilon_{x}=0$ almost surely. Let us suppose $ k\sigma_{b} =\sigma_{d} \: , \: k>0$ so $ \sigma_{b}$ is the shock size unit being measured. The researcher then calculates

$\displaystyle \tilde{\sigma}_{b}^{2} = \frac{\mathbb{V}\mathrm{ar}\left( y-\hat{y}\right) }{x^{2} + k^{2} + 2xk\rho_{bd}}$

where $ \rho_{bd}$ is the correlation between $ \epsilon_{b}$ and $ \epsilon_{d}$ perceived by the researcher.

However, actually

$\displaystyle \mathrm{Bias}\left( \tilde{\sigma}_{b} ^{2}\right) = \frac{b^{2}\sigma_{x}^{2}}{x^{2} + k^{2} + 2xk\rho_{bd}}$ (2.2)

The numerator is almost surely positive; so is the denominator, as the following shows
To show:

$\displaystyle {x^{2} + k^{2} + 2xk\rho_{bd}} > 0 $

Proof: Suppose the contrary. Then $ x^{2}+k^{2} < -2xk\rho_{bd}$ and note both sides are positive. Hence $ \frac{x^{2}}{k^{2}} + 2 + \frac{k^{2}}{x^{2}} < 4\rho_{bd}^{2} $ . WLOG, let $ x^{2}/k^{2}=1+\delta$ for $ \delta\geq0$. Then $ 1+\frac{\delta^{2}}{1+\delta} < \rho_{bd}^{2} $. But $ \rho_{bd}\in[-1,1]$. $ \blacksquare$

Therefore the researcher's estimate of the average size of the agent's parameter shocks is biased upwards.

One parameter on mismeasured data, mismeasured target

Suppose that the agent believes that

$\displaystyle y - \varepsilon_{y} = \left( b+\epsilon_{b}\right) \left( x+\varepsilon _{x}\right) $

The agent's prediction of $ y$ and forecast error naturally follows. If the researcher thinks $ \varepsilon_{y} = \varepsilon_{x} = 0$ almost surely, then

$\displaystyle \mathrm{Bias}\left( \tilde{\sigma}_{b}^{2}\right) = \frac{b^{2}\sigma _{x}^{2} + \sigma_{y}^{2} + 2b\rho_{xy}\sigma_{x}\sigma_{y}}{x^{2}}$ (2.3)

where $ \rho_{xy}$ is the true correlation between $ \varepsilon_{x}$ and $ \varepsilon_{y}$. The RHS is almost surely positive. The proof is the same as above, noting that one can write $ \sigma_{y}=k\sigma_{x}$ for some $ k>0$. Therefore the researcher's estimate of the shock size unit is biased upwards.

Two parameters on mismeasured data

$\displaystyle y = \left( b+\epsilon_{b}\right) \left( x+\varepsilon_{x}\right) + \left( c+\epsilon_{x}\right) \left( z+\varepsilon_{z}\right) $

Let $ k\sigma_{b}=\sigma_{c}$ and $ m\sigma_{x}=\sigma_{z}$ for $ k,m>0$ so $ \sigma_{b}$ is the shock size unit being measured. Then

$\displaystyle \mathrm{Bias}\left( \tilde{\sigma}_{b}^{2}\right) = \sigma_{x}^{2} \frac{b^{2} + c^{2}m^{2} + 2bcm\rho_{xz}}{x^{2} + k^{2}z^{2} + 2kxz\rho_{bc}}$ (2.4)

This bias is positive, following the proof above for both the numerator and denominator.

Three parameters, two on mismeasured data, mismeasured target

Suppose that the agent believes that

$ m\sigma_{b}=\sigma_{d}$

$\displaystyle y - \varepsilon_{y} = \left( b+\epsilon_{b}\right) \left( x+\varepsilon _{x}\right) + \left( c+\epsilon_{x}\right) \left( z+\varepsilon _{z}\right) + \left( d+\epsilon_{d}\right) $

The agent's prediction of $ y$ and forecast error naturally follows. For the researcher who thinks $ \varepsilon_{y}=\varepsilon_{x}=\varepsilon_{z}=0$ almost surely, let $ k\sigma_{b}=\sigma_{c}$, , $ n\sigma_{x}=\sigma_{z}$, and $ p\sigma_{x}=\sigma_{y}$ for $ k,m,n,p>0$, so $ \sigma_{b}$ is the shock size unit being measured. Then

$\displaystyle \mathrm{Bias}\left( \tilde{\sigma}_{b}^{2}\right) = \sigma_{x}^{2} \frac{b^{2} + c^{2}n^{2} + p^{2} + 2bcn\rho_{xz} + 2bp\rho_{xy} + 2cnp\rho_{zy}}{x^{2} + z^{2}k^{2} + m^{2} + 2kxz\rho_{bc} + 2mx\rho_{bd} + 2kmz\rho_{cd} }$ (2.5)

Without further restrictions, this bias can be any real number.10 Anticipating the model application below, consider the following empirically-relevant restriction: let the data predictions be nonnegative, the revision errors be uncorrelated, and the parameter shocks be nonnegatively correlated. In this case the bias is positive.

Discussion

These simple examples capture the intuition that a researcher's estimate of an agent's model uncertainty is biased by ignoring data uncertainty. The bias is a function of the true variances of the parameter shocks and data errors, the true correlation between data errors, and the perceived correlation between parameter shocks. In the simplest cases, this bias is positive. In more complicated cases, the bias may be anything, but under a particular restriction given above the bias is positive. The point of this section is that data uncertainty cannot necessarily be ignored without consequence. To evaluate the scale of the consequence in practice, we turn to the following application.


3.  Federal Reserve Model

The question of why inflation rose and fell so dramatically in the United States between 1973 and 1984 has received much attention. One strand of research emphasizes the evolution of the Federal Reserve's beliefs over this time period. Representing Fed beliefs as parameters in its model, Sargent, Williams, and Zha (2006) reverse engineers an optimal control framework that explains the time path of inflation very well. In light of the previous section, it is worth asking if ignored data uncertainty has an impact on the model's estimates. We will see in Section 4 that the clear answer to this question is: yes, very much.

3.1  Model

Sargent, Williams, and Zha (2006) assume that the Federal Reserve's economic model can be usefully approximated by a Philips curve with time-varying parameters. By specifying that the Fed believes the parameters follow a random walk we introduce persistent model uncertainty, as discussed in Cooley and Prescott (1976) and Primiceri (2005). In this context, inflation is the solution to an optimal control problem with a law of motion that changes according to the evolution of filtered parameter estimates. The Federal Reserve's objective remains the same while new data alter its best estimate of the effects of its actions, represented by the Philips curve trade-off.

The direct effect of Fed activity is the rate of inflation in the economy - it sets inflation up to some exogenous shock beyond its control. This control shock could be thought of as unpredictable market reaction to Fed policy. Therefore the annual inflation rate $ \pi_{t}$ is

$\displaystyle \pi_{t} = x_{t-1} + \frac{1}{\zeta_{1}}\omega_{1t}$ (3.1)

where $ x_{t-1}$ is the part of inflation controllable by the Federal Reserve using information through time $ t-1$, and $ \omega_{1t} \sim iid(0,1)$ is the exogenous control shock.

The Fed uses a Philips curve to understand the relationship between unemployment and inflation. However, the Fed is always uncertain of its estimated model - a way of accomplishing this is by assuming the parameters follow a random walk:

$\displaystyle u_{t}$ $\displaystyle = \boldsymbol{\alpha}^{\prime}_{t-1} \left( \begin{array}[c]{c} \pi_{t}\\ \pi_{t-1}\\ u_{t-1}\\ \pi_{t-2}\\ u_{t-2}\\ 1 \end{array} \right) + \frac{1}{\zeta_{2}} \omega_{t} \equiv\boldsymbol{\alpha}^{\prime }_{t-1} \boldsymbol{\Phi}_{t} + \frac{1}{\zeta_{2}} \omega_{2t}$ (3.2)
$\displaystyle \boldsymbol{\alpha}_{t}$ $\displaystyle = \boldsymbol{\alpha}_{t-1} + \boldsymbol{\Lambda }_{t}$ (3.3

where $ \omega_{2t} \sim iid(0,1)$ and $ \boldsymbol{\Lambda}_{t}$ is a vector with $ \mathbb{E}(\boldsymbol{\Lambda}_{t}) = 0$, $ \mathbb{E} (\boldsymbol{\Lambda}_{t}\boldsymbol{\Lambda}_{t}^{\prime})=\boldsymbol{V}$ , and $ \mathbb{E}(\boldsymbol{\Lambda}_{t} \omega_{2t})=\boldsymbol{0}$ .

The reason why the Fed estimates the relationship between unemployment and inflation is because the Fed has inflation and unemployment targets. The objective function, which Sargent (1999) calls the Phelps problem, is written

$\displaystyle \min_{ \{x_{t-1+j}\}_{j=0}^{\infty}} \hat{\mathbb{E} }_{t} \sum_{j=0}^{\infty}\delta^{j} \Big( (\pi_{t+j} - \pi^{*})^{2} + \lambda(u_{t+j} - u^{*})^{2} \Big)$ (3.4)

where $ \delta\in(0,1)$ is a time discount factor, $ \lambda> 0$ gives the Fed's relative weighting of its two objectives, and $ \pi^{*},u^{*} \geq0$ are inflation and unemployment targets. $ \hat{\mathbb{E}}$ is expectation with respect to the probability model formed by equations (3.1), (3.2), and (3.3). Because the parameters follow a random walk whose steps are independent of everything else, the Fed's estimate of $ \boldsymbol{\alpha}_{t-1}$ is also its estimate of $ \boldsymbol{\alpha}_{t+j},\: \forall j \geq0$. Hence, the time $ t$ solution to the dynamic programming problem {(3.4) s.t. (3.1), (3.2), (3.3)} is found after plugging the time $ t$ estimate of $ \boldsymbol{\alpha}_{t-1}$ into the law of motion (3.2) for all $ j \geq0$. I set parameters in line with Sargent, Williams, and Zha (2006): $ \delta= .9936$, $ \lambda= 1$, $ i^{*} =2$, $ u^{*}=1$. They note that the results are unaffected by letting $ u^{*}$ be closer to typical "natural unemployment" rates and I have confirmed that this is indeed the case for both the model with and without data uncertainty.

3.2  Without Data Uncertainty

If the Fed holds no data uncertainty, then the model is completed by assuming that the Fed observes the true values of both inflation and unemployment each period.11 If this is the case, the Fed estimates the relationships (3.2) and (3.3) through a linear filtering problem whose solution is given by the Kalman filter. The notation $ \sigma(\cdot)$ denotes the information set ($ \sigma $-algebra) formed by random variables within the parentheses. Let $ \mathbb{E}(\boldsymbol{\alpha}_{t}\vert\mathcal{I}_{s}) \equiv\boldsymbol{a} _{t\vert s}$ and $ \mathbb{V}ar\left( \boldsymbol{\alpha}_{t}\vert\mathcal{I} _{s}\right) \equiv\boldsymbol{P}_{t\vert s}$ for $ \mathcal{I}_{s} \equiv \sigma(u_{1},\pi_{1},\ldots,u_{s},\pi_{s})$ . Given initial conditions $ \boldsymbol{a}_{1\vert}$ and $ \boldsymbol{P}_{1\vert}$, the Kalman updating occurs using the formulae:

$\displaystyle \boldsymbol{a}_{t+1\vert t}$ $\displaystyle = \boldsymbol{a}_{t\vert t-1} + \frac{\boldsymbol{P} _{t\vert t-1}\boldsymbol{\Phi}_{t}(u_{t} - \boldsymbol{\Phi}^{\prime}_{t} \boldsymbol{a}_{t\vert t-1})}{(\frac{1}{\zeta_{2}})^{2} + \boldsymbol{\Phi} ^{\prime}_{t} \boldsymbol{P}_{t\vert t-1} \boldsymbol{\Phi}_{t}}$ (3.5)
$\displaystyle \boldsymbol{P}_{t+1\vert t}$ $\displaystyle = \boldsymbol{P}_{t\vert t-1} - \frac{\boldsymbol{P} _{t\vert t-1} \boldsymbol{\Phi}_{t} \boldsymbol{\Phi}^{\prime}_{t} \boldsymbol{P} _{t\vert t-1}}{(\frac{1}{\zeta_{2}})^{2} + \boldsymbol{\Phi}^{\prime}_{t} \boldsymbol{P}_{t\vert t-1} \boldsymbol{\Phi}_{t}} + \boldsymbol{V}$ (3.6)

3.3  Including Data Uncertainty

The modification I make to the model is motivated by the following. Policy-makers base decisions on their model of the economy. Data revisions alter the statistics that inform the policy-makers' model. Hence, the existence of revisions implies that a savvy policy-maker associates some uncertainty to the latest observations of the most recent data vintage. I propose that the policy-maker optimizes accordingly and discipline my analysis with actual evidence on the characteristics of that data uncertainty.12

Graphs of the data show that while the unemployment rate experiences revisions, the CPI inflation rate actually does not. Therefore I only model revisions on unemployment and assume 12-month-ended inflation rates are not subject to revision.13Hence, for each month's true unemployment rate $ u_{t}$ we have: a preliminary observation made in month $ t$ denoted $ u_{t}^{0}$; the next observation made in month $ t+1$ denoted $ u_{t}^{1}$; the observation made in month $ t+2$ denoted $ u_{t}^{2}$; and so on. Define the revision or revision error as

$\displaystyle \nu_{t+i}^{i} = u_{t}^{i} - u_{t}^{i-1} \quad, \quad i>0 $

To reduce the problem's dimension, every revision does not explicitly enter the model and a final vintage horizon $ f=72$ months is chosen.14 Therefore, for each $ u_{t}$ I assume that revisions are possible one month, two months, three months, and 72 months later.15 By grouping all revisions past the third into the final revision, we are relabeling the sum $ \sum_{k=4}^{f} \nu_{t+k}^{k}$ as simply $ \nu_{t+f} ^{f}$. This assumption reduces the dimension of the (augmented) state vector and has no effect on the results.

Therefore the unemployment measurement vector is

$\displaystyle \left[ \begin{array}[c]{c} u_{t}^{0}\\ u_{t}^{1}\\ u_{t}^{2}\\ u_{t}^{3}\\ u_{t}^{f} \end{array} \right] = \left[ \begin{array}[c]{l} u_{t} - \nu_{t+f}^{f} - \nu_{t+3}^{3} - \nu_{t+2}^{2} - \nu_{t+1}^{1}\\ u_{t-1} - \nu_{t-1+f}^{f} - \nu_{t-1+3}^{3} - \nu_{t-1+2}^{2}\\ u_{t-2} - \nu_{t-2+f}^{f} - \nu_{t-2+3}^{3}\\ u_{t-3} - \nu_{t-3+f}^{f}\\ u_{t-f} \end{array} \right]$ (3.7)

In any modification that accounts for both data revision and time-varying parameters, both the latent economic quantities ($ u_{t}$) and the time-varying parameters ( $ \boldsymbol{\alpha}_{t}$) are state variables. Therefore, the state transition equation is nonlinear:

$\displaystyle \boldsymbol{\beta}_{t} = \boldsymbol{g}_{t}(\boldsymbol{\beta}_{t-1}, \boldsymbol{\eta}_{t})$ (3.8)

Ignoring trivial equations that merely shift the position of elements from one state vector to the next, (3.8) is the compact notation for the system of equations

$\displaystyle \pi_{t}$ $\displaystyle = x_{t-1}(\cdot) + \frac{1}{\zeta_{1}} \omega_{1t}$ (3.9)
$\displaystyle u_{t}$ $\displaystyle = \boldsymbol{\alpha}^{\prime}_{t-1} \boldsymbol{\Phi}_{t} + \frac{1}{\zeta_{2}} \omega_{2t}$ (3.10)
$\displaystyle \boldsymbol{\alpha}_{t}$ $\displaystyle = \boldsymbol{\alpha}_{t-1} + \boldsymbol{\Lambda }_{t}$ (3.11)
$\displaystyle \nu_{t+2}^{2}$ $\displaystyle = \tilde{\nu}_{t}^{2}$    
$\displaystyle \nu_{t+3}^{3}$ $\displaystyle = \tilde{\nu}_{t}^{3}$    
$\displaystyle \nu_{t+f}^{f}$ $\displaystyle = \tilde{\nu}_{t}^{f}$    

where

\begin{displaymath} \left( \begin{array}[c]{l} \tilde{\nu}_{t}^{2}\ \tilde{\nu}_{t}^{3}\ \tilde{\nu}_{t}^{f} \end{array}\right) \sim iid (\boldsymbol{0}, \boldsymbol{\tilde{V}}) \end{displaymath}

(3.9), (3.10), and (3.11) repeat (3.1), (3.2), and (3.3), respectively. Note that $ x$ has been written as a function in (3.9) in order to point out that it will be a policy rule depending on the best estimate of $ \boldsymbol{\beta }_{t}$. Furthermore, (3.9) is an element of the vector $ \boldsymbol{\Phi}_{t}$ in (3.10). Hence, (3.9) and (3.10) demonstrate the nonlinear parts of the transition equation.

Turning to the observation equation, the observed vector is

$\displaystyle \boldsymbol{y}_{t} = (\pi_{t} , u_{t}^{0} , u_{t}^{1} , u_{t}^{2} , u_{t}^{3} ,u_{t}^{f} )^{\prime} $

so that the state is measured

$\displaystyle \boldsymbol{y}_{t}$ $\displaystyle = \boldsymbol{H} \boldsymbol{\beta}_{t} + \boldsymbol{\varepsilon}_{t} = \boldsymbol{H} \boldsymbol{\beta}_{t} + \left( \par \begin{array}[c]{c} 0\\ \tilde{\nu}_{t}^{1}\\ \boldsymbol{0} \end{array} \right)$ (3.12)

where $ \tilde{\nu}_{t}^{1} \sim i.i.d. \left( 0,(1/\zeta_{\varepsilon} )^{2}\right) $ . $ \boldsymbol{H}$ gives the linear combinations written in (3.7).

3.4  Estimation

While (3.12) is linear, (3.8) is nonlinear. The Extended Kalman filter approximates the state space model using a taylor-expansion about the linear prediction of the state, as suggested by Anderson and Moore (1979) and following Tanizaki (1996). Considering (3.9)-(3.11), the second-order expansion completely represents the function; however, I have found little difference in practice between the first-order and second-order expansions and use the former since it does not require computing second derivatives numerically. Having a method of approximating the optimal predictions and updates that relies only on matrix multiplication makes the entire estimation procedure computationally reasonable. In the interest of exposition, discussion of the Extended Kalman filter and the likelihood is put in Appendix AII.

The parameter estimated is

$\displaystyle \boldsymbol{\Psi} \equiv\left( \zeta_{1},\text{vech}\left( \text{Chol} \left( \boldsymbol{V}\right) \right) ^{\prime},\text{vech}\left( \text{Chol}\left( \boldsymbol{P}_{1\vert}\right) \right) ^{\prime},\right) ^{\prime} $

where Chol$ (\cdot)$ is the Cholesky factor of positive definite matrix $ \boldsymbol{A}$ such that Chol$ (\boldsymbol{A})$Chol$ (\boldsymbol{A})^{\prime}= \boldsymbol{A}$.16

Because of $ \boldsymbol{\Psi}$'s large dimension, I follow Sargent, Williams, and Zha (2006) and use a Bayesian empirical method discussed in Appendix AIV. This procedure assumes the shocks are Gaussian. From the simulated posterior distribution I report medians as my point estimates and quantiles as probability intervals for the parameters. Following Sargent, Williams, and Zha (2006), $ \boldsymbol{a}_{1\vert}$ is set to a regression estimate from presample data and $ \zeta_{2}$ is set to be 59, indicating the Fed believes its Philips curve, if the parameters were known, would deliver forecasts with one-tenth the RMSE of naive random walk forecasts; see Appendix A.I for a detailed discussion.17

Given a prior $ p(\boldsymbol{\Psi})$ and the likelihood $ \mathcal{L} (\mathcal{Y}_{T}\vert\boldsymbol{\Psi})$ (in Appendix A.II), the posterior distribution is

$\displaystyle p(\boldsymbol{\Psi}\vert\mathcal{Y}_{T}) \propto\mathcal{L}(\mathcal{Y} _{T}\vert\boldsymbol{\Psi})p(\boldsymbol{\Psi})$ (3.13)

I sample from (3.13) using a Metropolis algorithm with random walk proposals (cf. Robert and Casella (2004)).18


4.  Empirical Results

The outline of this section is as follows. The first subsection presents the estimation results for the model without data uncertainty, which are essentially identical to those in Sargent, Williams, and Zha (2006). I describe three problems that emerge from these results. The next subsection explains the importance of disciplining my model with the observed characteristics of data revisions. The significance of this discipline is straightforward after one understands the problems that must be addressed. The last subsection presents the estimation results for the model including data uncertainty which fixes the fore mentioned problems.

4.1  Without Data Uncertainty

Data on inflation and the civilian unemployment rate for ages 16 and older comes from the BEA and BLS, respectively, as reported in December 2003.19 The data begins in 1960 and I end the sample at December 1995.20 Table 1 shows estimates from 75,000 draws derived from 100,000 MCMC iterations where the first 25% are burned in hopes that the Markov Chain has, for practical purposes, converged to its ergodic distribution.21 The estimates in Table 1 are virtually identical to those of Sargent, Williams, and Zha (2006).22

Figure 1 shows the predicted inflation control choices. Figure 1 shows the Fed choosing to set inflation high in the two high-inflation episodes of the mid 1970s and early 1980s. With $ \zeta_{1}$ estimated at about 4.24, the standard deviation of the $ \omega_{1t}$ is around 0.2, reflecting the Fed's belief that it has rather tight control of inflation.

The Philips curve beliefs $ a_{t-1\vert t-1}$ are used to forecast the next month's unemployment rate for any inflation control setting. According to the model, the Fed sets inflation with this forecast in mind. Therefore an important aspect of the model-predicted Fed beliefs are what they deliver in terms of unemployment forecasts: these are plotted in Figure 2.23 These forecasts have some negative bias and a Root Mean Squared Error of 3.3 percentage points. Roughly speaking, every four months the Fed expects its month-ahead unemployment rate forecasts to be off by 3.3 percentage points.

An important aspect of these unemployment rate forecasts is how well they explain actual Federal Reserve unemployment rate forecasts. There is evidence of the latter from the Greenbook forecasts over this time span.24 An appropriate statistical test of the similarity between the model-predicted forecasts and the Greenbook forecasts is that of Diebold and Mariano (1995). Their statistic $ S_{1}$ is a two-sided test of the null hypothesis that the model's unemployment forecasts have accuracy equal to the Greenbook forecasts and $ S_{1}$ has an asymptotic standard normal distribution. Here $ \vert\hat{S}_{1}\vert = 2.5044$, so we can reject the null of equal forecasting accuracy at the 99% level. This means that in terms of accuracy the model-predicted forecasts are statistically different than actual Greenbook forecasts.

Given the large value of $ \zeta_{2}$ (equivalently, the small variance of $ \omega_{2t}$), volatile and inaccurate unemployment forecasts are evidence of greatly varying parameters. Hence the estimate of the parameter shock's covariance matrix is large. In particular, notice that the estimated $ V^{(6,6)}$ implies that the Fed believes that every month the Philips curve's intercept experiences an i.i.d. shock with a standard deviation of about 5 unemployment rate points. This approximately means the Fed believes that every month there is a 30% chance the natural rate of unemployment jumps by 5 percentage points in either direction, even if inflation is kept at target. In other words, the Fed believes its model of the world is wildly unstable.

These results render the optimal policy story very problematic. On the one hand, the estimates imply the Fed regarded its unemployment forecasting tool as subject to large unpredictable shifts that cause its forecasts to be erratic and inaccurate. On the other hand, the model's main point is that the Fed's estimated Philips trade-off was motivation for inflationary policy in hopes of lowering unemployment. Given the large estimate of $ \boldsymbol{V}$ and its poor forecasting performance, it is implausible that the Fed believed its Philips curve enough to take on the pain of high inflation in hopes of decreasing unemployment. Moreover, even if the Fed did believe its estimated Philips curve enough to use it as basis for policy, we will see below that the evolution of the Philips curve trade-off does not explain inflation's rise and fall.

4.2  Observing Revision Errors

At this point it is clear that my modification introduces a new shock called the data revision error. Moreover, note that data revision errors and time-varying parameter shocks both affect the Fed's unemployment forecast performance: the Fed would expect these forecasts to be poor if the data is imprecisely observed or the time-varying parameters are jumping around a lot. This idea emerges in the Fed's filtering problem that decomposes unemployment forecast errors into predicted parameter shocks and predicted data revision errors.25 Thus, by simply introducing another shock into the model, we might soak up some of the force driving the previous section's results, where large forecast errors went hand-in-hand with large parameter shocks. However, we would be doing so by simply adding another latent variable, and the degree to which the forecast errors were attributed to parameter shocks versus data revision errors would be mostly due to our priors on their respective sizes.

I avoid this problem by specifying the properties of the data revision errors instead of estimating them. The variances and covariances of the revision errors ( $ \boldsymbol{\tilde{V}},\zeta_{\varepsilon}$) are set to values independently estimated on the data; basically, there is no correlation between revision errors and the largest revision error by far is the final one whose standard deviation is about 0.11.26 This is tantamount to imposing that the Fed knows the data uncertainty surrounding the unemployment rate in the same way it knows its own objective function parameters. This assumption is reasonable since data revisions are observable and their statistical properties are easily known. This way, model estimation does not deliver results that necessarily drive down the size of parameter shocks by attributing an exaggerated role to data uncertainty. Instead, the model takes as given and observable the characteristics and realizations of the data revision errors, and estimates the size and time path of the parameter shocks in light of this data.

4.3  Including Data Uncertainty

Multiple vintages of real-time data on inflation and the civilian unemployment rate for ages 16 and older comes from the ALFRED archive maintained by the Federal Reserve Bank of St. Louis, downloaded in 2007.27 Table 2 reports estimates from 700,000 MCMC iterations from two separate runs of 400,000 with different initial conditions where the first 50,000 of each run is burned.

Figure 3 shows that the Fed's inflation control explains the rise and fall of American inflation. The estimated standard deviation of the inflation control shock is 0.4 and the fit of the model-predicted inflation to actual inflation is a little poorer than in Section 4.1 and Sargent, Williams, and Zha (2006). However, since the point of the model is to explain low frequency movements in inflation (the Great Inflation), I argue that this deterioration in high-frequency fit is not a problem.

Turning now to the Fed's unemployment rate forecasts in Figure 4, we find a far different picture than in the model without data uncertainty. The Fed's forecasts are considerably more accurate than before, with no bias and a RMSE of 0.24 percentage points.28

Again, seeing as the model forecasts are intended to predict the Fed's actual forecasts, we can directly compare them to Greenbook unemployment rate forecasts. Using the Diebold and Mariano (1995) test of equal accuracy between model forecasts and Greenbook forecasts, $ \vert\hat{S}_{1}\vert = 0.4858$ and we accept the hypothesis that the model forecasts and Greenbook forecasts are equally-accurate. Therefore in terms of accuracy the model-predicted unemployment rate forecasts are statistically indistinguishable from Fed unemployment rate forecasts.

The estimate of $ \boldsymbol{V}$ in Table 2 is much smaller than before. For instance, the estimated $ V^{(6,6)}$ implies that the Philips curve's intercept has a monthly shock with a standard deviation of about $ \frac{1}{5}$ as opposed to the 5 points estimated with data uncertainty left out.29 Roughly speaking, the shocks are smaller by about 8 times for inflation parameters, 5 times for unemployment parameters, and 20 times for the constant parameter. The Fed is now confident that its Philips curve is not hopelessly unstable. It is now plausible that the Fed believes (3.2) and sets inflation in light of an existing Philips curve trade-off.

Let us turn now to the predicted evolution of the Fed's beliefs about the Philips curve trade-off. As seen in the top panel of Figure 5, the trade-off in Sargent, Williams, and Zha (2006) experiences a large jump between 1973 and 1975 which explains the great rise in inflation over those years. Thereafter the trade-off shows a gradual decline, with no sharp activity around the disinflation of the early 1980s.30But this does not bear out the main story, which is that the evolution of the Philips curve trade-off led to the rise and fall of inflation. In the model without data uncertainty, the dramatic fall of inflation from 14.4% to 2.2% between 1980 and 1984 occurs without any concurrent sharp change in the Fed's beliefs.

On the other hand, consider the bottom panel predicted by the model including data uncertainty. I predict a drastic drop in the Philips curve trade-off starting around 1980. As this perceived trade-off falls by 98% off its peak, inflation falls by than 80% off its peak, and both bottom out in 1984. Moreover, the lull in inflation in the late 1970s corresponds to the fall in the negative Philips curve relationship starting in 1975. 31 Thus, the model including data uncertainty describes a strong and consistent connection between the Fed's inflation control and the Fed's beliefs about the Philips curve trade-off.

In sum, we have explicitly allowed for some data uncertainty that the Fed might have held about the unemployment situation. Since this data is revised, it seems natural to suppose the Fed wouldn't regard real-time observations as exact. By so doing, we drastically but constructively change the results in Sargent, Williams, and Zha (2006). Now the model reverse engineers a plausible story that the Fed's evolving Philips curve beliefs were directly tied to its control of inflation.


5.  Why Data Uncertainty Matters Here

An obvious question is, how can such small data revision errors cause such large changes in the model performance and estimates? The answer lies in understanding that the errors are not small in comparison to the unemployment parameter shocks.

In standard deviation terms, the data errors are about three times larger than the shocks to the unemployment parameters. On the other hand, by ignoring data uncertainty the researcher perceives the correlations between unemployment parameter shocks and the other parameter shocks as predominantly positive - see Table 1. This situation is reminiscent of the last example in Section 2 under the particular restriction that data predictions are nonnegative, data errors are uncorrelated, and parameter shocks are nonnegatively correlated. Hence, we might expect that the researcher's estimated parameter shock size unit would be biased upwards, and indeed this is the case. Figure 6 shows the bias on estimates of the parameter shock size unit. This bias, which recall is important for decomposing the forecast error into the part contributed by parameter shocks, is time-varying because it depends on the value of each period's data and time-varying parameter predictions. During the sample period, this bias is always greater than 10% and attains values as high as 146%.

In essence, the overestimate of the unemployment parameters' shocks leads the model to overestimate the shocks to the inflation and constant parameters in order to explain inflation, and these greatly varying parameters rot the unemployment rate forecast performance; the explanation of this is as follows. By ignoring the Fed's data uncertainty, we overestimate the size of the shocks to the parameters on unemployment in the Philips curve (3.2). Because of this, the unemployment beliefs are adjusted too much in response to an unemployment rate forecast error. Recall that these parameters are key ingredients to the optimal policy rule governing inflation. But since inflation over this period is rather persistent, ceteris paribus next period's optimal policy rule should be (to fit the data) somewhat near to this period's policy rule. Therefore, the size of shocks to the parameters on inflation and the constant are overestimated: this allows them to shift a lot too, period to period, in order to smooth out the path of the policy rule. This is clearly seen in the time path of Philips curve constant, plotted in Figure 7, which moves around a lot in order to fit the policy rule to actual inflation. The constant term belief moves much more sluggishly when data uncertainty is explicitly modeled.

This example suggests that ignored data uncertainty can seriously influence a model's results. In particular, estimates of the agent's model uncertainty are positively biased. For this bias, the size of the data uncertainty relative to the relevant model uncertainty - here the size of shocks to the parameters on unemployment - is important. Moreover, the process of fitting the model to the data can exacerbate the issue.


6.  Conclusion

This paper analyzes the effects of data uncertainty in models with agents who update their beliefs. Ignoring data uncertainty can seriously bias estimates of agents' model uncertainty. After illustrating this point through simple examples, I apply it to the more sophisticated framework of Sargent, Williams, and Zha (2006). I show that accounting for data uncertainty remedies problems having to do with overestimates of the Fed's model uncertainty and the dissimilarity of model-predicted unemployment rate expectations to actual Greenbook evidence. The Federal Reserve is more sluggish to change its beliefs in response to new data when this data may be revised. Once this is the case, the model predicts that the inflation of the 1970s and 1980s can be strongly tied to evolving beliefs about the Philips curve trade-off between inflation and unemployment.

Table 1.  Parameter Estimates, Wihtout Data Uncertainty - Panel 1: Information on $ \zeta_{1}$.

$ \zeta_{1}$: 4.24 (3.96, 4.47)

Table 1.  Parameter Estimates, Wihtout Data Uncertainty - Panel 2: $ \boldsymbol{V}$: Standard Deviations and Correlations

 Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Row 1 0.2871-0.95170.18600.9708-0.1256-0.2835
Row 2  0.28440.002-0.99760.29530.47
Row 3   0.17230.04260.94420.8187
Row 4    0.1769-0.256-0.4346
Row 5     0.22740.8712
Row 6      5.0817

Notes: at median of posterior distribution. For top panel, $ \zeta_1^2$ is the precision of $ \omega_{1t}$, the additive shock to the Fed's inflation control, and 95% probability intervals in parentheses. The bottom array is comprised as follows: the main diagonal are the square roots of the main diagonal of $ \mmm{V}$; the off-diagonal elements are the correlations derived from $ \mmm{V}$; $ \mmm{V}$ is the covariance matrix of the $ \mmm{\Lambda}_t$ shock to the time-varying parameters $ \mmm{\alpha}_t$. The vector $ \mmm{\Phi}_t = (\pi_t, \pi_{t-1}, u_{t-1}, \pi_{t-2}, u_{t-2}, 1)'$ multiplies $ \mmm{\alpha}_t$.

Table 2.  Parameter Estimates Including Model Uncertainty - Panel 1: Information on $ \zeta_{1}$.

$ \zeta_{1}$: 2.23 (2.21, 2.25)

Table 2.  Parameter Estimates Including Model Uncertainty - Panel 2: $ \boldsymbol{V}$:Standard Deviations and Correlations

 Column 1Column 2 Column 3 Column 4 Column 5 Column 6
Row 1 0.01920.1749-0.5230.52940.3461-0.8742
Row 2  0.0227-0.2719-0.690.329-0.1155
Row 3   0.0478-0.1357-0.97270.2646
Row 4    0.0242-0.0012-0.3808
Row 5     0.0436-0.0682
Row 6      0.1953

Notes: at median of posterior distribution. For top panel, $ \zeta_1^2$ is the precision of $ \omega_{1t}$, the additive shock to the Fed's inflation control, and 95% probability intervals in parentheses. The bottom array is comprised as follows: the main diagonal are the square roots of the main diagonal of $ \mmm{V}$; the off-diagonal elements are the correlations derived from $ \mmm{V}$; $ \mmm{V}$ is the covariance matrix of the $ \mmm{\Lambda}_t$ shock to the time-varying parameters $ \mmm{\alpha}_t$. The vector $ \mmm{\Phi}_t = (\pi_t, \pi_{t-1}, u_{t-1}, \pi_{t-2}, u_{t-2}, 1)'$ multiplies $ \mmm{\alpha}_t$.

Figure 1.  Actual vs. Fed Control, Without Data Uncertainty

Figure 1, top panel: CPI inflation and model predicted Fed inflation control. Bottom panel: prediction errors. NBER recessions shaded. Figures 1 and 3 are on the same scale. The prediction errors are quite small.

Top panel: Actual inflation and model predicted Fed inflation control. Bottom panel: prediction errors. NBER recessions shaded. Figures 1 and 3 are on the same scale.

Figure 2.  Actual Unemployment vs. Fed Forecasts, Without Data Uncertainty

Figure 2, top panel: step-ahead unemployment forecasts come from the Philips curve using the Fed’s inflation setting and actual unemployment and inflation data. Bottom panel: predicted forecast errors. NBER recessions shaded. Figures 2 and 4 are on the same scale. The predicted forecast errors are quite large.

Top panel: step-ahead unemployment forecasts come from the Philips curve (3.2) using the Fed's inflation setting and actual unemployment and inflation data. Bottom panel: predicted forecast errors. NBER recessions shaded. Figures 2 and 4 are on the same scale.

Figure 3.  Actual Inflation vs. Fed Control, Including Data Uncertainty

Figure 3, top panel: CPI inflation and model predicted Fed inflation control. Bottom panel: prediction errors. NBER recessions shaded. Figures 1 and 3 are on the same scale. The prediction errors are quite small.

Notes: Top panel: Actual inflation and model predicted Fed inflation control. Bottom panel: prediction errors. NBER recessions shaded. Figures 1 and 3 are on the same scale.

Figure 4.  Actual Unemployment vs. Fed Forecast, Including Data Uncertainty

Figure 4, top panel: step-ahead unemployment forecasts come from the Philips curve using the Fed’s inflation setting and actual unemployment and inflation data. Bottom panel: predicted forecast errors. NBER recessions shaded. Figures 2 and 4 are on the same scale. The predicted forecast errors are quite small.

Notes: Top panel: step-ahead unemployment forecasts come from the Philips curve (3.2) using the Fed's inflation setting and actual unemployment and inflation data. Bottom panel: predicted forecast errors. NBER recessions shaded. Figures 2 and 4 are on the same scale.

Figure 5.  Evolution of the Philips Curve Trade-Off

Figure 5, top panel: time series of Philips curve inflation response estimates from model without data uncertainty. Bottom panel: time series of Philips curve inflation response estimates from model including data uncertainty. NBER recessions are shaded. The top panel shows a sharp negative decrease around 1973, and then steady trend upward crossing 0 around 1990. The bottom panel shows a trade-off that closely mirrors the low-frequency movement of inflation.

Notes: Top panel: time series of Philips curve inflation response estimates from model without data uncertainty. Bottom panel: time series of Philips curve inflation response estimates from model including data uncertainty. NBER recessions are shaded.

Figure 6.  Percentage Bias of Estimated Parameter Shock Size

Figure 6:  At each point in time, bias is determined by inflation and unemployment predicted values, perceived correlation and relative size of parameter shocks from model without data uncertainty, and actual relative size of unemployment errors to unemployment parameter shocks from model including data uncertainty. NBER recession shaded. Plot resembles inflation plot, with large peaks, and is always great than 10% with maxima around 140%.

Notes: at each point in time, bias determined by inflation and unemployment predicted values, perceived correlation and relative size of parameter shocks from model without data uncertainty, and actual relative size of unemployment errors to unemployment parameter shocks from model including data uncertainty. NBER recessions shaded.

Figure 7.  Evolution of the Philips Curve Constant

Figure 7, left panel: time series of Philips curve constant estimates from model without data uncertainty. Right panel: time series of Philips curve constant estimates from model including data uncertainty. NBER recessions are shaded. Left panel shows large variation in values between 5 and 25. Right panel shows slow movement between 2 and 5.5.

Notes: Left panel: time series of Philips curve constant estimates from model without data uncertainty. Right panel: time series of Philips curve constant estimates from model including data uncertainty. NBER recessions are shaded.


References

Anderson, B. D. O., and J. B. Moore (1979): Optimal Filtering. Prentice-Hall.

Andrews, D. W. K. (1991): "Heteroskedasticity and Autocorrelation Consistent Covariance
Matrix Estimation," Econometrica, 59(3), 817-58.

Aruoba, S. B. (2004): "Data Uncertainty in General Equilibrium," Computing in Economics and
Finance 2004 131, Society for Computational Economics.

Boivin, J. (2006): "Has US Monetary Policy Changed? Evidence from Drifting Coefficients and
Real-Time Data," Journal of Money, Credit, and Banking, 38(5), 1149-1173.

Carboni, G., and M. Ellison (2007): "Learning and the Great Inflation," Discussion paper, University of Warwick.

---- (2008): "The Great Inflation and the Greenbook," Discussion paper, University of Oxford.

Clarida, R., J. Gali, and M. Gertler (2000): "Monetary Policy Rules And Macroeconomic Stability: Evidence And Some Theory," The Quarterly Journal of Economics, 115(1), 147-180.

Cooley, T. F., and E. C. Prescott (1976): "Estimation in the Presence of Stochastic Parameter Variation," Econometrica, 44(1), 167-84.

Croushore, D., and T. Stark (2001): "A real-time data set for macroeconomists," Journal of Econometrics, 105(1), 111-130.

Cunningham, A., C. Jeffery, G. Kapetanios, and V. Labhard (2007): "A State Space Approach To The Policymaker's Data Uncertainty Problem," Money Macro and Finance (MMF) Research Group Conference 2006 168, Money Macro and Finance Research Group.

Diebold, F. X., and R. S. Mariano (1995): "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, 13(3), 253-63.

Elliott, G., I. Komunjer, and A. Timmermann (2005): "Estimation and Testing of Forecast Rationality under Flexible Loss," Review of Economic Studies, 72(4), 1107-1125.

Evans, G. W., and S. Honkapohja (2001): Learning and Expectations in Macroeconomics. Princeton University Press.

Friedman, M. (1968): "The Role of Monetary Policy," The American Economic Review, 58(1), 1-17.

Ghysels, E., N. R. Swanson, and M. Callan (2002): "Monetary Policy Rules with Model and Data Uncertainty," Southern Economic Journal, (69), 239-265.

Harvey, A. C. (1989): Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

Howrey, E. P. (1978): "The Use of Preliminary Data in Econometric Forecasting," The Review of Economics and Statistics, 60(2), 193-200.

Nason, J. M., and G. W. Smith (forthcoming): "The New Keynesian Phillips Curve: Lessons from Single-Equation Econometric Estimation," Economic Quarterly.

Oh, S., and M. Waldman (1990): "The Macroeconomic E ects of False Announcements," The Quarterly Journal of Economics, 105(4), 1017-34.

Orphanides, A. (2001): "Monetary Policy Rules Based on Real-Time Data," American Economic Review, 91(4), 964-985.

Orphanides, A., and J. C. Williams (2006): "Monetary Policy with Imperfect Knowledge," Journal of the European Economic Association, 4(2-3), 366-375.

Owyang, M. T., and G. Ramey (2004): "Regime switching and monetary policy measurement," Journal of Monetary Economics, 51(8), 1577-1597.

Primiceri, G. E. (2005): "Time Varying Structural Vector Autoregressions and Monetary Policy," Review of Economic Studies, 72(3), 821-852.

Raftery, A., and S. Lewis (1992): "How Many Iterations in the Gibbs Sampler,".

Robert, C. P., and G. Casella (2004): Monte Carlo Statistical Methods. Springer, 2 edn.

Romer, C. D., and D. H. Romer (1990): "Does Monetary Policy Matter? A New Test in the Spirit of Friedman and Schwartz," NBER Working Papers 2966, National Bureau of Economic Research, Inc.

Runkle, D. E. (1998): "Revisionist history: how data revisions distort economic policy research," Quarterly Review, (Fall), 3-12.

Sargent, T. J. (1999): The Conquest of American Inflation. Princeton University Press.

Sargent, T. J., N. Williams, and T. Zha (2006): "Shocks and Government Beliefs: The Rise and Fall of American Inflation," American Economic Review, 96(4), 1193-1224.

Sims, C. (2007): "Evidence for passive monetary policy pre-1979, comments," Discussion paper, Federal Reserve Bank of Cleveland conference.

Sims, C. A. (2006): "Improving Monetary Models," Discussion paper. Tanizaki, H. (1996): Nonlinear Filters: Estimation and Applications. Springer-Verlag, 2 edn.

Watson, M. W., and R. F. Engle (1983): "Alternative Algorithms for the Estimation of Dynamic Factor, MIMIC, and Varying Coefficient Regression Models," Journal of Econometrics, 23, 385-400.

Zellner, A. (1958): "A Statistical Analysis of Provisional Estimates of Gross National Product and its Components, of Selected National Income Components, and of Personal Saving," Journal of the American Statistical Association, 53(281), 54-65.


Appendix

A.I  Federal Reserve Model Comparison

I have dropped Sargent, Williams, and Zha (2006)'s "true" Lucas natural-rate Philips curve from my model presentation. The estimation of the Sargent, Williams, and Zha (2006) Lucas natural-rate Philips curve does not affect the estimates of the belief-formation parameters apart from $ \zeta_{2}$ because the relevant priors are independent and, otherwise, the likelihood does not tie the two together: this relates to Carboni and Ellison (2008)'s statements that including the "true" curve in the overall model gives an `informational gain [that] is low' and that the belief-formation results `are robust to embedding' the model of Section 3.1 'in a "true" model.'

Note that $ \zeta_{2}$ is unidentified by the model without data uncertainty. Sargent, Williams, and Zha (2006) deals with this problem by normalizing it such that $ \frac{1}{\zeta_{2}}$ is one-tenth the standard deviation of the shock in the Lucas natural-rate equation. In practice, that paper's assumption implies that $ \zeta_{2} = 59.7108$ which means that the Fed thinks that, if it knew the parameters, the Philips curve would forecast unemployment up to an exogenous error with standard deviation $ \frac{1}{59.7108}=0.0167$. That paper describes the assumption as implying that the Fed believes "that the standard deviation of the [its] regression error is smaller by a factor of ten than the standard deviation exogenous unemployment shocks."

Since the Philips curve is used by the Fed to forecast its affects on unemployment, consider a very common forecasting rule: the random-walk unemployment forecast. I can make an assumption relative to random-walk unemployment forecast errors that is similar to Sargent, Williams, and Zha (2006)'s assumption relative to Lucas natural-rate Philips curve shocks. This assumption is: "the Fed believes that the standard deviation of its Philips curve forecast error is smaller by a factor of ten than the standard deviation of random-walk unemployment forecast errors." This assumption in practice implies that $ \zeta_{2}=54.9753$ since the standard deviation of random-walk unemployment forecast errors is 0.1819. Hence I arrive at choice for $ \zeta_{2}$ that is similar to Sargent, Williams, and Zha (2006)'s, but without the Lucas natural-rate Philips curve.

To ensure that my results are robust to these factors, I have done the following. I have reproduced the results of Sargent, Williams, and Zha (2006)'s entire model. I have estimated the Section 3.2 model while setting $ \zeta_{2}=59.7108$ (calculated from Sargent, Williams, and Zha (2006)'s estimate of 35.6538 as the precision of the shock to the Lucas natural-rate Philips curve). And I have estimated the Section 3.2 model by using $ \zeta_{2}=54.9753$ based on assumed Fed beliefs as to its forecasting ability relative to random-walk forecasts. The results (parameter estimates, beliefs, inflation choices, and unemployment forecasts) are virtually the same.32 In order to provide a clear comparison to previous literature, I simply set $ \zeta_{2}=59$.

As an aside, $ \zeta_{2}$ is identified in the model including data uncertainty: the affect of $ \boldsymbol{V}$ on the first derivative of the optimal control policy rule allows that parameter to be pinned down apart from $ \zeta_{2}$, eliminating the identification problem that exists in the totally linear model without data uncertainty. However, I do not estimate this parameter so as to make as clear as possible my results' comparison to Sargent, Williams, and Zha (2006).

A.II  Extended Kalman Filter and Likelihood

Extended Kalman Filter

To implement, we first approximate the state space model by first-order expansions. Let $ \mathbb{E}(\boldsymbol{\beta}_{t}\vert\mathcal{Y}_{s}) \equiv\boldsymbol{b}_{t\vert s}$ and $ \mathbb{V}ar\left( \boldsymbol{\beta} _{t}\vert\mathcal{Y}_{s}\right) \equiv\Sigma_{t\vert s}$ for $ \mathcal{Y}_{s} \equiv\sigma(\boldsymbol{y}_{s}, \boldsymbol{y}_{s-1},\ldots)$ . The expansion of (3.12) about $ (\boldsymbol{\beta}_{t} ,\boldsymbol{\varepsilon}_{t}) = (\boldsymbol{b}_{t\vert t-1},0)$ is exact:

$\displaystyle \boldsymbol{h}_{t}(\boldsymbol{\beta}_{t},\boldsymbol{\varepsilon}_{t}) = \boldsymbol{H} \boldsymbol{b}_{t\vert t-1} + \boldsymbol{H}(\boldsymbol{\beta} _{t}-\boldsymbol{b}_{t\vert t-1}) + \boldsymbol{\varepsilon}_{t}$ (A.1)

The expansion of (3.8) about $ (\boldsymbol{\beta} _{t-1},\boldsymbol{\eta}_{t}) = (\boldsymbol{b}_{t-1\vert t-1},0)$ is approximate:

$\displaystyle \boldsymbol{g}_{t}(\boldsymbol{\beta}_{t-1},\boldsymbol{\eta}_{t}) \approx\boldsymbol{g}_{t\vert t-1} + \boldsymbol{T}_{t\vert t-1}(\boldsymbol{\beta}_{t} - \boldsymbol{b}_{t-1\vert t-1}) + \boldsymbol{R}_{t\vert t-1} \boldsymbol{\eta} _{t}$ (A.2)

where

$\displaystyle \boldsymbol{g}_{t\vert t-1}$ $\displaystyle = \boldsymbol{g}_{t}(\boldsymbol{b}_{t-1\vert t-1},0)$
$\displaystyle \boldsymbol{T}_{t\vert t-1}$ $\displaystyle = \frac{\partial\boldsymbol{g}_{t} (\boldsymbol{\beta}_{t-1},\boldsymbol{\eta}_{t})}{\partial\boldsymbol{\beta }_{t-1}^{\prime} }\bigg\vert _{(\boldsymbol{b}_{t-1\vert t-1},0)}$
$\displaystyle \boldsymbol{R}_{t\vert t-1}$ $\displaystyle = \frac{\partial\boldsymbol{g}_{t} (\boldsymbol{\beta}_{t-1},\boldsymbol{\eta}_{t})}{\partial\boldsymbol{\eta }_{t}^{\prime} }\bigg\vert _{(\boldsymbol{b}_{t-1\vert t-1},0)}$

I motivate the derivation of optimal prediction and updating for the approximating system (A.1) and (A.2) by assuming Gaussian shocks, as in Howrey (1978), Watson and Engle (1983), and Harvey (1989).33 In this case, the relevant conditional expectations have the known forms given below. In particular, I assume

$\displaystyle \boldsymbol{\eta}_{t} \sim iid \: \mathcal{N}(\boldsymbol{0},\boldsymbol{Q}), \quad\quad\boldsymbol{\varepsilon}_{t} \sim iid \: \mathcal{N}(\boldsymbol{0} ,\boldsymbol{N}), \quad\quad\boldsymbol{\eta}_{t} \bot\boldsymbol{\varepsilon }_{\tau}, \forall t,\tau $

The Extended Kalman Filtering equations are

$\displaystyle \boldsymbol{b}_{t\vert t-1}$ $\displaystyle = \boldsymbol{g}_{t\vert t-1}$ (A.3)
$\displaystyle \boldsymbol{\Sigma}_{t\vert t-1}$ $\displaystyle = \boldsymbol{T}_{t\vert t-1} \boldsymbol{\Sigma }_{t-1\vert t-1} \boldsymbol{T}_{t\vert t-1}^{\prime}+ \boldsymbol{R}_{t\vert t-1} \boldsymbol{Q} \boldsymbol{R}_{t\vert t-1}^{\prime}$ (A.4)
$\displaystyle \boldsymbol{y}_{t\vert t-1}$ $\displaystyle = \boldsymbol{H} \boldsymbol{b}_{t\vert t-1}$ (A.5)
$\displaystyle \boldsymbol{F}_{t\vert t-1}$ $\displaystyle = \boldsymbol{H} \boldsymbol{\Sigma}_{t\vert t-1} \boldsymbol{H}^{\prime}+ \boldsymbol{N}$ (A.6)
$\displaystyle \boldsymbol{M}_{t\vert t-1}$ $\displaystyle = \boldsymbol{H} \boldsymbol{\Sigma}_{t\vert t-1}$ (A.7)
$\displaystyle \boldsymbol{K}_{t}$ $\displaystyle = \boldsymbol{M}_{t\vert t-1}^{\prime}\boldsymbol{F} _{t\vert t-1}^{-1}$ (A.8)
$\displaystyle \boldsymbol{\Sigma}_{t\vert t}$ $\displaystyle = \boldsymbol{\Sigma}_{t\vert t-1} - \boldsymbol{K} _{t} \boldsymbol{F}_{t\vert t-1} \boldsymbol{K}_{t}^{\prime}$ (A.9)
$\displaystyle \boldsymbol{b}_{t\vert t}$ $\displaystyle = \boldsymbol{b}_{t\vert t-1} + \boldsymbol{K}_{t} (\boldsymbol{y}_{t} - \boldsymbol{y}_{t\vert t-1} )$ (A.10)

Conditional on the data $ \mathcal{Y}_{T}$, parameters $ \{\boldsymbol{G} ,\boldsymbol{H},\boldsymbol{Q},\boldsymbol{N}\}$ , and initial conditions $ \{\boldsymbol{b}_{1\vert},\boldsymbol{\Sigma}_{1\vert}\}$, the sequences of left hand side variables (A.3)-(A.10) are found by matrix multiplication.

$ \boldsymbol{Q}$ is a block diagonal matrix composed of $ \boldsymbol{V}, (1/\zeta_{1})^{2}, (1/\zeta_{2})^{2}, \boldsymbol{\tilde{V}}$ , and zeros otherwise. $ \boldsymbol{N}$ only has one nonzero entry: $ N^{(2,2)} =(1/\zeta_{\varepsilon})^{2}$. $ \boldsymbol{b}_{1\vert}$ is comprised of the true values of the revisions and economic variables, along with the estimate of the time-varying parameters from the model without data uncertainty for the starting time period of the model including data uncertainty.

$ \boldsymbol{\Sigma}_{1\vert}$ is a block diagonal matrix composed of $ \boldsymbol{P}_{1\vert}$ and the remaining elements specifying the initial uncertainty over the revisions and unemployment values. The off-diagonal elements are set to zero and the relevant diagonal elements are set to $ \tilde{V}^{(3,3)}$.

Likelihood

The likelihood is

$\displaystyle \mathcal{L}(\mathcal{Y}_{T}\vert\boldsymbol{\Psi}) = \prod^{T}_{t=1} \left\vert \boldsymbol{F}_{t\vert t-1} \right\vert ^{-1/2} \exp\left\{ -\frac{1}{2} (\boldsymbol{y} _{t} - \boldsymbol{y}_{t\vert t-1})^{\prime}\boldsymbol{F} _{t\vert t-1}^{-1} (\boldsymbol{y}_{t} - \boldsymbol{y}_{t\vert t-1})\right\} $
where the $ \boldsymbol{F}_{t\vert t-1}$ come from (A.6).

A.III  Greenbook Forecasts

The main issue with comparing the model predicted unemployment forecasts to Greenbook forecasts is the difference in the frequency of observation. The model forecasts the monthly unemployment rate one month into the future. The Greenbook forecasts quarterly unemployment rates and are released without rigid frequency. For example, there are Greenbook forecasts published monthly through the 1970s, but into the 1980s and 1990s these forecasts are published almost at a bimonthly frequency. I take the following steps to make the comparison.

First, I form a quarterly unemployment rate series as the average of unemployment rate for the three underlying months. It is against these series that the forecasts produce forecast errors.

Second, I form a quarterly model-forecast series as the average of the step-ahead forecasts for the three underlying months. That is, the model's quarterly unemployment forecast for quarter $ q$ composed of months $ m_{1},m_{2},m_{3}$ is

$\displaystyle \frac{1}{3}(u_{m_{1}\vert m_{1}-1} + u_{m_{2}\vert m_{1}} + u_{m_{3}\vert m_{2}}) $

where $ u_{j\vert j-1}$ is the forecast made at time $ j-1$ pertaining to time $ j$

Third, I form the quarterly Greenbook-forecast series as an average of all the forecasts made the month before or anytime during a quarter. That is, the Greenbook quarterly forecast for $ q$ composed of months $ m_{1},m_{2},m_{3}$ and immediately preceded by month $ m_{0}$ is

$\displaystyle \frac{1}{n_{obs}}(gb_{m_{0}} + gb_{m_{1}} + gb_{m_{2}} + gb_{m_{3}}) $

where $ gb_{j}$ is the Greenbook forecast for quarter $ q$ published in month $ j$. It should be noted that all four of these forecasts do not exist for every quarter, in which case only those observed are summed and $ n_{obs}$ adjusts to however many forecasts are observed.

The Diebold and Mariano (1995) statistic $ S_{1}$ takes forecast error series $ \{e_{it}\}$ and $ \{e_{jt}\}$

$\displaystyle S_{1} = \frac{\overline{d}}{\sqrt{\frac{2\pi\widehat{f_{d}(0)}}{T}}} $

where

$\displaystyle \overline{d} = \frac{1}{T}\sum_{t=1}^{T} \left( e_{it}-e_{jt}\right) $

and I take $ \widehat{f_{d}(0)}$ to be andrews91 quadratic-spectral HAC estimator. The errors under consideration run from 1970 through 1995 so that $ T=104$. The forecast errors from the Greenbook and the two models are graphed in Figure A.1(note different vertical scales).

Figure A.1  Forecast Errors Entering Equal-Accuracy Test

Figure A.1:  Left panel - time series of forecast errors coming from model with data uncertainty and forecast errors from Greenbook unemployment rate forecasts. The former errors attain large values of 6 and -6, while the latter are never more than 1. Right panel - time series of forecast errors coming from model including data uncertainty and forecast errors from Greenbook unemployment rate forecasts. Both series look to have the same magnitude.

Table A.1: Remaining Parameter Estimates - Panel 1: Model Without Data Uncertainty - $ \boldsymbol{P}_{1\vert}$: standard deviations and correlations.

 Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Row 1 0.32730.98750.9187-0.9953-0.6626-0.9959
Row 2  0.44120.9237-0.9978-0.6624-0.9972
Row 3   0.0748-0.9182-0.9134-0.9191
Row 4    0.77430.65450.9996
Row 5     0.04240.6515
Row 6      0.3111

Table A.1: Remaining Parameter Estimates - Panel 2: Model including Data Uncertainty - $ \boldsymbol{P}_{1\vert}$: standard deviations and correlations.

 Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Row 1 1.573-0.2426-0.3637-0.2986-0.46980.3426
Row 2  0.4379-0.6326-0.85290.6846-0.1535
Row 3   0.78390.8357-0.57500.0095
Row 4    0.4790-0.4403-0.0264
Row 5     0.5759-0.5757
Row 6      0.2482

Notes: The arrays are comprised as follows: the main diagonal is the square roots of the main diagonal of P1j0; the o -diagonal elements are the correlations derived from $ \boldsymbol{P}_{1\vert}$; $ \boldsymbol{P}_{1\vert}$ is the Fed's initial step-ahead uncertainty over the initial Philips curve parameter estimate.

A.IV  MCMC Implementation and Robustness

Priors

The prior for $ \boldsymbol{\Psi}$ is multivariate normal with a non-zero mean and a diagonal covariance matrix - so equivalently, the priors for each parameter are independent normals. The exact specifications are listed below where $ \boldsymbol{\varphi} \equiv\left( \text{vech}\left( \text{Chol} \left( \boldsymbol{V}\right) \right) ^{\prime},\text{vech}\left( \text{Chol}\left( \boldsymbol{P}_{1\vert}\right) \right) \right) ^{\prime}$ following the notation of Sargent, Williams, and Zha (2006):

$ \boldsymbol{\zeta_{1}}$ $ \mathcal{N}\left( 5,\frac{5^{2}}{3^{2} }\right) $
$ \boldsymbol{\varphi}$ Follows Sargent, Williams, and Zha (2006). For each element on the diagonal of Chol$ \left( \boldsymbol{V}\right) $ or Chol$ \left( \boldsymbol{P}_{1\vert}\right) $ the prior is $ \mathcal{N} \left( 0,5^{2}\times0.5\right) $; for those elements off the diagonal, it is $ \mathcal{N}\left( 0,2.5^{2}\times0.5\right) $,
Convergence of the MCMC

To address the convergence of the MCMC algorithm to its posterior distribution, I computed the number of iterations required to estimate the 0.025 quantile with a precision of 0.02 and probability level of 0.950 using the method of rafteryle92. For each chain (with different initial conditions) the max of these across $ \boldsymbol{\Psi}$ was below the $ 3.5E5$ iterations taken from each chain, suggesting that mixing the two chains (after burn-in) yields satisfactory precision.

Metropolis Algorithm

An important part of the MCMC algorithm sampling from the posterior in a reasonable number of iterations is the covariance matrix of the proposal random step in the Metropolis algorithm. The Metropolis algorithm is

  1. Given $ \boldsymbol{\Psi}^{\text{previous}}$, propose a new value
    $\displaystyle \boldsymbol{\Psi}^{\text{proposal}} = \boldsymbol{\Psi}^{\text{previous}} + \boldsymbol{\xi} $

    where $ \boldsymbol{\xi}$ is normal with mean zero and covariance matrix $ c\boldsymbol{\Sigma}_{\xi}$

  2. Compute
    $\displaystyle q = \min\left\{ \frac{p\left( \boldsymbol{\Psi}^{\text{proposal}} \vert\mathcal{Y}_{T}\right) }{p\left( \boldsymbol{\Psi}^{\text{previous} }\vert\mathcal{Y}_{T}\right) },1 \right\} $
  3. Randomly draw $ w \sim U(0,1)$
  4. If $ w\leq q$, accept $ \boldsymbol{\Psi}^{\text{proposal}}$ as current draw; otherwise, set $ \boldsymbol{\Psi}^{\text{previous}}$ as the current draw

Given the manner in which all parameters affect the optimal policy, I arrived at this proposal covariance matrix $ \boldsymbol{\Sigma}_{\xi}$ by doing the following. Using the covariance matrix for $ \boldsymbol{\varphi}$ numerically solved for as described in Sargent, Williams, and Zha (2006)'s Appendix D and the prior covariance terms for all other elements of $ \boldsymbol{\Psi}$ given above, the MCMC was started. For tens of thousands of iterations based on one initial condition, I considered only elements of the MCMC chain where a proposal had been accepted. From these chain elements I calculated the sample covariance matrix of the successful proposal shocks and set $ \boldsymbol{\Sigma}_{\xi}$ equal to this. I tried different initial conditions and took the weighted average of the Cholesky factors of these sample covariance matrices. The tuning parameter $ c$ was adjusted to achieve an acceptance rate of around 25-35% during the first 20,000 iterations: after this, it was unadjusted, as continual chain-dependent adjustment of Metropolis step-size can negate the ergodicity upon which MCMC methods are based (see Robert and Casella (2004)).

Remaining $ y_{t\vert t-1}$ forecasts

Figure A.2 shows the remaining measurement equation predictions: current, 1-lag, 2-lag, and 72-lag unemployment report forecasts.

A.IV.1  Using Current Vintage CPI Inflation and Unemployment Rate Data

The plots in Figure A.3 are produced with inflation data as current vintage of CPI inflation (made available by those authors). The $ \boldsymbol{V}$ and $ \boldsymbol{P}_{1\vert}$ estimates are close to those reported in Table 1 for the model without data uncertainty while the estimate of $ \zeta_{1}$ is about one-half as large - they are available upon request.

A.IV.2  Using Preliminary Inflation and Unemployment Rate Data

The plots of Figure A.4 are produced by instead using preliminary CPI inflation and unemployment rate data. The $ \boldsymbol{V}$ and $ \boldsymbol{P}_{1\vert}$ estimates are close to those reported in Table 1 for the model without data uncertainty while the estimate of $ \zeta_{1}$ is about one-third as large - they are available upon request.

Figure A.2  Fed Unemployment Forecasts

Figure A.2: Four panels show, clockwise from top right, filtered forecasts of current unemployment rate, first lag of unemployment rate, 72nd lag of unemployment rate (when unemployment is assumed to be observed without error), and second lag of unemployment rate. There is very little discrepancy between any of these series.

Notes: now, one-step past-, two-step past-, and 72-step past-casts. NBER recessions are shaded. These figures are all on the same scale, but not on the scale of Figures 2 and 4.

Figure A.3  Fed Inflation Control and Unemployment Forecasts, Current Vintage CPI.

Figure A.3: The left and right panel show figures almost identical to Figures 1 and 2, respectively.

Notes: using current vintage seasonally-adjusted CPI in ation data. NBER recessions are shaded.

Figure A.4  Fed Inflation Control and Unemployment Forecasts, Preliminary CPI

Figure A.4: The left and right panel show figures very similar to Figures 1 and 2, respectively with the following differences. The left panel shows more discrepancy between actual inflation the model-predicted inflation than is present in Figure 1. The right panel shows slightly smaller forecast errors than seen in Figure 2.

Notes: using preliminary CPI in ation data (non-seasonally-adjusted). NBER recessions are shaded.

Figure A.5  Fed Inflation Control and Unemployment Forecasts, Modified Likelihood

Figure A.5: The left panel shows model-predicted inflation versus actual inflation. The two series bear very little resemblance to one another. Around 1973, the model-predicted inflation becomes negative and has high frequency movements, very different from the smoothly rising actual inflation. The right panel shows model-forecasts of the unemployment rate that are similar to the actual unemployment rate.

Notes: using the model without data uncertainty and a modifed likelihood placing more weight on the unemployment rate forecasts. NBER recessions are shaded.

A.IV.3  Modified Likelihood

It has been suggested on earlier drafts of this paper that the results may stem not so much from the data revisions as much as only the modified likelihood function taking account of revisions. This modification involves having the unemployment forecasts enter the likelihood, which might do most of the "smoothing" that is evident.

Doing this, we indeed see smoother Fed unemployment forecasts in the right panel of Figure A.5. As expected, the likelihood penalizes unemployment forecast errors and delivers far better ones. However, in order to accomplish this the time-varying Philips curve estimates are such that inflation is far from target far too often, as the left panel shows.

Figure A.6  Maximum Modulus Eigenvalue, Time-Varying Parameter Model

Figure A.6: Shows a line that is rather flat around 0.8 for the whole time period, always lying below the plotted line at 1.

Notes: maximum modulus of eigenvalue of characteristic polynomial of (3.2) coming from filltered time-varying parameter estimates in Section 3.3. NBER recessions are shaded.

A.V  Autoregressive Stability

One issue that has been raised, for instance by Sims (2007), with the estimated beliefs coming from Sargent, Williams, and Zha (2006) is that they imply autoregressive instability for more than a dozen months around 1973. The model including data uncertainty addresses this "problem" - see Figure A.6. This section explains what exactly this issue is and also argues that it is not problematic to the purposes of using the Philips curve as a state transition equation.

Consider the autoregressive structure of the Philips curve the government is estimating over time. The estimates $ \boldsymbol{a}_{t-1\vert t-1}$ determine the perceived transition law for the Fed's optimal control problem.

Strictly-speaking, unemployment is not an explosive process: it must certainly take on values in [0,100]. Nonetheless, its high persistence, especially during the 1970s and 1980s, certainly makes it appear not "very" covariance-stationary. Consider estimating a bivariate VAR on the final data ($ f$ periods after the preliminary data) over rolling windows of varying sizes; Figure A.7 shows this for windows of 2, 3, 4, or 5 years.34 If the Fed was concerned with breaks, a rolling window estimation procedure would be a straightforward way of picking this up. Notice that this eigenvalue condition is close to the unstable region most of the time. Hence, without positing a more sophisticated mechanism of Fed belief formation, we see that simple estimation methods would also have given evidence of unemployment instability.

Figure A.7  Maximum Modulus Eigenvalue, Bivariate VAR

Figure A.7: Each of the four panels correspond to a rolling VAR using (clockwise from top right) 24, 36, 60, and 48 months of data. In all cases, the plotted line goes above and below 1 over the sample.

Maximum modulus of eigenvalue of characteristic polynomial of (3.2) coming from bivariate VAR; aligned so that the maximum modulus is at the time when it could be estimated with data at least f months ago. NBER recessions are shaded.

Nonetheless, it is not clear that autoregressive instability would have led the Fed to reject (3.2) as unemployment's dynamic structure for the purposes of setting optimal policy, given the accurate unemployment one-step-ahead predictions it yields (in the case where data uncertainty is acknowledged). To the optimal controller, what is important from forecasts are accuracy, not the description of the world they engender. The Fed had little reason to adjust its Philips curve just on the basis of these forecast errors. So the Fed may have done well to forecast unemployment using a rule which implied unemployment was not stationary, if the rule performed better.35


Footnotes

1.  First version: February 2007. I thank Jim Hamilton for his insightful advice. Thanks also to Alina Barnett, Gray Calhoun, Martin Ellison, Margie Flavin, Nir Jaimovich, Garey Ramey, Valerie Ramey, Chris Sims, Allan Timmermann, Harald Uhlig, Chris Wignall, and Tao Zha for helpful discussions; to seminar participants at UCSD, Georgetown, HEC Montreal, Texas A&M, Penn State, UCSC, Cambridge University Learning and Macroeconomic Policy Conference, Philadelphia Fed, Kansas City Fed, Atlanta Fed, and the Federal Reserve Board; and to Tao Zha for kindly providing computer code and data. This paper was previously circulated as part of the deprecated "When Data Revisions Matter to Macroeconomics." Division of International Finance, mail to: [email protected]. This paper reflects the views and opinions of the author and does not reflect on the views or opinions of the Federal Reserve Board, the Federal Reserve System, or their respective staffs. Return to text

2.  Orphanides (2006) and Sims (2006) also argue that policy is sensitive to model uncertainty, and Nason and Smith (forthcoming) finds instability in the Philips curve relationship since the 1950s. Return to text

3.  Runkle (1998) made a similar point around the same time. Return to text

4.  Model uncertainty in this case is represented by the variance of shocks to time-varying parameters. Return to text

5.  Note that my model reduces to Sargent, Williams, and Zha (2006)'s model when one assumes data is seen without error and, therefore, later vintages of data are identical to its first report. Return to text

6.  In fact, data uncertainty was recognized by Burns and Mitchell, who revised their business cycle indicators as data revisions came in, and considered many macroeconomic variables for that reason. Return to text

7.  In contrast to Carboni and Ellis (2008), the model in this paper produces unemployment forecasts that are statistically indistinguishable from the Greenbook forecasts without using those Greenbook forecasts as data. Return to text

8.  This follows from the assumptions: the agent has an economic loss function, the agent's prediction is rational with respect to that loss function, and the loss function is minimized when the prediction error is zero. See Elliott, Komunjer, and Timmermann (2005) for more extensive discussion. Return to text

9.  We could instead label $ d/x_{0}$ a parameter prediction, $ \epsilon_{d}/x_{0}$ a parameter shock, and $ x_{0}$ be some data measured without error that is multiplied by $ (d/x_{0}+\epsilon _{d}/x_{0})$, and the idea here would be the same. Likewise if $ d=0$ and $ \epsilon_{d}$ is simply an additive error term in which case it might be natural to assume $ \rho_{bd}=0$Return to text

10.  For instance: if all correlations are zero, it is positive; if $ \rho_{bc} =\rho_{bd}=\rho_{cd}=0$, $ \rho_{xz}=\rho_{xy}=\rho_{yz}=-1$, and $ b=c=n=p=1$, it is negative. Return to text

11.  One might suppose that an Orphanides (2001)-type critique might matter here: this critique would say that the model should use preliminary inflation and unemployment rate data, but would not allow the Fed's filtering problem to recognize the fact that data is revised. In fact, this does not substantially change the problems I discuss below - see Appendix A.IV.2. Return to text

12.  As will be apparent, the key aspect of data uncertainty in this model is that the data is observed with error: one could then disregard data revisions, instead assume data measurement error. I explicitly consider data revisions for two reasons. One, data revisions are evidence that motivate the idea that data is measured with error; two, data revisions have demonstrable statistical properties that discipline data uncertainty's role. This second point is made clearer in Section 4.2. Return to text

13.  Some in the Federal Reserve system have noted that policy-makers hold more uncertainty over the inflation rate than the unemployment rate. I am extremely sympathetic to this criticism, but continue to ignore inflation rate uncertainty for the following reason: I only have real-time data on the CPI, which is not revised. Some part of policy-maker's inflation rate uncertainty has to do with the theoretical issues concerning inflation, including the issue of Core versus non-Core measures, which are distinct from the data uncertainty I focus on here. Nevertheless, it should be noted that PCE and GDP deflator inflation are subject to revision. Return to text

14.  The results presented below are not sensitive to this. In fact, changing $ f$ to 36, which accords with the horizon of historical values included in the Greenbook, delivers similar results while increasing $ f$ does very little. It is only a small $ f$ such as 12 or below - where most revisions are ignored and so data is not very uncertain - that affects the results, pushing them back towards the model that ignores data uncertainty altogether. It is unimportant to the results whether $ u_{t}^{f}=u_{t}$ or $ u_{t}^{f} =u_{t}+error$Return to text

15.  The first revision allows for inference on the preliminary report of unemployment. The middle two revisions allow for inference on the latent value of the two lags of unemployment that enter the policy rule emerging from the dynamic programming problem. The final possible revision, which is the largest, acknowledges that data revision happens even after two months. Return to text

16.  A key point that $ \boldsymbol{\tilde{V}}$ and $ \zeta_{\varepsilon}$ are not estimated is addressed below in Section 4.2. Return to text

17.  The parameter $ \zeta_{2}$ is unidentified in in the model without data uncertainty and so Sargent, Williams, and Zha (2006) normalize $ \zeta_{2}$ such that $ \frac{1}{\zeta_{2}}$ is one-tenth the standard deviation of the shock in an additional equation for the "true DGP" of the unemployment rate. As Carboni and Ellison (2008) discuss, apart from this choice the "true DGP" equation does not influence the belief-generating mechanism I have described; thus, it is not part of my model. In the interest of making a clear comparison to its results, I take the value of $ \zeta_{2}$ from Sargent, Williams, and Zha (2006). Return to text

18.  I must use a accept/reject simulation technique because, due to the effects of $ \boldsymbol{\Psi}$ on the whole sequence of forecasts, the form of (3.13) is not known. Further details are in Appendix A.IV. Return to text

19.  The inflation rate data, following Sargent, Williams, and Zha (2006) is the annual rate of change of the seasonally-adjusted Personal Consumption Expenditure chain price index from the BEA, as reported in December 2003. I use PCE inflation here in the interest of making a direct comparison although I must use real-time CPI inflation for the model including data uncertainty. Estimates of the model without data uncertainty using current vintage non-seasonally-adjusted or seasonally-adjusted CPI inflation are very similar and so this does not drive the difference between the models' estimates; see Appendix A.IV.1. Return to text

20.  The results are the same if the data runs through December 2003; I end the sample at 1996 in order to ensure comparability between all the models I have estimated for robustness (one of which assumes final data is seen 10 years after the fact) and because I consider the model to be descriptive of Fed beliefs only through the 1980s. Return to text

21.  Sargent, Williams, and Zha (2006)'s results come from a sequence of 50,000 draws with an unspecified burn-in interval. Return to text

22.  They also accord with results in Carboni and Ellison (2008) who also estimate this model. Appendix A.IV contains the estimates of $ \boldsymbol{P}_{1\vert}$ - this parameter is far less important than $ \boldsymbol{V}$ since the influence of $ \boldsymbol{P}_{1\vert}$ on the gain (which imputes shocks from forecast errors) dies out within a dozen or so periods, while $ \boldsymbol{V}$ dictates the steady-state gain. Return to text

23.  The information displayed in Figure 2 is implied by Sargent, Williams, and Zha (2006)'s Figure 7. The difference is that in their Figure 7 the inflation control is replaced by the "Ramsey" inflation rate policy of 2, the graph is shifted downward by the estimate of the true natural rate of unemployment, and the parameter estimate used for time $ t$ is the updated estimate $ \boldsymbol{a} _{t\vert t}$ (which is identically $ \boldsymbol{a}_{t+1\vert t}$). Return to text

24.  Appendix A.III discusses these forecasts and provides more details on the test statistic. Return to text

25.  I am ignoring the role of the additive shock $ \omega_{2t}$ because its variance is set so small and, in any case, is identical across the models. Return to text

26.  The matrix $ \boldsymbol{\tilde{V}}$ is diagonal with main diagonal (0.0007, 0.0004, 0.0121)' and $ \zeta_{\varepsilon}$ is 45.273. Return to text

27.  I use non-seasonally-adjusted CPI inflation because real-time PCE inflation is not available for the time span under consideration. If we assume current vintage PCE inflation data was known in real-time and underwent no revisions, the results presented here are virtually identical. Return to text

28.  The other measurement equation forecasts are pictured in Appendix A.IV. Return to text

29.  It might be suggested that one could adopt part of the model without data uncertainty ad hoc and include unemployment forecast errors as part of a modified version of an estimation procedure resembling the model with data uncertainty. Doing so might by itself lead to smoother TVP estimates and better fitting inflation and unemployment forecasts. Looking into this, I found that while the unemployment forecasts errors do improve, the fit of the model-predicted inflation control deteriorates greatly; see Appendix A.IV.3. This is somewhat similar to, but without the success of, what Carboni and Ellison (2008) does. Return to text

30.  The estimated trade-off evolution in Carboni and Ellison (2007) and Carboni and Ellison (2008) is qualitatively similar, the latter with a much smaller magnitude but a very similar shape. Return to text

31.  Over the sample period there is a strong correlation of $ -0.75$ between the low-frequency fluctuations of the trade-off and inflation. Low-frequency fluctuations as those with a period between 1 and 10 years, as estimated using the bandpass filter. For the model without inflation this correlation is $ -0.14$Return to text

32.  The value of log-likelihood (multiplied by the prior) for my reproduction of Sargent, Williams, and Zha (2006) is 555.0586, comparable to the 564.92 they find. Return to text

33.  Technically, I must assume that the vector $ \boldsymbol{\eta}_{t}$ appearing in (A.2) is Gaussian; assuming a Gaussian $ \boldsymbol{\eta}_{t}$ for the general nonlinear case (3.8) does not assure that the shock in the first-order expansion would be Gaussian. Return to text

34.  The window sizes are chosen with Friedman (1968) page 11 in mind: "[T]here is always a temporary trade-off between inflation and unemployment...I can at most venture a personal judgment, based on some examination of the historical evidence, that the initial effects of a higher and unanticipated rate of inflation last for something like two to five years." Return to text

35.  For a related point, see the modeling motivation given in Primiceri (2005). Return to text


This version is optimized for use by screen readers. Descriptions for all mathematical expressions are provided in LaTex format. A printable pdf version is available. Return to text

Home | Economic research and data | Publications and education resources
Accessibility | Contact us
Last update: January 14, 2009