The Federal Reserve Board eagle logo links to home page

On the Application of Automatic Differentiation to the Likelihood Function for Dynamic General Equilibrium Models*

Houtan Bastani and Luca Guerrieri**

NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at http://www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at http://www.ssrn.com/.


Abstract:

A key application of automatic differentiation (AD) is to facilitate numerical optimization problems. Such problems are at the core of many estimation techniques, including maximum likelihood. As one of the first applications of AD in the field of economics, we used Tapenade to construct derivatives for the likelihood function of any linear or linearized general equilibrium model solved under the assumption of rational expectations. We view our main contribution as providing an important check on finite-difference (FD) numerical derivatives. We also construct Monte Carlo experiments to compare maximum-likelihood estimates obtained with and without the aid of automatic derivatives. We find that the convergence rate of our optimization algorithm can increase substantially when we use AD derivatives.

Keywords: General equilibrium models, Kalman filter, maximum likelihood

JEL classification: C52, C61, C63


1  Introduction

While applications of automatic differentiation have spread across many different disciplines, they have remained less common in the field of economics.1 Based on the successes reported in facilitating optimization exercises in other disciplines, we deployed AD techniques to assist with the estimation of dynamic general equilibrium (DGE) models. These models are becoming a standard tool that central banks use to inform monetary policy decisions. However, the estimation of these models is complicated by the many parameters of interest. Thus, typically, the optimization method of choice makes use of derivatives. However, the complexity of the models does not afford a closed-form representation for the likelihood function. Finite-difference methods have been the standard practice to obtain numerical derivatives in this context. Using Tapenade (see [Hascoët 2004], [Hascoët et al. 2005], [Hascoët et al. 2005]), we constructed derivatives for a general formulation of the likelihood function, which takes as essential input the linear representation of the model's conditions for an equilibrium.

The programming task was complicated by the fact that the numerical solution of a DGE model under rational expectations relies on fairly complex algorithms.2 We use Lapack routines for the implementation of the solution algorithm. In turn, our top Lapack routines make use of a large number of Blas routines. A byproduct of our project has been the implementation of numerous AD derivatives of the double precision subset of Blas routines. Table 1 lists the routines involved.

In the remainder of this paper, Section 2 lays out the general structure of a DGE model and describes our approach to setting up the model's likelihood function. Section 3 outlines the step we took to implement the AD derivatives and how we built confidence in our results. Section 4 gives an example of a DGE model that we used to construct Monte Carlo experiments to compare maximum-likelihood estimates that rely, alternatively, on AD or FD derivatives, reported in Section 5. Section 6 concludes.

2  General Model Description and Estimation Strategy

The class of DGE models that is the focus of this paper take the general form:

\begin{displaymath} H(\theta) \left(\begin{array}{c} E_t X_{t+1} \ X_t \ X_{t-1}\end{array}\right) = 0. \end{displaymath} (1)

In the equation above, $H$ is a matrix whose entries are a function of the structural parameter vector $\theta$, while $X_t$ is a vector of the model's variables (including the stochastic innovations to the shock processes). The term $E_t$ is an expectation operator, conditional on information available at time $t$ and the model's structure as in equation 1. Notice that allowing for only one lead and one lag of $X_t$ in the above equation implies no loss of generality.

The model's solution takes the form:

\begin{displaymath} X_t = S(H(\theta)) X_{t-1}, \end{displaymath} (2)

thus, given knowledge of the model's variables at time $t-1$, a solution determines the model's variables at time $t$ uniquely. The entries of the matrix $S$ are themselves functions of the matrix $H$ and, in turn, of the parameter vector $\theta$.

Partitioning $X_t$ such that $X_t = \left(\begin{array}{c} x_t \ \epsilon_t \end{array} \right)$ , where $\epsilon_t$ is a collection of all the innovations to the exogenous shock processes (and possibly rearranging the system) it is convenient to rewrite the model's solution as

\begin{displaymath} x_t = A(H(\theta)) x_{t-1} + B(H(\theta)) \epsilon_t. \end{displaymath} (3)

Again, the entries in the matrices $A$ and $B$ are fundamentally functions of the parameter vector $\theta$. Given a subset of the entries in $x_t$ as observable, call these entries $y_t$, the state-space representation of the system takes the form:
$\displaystyle x_t = A(H(\theta)) x_{t-1} + B(H(\theta)) \epsilon_t$     (4)
$\displaystyle y_t = C x_t$     (5)

Without loss of generality, we restrict the matrix $C$ to be a selector matrix, which picks the relevant entries of $x_t$. Using the Kalman Filter recursions, we can express the likelihood function for the model as:
\begin{displaymath} L = L(A(\theta), B(\theta),C,y_{t-h},...,y_{t}) \end{displaymath} (6)

where $y_{t-h}$ and $y_t$ are respectively the first and last observation points available.

The routines we developed, given an input $H(\theta)$, produce the derivative of the likelihood function with respect to the structural parameters, $\frac{\partial L}{\partial \theta}$, and as an intermediate product, $\frac{\partial A}{\partial \theta}$, the derivative of the model's reduced-form parameters with respect to the structural parameters.

3  Implementing AD Derivatives

To obtain AD derivatives of the likelihood function, we used Tapenade in tangent mode. Tapenade required limited manual intervention on our part. This is remarkable given that the code to be differentiated consisted of approximately 80 subroutines for a total of over 17,000 lines of code. The derivative-augmented code produced by Tapenade covers approximately 25,000 lines (the original code has a size of 554 kilobytes and the differentiated code is 784 kilobytes in size).

Recoding became necessary when the Lapack or Blas routines we called did not explicitly declare the sizes of the arguments in the calling structure and instead allowed for arbitrary sizing (possibly exceeding the storage requirements). A more limited recoding was required when we encountered the use of ``GOTO'' statements in the Fortran 77 code of the Blas library, which Tapenade could not process.

More substantially, two of the decompositions involved in the model solution, the real Schur decomposition and the singular-value decomposition, are not always unique. Parametric restrictions of the models we tested could ensure uniqueness of these decompositions. In those cases, we verified that AD derivatives obtained through Tapenade satisfied some basic properties of the decompositions that we derived analytically, but our test failed whenever we relaxed those parametric restrictions to allow for more general model specifications

In particular, we relied on the Lapack routine DGEESX to implement the real Schur decomposition. For a given real matrix $E$, this decomposition produces a unitary matrix $X$, such that $T=X^H E X$ is quasitriangular. Given $\frac{\partial{E}}{\partial\theta}$, we need that the derivative $\frac{\partial X}{\partial \theta}$ satisfy $\frac{\partial T}{\partial \theta} =\frac{\partial X^H}{\partial \theta} E X + X^H\frac{\partial{E}}{\partial\theta}X + X^HE\frac{\partial X}{\partial \theta}$ , where $\frac{\partial T}{\partial \theta}$ is itself quasitriangular. This property failed to be met by our AD derivatives when our choice of E implied a non-unique Schur decomposition. To obviate this problem, we substituted the AD derivative for the DGEESX routine with the analytical derivative of the Schur decomposition as outlined in [Anderson 1987].

Similarly, the singular value decomposition, implemented through the DGESVD routine in the Lapack library, given a real matrix $E$, produces unitary matrices $U$ and $V$ and a diagonal matrix $D$, such that $E = UDV^T$. Given $\frac{\partial E}{\partial \theta}$, it can be shown that $U^T\frac{\partial E}{\partial \theta}V = U^T \frac{\partial U }{\partial \theta }D + \frac{\partial D}{\partial \theta} + D\frac{\partial V}{\partial \theta}V$ , where $\frac{\partial D}{\theta}$ is diagonal and $ U^T \frac{\partial U }{\partial \theta }$ and $\frac{\partial V}{\partial \theta}V$ are both antisymmetric. Our AD derivative of the routine DGESVD failed to satisfy this property when the matrix $E$ had repeated singular values (making the decomposition non-unique). We substituted our AD derivative with the analytical derivative derived by [Papadopoulo and Lourakis 2000].

To test the derivative of the likelihood function, we used a two-pronged approach. For special cases of our model that could be simplified enough as to yield a closed-form analytical solution, we computed analytical derivatives and found them in agreement with our AD derivatives, accounting for numerical imprecision. To test the derivatives for more complex models that we could not solve analytically, we relied on comparisons with centered FD derivatives. Generally with a step size of $10^{-8}$ we found broad agreement between our AD derivatives and FD derivatives. Plotting AD and FD side by side, and varying the value at which the derivatives were evaluated, we noticed that the FD derivatives appeared noisier than the AD derivatives. We quantify the ``noise'' we observed in an example below.

4  Example Application

As a first application of our derivatives, we consider a real business cycle model augmented with sticky prices and sticky wages, as well as several real rigidities, following the work of [Smets and Wouters 2003]. Below, we give a brief description of the optimization problems solved by agents in the model, which allows us to interpret the parameters estimated in the Monte Carlo exercises that follow.

There is a continuum of households of measure 1, indexed by $h$, whose objective is to maximize a discounted stream of utility according to the following setup:

\begin{eqnarray*} && \max_{[C_t(h), W_t(h), I_t(h), K_{t+1}(h), B_{t+1}(h)]} E_t \sum^{\infty}_{j=0} \beta^j \left( U(C_{t+j}(h),C_{t+j-1}(h)) \right. \ && \left. + V(L_{t+j}(h)) \right) + \beta^j\lambda_{t+j}(h) \left[ \Pi_t(h) + T_{t+j}(h) + (1-\tau_{Lt}) W_{t+j}(h)L_{t+j}(h) \right. \ && \left. +(1-\tau_{Kt})R_{kt+j}K_{t+j}(h) -\frac{1}{2}\psi_I P_{t+j} \frac{ \left( I_{t+j}(h) - I_{t+j-1}(h) \right)^2}{I_{t+j-1}(h)} \right. \ && \left. -P_{t+j} C_{t+j}(h) - P_{t+j} I_{t+j}(h) - \int_s \psi_{t+j+1, t+j}B_{t+j+1}(h) + B_{t+j}(h) \right] \ && +\beta^j Q_{t+j}(h) \left[ (1-\delta) K_{t+j}(h) + I_{t+j} (h) -K_{t+j+1}(h) \right]. \end{eqnarray*}

The utility function depends on consumption $C_t(h)$ and labor supplied $L_t(h)$. The parameter $\beta$ is a discount factor for future utility. Households choose streams for consumption $C_t(h)$, wages $W_t(h)$, investment $I_t(h)$, capital $K_{t+1}(h)$ and bond holdings $B_{t+1}(h)$, subject to the budget constraint, whose Lagrangian multiplier is $\lambda_t(h)$, capital accumulation equation, whose Lagrangian multiplier is $Q_t(h)$, and the labor demand schedule $L_t(h)=L_t \left( \frac{W_t(h)}{W_t} \right)^{-\frac{1+\theta_w}{\theta_w}}$ . Households rent to firms (described below) both capital, at the rental rate $R_{Kt}$, and labor at the rental rate $W_t(h)$, subject to labor taxes at the rate $\tau_{Lt}$ and to capital taxes at the rate $\tau_{Kt}$. There are quadratic adjustment costs for investment, governed by the parameter $\psi_I$, and capital depreciates at a per-period rate $\delta$. We introduce Calvo-type contracts for wages following [Erceg et al. 2000]. According to these contracts, the ability to reset wages for a household $h$ in any period $t$ follows a Poisson distribution. A household is allowed to reset wages with probability $1-\xi_w$. If the wage is not reset, it is updated according to $W_{t+j}(h)=W_t(h) \pi^j$ (where $\pi$ is the steady-state inflation rate), as in [Yun 1996]. Finally, $T_t(h)$ and $\Pi_t(h)$ represent, respectively, net lump-sum transfers from the government and an aliquot share of the profits of firms.

In the production sector, we have a standard Dixit-Stiglitz setup with nominal rigidities. Competitive final producers aggregate intermediate products for resale. Their production function is

$\displaystyle Y_t= \left[ \int_0^1 Y_t(f)^{\frac{1}{1+\theta_p}} \right]^{1+\theta_p}$     (7)

and from the zero profit condition the price for final goods is

$\displaystyle P_{t}= \left[ \int_0^1 P_t(f)^{-\frac{1}{\theta_p}} \right] ^{-\theta_p}.$     (8)

where $P_t(f)$ is the price for a unit of output for the intermediate firm $f$.

Intermediate firms are monopolistically competitive. There is complete mobility of capital and labor across firms. Their production technology is given by

\begin{displaymath} Y_{t}(f) = A_t K_{t}(f)^{\alpha}L^d_t(f)^{1-\alpha}. \end{displaymath} (9)

Intermediate firms take input prices as given. $L^d_{t}(f)$, which enters the intermediate firms' production function, is an aggregate over the skills supplied by each household, and takes the form $L^d_{t}(f)=\left(\int_h L_t(h)^{\frac{1}{1+\theta_w}}\right)^{1+\theta_w}$ . $A_t$ is the technology level and evolves according to an autoregressive (AR) process:

\begin{displaymath} A_t - A= \rho_A \left(A_{t-1}-A \right) + \epsilon_{At}, \end{displaymath} (10)

where $\epsilon_{At}$ is an $iid$ innovation with standard deviation $\sigma_A$, and $A$ is the steady-state level for technology. Intermediate firms set their prices $P_t(f)$ according to Calvo-type contracts with reset probabilities $1-\psi_P$. When prices are not reset, they are updated according to $P_{t+j}{t}(f)=P_t(f) \pi^j$.

Finally, the government sector sets a nominal risk-free interest rate according to the reaction function:

$\displaystyle i_t = {\frac{\pi}{\beta} -1} + \gamma_\pi ( \pi_t - \pi) + \gamma_Y (log(Y_t) - log(Y_{t-1}) + \epsilon_{it},$     (11)

where inflation $\pi_t \equiv \frac{P_t}{P_{t-1}}$, and $\epsilon_{it}$ is itself an AR process of order 1. For this process, we denote the AR coefficient with $\rho_i$; the stochastic innovation is iid with standard deviation $\sigma_i$. Notice that, in this setting, households are Ricardian, hence the time-profile of net lump-sum transfers is not distortionary. We assume that these transfers are set according to:

\begin{displaymath} \tau_{Lt} W_t L_t + \tau_{Kt}R_{Kt}K_t = G_t + T_t. \end{displaymath} (12)

Labor taxes, $\tau_{Lt}$, and capital taxes, $\tau_{Kt}$, follow exogenous AR processes

\begin{displaymath} \tau_{Lt} - \tau_L = \rho_{L} \left(\tau_{Lt-1} - \tau_L \right)+ \epsilon_{Lt}, \end{displaymath} (13)


\begin{displaymath} \tau_{Kt} - \tau_K = \rho_{K} \left(\tau_{Kt-1} - \tau_K \right)+ \epsilon_{Kt}, \end{displaymath} (14)

as does Government spending (expressed as a share of output)

\begin{displaymath} \frac{G_t}{Y_t} - \frac{G}{Y}= \rho_{G} \left(\frac{G_{t-1}}{Y_{t-1}}-\frac{G}{Y} \right) + \epsilon_{Gt}. \end{displaymath} (15)

In the equations above, the exogenous innovations $\epsilon_{Lt}, \epsilon_{Kt}, \epsilon_{Gt}$ are $iid$ with standard deviations $\sigma_{L}$, $\sigma_{Kt}$, and $\sigma_G$, respectively. The parameters $\tau_L$, $\tau_K$, and $\frac{G}{Y}$, without a time subscript, denote steady-state levels.

The calibration strategy follows [Erceg et al. 2005] and parameter values are reported in Table 2. By linearizing the necessary conditions for the solution of the model, we can express them in the format of Equation (1).

5  Monte Carlo Results

Using the model described in Section 4 as the data-generating process, we set up a Monte Carlo experiment to compare maximum-likelihood estimates obtained through two different optimization methods. One of the methods relies on our AD derivative of the model's likelihood function. The alternative method, uses a two-point, centered, finite-difference approximation to the derivative.

In setting up the likelihood function, we limit our choices for the observed variables in the vector $y_t$ of equation (5) to four series, namely: growth rate of output $\log(Y_t) -log(Y_{t-1})$, price inflation $\pi_t$, wage inflation $\omega_t \equiv \frac{W_t}{W_{t-1}}$, and the policy interest rate $i_t$. For each Monte Carlo sample, we generate 200 observations, equivalent to 50 years of data given our quarterly calibration, a sample length often used in empirical studies. We attempt to estimate the parameters $\rho_i$, $\sigma_i$, governing the exogenous shock process for the interest rate reaction function; $\psi_P$, $\psi_W$, the Calvo contract parameters for wages and prices; and $\gamma_{\pi}$, and $\gamma_Y$ the weights in the monetary policy reaction function for inflation and activity. In the estimation exercises, we kept the remaining parameters at their values in the data-generating process as detailed in Table 2. We considered 1,000 Monte Carlo samples.3 The two experiments described below differ only insofar as we chose two different initialization points for the optimization routines we used to maximize the likelihood function.

Figure 1 shows the sampling distribution for the parameter estimates from our Monte Carlo exercise when we initialize the optimization routine at the true parameter values used in the data-generating process. The black bars in the various panels denote the estimates that rely on AD derivatives, while the white bars denote the estimates obtained with FD derivatives. The optimization algorithm converged for all of the 1,000 Monte Carlo samples.4We verified that the optimization routine did move away from the initial point towards higher likelihood values, so that clustering of the estimates around the truth do not merely reflect the initialization point. For our experiment, the figure makes clear that when the optimization algorithm is initiated at the true value for the parameters of interest, reliance on FD derivatives minimally affects the maximum-likelihood estimates for those parameters.5

Of course, the true value of the parameters do not necessarily coincide with the ML parameter estimates for small samples. Yet, it is unrealistic to assume that a researcher would happen on such good starting values. Figure 2 reports the sampling distribution of estimates obtained when we initialize the optimization algorithm at arbitrary values for the parameters being estimated, away from their true values. For the estimates reported in Figure 2, we chose $\rho_i = 0.6$, $\sigma_i = 0.4$, $\psi_P=.5$, $\psi_W = 0.5$, $\gamma_{\pi} = 3$, $\gamma_{Y}=0.15$. The bars in Figure 2 show the frequency of estimates in a given range as a percentage of the 1,000 experiments we performed. We excluded results for which our optimization algorithm failed to converge. The figure makes clear that the convergence rate is much higher when using AD derivatives (47.2% instead of 28.3% for FD derivatives). Moreover, it is also remarkable that the higher convergence rate is not accompanied by a deterioration of the estimates (the increased height of the black bars in the figure is proportional to that of the white bars).

To quantify the difference between AD and FD derivatives of the likelihood function for one of our samples, we varied the parameters we estimated one at a time. Figure 3 shows the percentage difference in the magnitude of the AD and FD derivatives for $\rho_i$ and $\sigma_i$. We discretized the ranges shown using a grid of 1,000 equally spaced points. The differences are generally small percentage-wise, although, on occasion, they spike up, or creep up as we move away from the true value, as in the case of $\sigma_i$. For the other parameters we estimated, we did not observe differences in the magnitudes of the AD and FD derivatives larger than $10^{-4}$ over ranges consistent with the existence of a rational expectations equilibrium for our model.

6  Conclusion

Given that the approximation error for a first derivative of the likelihood function of a DGE model computed through FD methods depends on the size of the second derivative, which itself is subject to approximation error, we view having an independent check in the form of automatic derivatives as a major contribution of our work. As an example application, we showed that AD derivatives can facilitate the computation of maximum-likelihood estimates for the parameters of a DGE model.

Bibliography

Anderson, G. (1987). A Procedure for Differentiating Perfect-Foresight-Model Reduced-Form Coefficients. Journal of Economic Dynamics and Control 11, 465-81.

Anderson, G. and G. Moore (1985). A Linear Algebraic Procedure for Solving Linear Perfect Foresight Models. Economic Letters 17, 247-52.

C. H. Bischof, H. M. Bücher, B. L. (2002). Automatic Differentiation for Computational Finance. In B. R. E. J. Kontoghiorghes and S. Siokos (Eds.), Computational Methods in Decision-Making. springer.

Erceg, C. J., L. Guerrieri, and C. Gust (2005). Can Long-Run Restrictions Identify Technology Shocks? Journal of the European Economic Association 3(6), 1237-1278.

Erceg, C. J., D. W. Henderson, and A. T. Levin (2000). Optimal monetary policy with staggered wage and price contracts. Journal of Monetary Economics 46(2), 281-313.

Giles, M. (2007). Monte Carlo Evaluation of Sensitivities in Computational Finance. In E. A. Lipitakis (Ed.), HERCMA – The 8th Hellenic European Research on Computer Mathematics and its Applications Conference.

Hascoët, L. (2004). TAPENADE: a tool for Automatic Differentiation of programs. In Proceedings of 4$^{th}$ European Congress on Computational Methods, ECCOMAS'2004, Jyvaskyla, Finland.

Hascoët, L., R.-M. Greborio, and V. Pascual (2005). Computing Adjoints by Automatic Differentiation with TAPENADE. Springer. Forthcoming.

Hascoët, L., V. Pascual, and D. Dervieux (2005). Automatic Differentiation with TAPENADE. Springer. Forthcoming.

M. Giles, P. G. (2006). Smoking Adjoints: Fast Monte Carlo Greeks. Risk Magazine 19, 88-92.

Papadopoulo, T. and M. I. A. Lourakis (2000). Estimating the Jacobian of the Singular Value Decomposition: Theory and Applications. Lecture Notes in Computer Science 1842/2000, 554-570.

Smets, F. and R. Wouters (2003). An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area. Journal of the European Economic Association 1(5), 1123-1175.

Yun, T. (1996). Nominal price rigidity, money supply endogeneity, and business cycles. Journal of Monetary Economics 37, 345-370.

Table 1:  List of library functions

Blas Functions

daxpy.f, dcopy.f, ddot.f, dgemm.f, dgemv.f, dger.f, dnrm2.f, drot.f, dscal.f, dswap.f, dtrmm.f, dtrmv.f, dtrsm.f

Lapack Functions dgebak.f, dgebal.f, dgeesx.f, dgehd2.f, dgehrd.f, dgeqp3.f, dgeqr2.f, dgeqrf.f, dgesv.f, dgetf2.f, dgetrf.f, dgetrs.f, dhseqr.f, dlacn2.f, dlacpy.f, dladiv.f, dlaexc.f, dlahqr.f, dlahr2.f, dlaln2.f, dlange.f, dlanv2.f, dlapy2.f, dlaqp2.f, dlaqps.f, dlaqr0.f, dlaqr1.f, dlaqr2.f, dlaqr3.f, dlaqr4.f, dlaqr5.f, dlarfb.f, dlarf.f, dlarfg.f, dlarft.f, dlarfx.f, dlartg.f, dlascl.f, dlaset.f, dlassq.f, dlaswp.f, dlasy2.f, dorg2r.f, dorghr.f, dorgqr.f, dorm2r.f, dormqr.f, dtrexc.f, dtrsen.f, dtrsyl.f, dtrtrs.f

Table 2a:  Calibration - Parameters Governing Households' and Firms' Behavior

ParameterUsed to Determine
$\beta = 0.997 $ discount factor
$\tau_L = 0.28 $ steady state labor tax rate
$\psi_P = 0.75 $ Calvo price parameter
$\delta = 0.025$ depreciation rate
$\phi_I = 3$ investment adj. cost
$\tau_K = 0$ steady state capital tax rate
$\psi_W = 0.75 $ Calvo wage parameter

Table 2b:  Calibration - Monetary Policy Reaction Function

ParameterUsed to Determine
$ \gamma_{\pi} = 1.5$ inflation weight
$\gamma_Y = 0.5$ output weight

Table 2c:  Calibration - Exogenous Processes

Parameter

Used to Determine

$\rho_L = 0.98$ AR(1) Coefficient - labor tax rate
$\rho_K = 0.97$ AR(1) Coefficient - capital tax rate
$\rho_G = 0.98$ AR(1) Coefficient - govt spending
$\rho_i = 0.95$ AR(1) Coefficient - monetary policy
$\rho_A = 0.95$ AR(1) Coefficient - technology
$\sigma_L = 3.88$ Standard Deviation - labor tax rate innovation
$\sigma_K = 0.80$ Standard Deviation - capital tax innovation
$\sigma_G = 0.30$ Standard Deviation - govt spending innovation
$\sigma_i = 0.11$ Standard Deviation - monetary policy innovation
$\sigma_A = 0.94$ Standard Deviation - labor tax innovation

Figure 1:  Sampling Distribution of Parameter Estimates; the Initial Guesses Concided with the True Values in the Data-Generating Process

Data for Figure 1 immediately follows

Data for Figure 1

variable
true parameter value
bin
finite differences (%)
automatic differentiation (%)
\rho_i
0.95
0.942808
0.3
0.3
\rho_i
0.95
0.943721
0.6
0.6
\rho_i
0.95
0.944633
0.9
0.9
\rho_i
0.95
0.945545
3.4
3.4
\rho_i
0.95
0.946458
4.8
4.8
\rho_i
0.95
0.94737
7.8
7.8
\rho_i
0.95
0.948283
11.2
11.2
\rho_i
0.95
0.949195
14.2
14.2
\rho_i
0.95
0.950108
15.9
15.9
\rho_i
0.95
0.95102
12.3
12.3
\rho_i
0.95
0.951932
10.5
10.5
\rho_i
0.95
0.952845
7.6
7.6
\rho_i
0.95
0.953757
5.2
5.2
\rho_i
0.95
0.95467
3.7
3.7
\rho_i
0.95
0.955582
1.4
1.4
\rho_i
0.95
0.956495
0.2
0.2
\sigma_i
0.1125
0.089507
0.1
0.1
\sigma_i
0.1125
0.092423
0.1
0.1
\sigma_i
0.1125
0.095339
0.2
0.2
\sigma_i
0.1125
0.098255
1
1
\sigma_i
0.1125
0.101171
3.8
3.8
\sigma_i
0.1125
0.104087
9.9
9.9
\sigma_i
0.1125
0.107003
11.9
11.9
\sigma_i
0.1125
0.109919
16.2
16.2
\sigma_i
0.1125
0.112835
18.4
18.4
\sigma_i
0.1125
0.115751
16
16
\sigma_i
0.1125
0.118667
10.5
10.5
\sigma_i
0.1125
0.121582
7.2
7.2
\sigma_i
0.1125
0.124498
3
3
\sigma_i
0.1125
0.127414
1.3
1.3
\sigma_i
0.1125
0.13033
0.2
0.2
\sigma_i
0.1125
0.133246
0.2
0.2
\gamma_{\pi}
1.5
1.461714
0.1
0.1
\gamma_{\pi}
1.5
1.466688
0.1
0.1
\gamma_{\pi}
1.5
1.471662
0.7
0.7
\gamma_{\pi}
1.5
1.476636
2.3
2.3
\gamma_{\pi}
1.5
1.48161
4.6
4.6
\gamma_{\pi}
1.5
1.486584
8.5
8.5
\gamma_{\pi}
1.5
1.491557
15.2
15.2
\gamma_{\pi}
1.5
1.496531
15.3
15.3
\gamma_{\pi}
1.5
1.501505
15.3
15.3
\gamma_{\pi}
1.5
1.506479
13.7
13.7
\gamma_{\pi}
1.5
1.511453
12.4
12.4
\gamma_{\pi}
1.5
1.516427
6
6
\gamma_{\pi}
1.5
1.5214
3.1
3.1
\gamma_{\pi}
1.5
1.526374
1.7
1.7
\gamma_{\pi}
1.5
1.531348
0.8
0.8
\gamma_{\pi}
1.5
1.536322
0.2
0.2
\gamma_Y
0.5
0.483826
0.1
0.1
\gamma_Y
0.5
0.485893
0.4
0.4
\gamma_Y
0.5
0.487959
0.4
0.4
\gamma_Y
0.5
0.490026
2.4
2.4
\gamma_Y
0.5
0.492093
4
4
\gamma_Y
0.5
0.494159
8.4
8.4
\gamma_Y
0.5
0.496226
13
13
\gamma_Y
0.5
0.498293
16.8
16.8
\gamma_Y
0.5
0.500359
17.4
17.4
\gamma_Y
0.5
0.502426
14.4
14.4
\gamma_Y
0.5
0.504493
10
10
\gamma_Y
0.5
0.506559
6.2
6.2
\gamma_Y
0.5
0.508626
3.9
3.9
\gamma_Y
0.5
0.510693
1.6
1.6
\gamma_Y
0.5
0.51276
0.7
0.7
\gamma_Y
0.5
0.514826
0.3
0.3
\psi_P
0.75
0.718549
0.1
0.1
\psi_P
0.75
0.722793
0.2
0.2
\psi_P
0.75
0.727037
0.6
0.6
\psi_P
0.75
0.731281
1.8
1.8
\psi_P
0.75
0.735526
5.5
5.5
\psi_P
0.75
0.73977
9.4
9.4
\psi_P
0.75
0.744014
13.6
13.6
\psi_P
0.75
0.748258
17.6
17.6
\psi_P
0.75
0.752502
18
18
\psi_P
0.75
0.756746
13
13
\psi_P
0.75
0.76099
10.6
10.6
\psi_P
0.75
0.765234
5.4
5.4
\psi_P
0.75
0.769478
2.9
2.9
\psi_P
0.75
0.773722
0.8
0.8
\psi_P
0.75
0.777966
0.4
0.4
\psi_P
0.75
0.78221
0.1
0.1
\psi_W
0.75
0.720129
0.2
0.2
\psi_W
0.75
0.724272
0.8
0.8
\psi_W
0.75
0.728415
1.8
1.8
\psi_W
0.75
0.732558
3.3
3.3
\psi_W
0.75
0.736702
6.5
6.5
\psi_W
0.75
0.740845
9.7
9.7
\psi_W
0.75
0.744988
13
13
\psi_W
0.75
0.749131
14.7
14.7
\psi_W
0.75
0.753275
15.6
15.5
\psi_W
0.75
0.757418
12.8
12.9
\psi_W
0.75
0.761561
9.4
9.4
\psi_W
0.75
0.765704
6.5
6.5
\psi_W
0.75
0.769848
3.1
3.1
\psi_W
0.75
0.773991
1.8
1.8
\psi_W
0.75
0.778134
0.5
0.5
\psi_W
0.75
0.782277
0.3
0.3

Figure 2:  Sampling Distribution of Parameter Estimates; the Initial Guesses Did Not Concide with the True Values in the Data-Generating Process

Data for Figure 2 immediately follows

Data for Figure 2

variable
true parameter value
bin
finite differences (%)
automatic differentiation (%)
\rho_i
0.95
0.942894
0.2
0.2
\rho_i
0.95
0.9438
0.4
0.5
\rho_i
0.95
0.944706
0.5
0.8
\rho_i
0.95
0.945612
1
1.8
\rho_i
0.95
0.946518
1.7
2.6
\rho_i
0.95
0.947425
2.4
4.9
\rho_i
0.95
0.948331
3.3
4.6
\rho_i
0.95
0.949237
4.7
8.1
\rho_i
0.95
0.950143
4.2
6.7
\rho_i
0.95
0.951049
3.5
5.3
\rho_i
0.95
0.951955
1.5
3.2
\rho_i
0.95
0.952862
1.4
3.4
\rho_i
0.95
0.953768
1.6
2.5
\rho_i
0.95
0.954674
1.2
1.6
\rho_i
0.95
0.95558
0.6
0.8
\rho_i
0.95
0.956486
0.1
0.2
\sigma_i
0.1125
0.089486
0.1
0.1
\sigma_i
0.1125
0.092113
0
0.1
\sigma_i
0.1125
0.09474
0
0
\sigma_i
0.1125
0.097366
0.2
0.4
\sigma_i
0.1125
0.099993
0.9
1.4
\sigma_i
0.1125
0.102619
2.7
3.5
\sigma_i
0.1125
0.105246
3.8
6.3
\sigma_i
0.1125
0.107873
3.9
6.6
\sigma_i
0.1125
0.110499
4.3
7.3
\sigma_i
0.1125
0.113126
3.7
7.2
\sigma_i
0.1125
0.115753
4
6.3
\sigma_i
0.1125
0.118379
2.3
3.6
\sigma_i
0.1125
0.121006
1.3
2.5
\sigma_i
0.1125
0.123633
0.9
1
\sigma_i
0.1125
0.126259
0.1
0.6
\sigma_i
0.1125
0.128886
0.1
0.3
\gamma_{\pi}
1.5
1.464618
0.1
0.1
\gamma_{\pi}
1.5
1.46909
0
0.2
\gamma_{\pi}
1.5
1.473563
0.2
0.5
\gamma_{\pi}
1.5
1.478035
0.9
1.4
\gamma_{\pi}
1.5
1.482507
1.3
2.3
\gamma_{\pi}
1.5
1.48698
3.1
4.4
\gamma_{\pi}
1.5
1.491452
4
7.2
\gamma_{\pi}
1.5
1.495925
4.5
7.2
\gamma_{\pi}
1.5
1.500397
2.9
5.2
\gamma_{\pi}
1.5
1.50487
4.1
6.2
\gamma_{\pi}
1.5
1.509342
3.2
5.2
\gamma_{\pi}
1.5
1.513815
2.4
4
\gamma_{\pi}
1.5
1.518287
0.7
1.9
\gamma_{\pi}
1.5
1.52276
0.4
0.5
\gamma_{\pi}
1.5
1.527232
0.5
0.8
\gamma_{\pi}
1.5
1.531704
-1.1E-16
0.1
\gamma_Y
0.5
0.483832
0.1
0.1
\gamma_Y
0.5
0.485855
0
0
\gamma_Y
0.5
0.487877
0.2
0.2
\gamma_Y
0.5
0.489899
0.8
1.4
\gamma_Y
0.5
0.491921
1.2
2.1
\gamma_Y
0.5
0.493944
1.8
3.4
\gamma_Y
0.5
0.495966
3
5.6
\gamma_Y
0.5
0.497988
4.9
8.5
\gamma_Y
0.5
0.50001
4.8
7.3
\gamma_Y
0.5
0.502033
4.4
7
\gamma_Y
0.5
0.504055
3.3
5.2
\gamma_Y
0.5
0.506077
1.9
2.9
\gamma_Y
0.5
0.508099
0.9
2
\gamma_Y
0.5
0.510122
0.6
0.9
\gamma_Y
0.5
0.512144
0.3
0.5
\gamma_Y
0.5
0.514166
0.1
0.1
\psi_P
0.75
0.718544
0
0.1
\psi_P
0.75
0.722281
0
0
\psi_P
0.75
0.726019
0.2
0.4
\psi_P
0.75
0.729757
0.4
0.8
\psi_P
0.75
0.733494
1.1
1.6
\psi_P
0.75
0.737232
2.3
4.3
\psi_P
0.75
0.74097
3.3
5.3
\psi_P
0.75
0.744707
4.4
6.5
\psi_P
0.75
0.748445
4.2
7.5
\psi_P
0.75
0.752183
4.7
7.6
\psi_P
0.75
0.755921
2.8
4.9
\psi_P
0.75
0.759658
3.1
4.2
\psi_P
0.75
0.763396
1.3
2.3
\psi_P
0.75
0.767134
0.3
1
\psi_P
0.75
0.770871
0.1
0.5
\psi_P
0.75
0.774609
0.1
0.2
\psi_W
0.75
0.720031
0.1
0.2
\psi_W
0.75
0.72418
0.4
0.5
\psi_W
0.75
0.728329
0.5
1.1
\psi_W
0.75
0.732478
1.4
1.9
\psi_W
0.75
0.736627
2.4
3.8
\psi_W
0.75
0.740776
3.4
5.8
\psi_W
0.75
0.744925
3.8
6.5
\psi_W
0.75
0.749074
3.8
6.5
\psi_W
0.75
0.753223
4.1
6.8
\psi_W
0.75
0.757372
3.8
5.8
\psi_W
0.75
0.761521
2
3.6
\psi_W
0.75
0.765669
1.6
2.3
\psi_W
0.75
0.769818
0.3
1
\psi_W
0.75
0.773967
0.5
1
\psi_W
0.75
0.778116
0.1
0.2
\psi_W
0.75
0.782265
0.1
0.2

 

Figure 3:  Percentage Difference Between AD and FD Derivatives

Data for Figure 3 immediately follows

Data for Figure 3 - \rho_i: Percentage Difference in the Magnitude of AD and FD Derivatives

Parameter Value
Percentage Difference
-0.999
-4.4E-05
-0.997
-2.7E-05
-0.995
0.000378
-0.99301
-4.3E-05
-0.99101
-2E-05
-0.98901
-0.00022
-0.98701
-0.00017
-0.98501
-0.00025
-0.98302
-3.2E-05
-0.98102
-5.1E-05
-0.97902
-0.00026
-0.97702
-5.3E-05
-0.97502
8.3E-05
-0.97303
-9.2E-05
-0.97103
1.03E-05
-0.96903
-0.00017
-0.96703
-0.00028
-0.96503
-1.9E-05
-0.96304
0.00019
-0.96104
5.99E-05
-0.95904
4.74E-05
-0.95704
0.000168
-0.95504
-0.00032
-0.95305
5.11E-05
-0.95105
9.21E-05
-0.94905
0.000108
-0.94705
7.23E-06
-0.94505
-0.00011
-0.94306
-0.00026
-0.94106
-7.9E-05
-0.93906
0.000209
-0.93706
5.22E-05
-0.93506
-4.4E-06
-0.93307
-0.0002
-0.93107
-0.00011
-0.92907
-9.8E-05
-0.92707
-8.2E-05
-0.92507
-3.9E-05
-0.92308
-5.3E-05
-0.92108
-4.7E-05
-0.91908
-0.00024
-0.91708
-7E-05
-0.91508
4.28E-05
-0.91309
0.000124
-0.91109
-0.00019
-0.90909
-0.00025
-0.90709
5.47E-06
-0.90509
8.31E-05
-0.9031
0.000125
-0.9011
4.57E-05
-0.8991
-0.00016
-0.8971
-9.8E-05
-0.8951
0.00014
-0.89311
-0.0001
-0.89111
5.34E-05
-0.88911
3.68E-05
-0.88711
0.000184
-0.88511
-0.00015
-0.88312
0.000176
-0.88112
0.000139
-0.87912
-3.5E-05
-0.87712
0.000126
-0.87512
9.19E-06
-0.87313
-0.00012
-0.87113
-0.00027
-0.86913
-6.2E-05
-0.86713
-1.2E-05
-0.86513
3.78E-06
-0.86314
-0.00011
-0.86114
0.000193
-0.85914
0.000156
-0.85714
-9.5E-05
-0.85514
-4.6E-05
-0.85315
0.00012
-0.85115
3.12E-05
-0.84915
-0.00024
-0.84715
-0.00013
-0.84515
0.000128
-0.84316
0.000132
-0.84116
2.81E-05
-0.83916
-1.5E-05
-0.83716
-0.00028
-0.83516
-0.00013
-0.83317
-0.00013
-0.83117
4.74E-05
-0.82917
-4.1E-05
-0.82717
-0.00016
-0.82517
-9.5E-05
-0.82318
1.95E-05
-0.82118
0.000202
-0.81918
-0.00021
-0.81718
-0.00023
-0.81518
0.000167