FRB: 1989 SCF Codebook, part1

1989 Survey of Consumer Finances
Here: Introduction
Next: Question, Text, Variable Names, & Responses

CODEBOOK FOR 1989 SURVEY OF CONSUMER FINANCES

Arthur Kennickell

SCF Project Director

Table of Contentes

	1. Introduction
	2. Question Text, Variable Names, & Responses
	3. Editing Instructions
	4. Net Worth Program
	5. Public Data Set Variable List

INTRODUCTION

This codebook serves as the authoritative guide to the variables included on the public use version of the 1989 SCF cross-section dataset. However, not every variable included in this codebook is actually in the public use dataset. A list of the variables included is given in the section entitled Public Data Set Variable List at the end of this documentation.

For a general overview of the findings of the 1989 SCF, see Arthur B. Kennickell and Janice Shack-Marquez, "Changes in Family Finances from 1983 to 1989: Evidence from the Survey of Consumer Finances," Federal Reserve Bulletin, January 1992. (134 KB PDF)

QUESTIONNAIRE
In this codebook, many variables have been grouped in a way different from the way that they were originally asked (e.g., lines of credit for homeowners and non-homeowners were originally asked separately -- as noted in the codebook below, these responses have been merged in a single set of variables). With the exception of a few variables (e.g., marital history), the original questions appear in the codebook with the appropriate variable numbers. Because question ordering is important in understanding the effective meaning of many questions, users of the data are encouraged to consult the questionanaire (available separately) for a precise guide to where and how the underlying questions were asked. Note that there are no summary variables (e.g., net worth) in the public version of the dataset.

FILES INCLUDED
The full public dataset consists of three files in addition to this codebook file. The first is the main dataset, which contains most of the survey variables (a 258 megabyte file in its fully expanded form, and 3.7 megabytes in the zipped SAS transport version of the file available here). The second is the questionnaire. The third is a file of replicate weights for use in variance estimation as described below (42 megabytes in its fully expanded form, and 19.9 megabytes in the zipped SAS transport version of the file available here).

VARIABLE NAMES The public use version of the 1989 SCF cross-section is a SAS dataset of about 200 megabytes. Unlike the case of our earlier SCFs, there are not separate files of "raw" and "cleaned" data. Rather, virtually every missing variable in the dataset has been imputed and every variable has a "shadow" variable that describes the original state of the variable (i.e., whether it was missing for some reason, a range response was given, etc.). An exception is reported values which have been imputed or otherwise altered for purposes of disclosure avoidance. Such values are not flagged in any systematic way. Users who so desire may use the shadow variables to restore the data to something very close to their original condition. The main data values are stored using variable names corresponding to the numbers given in the codebook below and having a prefix of "X." The shadow variables have the same numbers, but with a prefix of "J." A list of the values of the shadow variables is given in the section below entitled RANGE DATA COLLECTION AND J-CODES.

IMPUTATION
The imputations of missing values provided here are the result of the sixth iteration of a large model described in a paper I gave at the annual ASA meetings in 1991 ("Imputation of the 1989 Survey of Consumer Finances: Multiple Imputation and Stochastic Relaxation). A copy of the paper is available upon request. In this dataset, values have been MULTIPLY-IMPUTED. The imputations are stored as complete replicates of each case. In the release, there are 5 copies of each survey observation. Thus, the sum of the weights will be equal five times the number of families in the U.S. in 1989. Users should be careful to make appropriate degrees of freedom adjustments in any calculations they make using these data. For an overview of the theory of multiple imputation and the analysis of multiply-imputed data, see MULTIPLE IMPUTATION FOR NONRESONSE IN SURVEYS by Donald B. Rubin, John Wiley and Sons, 1987.

ANALYSIS WEIGHTS
The dataset contains several sets of nonresponse-adjusted sampling weights. X42001 is the weight strongly recommended for the analysis of the data for most purposes. This variable is a partially design-based weight constructed at the Federal Reserve using original selection probabilities and frame information along with aggregate control totals estimated from the Current Population Survey. The population defined by the weights for *each implicate* (see above) is 93.0 million households. This weight is a relatively minor revision of the consistent weight series (X42000) maintained for the SCFs beginning with 1989 (For a detailed discussion of these weights, see "Consistent Weight Design for the 1989, 1992, and 1995 SCFs and the Distribution of Wealth," by Arthur B. Kennickell and R. Louise Woodburn, Review of Income and Wealth, Series 45, Number 2, June 1999, pp. 193-215 or the longer version given on the SCF web site at http://www.federalreserve.gov/pubs/oss/oss2/method.html). The nature of the revisions to the consistent weights is described in "Revisions to the SCF Weighting Methodology: Accounting for Race/Ethnicity and Homeownership," by Arthur Kennickell (see SCF web site). A version of the revised weight has been computed for all the surveys beginning with 1989, and this variable has been added to the public versions of the SCF datasets. Weights X40125 and X40131 have also been included in this release for historical reasons. These weights are partially design-based weights (see Heeringa, Conner and Woodburn [1994] "The 1989 Surveys of Consumer Finances: Sample Design and Weighting Documentation") that were originally released with the data. The weight X40125 is the preliminary SRC design-based weight used in the report on the SCF by Janice Shack-Marquez and me in the January 1992 issue of the Federal Reserve Bulletin. This weight was superceded by X40131. Users should be aware that the sum of each of the weights over all sample cases and imputation replicates is equal to five times the number of households in the sample universe.

SAMPLING ERROR
For a variety of reasons connected with disclosure limitation, it is not possible to give users the details about the SCF sample design that they would need to compute reasonable estimates of sampling variance by standard means. During the construction of X40202, a small set of replicate weights was created to use for variance estimation. Because it was difficult at that time to compute such weights, only eleven such replicates were computed. Until this release, these replicate weights have not been included with the dataset. They are included here largely for historical documentation. In two separate files, available with the main dataset, we have included a sets of replicate weights computed using the same algorithm as was used for X42000 and for X42001. Using detailed information about the original sample design, we selected 999 sample replicates from the final set of completed cases in a way intended to capture the important dimensions of sample variation. For each survey case and each replicate, the file contains a weight and the number of times the case was selected in the replicate. We computed weights for each replicate using exactly the same procedures we used for X42000. Replicate weights were computed only for the first implicate of each case. For most purposes, users will probably want to multiply the weight times the multiplicity: in all cases the sum of the weights times the multiplicities equals the total number of households. To estimate the sampling variance of the mean of family income, for example, a user would estimate the mean 999 times using the replicate weights and compute the standard error of that estimate. An estimate of the standard error due to these two sources is given by the square root of the sum of the estimated sampling variance and 6/5 times the imputation variance. The replicate weights associated with this release of the data were recomputed along with the main weight to ensure consistency with the 1992 and 1995 SCFs.

DISCLOSURE REVIEW
Unlike earlier releases of the dataset, this one includes all cross-section cases and most important dollar variables. However, the data reported have have been systematically altered by several means to minimize the possibility of identifying any survey respondent. For some discrete variables, small or unusual cells were collapsed as noted in the variable descriptions below. Continuous variables were rounded. Data were also blurred by other unspecified means. In addition, 300 cases were identified for more extensive treatment. Some of these cases were selected on the basis of extreme or unusual data values. Others of the 300 cases were selected at random. For each of the 300 cases, a selection of critical variables was set to missing and statistically imputed subject to constraints designed to ensure that any distortions induced in key population statistics would be minimal. Aside from the cell collapsing, there is no key in this codebook or in the dataset that would allow users to identify directly either which data items have been smoothed or otherwise altered, or which cases were selected for imputation of critical values.

DATA REVIEW
We have spent many hours searching for errors in the data. Many inconsistencies are actually in the raw data and seem to have no obvious reconcilliation (most prominently the fact that X5729 -- total income -- is not always equal to the sum of the income components). Other types of inconsistencies may have been induced as a byproduct of imputation, even though elaborate checks are built into the imputation routines. We ask our colleagues who use this dataset to help us find the remaining inconsistencies.

CONTACT
It is likely that some users will have trouble dealing with the data at first. If after having framed a focused question and exhausted all of your local resources your problem persists, you may call Gerhard Fries at ((202) 452-2578 or e-mail him at [email protected]) or myself at ((202)-452-2247 or email me at [email protected])). We prefer correspondence via e-mail. While we would like to be helpful to you, please realize that we do not have extensive resources to devote to user services. We hope that by persistence, you will almost always be able to figure out what you need by consulting the questionnaire and the codebook below.

RANGE DATA COLLECTION ADN J-CODES

0  = value reported on original tape (includes values reported in the
     questionnaire that were alered during editing).

1  = question is inapplicable (e.g., R has no checking account
     so value of checking account is coded as zero -- NOTE: zero is a
     legitimate value for X-variables only for this value of the
     associated J-variable).

2  = evidence from problem sheets that data moved from another
     location, but no evidence that value was altered,
     or data moved.
3  = evidence from problem sheets that data moved from another
     location, and evidence that value was somehow altered based on
     margin notes or other information.

4  = evidence that data imputed from marginal notes.

5  = 83/86 value brought forward.

8  = recode of survey variables, no missing values in antecedents.
9  = recode of survey variables, insufficient data collected to
     compute value, but not imputed.

12 = in case of regular installment loans where term is DK, non-missing
     typical payment moved to monthly payment section.
 
13 = coded value overridden after editing completed.
14 = inapplicable given hard-code decision (15).
15 = hard-coded imputation determined during cleaning.
16 = override of reported 89 data with 86 data.
17 = override of reported 89 data with 83 data.
18 = other panel override of reported data (combination of 16 & 17, logical
     consistency, misc. intuition).

19 = imputation of missing 89 data using 86 data.
20 = imputation of missing 89 data using 83 data.

24 = range card response A: $1 to $500.
25 = range card response B: $501 to $1,000.
26 = range card response C: $1,001 to $2,500.
27 = range card response D: $2,501 to $10,000.
28 = range card response E: $10,001 to $50,000.
29 = range card response F: $50,001 to $250,000.
30 = range card response G: $250,001 to $1,000,000.
31 = range card response H: $1,000,001 to $10,000,000.
32 = range card response I: $10,000,001 to $100,000,000.
33 = range card response J: more than $100,000,000.
34 = range card response < 0 A: -$1 to -$500.                  
35 = range card response < 0 B: -$501 to -$1,000.              
36 = range card response < 0 C: -$10,01 to -$2,500.            
37 = range card response < 0 D: -$2,500 to -$10,000.           
38 = range card response < 0 E: -$10,001 to -$50,000.          
39 = range card response < 0 F: -$50,001 to -$250,000.         
40 = range card response < 0 G: -$250,001 to -$1,000,000.      
41 = range card response < 0 H: -$1,000,001 to -$10,000,000.   
42 = range card response < 0 I: -$10,000,001 to -$100,000,000. 
43 = range card response < 0 J: less than -$100,000,000.

44 = value < 0, amount DK
45 = value < 0, amount NA


49 = variable imputed during editing from margin notes, reimputed.

50 = original response was DK.
51 = original response was NA (includes refusals, interviewer errors,
     and missing data resulting from editing decisions).  Does not
     include data missing as a result of missing higer-order questions.
52 = original response missing as a result of missing information for
     a higher-order question (typically a YES/NO cut question).  In
     this case, the higher-order question has been imputed in such
     a way as to render response appropriate.
53 = refused (available only for aggregate income, income range
     questions, whether filed tax return, and AGI: T3/4/7b/7d).
54 = some, DK how many (see B6).

79 = data missing becaue of questionnaire error, or data not collected

80 = recode variable, missing because data not collected for a
     sub-group, data to be imputed.
81 = recode variable, some, but not all components originally missing.
82 = recode variable, all components originally missing.

88 = for property value, only assessed value given.

98 = override of reported information (e.g., R says has 1 IRA, but
     2 institution types reported) -- value set to missing.

99 = used for absent spouse for J104 or J105 when X104 OR X105 < 0.

100 = Value set to missing while problem with case is being resolved
      (temporary code).

183 = demographic and employment recodes for panel and panel/cross-section
      cases: only 83 data missing.
184 = employment recodes for panel and panel/cross-section cases: 83
      and 86 data missing.
185 = demographic and employment recodes for panel and panel/cross-section
      cases: 83 and 89 data missing. 
186 = demographic, marital history, and employment recodes for panel
      and panel/cross-section cases: only 86 data missing. 
187 = demographic, marital history, and employment recodes for panel
      and panel/cross-section cases: 86 and 89 data missing.
188 = marital history and employment recodes for panel
      and panel/cross-section cases: 83,86 and 89 data missing.
189 = demographic, marital history, and employment recodes for panel
      and panel/cross-section cases: only 89 data missing. 

200 = marital history variable set to missing due to irreconcilable
      inconsistencies between 1986 and 1989 data.

Top of page | Next: Question Text, Variable Names, & Responses

Home | Surveys | OSS | SCF index | 1989 SCF index

To comment on this site, please fill out our feedbackform.
Last update: October 20, 1999, 5:00pm