|
| 1. Introduction | |
| 2. Question Text, Variable Names, & Responses | |
| 3. Editing Instructions | |
| 4. Pulic Data Set Variable List |
The 1983 SCF was the first comprehensive wealth survey since the 1963 Survey of Financial Characteristics of Consumers and the 1964 reinterview which were conducted by Dorothy Projector at the Federal Reserve. In 1986 a very brief reinterview was conducted with the 1983 SCF respondents. The 1989 SCF has a complex sample design which provides both cross-section representation as well as information on a subset of the 1983 respondents.
SAMPLE DESIGN
The 1983 SCF was based on a dual-frame design. One part was an
area-probability sample, which was intended to provide good coverage of
assets and liabilities that are broadly distributed in the population
(such as cars, mortgages, etc.). The second part of the sample was a
list of names developed from a sample of tax files maintained by the
Statistics of Income Division of the Internal Revenue Service. The
list sample was designed to improve the precision of estimates of
assets and liabilities that are held more narrowly by relatively
wealthy households. For more detail on the sample, see the 1983
cross-section codebook.
In 1989, there was a need to collect cross-section information on household finances. In addition, there was a strong interest in following the behavior of the 1983 SCF respondents. A complex sample design was created to meet these objective and to control costs. The final sample design for the 1989 SCF falls into two large pieces, a completely new cross-section sample of 2000 completed cases and a group of 1803 cases from an overlapping panel/cross-section design. This second part of the sample consists of three major parts. First, there is a list sample developed in 1983 using tax records. Since the cross-section representation of these cases is not known, they are counted only as panel cases. The remaining cases derive from a modification of an area-probability sample generated for the 1983 SCF. An interviewer was sent to a subsample of 1983 addresses, which is described more fully below. If the person living at that address was a respondent to the 1983 SCF or the spouse or partner of that person, an attempt was made to obtain an interview. If the person living there was a new person, an interview was attempted, and for a subsample the original respondent was also pursued. The new person had only cross-section representation and the 1983 respondent had only panel representation. In general, the sample followed both 1983 respondents and the spouses or partners of 1983 respondents in cases where the original couple had dissolved. A small supplemental sample was added to the panel/cross-section to compensate for new construction. For more detail, see the 1989 cross-section codebook and "Sample Design and Weighting Documentation" (Heeringa, Connor, & Woodburn, 1994).
The cleaned and imputed versions of the entire 1983 and 1986 surveys have been released to the public. The public version of the 1986 SCF is cleaned and imputed using information for the first two waves of the survey. Previously only the cross-section part of the 1989 survey has been made available. The data set accompanying this documentation includes all cases that were interviewed in 1989 that have either cross-section and panel representation or only panel representation. Of the 1479 cases in the panel data set, 849 were included in the cross-section data release (these are households in the area-probability sample that did not move from their 1983 residence).
QUESTIONNAIRE
In the 1989 survey, the questionnaires for the panel and
cross-section cases differed mainly in two areas. Unlike the
cross-section questionnaire, the panel questionnaire assumed the
availability of information reported earlier on marital history and
work history. Data from the earlier surveys were used to construct
variables for the panel cases that are comparable to those available
directly for the cross-section cases. Thus, the panel cases
contain original or recoded information corresponding to all of the
variables described in the codebook for the 1989 cross-section, a
version of which is included in Part 2 of this codebook. Not all
variables in the codebook are available on the public-use data set. The
definitive list of the variables included in the public data set is
given in the section entitled Public Data Set Variable List. Please
consult that list to determine whether a given variable is available
to you.
DATA EDITING
The editing and imputation of panel data raises many problems
(Kennickell and McManus 1994). There is an immense amount of
information in the 1983, 1986, and 1989 surveys that could be checked
for inconsistencies and imputed for missing information. There are
several thousand variables available in each survey. In principle,
all of these variables form a joint distribution to be edited and
imputed simultaneously. While the editing task is very complicated,
the imputation task is much more formidable. A respondent may have
very complicated patterns of missing data over time. All missing data
should, in principle, be imputed conditioning on information available
regardless of the year of the survey. Thus, one should reimpute
earlier data to account for any new information.
To reduce the problems of processing the 1983-89 panel data to a managable level, a decision was made to limit the problem in significant ways. First, the 1986 SCF is treated only as a source of limited information for the construction of the cross-section variables mentioned above and for some very limited editing. No other information from the 1986 SCF is used in the construction of the 1983-89 panel file and 1986 variables were not used to condition either the 1983 or 1989 imputations. Thus, USERS ARE ADVISED NOT TO USE THE IMPUTED DATA IN THIS FILE TOGETHER WITH IMPUTED 1986 DATA WITHOUT DETAILED CONSIDERATION. Second, to reduce the scale of the editing and imputation problem further, a summary version of the 1983 SCF was created to serve as a basis for editing and imputation. Most of the variables described at the beginning of the variable list are summary variables created from the 1983 data set. Finally, retrospective data collected in the 1989 panel questionnaire on changes between 1983 and 1989 are generally not cleaned or imputed. Our analysis indicates that these variables provide an unusably noisy picture of change (see Arthur B. Kennickell and Martha Starr-McCluer, " Household Saving and Portfolio Change: Evidence from the 1983-89 SCF Panel, " April 1996. (Abstract | 511 KB PDF)). This weakness in the data is not surprising given the length of the period over which respondents were asked to recall information.
"J" and "K" VARIABLES
For cleaning and imputation, it is critical that we know what
items were reported by respondents and what items have missing values.
In the 1989 and later SCFs, shadow variables ("J-variables") are
available for all data values indicating whether the value in the
final data set is one given by a respondent, whether it was imputed
using range information provided by a respondent, whether it was
originally missing altogether, and other situations. Unfortunately,
such information is not available for the 1983 SCF. Moreover, the raw
and the cleaned and imputed data sets are contained in different files
with different forms of organization. Often changes were made in the
cleaned file based on additional information from the respondents or
from further editing of the paper questionnaires. In addition, data
were rearranged in the cleaned file to make the reported data
correspond more closely to a desired analytical arrangement. Very few
of these changes appear to heve been incorporated in the raw data.
To complete the processing of the panel in a reasonable time, it was necessary to develop some rules to describe the original status of the 1989 data. These rules embody some compromises. A pair of shadow variables was constructed for most 1983 summary variables--a J-variable comparable to those for the 1989 and later surveys (see below for the definitions of the legal codes), and for money variables, a variable that indicates what part of the variable was originally provided by the respondent ("K-variable"). If the raw data item (or items) most directly underlying a variable in the cleaned data set contained a missing value code and it could not be determined within a simple set of alternatives that the original data had been moved to or from another location, the 1983 summary variable was assigned a J-variable code indicating that the data were originally missing. If it could be determined that some part of the original data had been given by the respondent, this amount was included in the K-variables. If the underlying data item was the same as a related variable in the imputed data set, the information was taken to have been reported by the respondent, the J-variable was given a code to indicate this fact and the K-variable was assigned the same value as the variable in the imputed data set. In the event that an item in the imputed data set could not be linked to either a missing value or a reported item in the raw data set, it was assumed that the data derived from editing or retrieval activities performed after the creation of the raw data set and the information was treated as if it had been reported by the respondent. There is some unknown degree of error in this series of assumptions used to determine the original status of the data.
IMPUTATION
Missing values in the data were multiply-imputed (three times)
using a modified version of the software designed to impute the 1989
cross-section data, and additional software created specifically to
impute the 1983 summary data. Two facts are particulatly important to
note. First, the multiple imputations are stored as separate
replicates ("implicates") of each record. Thus, the public data set
contains 4,437 observations. Because muiltiple imputation seeks to
proxy the *distribution* of the missing information, there is no
"correct" implicate to use. Users are advised to use all three
implicates, but to beware (particularly in regressions) that the
degrees of freedom in their calculations will often be spuriously
inflated by standard computer packages unless users take appropriate
action (see Multiple Imputation for Nonresponse in
Surveys by Donald B. Rubin, John Wiley and Sons, 1987). Second,
users should note that the imputed values in this data set may differ
from those in the public versions of the 1983 and 1989 data sets. The
1983 cases and the 1989 panel cases with cross-section representation
in 1989 were imputed using only cross-section information. The
imputations in this panel data set are conditioned on information from
both the 1983 and 1989 surveys.
The imputations we provide are the result of an intensive effort that frequently employs information that would not be available to outside users. However, because the imputations in the data set are flagged, users are free to perform their own imputations or to devise other treatments for the missing data. Users may also wish to treat the limited 1983 imputations provided as a core for their own imputations of additional 1983 variables.
DIFFERENCES IN PANEL AND CROSS-SECTION ESTIMATES
IT IS VERY IMPORTANT TO NOTE that statistics computed using
the panel file may differ from estimates made using either the full
1983 cross-section or the full 1989 cross-section. These differences
may be attributable to sampling error and our limited ability to make
corrections for nonresponse. It is also important to note the nature
of the population included in the panel. Obviously, everyone must
have been a respondent in 1983. Thus, immigrants since that time are
included only if they became a part of an existing sample family.
Another important group missing from the panel is most families with
heads under the age of 28 in 1989. Those heads of households would
have been under the age of 22 in 1983. Because of the fraction of
people in their late teens and early 20s who are attending college, in
the military, or living with their family, it is very unlikely that
the sample of independent young people observed in 1983 is at all
representative of householders in 1989. For this reason, the design
of the sample excludes households with heads under the age of 22 in
1983. A small number of such households appear in the panel sample
because the sample design followed both halves of couples that
separated since 1983.
DISCLOSURE REVIEW
To protect the confidentiality of information provided by
respondents, we have systematically altered the information in the
public use file according to rules developed jointly by the SCF staff
at the FRB, SOI, and SRC. The general outline of the procedure is as
follows. All dollar figures were rounded using the the SAS code
provided at the end of this introduction. For some categorical
variables, cells were collapsed--in particular, three-digit
occupation codes have been reduced to one digit. Many other
unspecified alterations of the data were made, and these changes
cannot be determined from any publicly-available information. This
class of changes is specifically designed to generate very little
distortion of the statistical properties of the data overall while
substantially reducing the probability that the data for any given
case represents an identifiable individual in the population.
ANALYSIS WEIGHTS
Two analysis weights are available for the panel: WGT0195 and
WGT0296. WGT0195 is based on the FRB full sample weight for the
1983 cross section data (B3016). The weight is adjusted separately
for list and area cases using the adjustments computed by University
of Michigan's Survey Research Center (see "Sample Design and Weighting
Documentation," Heeringa et. al.,1994). For the list cases, a
nonresponse adjustment is computed by 1983 income half-quantiles
(i.e. the 12.5, 25.0, 37.5, 50.0, ... percentiles). These adjustments
range from 1.14 to 1.330. For the area probability cases, there are
subsampling adjustments for the subselection by age and PSU, as well
as a nonresponse adjustment. The PSU selection factors range from 1
to 2, the age adjustment factor ranges from 1 to 4 and the nonresponse
adjustment ranges from 1.17 to 2.89. In creating the merged sample
analysis weights, the weights resulting from this procedure were
trimmed at the 95th percentile in cross categories of 1983 net worth
vs 1989 net worth and the cell frequencies were then adjusted to the
original weighted frequencies. The weight total was adjusted to the
1983 total of households with the age of the household head > 22 which
is 81,425,929, as estimated using the 1983 SCF cross sectional data.
WGT0296 is a refined version of WGT0195 developed for the purpose of
estimating changes in wealth between 1983 and 1989. For this purpose,
WGT0195 was subjected to a series of post-statification (age and
homeownership categories) and a three-stage raking (using net worth
categories for comparable age groups in the 1983 and 1989 SCF
cross-sections) designed to provide better estimates of saving.
We use this weight for all calculations of changes in wealth in the
panel sample. A discussion of the WGT0296 weight is given in
"Weighting design for the 1992 Survey of Consumer Finances" by Arthur
Kennickell, Douglas McManus, and R. Louise Woodburn. (HTML | 1.1 MB PDF | 2.6 MB Postscript)
VARIABLE NAMES
The main data values are stored using variable names
corresponding to the numbers given in the codebook below and having a
prefix of "X." The corresponding K-variables (only for dollar values
in the 1983 recode variables) and J-variables described above are
addressed using the same numbers, but with a prefix of "K" or "J"
respectively. Legal values for the J-variables for variables recoded
from 1983 data are:
0 = value reported on original tape (includes values reported in the
questionnaire that were alered during editing).
1 = question is inapplicable (e.g., R has no checking account
so value of checking account is coded as zero -- NOTE: zero is a
legitimate value for X-variables only for this value of the
associated J-variable).
46 = less that 30 percent of the original 1983 imputed value was
imputed.
50 = original response was DK.
51 = original response was NA (includes refusals, interviewer errors,
and missing data resulting from editing decisions). Does not
include data missing as a result of missing higer-order questions.
52 = original response missing as a result of missing information for
a higher-order question (typically a YES/NO cut question). In
this case, the higher-order question has been imputed in such
a way as to render response appropriate.
71 = more than 30 percent of the 1983 imputed value was imputed.
Legal J-variable values for 1989 variables can be found in Part 2 of
this codebook.
SUMMARY VARIABLES
The user will notice that we do not provide the sort of summary
variables that were given in the codebook for the 1983 SCF. It is our
strong belief that users should consider carefully the variables that
underlie summary concepts, such as "financial assets." Guidance to
what we consider to be appropriate definitions for our purposes is
usually given in our published analysis (e.g., see Arthur B. Kennickell
and Martha Starr-McCluer, " Changes in Family Finances from 1989
to 1992: Evidence from the Survey of Consumer Finances "
Federal Reserve Bulletin, October 1994.
(134 KB PDF | 1.1 MB Postscript)).
ACKNOWLEDGMENTS
Data for the survey were collected by the Survey Research Center
at the University of Michigan. Officers for the project at SRC were
Richard T. Curtin (1983 and 1986 SCF) and F. Thomas Juster (1989 SCF).
Others at SRC making particularly valuable contributions to the panel
survey include Stephen Heeringa, Dorothy Kempter, and Joan Schleffer.
Fritz Scheuren, former director of Statistics of Income at the
IRS provided unending intellectual and other support for the project,
and without his participation it is certain that we would not have
been as successful. Louise Woodburn, Barry Johnson, Tom Petska and
Bill Wong at SOI have contributed in a major way to sampling and data
processing.
Major funding for the 1989 reinterview survey was provided by the Federal Reserve and the National Institute on Aging. Particular thanks are due to Richard Suzman at NIA for his support and encouragement. Additional funding was provided by the Department of Health and Human Services, the Comptroller of the Currency, the Federal Deposit Insurance Corporation, the General Accounting Office, and the Joint Committee on Taxation. Thanks to the Board of Governors and the official staff at the Federal Reserve, particularly James Kichline, Michael Prell, Edward Ettin, Myron Kwast, and Barbara Lowrey. Research assistance for the SCF at the Federal Reserve has been provided by Aliki Antonatas, Phoebe Roaf, Daniel Kelley, Kurt Strovink, Alex Wright, Robert Denk, James Faulkner, and Todd King. Other SCF staff at the Federal Reserve have included Robert Avery, Gregory Elliehausen, Gerhard Fries, Arthur Kennickell, Douglas McManus, Janice Shack-Marquez, and Martha Starr-McCluer.
DATA REVIEW
Although we have devoted considerable time to searching for
inconsistencies in the data, some of these problems seem to have no
obvious reconcilliation. Thus, seeming conflicts may remain.
Other types of inconsistencies may have been induced during
imputation, in spite of the elaborate checks that are built into the
imputation routines. We ask our colleagues who use this data set to
help us find the remaining irritating inconsistencies.
CONTACT INFORMATION
It is likely that some users will have trouble dealing with the
data at first. If after having framed a focused question and
exhausted all of your local resources your problem persists, you may
e-mail Gerhard Fries
([email protected]) or myself ([email protected]). While we would like
to be helpful to you,
please realize that we have very limited resources to devote to user
services. We hope that by persistence, you will almost always be able
to figure out what you need by consulting the questionnaire (available
separately) and this codebook.
ARRAY AMT {*} X412-X414 X420 X421 X423 X424 X426 X427 X429 X430
X505 X510 X512 X513 X518 X521 X525 X526 X602 X604
X607 X612 X614 X617 X619 X623 X627 X631 X635 X703
X708 X716 X717 X721 X804 X805 X808 X812 X813 X904
X905 X908 X912 X913 X1004 X1005 X1008 X1012 X1013
X1035 X1039 X1040 X1044 X1104 X1108 X1109 X1115
X1119 X1120 X1126 X1130 X1131 X1136 X1202 X1206
X1210 X1211 X1215 X1219 X1220 X1224 X1302 X1405
X1408-X1410 X1415 X1417 X1505 X1508-X1510 X1515
X1517 X1605 X1608-X1610 X1615 X1617 X1619 X1621
X1706 X1709 X1714 X1715 X1718 X1722 X1723 X1730
X1806 X1809 X1814 X1815 X1818 X1822 X1823 X1830
X1906 X1909 X1914 X1915 X1918 X1922 X1923 X1930
X2002 X2003 X2006 X2007 X2010 X2012 X2013 X2016
X2017 X2020 X2105 X2112 X2117 X2209 X2213 X2214
X2218 X2309 X2313 X2314 X2318 X2409 X2413 X2414
X2418 X2422 X2424 X2425 X2506 X2510 X2514 X2515
X2519 X2606 X2610 X2614 X2615 X2619 X2623 X2625
X2626 X2714 X2718 X2719 X2723 X2731 X2735 X2736
X2740 X2814 X2818 X2819 X2823 X2831 X2835 X2836
X2840 X2914 X2918 X2919 X2923 X2931 X2935 X2936
X2940 X3121 X3124 X3126 X3129-X3133 X3221 X3224
X3226 X3229-X3233 X3321 X3324 X3326 X3329-X3333
X3335-X3337 X3408-X3410 X3412-X3414 X3416-X3418
X3420-X3422 X3424-X3426 X3428-X3430 X3506 X3510
X3514 X3518 X3522 X3526 X3529 X3610 X3620 X3630
X3706 X3711 X3716 X3718 X3721 X3804 X3807 X3810
X3813 X3816 X3818 X3822 X3824 X3826 X3828 X3830
X3833 X3835 X3902 X3906 X3908 X3910 X3912 X3915
X3918 X3920 X3922 X3930 X3932 X3939 X3942 X4003
X4005 X4006 X4010 X4011 X4014 X4018 X4022 X4026
X4030 X4032 X4112 X4131 X4204 X4207 X4210 X4214
X4220 X4224 X4226 X4229 X4304 X4307 X4310 X4314
X4320 X4324 X4326 X4329 X4404 X4407 X4410 X4414
X4420 X4424 X4426 X4429 X4436 X4509 X4520 X4532
X4540 X4605 X4613 X5306 X5311 X5318 X5326 X5334
X5418 X5426 X5434 X5504 X5507 X5510 X5513 X5516
X5604 X5608 X5612 X5616 X5620 X5624 X5628 X5632
X5636 X5640 X5644 X5648 X5702 X5704 X5706 X5708
X5710 X5712 X5714 X5716 X5718 X5720 X5722 X5724
X5726 X5729 X5732 X5734 X5751 X5804 X5809 X5814
X5818 X5821 X5823 X5926 X5928 X6403 X6415 X6418
X6421 X6432 X6436 X6437 X6439 X4712 X4731 X4804
X4807 X4810 X4814 X4820 X4824 X4826 X4829 X4904
X4907 X4910 X4914 X4920 X4924 X4926 X4929 X5004
X5007 X5010 X5014 X5020 X5024 X5026 X5029 X5036
X5109 X5120 X5132 X5140 X5205 X5213 X8163 X8164
X8166 X8167 X8168
X50002 X50004 X50006 X50008 X50010
X50012 X50014 X50016 X50018 X50020
X50022 X50024 X50026 X50028 X50030
X50032 X50034 X50036 X50053 X50055
X50061 X50064 X50067 X50069 X50072
X50073 X50075 X50076 X50078 X50079
X50080 X50082 X50083 X50084 X50086
X50087 X50089 X50091 X50092 X50094
X50095 X50097 X50100 X50228 X50231
X50232 X50248 X50251 X50252 X50261
X50263 X50265 X50267 X50269 X50271
X50273 X50275 X50277 X50279 X50281
X50283 X50286
K50002 K50004 K50006 K50008 K50010
K50012 K50014 K50016 K50018 K50020
K50022 K50024 K50026 K50028 K50030
K50032 K50034 K50036 K50053 K50055
K50061 K50064 K50067 K50069 K50072
K50073 K50075 K50076 K50078 K50079
K50080 K50082 K50083 K50084 K50086
K50087 K50089 K50091 K50092 K50094
K50095 K50097 K50100 K50228 K50231
K50232 K50248 K50251 K50252 K50261
K50263 K50265 K50267 K50269 K50271
K50273 K50275 K50277 K50279 K50281
K50283 K50286
X24103 X24703 X25852 X25854 X25859 X25861 X26005
X26008 X26012 X26015 X26017 X26023 X26025 X26027
X26029 X26032 X26034 X26036 X26041 X26045 X26046
X26048 X26105 X26134 X26108 X26137 X26111 X26140
X26114 X26143 X26117 X26146 X26120 X26149 X26123
X26152 X26126 X26155 X26202 X26203 X26206 X26207
X26216 X26218 X26220 X26222 X26224 X26228 X26232
X26317 X26334 X26404 X26432 X26318 X26335 X26405
X26433 X26323 X26342 X26424 X26457 X26325 X26344
X26426 X26459 X26327 X26346 X26418 X26446 X26328
X26347 X26419 X26447 X26412 X26414;
DO I = 1 TO DIM(AMT);
IF (0 < AMT{I} < 5) THEN AMT{I}=1;
IF (5 <= (AMT{I}) < 1000) THEN DO;
AMT{I}=MAX(1,ROUND(AMT{I},10));
END;
IF (1000 <= AMT{I} < 10000) THEN DO;
AMT{I}=ROUND(AMT{I},100);
END;
IF (10000 <= AMT{I} < 100000) THEN DO;
AMT{I}=ROUND(AMT{I},1000);
END;
IF (100000 <= AMT{I} < 1000000) THEN DO;
AMT{I}=ROUND(AMT{I},10000);
END;
IF (1000000 <= AMT{I}) THEN DO;
AMT{I}=ROUND(AMT{I},100000);
END;
IF AMT{I} > 25000000 THEN AMT{I}=25000000;
IF (-1000 < (AMT{I}) <= - 5) THEN DO;
AMT{I}=ROUND(AMT{I},10);
END;
IF (-10000 < AMT{I} <= -1000) THEN DO;
AMT{I}=ROUND(AMT{I},100);
END;
IF (-100000 < AMT{I} <= -10000) THEN DO;
AMT{I}=ROUND(AMT{I},1000);
END;
IF (-1000000 < AMT{I} <= -100000) THEN DO;
AMT{I}=ROUND(AMT{I},10000);
END;
IF .Z < AMT{I} <= -1000000 THEN
AMT{I}=-1000000;
END;