|
| 1. Introduction | |
| 2. Question Text, Variable Names, & Responses | |
| 3. Appendix |
DESIGN AND METHODS
The 1983 SCF sample consists of a randomly selected, nationally representative, area
probability sample of all U.S. households (cross-section sample) and a supplemental sample of
high-income households drawn from federal income tax files (high-income sample). The area
probability sample used the 1970 SRC National Sample frame. The sample was designed to represent
housing units in the coterminous United States (the 48 lower states) exclusive of those on
military reservations, nursing and rest homes, college dorms, jails, hotels, missions, convents,
monasteries, and other institutional quarters.[2] Housing units were selected by a multi-stage
procedure that samples successively smaller geographic areas. Probability selection was enforced
at all stages of sampling.
The first stage of the sample selection procedure was that used to draw the 1970 SRC National
Sample. The 1970 frame was used because full data from the 1980 Decennial Census were not
available at the time of the survey. The frame was adjusted, however, to represent 1980
population values. The 1970 SRC National sample was drawn by assigning all U.S. metropolitan
areas (SMSAs) and non-SMSA counties to 74 relatively homogeneous strata, and selecting one SMSA or
county (primary sampling unit or PSU) per stratum.[3] Twelve of these strata, representing New
York City and the other 11 largest SMSAs, contained only one PSU, and thus, were selected with
certainty.[4] From each of the remaining 62 strata, which contained between 2 and 200 PSUs, one
primary area was selected with a probability proportionate to population. Thirty-two of these
strata were SMSAs and 30 were strata made up of non-SMSA counties or county groups. The sample
was stratified by region of the country, so that each of the four major geographic regions --
Northeast, North Central, South, and West -- received representation in proportion to their
population. In order to reduce variances below those that would be obtained with ordinary
stratification, PSUs in each region were selected using a controlled selection method. This
method controlled the distribution of sample PSUs by state and degree of urbanization. It results
in a more geographically balanced sample and increases the precision of sample estimates relative
to a more conventional random design. The PSUs selected for the 1983 SCF were drawn from 37
states and the District of Columbia as shown in figure 1.
Response Rates
Total Area Probability High-income
Households Sample Sample
_______________________________________________________________________________
Original Sample units -- 6062 5000
Sample Households 5855 5396 459
Complete Interviews 4262 3824 438
Refusals* 1224 1209 15
Non-interviews** 369 362 6
Response Rate .73 .71 .95
Interview length (minutes) 75 74 87
______________________________________________________________________________
* Contact with household complete, and eligible respondent refused.
** Interview not obtained for reasons other than respondent refusal
(for example, household members on vacation or respondent health
problems preclude interview).
DATA CLEANING AND IMPUTATION
The actual responses given by respondents may contain missing or inconsistent information due
to respondents' misunderstanding, lack of knowledge, or unwillingness to answer certain questions.
These problems make analysis of the raw data difficult and, depending on the pattern of errors,
may bias conclusions. A series of consistency checks and imputation procedures was developed at
the Federal Reserve Board to clean the raw data and to estimate values for the missing data. The
computer code to implement these data cleaning and imputation procedures is over 40,000 lines
long. Specific information is given for each variable listed in this manual. The general
procedures are described below.
Overview:
Three basic methods were used to impute missing data. The first method computed
missing values by formulas based on respondent information that was closely related to the missing
items. For example, missing earned income could be imputed from reported wage rates, hours
worked, and work history. Asset income could be inferred using average rates of return if asset
values were given. Similarly, asset values could be estimated from reported asset income. Length
of unemployment coupled with the appropriate state benefit formula could be used to impute
unemployment income; and work history and Social Security benefit formulas could be used to impute
Social Security income. Where appropriate, random disturbances were added in making imputations.
The detailed data were also useful for resolving inconsistencies in reported values.
The second method assigned missing values on the basis of random draws from conditional
frequency distributions. This method was used primarily to impute missing values for variables
with discrete values. It was also used to estimate dollar amounts in a few cases in which a very
small number of missing values were present. A variant of this method involved using a
conditional mean together with information reported by the respondent to estimate the value of a
missing item. The amount borrowed for a first mortgage, for example, was sometimes estimated by
multiplying the purchase price of the house by the average loan-to-price ratio in the year of
purchase.
The third method estimated missing values by regression. Missing values were assigned the
value predicted by the regression plus a random disturbance term, which was generally assumed to
be a truncated log-normal variable with the same variance as the residual term of the regression.
This method was used to estimate most missing dollar amounts. Income and asset regression
imputations were done simultaneously, using an iterative technique, in order to preserve all
second moments.
Much more was involved in the process of preparing the data set than imputation of missing
values. In many instances respondents gave inconsistent answers to similar questions, which had
to be resolved. There were other instances where assets and/or debts appeared to be reported
twice. A more common type of problem involved the categories to which assets were assigned. For
example, many elderly respondents appeared to confuse Social Security and SSI. The assignment of
assets to money market accounts or SUPER NOW checking accounts was done inconsistently by
respondents. Many of these kinds of problems and inconsistencies resulted in data being changed
or moved.
The area probability and high-income samples were handled separately. Missing values for all
observations in the high-income sample were imputed. In the area probability sample, however, 159
of the original 3824 observations were discarded because virtually all dollar amounts for income
and assets were missing (this procedure is described in the next section). All missing values for
the remaining 3665 observations were imputed.
Most of the imputations for the 1983 survey were made entirely on the basis of 1983 data. In
order to preserve the appropriate inter-temporal correlations, however, some use was made of 1986
responses for those households responding to the 1986 SCF. Specifically, some 1983 values were
reimputed where respondents gave "hard data" in 1986 for an item which was missing in 1983.
Wages and Income:
Dollar amounts for each missing income source were imputed
separately, and
total family income was obtained by summing income sources. Missing earned income was estimated
using a wage index to deflate the current wages of the respondent and their spouse to 1982 levels.
If the current wages were missing as well, they were estimated using conditional mean tables
constructed from the March 1983 CPS. Missing values were assigned average log-wages for persons
of the same sex, race, age, and three-digit occupation code plus a random error term. The random
error term was chosen so that the correlation in wages between an individual's current and past
jobs was the same as that observed for the part of the sample with complete information. Missing
wage and business income of self-employed individuals were imputed separately using CPS data for
self-employed individuals. For the high-income sample, a similar procedure was used to impute
wage and business income. Missing values were assigned on the basis of occupation and age, with
random error terms added to the conditional means. The conditional means were obtained from the
high-income sample itself, however, not the CPS.
Unemployment compensation was imputed using state benefit rules and job information reported
in the employment section. Some pension and Social Security income was estimated using
information from the employment section, but most was imputed by regression. Welfare was imputed
by regression using state benefit formulas. Interest and dividend income was estimated by
multiplying dollar amounts of various types of assets held by the appropriate average yields for
those assets in 1982. Average yields were obtained from the Federal Reserve Bulletin and other
publications.
Assets:
Missing values for most financial assets were estimated either by capitalizing
interest and dividend income or by regression if these income sources were not reported. The
procedure used when both income and financial asset values were missing was an iterative one that
imputes income and asset values simultaneously. It builds maximum likelihood estimates of the
covariance matrix of the set of imputed variables conditioned on demographic and other variables
under the assumption that they are jointly log-normal and that given such information observations
are missing randomly. Imputed values are conditional expectations plus a random error.
Missing house values were imputed by inflating reported purchase prices by regional housing
price indices. Some house values, and all other missing real estate, life insurance, net equity
of businesses, and some land contracts were imputed by regression. Automobile values were imputed
by matching the year and make of the vehicle to the National Association of Automobile Dealers
"blue book" listing of used car values at the time of the survey.
Debts:
Respondents were asked to report dollar amounts outstanding for open-end and non-
installment closed-end debts. Missing dollar amounts were estimated by regression. Asset values,
income, and other demographic characteristics of the family were used as predictor variables.
Random errors were added to the predicted amounts outstanding.
Respondents reported credit terms -- payment size, frequency, amount borrowed, due date, and
interest rate -- for mortgages, land contracts, and other real estate loans. Some payments had to
be adjusted because the reported amount included taxes and insurance. The amount outstanding on
each loan could then be obtained by calculating the present value of remaining payments of
principal and interest. Missing credit terms were estimated using appropriate average interest
rates, maturities, and loan-to-price ratios published in the Federal Reserve Bulletin (the amount
outstanding could actually be computed even if one term was not reported, because of a
redundancy).
Similar information was collected for closed-end installment debt, although interest rates
were not reported. Sufficient information was available if all other terms were reported to solve
for the implied interest rate. The amount outstanding for each loan could then be computed in the
same fashion as used for mortgages. When one of the terms was missing or the computed interest
was implausible, an appropriate interest rate from the Federal Reserve Bulletin was assumed for
area probability sample observations. For high-income sample observations, missing values were
assigned on the basis of matches to other comparable loans in the high-income sample.
Summary Data on Imputation:
The impact of the imputation process used for the SCF is apparent
from the non-response rates given for selected assets and liabilities in table 2. Non-response
rates for ownership ("do you have a .. .?") and dollar amounts ("what is its value?") were
computed for three subsamples -- the high-income sample, the 3665 edited area probability
observations, and the 159 discarded area probability observations.
NON-RESPONSE RATES FOR SELECTED ASSETS AND LIABILITIES, 1983 SCF
EDITED HIGH-INCOME DISCARDED
CROSS-SECTION CROSS-SECTION
ASSETS OWNRSHP $ AMT OWNRSHP $ AMT OWNRSHP $ AMT
Principal residence .1 7.8 .0 1.4 .6 32.8
Other real estate (gross) .1 9.2 .0 3.0 2.5 35.3
Public stock 1.3 25.4 1.7 6.7 15.8 97.8
Bonds and trusts .7 24.7 2.4 6.2 8.5 82.9
Checking accounts .1 9.6 .5 4.7 5.0 71.7
Savings accounts .2 14.1 .7 4.8 8.2 89.6
Money market accounts .2 18.3 .3 9.3 5.0 93.1
Certificates of deposit 1.3 25.6 .9 9.0 15.2 97.4
IRA and Keogh accounts .3 11.9 .3 2.1 4.4 79.6
Savings bonds .1 17.4 .2 4.6 6.3 71.4
Life insurance cash value 2.4 71.7 .5 33.5 9.4 91.8
Business assets (net) .1 37.2 .5 17.6 4.4 96.9
Automobiles .0 -- .0 -- .1 --
DEBT
Automobile debt .7 4.6 .0 1.7 2.2 36.4
Consumer debt .2 5.6 .4 3.7 .6 59.1
Principal residence debt .3 9.6 .5 3.6 1.6 50.8
Other real estate debt 1.0 8.2 .0 3.9 7.7 40.0
IMPUTATION PERCENTAGES AND IMPLIED MEAN IMPUTATIONS, 1983 SCF*
HIGH-INCOME EDITED CROSS-SECTION
RESPONDENT IMPUTATION RESPONDENT IMPUTATION
DATA BY MEANS DATA BY MEANS
AS % OF IMPLIED % AS % OF IMPLIED %
TOTAL OF TOTAL TOTAL OF TOTAL
Principal residence 99.2 100.3 95.6 102.9
Other real estate (gross) 98.8 101.1 90.1 97.5
Public stock 95.6 100.2 88.4 104.7
Bonds and trusts 93.7 100.3 74.9 94.3
Checking accounts 96.5 101.1 92.6 101.1
Savings, CDs, money mkt.
Savings accounts 96.7 101.1 86.3 99.9
Money market accounts 95.7 100.9 84.3 98.1
Certificates of deposit 91.8 97.5 76.4 98.9
IRA and Keogh accounts 98.5 101.2 91.3 102.6
Savings bonds 99.0 100.7 89.9 106.9
Life insurance cash value 58.8 80.7 24.1 70.4
Business assets (net) 80.1 101.1 82.4 118.9
* Full sample weights are used for each subgroup
SAMPLE WEIGHTS
For a variety of reasons, final household survey data will rarely represent a random draw
from the U.S. population. Even if the original survey design were random, different types of
households will have different likelihoods of completing interviews. Very high and very low
income households, for example, are thought to be less likely to participate in surveys. The 1983
SCF had an additional source of non-randomness because of the inclusion of the high-income
supplement. These factors necessitated the construction of weighting variables to compensate for
known or estimable sources of stratification in the final data set. Several different weights
were constructed.
Inclusion of households in the final "cleaned" data set described in this manual results from
a series of implicit stratified selection criteria. There are four major sources of implicit
stratification. First, there are different response rates for different household types. This
occurs because households cannot be reached (up to six attempts to contact participants were made)
or because they refuse to participate when contacted. Second, the sample may not fully reflect
U.S. population due to sampling error in the survey itself (in the sense that any random sample
will not have exactly the same average characteristics as the population from which it is drawn).
Third, as mentioned above, not all survey observations were usable for analysis because of
significant missing information due to deliberate or inadvertent actions. Of the 3,824 area
probability sample survey households included in the original SRC data tape, 159 observations were
dropped in the cleaned file. Finally, the SCF sample is drawn from both the area probability and
high-income sampling frames. If both samples are used, it is necessary to properly mix them to
have a representative national sample.
There are a number of ways which can be used to compensate for these types of sample
stratification. The method employed here is to construct sample weights, which can be used to
adjust the final sample. Briefly, the weights were calculated as follows.
Area Probability sample weights:
The original SRC sampling frame called for sampling 6,062
housing units in the cross-section. If each housing unit had yielded an interview, each
observation would represent 13,333 housing units. In fact as shown earlier in table 1, only about
89 percent of the housing units were occupied, and of these, only about 71 percent (3824)
contained households who were willing to participate in the survey. Obviously, the exact
characteristics of the non-responding households are unknown; however, the response and occupancy
rate differed significantly across the survey's 75 PSUs. The "non-response" weight adjusts for
this differential non-response. This weight is equal to the reciprocal of the household response
rate for the PSU for which the household belongs. This weight should compensate for some
location-related characteristics associated with non-response and occupancy.
The second type of stratification cited above can be adjusted for by taking into account how
the final area probability sample (weighted for "non-response") compares with the population
according to certain select characteristics. Obviously the sample could be adjusted for any one
of a number of characteristics (age, income, size of household, region, etc.). SRC elected to
compensate for regional sampling error by computing a "post-stratification" weight which adjusts
the sample to have the same relative population as the 1980 census in the four major sample
regions (Northeast, South, North Central, and West) further subdivided into urban and rural. We
computed a similar weight, but post-stratified to the population and urban/rural definitions
represented by the March 1983 CPS.
The third type of stratification cited above can be adjusted for in two different ways. The
criteria for inclusion in the cleaned area probability sample was as follows. Any observation
with any dollar values given for earnings or income was automatically included. Remaining
observations were excluded only if there were virtually no dollar values given for housing value
(or rent), other property values, business, and financial assets. Information on debts was not
used in this section. Specific rules were employed to exclude observations whose ratio of missing
data to relevant questions exceeded a certain level. In practice those observations that were
dropped were missing values for virtually all dollar value questions
they were asked.[16] None of
the high-income sample observations were dropped. In fact, these observations were remarkably
clean, with comparatively little missing information.
To compensate for these exclusions, a probit model was fit to model the probability that an
observation in the complete 3824 area probability sample file would be included in the cleaned
3665-observation data set. Variables used in this model were selected as follows. Most asset
questions consisted of two parts. An initial screen question which asked if the respondent had an
asset, and a follow-up for those acknowledging ownership, asking how much they had. In virtually
every instance of missing values, it was the second question which was not answered rather than
the first. Thus, the probit model predicting inclusion in the cleaned sample used a number of the
first type of questions, as well as standard demographic data available for all observations, as
predictor variables.
The fitted probit equation results can be used in two ways. The inverse of the predicted
probability that an observation would have been included in the cleaned sample (conditioned on the
predictor variables used), can be used to weight observations in analysis. In practice, this
procedure implies higher weights for observations holding a large number of different assets.
An alternative way of using the probit results attempts to account explicitly for correlation
between inclusion in the sample and the error term of an estimated equation. A variable has been
created which is the expectation of the "error" in the inclusion equation, conditioned on the fact
that the observation indeed turned out to be included (the so-called Mills ratio). This variable
can be included as a regressor in estimated equations and will account for "sample selection"
bias under certain assumptions.
High-Income and full sample weights:
The weights for the combined sample containing both the
area probability and high-income samples were more difficult to compute, and are more subject to
question. Unfortunately full information on the high-income sampling procedure is not publicly
available. Additional complications stem from the fact that the high-income observations were
drawn from a 1980 sampling frame of taxpayers (but were contacted in 1983) and the fact that the
reporting basis for tax files (individuals or married couples) is not always the same as the
survey (families).
Initially, the Office of Tax Analysis used a statistical file set up by the IRS to construct
relative weights for the high-income sample (the sample was divided into nine cells each with a
separate weight). These weights are supposed to make the high-income sample representative of the
unknown full IRS sampling frame. SRC constructed sample distributions for the overlap of the area
probability and high-income samples (over $100,000 dollars in household income) to compute a
"meshed" combined sample weight. The SRC weight retained the relative weights of the high-income
sample and left the weight for area probability observations below $100,000 largely unchanged.[17]
Well after the initial SRC full sample weights were computed, information on the upper tail
of the distribution of 1982 taxpayers became available. When this was compared to reported 1982
household income in the SCF sample, it appeared that the SRC meshed weight may have given too much
weight to the high-income sample. As a consequence, another full-sample weight was constructed at
the Federal Reserve Board which used a different approach to combine the two samples. It was
decided to construct sampling weights for the high-income sample (and area probability
observations with income above a certain level) using a post-stratification scheme based on
control totals for an "extended" income measure constructed from the 1982 Tax Model File (TMF) of
the IRS. The TMF is a stratified sample of 88,218 individual tax returns with a significant over-
sampling of high incomes.[18]
This income measure, which was constructed for all survey households using reported 1982
income data, is roughly comparable to the IRS measure of adjusted gross income plus excluded
realized capital gains. Despite the fairly detailed income questions in the SCF, it is clear that
the survey measure of business income almost surely overstates the TMF measure. It appears likely
that survey respondents often report something much closer to a cash-flow concept of income rather
than income netted of expenses and depreciation. Unfortunately, there is not sufficient
information in either the SCF or the TMF to make a precise compensating adjustment. A gross
adjustment for the aggregate difference between the survey and TMF business income totals was made
in constructing the survey measure of extended income. Reported business income in the SCF was
deflated by about 40 percent so that IRS and adjusted survey aggregates of business income were
the same. This adjustment was quite ad hoc, however, and the potential for distortion at the
individual level remains, with weights for households with business income particularly suspect.
Because the reporting units in the survey and the TMF differ, the TMF data were adjusted in
order to estimate income on a family basis. Most of the high-income returns in the TMF are either
joint filings or those of single individuals. However, there are some married couples filing
separately, particularly in the $50,000 to $100,000 income range. These observations were
"aggregated" into households by assuming that separate filers were all married to people with the
same income (weights for such observations were halved). The final weights are only slightly
affected by variations in this adjustment.
Post-stratification cells were defined by the seven categories of extended income shown in
table 4. For each of the top six income cells (above $80,000), weights were determined so that
the weighted number of survey observations equaled the TMF totals. High-income observations were
assigned the cell average. Area probability sample observations were assigned relative weights
based on their cross-section weight, but scaled so that the mean for each of the six cells was the
same as that of the high-income observations. The original weights of the area probability
observations with income below $80,000 were adjusted so that the weighted number of SCF households
equaled the population estimated from the March 1983 CPS. High-income sample observations with
income below $80,000 were arbitrarily assigned the same weight as observations in the $80,000 to
$90,000 group.
FRB Full Sample Weight
Household Number of Number of TMF control Average
extended income Area Prob. High-income totals of weight
(dollars) cases cases households assigned
___________________________________________________________________________
under $80,000 3,579 49 82,364,760 22,703
$80,000-89,999 22 11 356,324 10,798
$90,000-99,999 13 16 250,746 8,646
$100,000-124,999 23 40 362,022 5,746
$125,000-199,999 16 92 356,386 3,300
$200,000-499,999 11 148 182,424 1,147
$500,000 and over 1 82 45,338 546
___________________________________________________________________________
All cases 3,665 438 83,918,000 20,453
Weighted and Unweighted Percentages
Unweighted Weighted
Area High- Full Area FRB
Probability Income Sample Probability Full
Age (head)
34 or less
married 17.9 1.4 16.1 17.0 16.9
unmarried male 6.0 .5 5.4 6.2 6.2
unmarried female 7.4 .2 6.7 7.6 7.6
35 to 44
married 13.6 12.6 13.5 13.6 13.6
unmarried male 1.8 .4 1.7 1.9 1.8
unmarried female 4.1 .2 3.7 4.1 4.1
45 to 54
married 10.7 22.8 12.0 10.6 10.5
unmarried male 1.6 2.3 1.7 1.7 1.7
unmarried female 3.2 .0 2.9 3.2 3.2
55 to 64
married 9.6 28.3 11.6 9.7 9.7
unmarried male 1.4 2.5 1.5 1.5 1.5
unmarried female 3.7 .5 3.3 3.7 3.7
65 or more
married 9.4 24.7 11.0 9.7 9.9
unmarried male 1.8 1.8 1.8 1.8 1.8
unmarried female 7.8 .9 7.1 7.7 7.7
Race
Caucasian 82.8 98.6 84.5 82.3 82.2
Nonwhite or
Hispanic 17.2 1.4 15.5 17.7 17.8
Family Income
< $10,000 25.0 .0 22.2 24.1 24.0
$10,000-$19,999 26.8 .0 23.9 26.7 26.8
$20,000-$29,999 19.2 .2 17.3 19.2 19.3
$30,000-$49,999 19.6 .9 17.5 19.8 19.7
$50,000-$99,999 7.4 7.8 7.5 8.1 8.2
>= $100,000 2.0 91.1 11.5 2.2 2.0
Family Net Worth
< $5,000 25.5 .0 22.8 25.3 25.3
$5,000-$24,999 18.3 .0 16.4 18.0 18.0
$25,000-$50,000 16.3 .0 14.6 16.0 16.0
$50,000-$99,999 17.6 1.4 15.8 17.3 17.2
$100,000-$249,999 14.3 2.3 13.0 15.0 14.7
$250,000-$999,999 7.0 26.0 9.0 7.3 7.1
>= $1,000,000 1.0 70.2 8.4 1.2 1.7
Homeownership 64.1 95.4 67.4 63.5 63.4
Education
of the Head
0 to 8 grades 15.2 .5 13.6 14.4 14.5
9 to 12 grades 46.1 5.0 41.7 45.0 44.9
some college 16.9 13.0 16.5 17.6 17.7
college graduate 21.7 81.5 28.1 22.9 22.9
Labor Force
Participation
single,
not working 17.1 1.8 15.5 16.9 17.0
single, working 21.7 8.4 20.3 22.4 22.4
married,
neither working 9.7 8.0 9.5 9.8 9.8
married,
one working 23.1 52.5 26.2 22.8 23.0
married, both
working 28.4 29.2 28.5 28.1 27.8
SAMPLE STATISTICS, 1983 SCF CROSS-SECTION ONLY
SUM STD ERR % GROSS OVERALL PERCNT MEAN MEDIAN
($ B) SUM ASSETS MEAN OWNING OWNERS OWNERS*
GROSS ASSETS $10,127.2 $692.8 100.0% $120,678 96.0% $125,693 $49,885
Principal
residence 3,623.8 87.5 35.8% 43,182 63.5% 68,002 52,342
Other real
estate (gross) 1,416.7 149.4 14.0% 16,882 18.8% 89,699 36,000
Public stock 642.8 127.8 6.3% 7,660 20.2% 37,860 4,696
Bonds and
trusts 344.9 57.9 3.4% 4,110 7.3% 56,421 11,000
Checking
accounts 119.2 6.9 1.2% 1,421 78.6% 1,808 500
Savings, cds,
money mkt. 959.2 44.2 9.5% 11,430 74.0% 15,445 3,500
Life insurance
cash value 262.1 16.0 2.6% 3,124 33.9% 9,205 3,428
Business
assets (net) 2,127.3 567.5 21.0% 25,349 13.9% 181,861 40,538
Automobiles 372.3 6.7 3.7% 4,436 84.4% 5,255 4,100
Miscellaneous 258.8 22.5 2.6% 3,084 15.5% 19,845 5,575
DEBT 1,444.6 69.6 14.3% 17,215 69.8% 24,655 10,797
Automobile
debt 92.2 3.6 0.9% 1,098 28.9% 3,797 3,052
Consumer debt 178.5 12.5 1.8% 2,127 54.9% 3,872 1,120
Principal
residence debt 855.3 29.4 8.4% 10,192 37.3% 27,355 21,468
Other real
estate debt 318.7 56.4 3.1% 3,797 8.2% 46,291 18,797
NET WORTH 8,682.5 666.4 85.7% 103,463 -- ---- 34,537
INCOME (GROSS) 2,233.5 46.8 -- 26,615 -- ---- 19,625
* For gross assets, net worth, and income, this is the overall median
SAMPLE STATISTICS, FULL SAMPLE 1983 SCF
SUM STD ERR % GROSS OVERALL PERCNT MEAN MEDIAN
($ B) SUM ASSETS MEAN OWNING OWNERS OWNERS*
GROSS ASSETS $11,572.2 $504.6 100.0% $137,899 96.0% $143,638 $49,588
Principal
residence 3,751.7 80.2 32.4% 44,706 63.4% 70,465 52,000
Other real
estate (gross) 1,689.9 225.1 14.6% 20,137 18.8% 106,997 37,765
Public stock 1,033.8 180.2 8.9% 12,319 20.5% 60,161 5,000
Bonds and
trusts 677.4 138.1 5.9% 8,072 7.6% 106,365 12,500
Checking
accounts 119.4 5.7 1.0% 1,423 78.6% 1,811 500
Savings, cds,
money mkt. 1,052.3 44.4 9.1% 12,539 74.0% 16,951 3,500
Life insurance
cash value 285.3 18.9 2.5% 3,400 34.1% 9,978 3,373
Business
assets (net) 2,276.3 292.0 19.7% 27,125 14.3% 190,025 45,768
Automobiles 373.7 6.5 3.2% 4,453 84.3% 5,282 4,085
Miscellaneous 312.5 25.4 2.7% 3,724 15.5% 24,073 5,506
DEBT 1,510.6 74.5 13.1% 18,001 69.6% 25,853 10,787
Automobile
debt 90.6 3.5 0.8% 1,080 28.6% 3,772 3,029
Consumer debt 228.8 26.8 2.0% 2,726 54.9% 4,969 1,117
Principal
residence debt 865.5 28.1 7.5% 10,314 37.0% 27,898 21,673
Other real
estate debt 325.6 57.2 2.8% 3,880 8.2% 47,317 18,797
NET WORTH 10,061.6 479.3 86.9% 119,898 -- ---- 34,466
INCOME (GROSS) 2,247.4 33.8 -- 26,781 -- ---- 19,523
* For gross assets, net worth, and income, this is the overall median
COMPARABILITY WITH OTHER SURVEYS
The 1983 Survey of Consumer Finances is the most recent survey in a series of surveys of
household finances conducted by the Survey Research Center of the University of Michigan. Surveys
of Consumer Finances were conducted annually from 1946 and 1970 but
were then discontinued.[20] In
1977, the Survey Research Center conducted a comprehensive household finance survey again, under
the sponsorship of the Federal Reserve Board and other federal bank
regulatory agencies.[21] The
same basic methods were used in all these surveys, although changes in sampling and interviewing
procedures were introduced from time to time to improve survey results. The content of the
surveys also changed over time because of shifts in interest in various aspects of consumer
finances. The early surveys, which were sponsored by the Federal Reserve Board, were concerned
with the effect of the postwar accumulation of liquid assets on consumer spending. Mortgage and
consumer credit received greater attention in later surveys, and a major part of the 1977 survey
was concerned with the effect of federal regulations on consumer credit use. Nevertheless,
several areas of inquiry were followed through much of the 1946-77 period, and results from the
earlier surveys are generally comparable to those from the area probability sample for the 1983
SCF.
The Federal Reserve Board also sponsored the Survey of Financial Characteristics of Consumers
(SFCC) in 1963, with a followup reinterview in 1964.[22] Methodological work for this survey was
conducted by the SRC, and interviewing was performed by the Bureau of the Census. Like the 1983
SCF, the 1963 SFCC collected a more detailed inventory of assets and liabilities than is customary
in other consumer surveys. The 1963 survey also used Federal tax information to oversample high-
income households. For the 1963 survey, a sample of housing units stratified by income reported
in the 1960 Decennial Census was selected to represent households with incomes below $50,000.
Households with incomes of $50,000 or more were selected from a sample of 1960 Federal income tax
returns. Although this sample selection procedure is not exactly the same as that used for the
1983 survey, it produced a heavy over-sampling of households in the upper end of the income
distribution, making the 1963 sample the only household survey sample that is comparable to the
full sample from the 1983 SCF.
The Federal Reserve has also sponsored several recent surveys on the use of different means
of payment for household purchases. The Survey of Currency and Transaction Account Usage,
conducted by the SRC in the summer of 1984, solicited detailed information on the use of checking,
savings, and money market accounts for about 2,000 households. Data were also collected on the
use of currency, credit cards, money orders, and electronic banking services. In 1986, the survey
was repeated for a smaller sample.[23]
Wealth information is also available from other sources. Data from the Internal Revenue
Service for federal estate tax returns have been used to estimate total household wealth and its
percentage distribution. Unfortunately, data from this source are available only in aggregate
form, with very limited demographic breakdowns. Another source is the 1979 Income Survey
Development Program of the Department of Health and Human Services, which provides information for
a sample of households larger than that of most other surveys of wealth. The New York Stock
Exchange has also periodically conducted surveys of household stockholders, doing a survey at a
time comparable to the SCF in 1983, and more recently, in 1985. Wealth data was also collected
for respondents to the Panel Study of Income Dynamics (PSID) in 1984.
The most comprehensive recent survey of household wealth was conducted in 1984 (and repeated
in 1985) by the Bureau of the Census on participants in the Survey of Income and Program
Participation (SIPP). This survey solicited information similar to the SCF for a very large
sample of households. Its initial panel was a random cross-section of about 21,000 households
selected by procedures similar to those used to select the area probability sample for the 1983
SCF. Net worth information was collected between September and
December 1984.[24] Aggregate wealth
estimates from the earlier Surveys of Consumer Finances and SIPP are generally comparable to those
from the area probability sample of the 1983 SCF in their understatement of aggregate wealth
relative estimates from independent sources. Using comparably defined categories, we estimate an
aggregate net worth for the SCF area probability sample of $8,277 billion versus a $7,740 billion
total for the SIPP sample. The difference derives primarily from a smaller estimate of small
business assets in the SIPP. The full-sample SCF estimate of the same net wealth concept is
$9,610 billion. Thus, it appears that the major difference between the two surveys arises from
the inclusion of the high-income sample in the SCF.
The annual March Current Population Survey is perhaps the most comprehensive U.S. household
economic survey, soliciting economic information from approximately 59,000 households (U.S. Bureau
of the Census (1984)). The representativeness of the SCF is demonstrated by a comparison of the
sample distribution of various demographic variables for the SCF and comparable March 1983 CPS
survey in table 8. The CPS data are given for "primary families" defined comparably to families
in the SCF. As can be seen, the SCF has a very similar distribution for most items.
A Comparison of the 1983 SCF and CPS
SCF CPS
Number Weighted Number Weighted
of cases share of cases share
Age (head)
34 or less
married 661 16.9 9922 16.4
unmarried male 223 6.2 3435 5.8
unmarried female 273 7.6 4105 7.2
35 to 44
married 555 13.6 7830 13.0
unmarried male 71 1.8 1388 2.4
unmarried female 151 4.1 2165 3.7
45 to 54
married 492 10.5 6253 10.3
unmarried male 70 1.7 912 1.6
unmarried female 118 3.2 1735 3.8
55 to 64
married 475 9.7 5967 10.3
unmarried male 62 1.5 897 1.5
unmarried female 136 3.7 2130 3.8
65 or more
married 452 9.9 5532 9.5
unmarried male 74 1.8 1437 2.4
unmarried female 290 7.7 5293 9.2
Race
Caucasian 3468 82.3 47515 82.4
Nonwhite or Hispanic 635 17.7 11486 17.6
Family Income
less than $10,000 912 24.0 15053 25.2
$10,000 to $19,999 982 26.8 15580 26.0
$20,000 to $29,999 711 19.3 12072 20.5
$30,000 to $49,999 717 19.7 11533 19.8
$50,000 to $99,999 309 8.2 4480 7.9
$100,000 or more 472 2.0 283 .5
Homeownership 2766 63.4 38320 64.9
Education of the Head
0 to 8 grades 560 14.5 9155 14.8
9 to 12 grades 1713 44.9 27269 46.4
some college 678 17.7 10355 17.6
college graduate 1152 22.9 12222 21.3
Labor Force Participation
single not working 635 17.0 11130 19.2
single, working 833 22.4 12367 21.3
married, neither working 389 9.8 7088 12.0
married, one working 1077 23.0 14023 23.4
married, both working 1169 27.8 14393 24.0
Totals 4103 100.0 59001 100.0
Comparisons of Income Measured in SCF, CPS, and IRS Data
1982 INCOME 1982 INCOME 1982 INCOME 1982 INCOME
1983 SCF CPS 1983 SCF IRS DATA
TAXABLE INC
($ B) ($ B) ($ B) ($ B)
Salaries and wages 1,393.7 1,443.5 1,385.7 1,564.6
Business or farm income 291.3 110.5 290.4 53.7
Taxable interest income 98.5 95.1 95.9 157.2
Dividend income ---- ---- 46.7 54.2
Net gains from stocks ---- ---- 50.4 24.3
Rental or trust income ---- ---- 54.8 -2.1
Dividends/trust/
rental total 102.9 47.3 ---- ----
Welfare or public
assistance 23.2 17.4 ---- ----
Unemployment or
workman's comp 20.6 32.8 ---- ----
Alimony or child
support 35.6 21.4 ---- ----
Retirement income 194.6 204.3 94.0 59.9
Category totals 2,160.4 1,972.3 2,017.9 1,911.8
PENSION PROVIDER SURVEY
1986 SURVEY OF CONSUMER FINANCES
MANUAL INSTRUCTIONS
In the remainder of the manual, information is given on all the variables included in the
final data set. A brief description is given for each variable along with information on
imputation and a listing of the values that the variable takes on. The question number
corresponding to the actual survey questionnaire (e.g. R9) is also given for all variables except
recodes. Variables are listed by number, with the numeric code used as the basis of the
variable's internal name in the survey's SAS data set. All variables listed here have a "B"
prefix followed by a four digit number ranging from 3001 to 5749. The original uncleaned survey
responses are contained in variables with a "V" prefix. These variables range from V1 to V2613.
The codes for the "V" variables are described in the original survey codebook released by the SRC.
We note, though, that the "B" variables contain all the same types of information as those
contained in the "V" variables. Thus, for most analyses, it is possible to use the "B" variables
without reference to the "V" variables.
The range of allowable values of the variables is also given. The symbol xxxx is used for
continuous variables with a statement of the units used and the sample range. For discrete
variables with a small number of allowable codes, all possible values and their meanings are
listed. The number of sample cases (out of the 4103 "cleaned" observations) taking on each value
of discrete variables is also given. If the listing is for several variables (such as the 1st,
2nd, and 3rd automobiles), then the case totals are given for the listed variables, in order,
separated by slashes (e.g. 123/45/87 cases). Although useful in giving a flavor of the
distribution of responses to questions, the case listings should not be used for statistical
purposes, as they are unweighted distributions.
Most of the information collected for the 1983 SCF applies to the full family unit. Some
information, however, such as employment, education, health, and pension income, was collected
individually for the survey respondent and their spouse (if they had one). For married couples,
the respondent could have been either the husband or the wife. For ease of use in analysis most
of the person-specific variables in the cleaned data set have been arranged as "head" and "spouse,"
(where head is always the husband for married couples), instead of "respondent" and "spouse." It
is easy to switch data back, if desired, by using the variable B3122 which indicates whether the
respondent was the head or the spouse.
Several different codes are used in the data set, including:
ACKNOWLEDGEMENTS
Many people contributed to the 1983 Survey of Consumer Finances. Sampling, field work,
editing, and coding were conducted by the Survey Research Center of the University of Michigan
under the direction of Richard T. Curtin. Mr. Curtin also oversaw the cleaning and processing of
the Pension Provider Survey data. Other SRC staff also made substantial contributions. Mary
Grace Moore and Lisa Poole coordinated much of the questionnaire development and data processing
and editing procedures. Steve G. Heeringa supervised the sample design and drawing of the SRC
sample. Field work was supervised by Nancy Gebler. Coding and editing staff were supervised by
Joan Scheffler. Much of the field work, and the development of the coding instrument for the
Pension Provider Survey was done by Mathematica Policy Research. Miles Maxfield and Tim Carr
supervised these activities.
Thomas A. Gustafson, of the Department of Health and Human Services co-authored the
questionnaire, taking primary responsibility for the pension questions and overseeing the
development of the Pension Provider Survey. Arthur Kennickell, of the Federal Reserve Board,
developed the Federal Reserve Board weights, helped in writing this manual, and played a critical
role in the later stages of data cleaning and imputation. He also co-authored the 1986 Survey of
Consumer Finances.
Many individuals helped in the development of the survey instrument. Particularly noteworthy
contributions were made by Glenn B. Canner and James T. Fergus (Federal Reserve Board); Janet
Gordon, Melanie Quinn, Peter Struck (Office of the Comptroller of the Currency); Daniel J.
Villegas (Federal Trade Commission); Walter Kolodrubetz (Department of Labor); and Nelson McClung
(Office of Tax Analysis). Comments and helpful suggestions were also received from Emily S.
Andrews, Stuart B. Avery, Marshall E. Blume, Thomas A. Durkin, Robert M. Fisher, Gary Gilbert,
Arnold A. Heggestad, Malcolm Jensen, F. Thomas Juster, Robert W. Johnson, Myron Kwast, Barbara R.
Lowrey, Charles A. Luckett, Olivia S. Mitchell, Dorothy S. Projector, Lawrence H. Summers, Cameron
Whiteman, and John D. Wolken. Tom Petska, Fritz Scheuren, and Dan Skelly, of the Statistics of
Income Division of the Internal Revenue Service, provided the high-income sample and weights.
Research assistance for data cleaning and imputation at the Federal Reserve Board was provided by
Aliki Antonatos, Oscar Barnhardt, Phoebe Roaf, Julie Rochlin and Julia Springer. Additional
assistance was provided by Neil Briskman, William Carbaugh, M. Elizabeth Crowell, Charlotte
Jackson, Scott Hedges, Pat Ma, Elaine Peterson, Missi Reinkemeyer, Bob Schmitt, Paul Hughes-
Cromwick, and Sharon Ward.
ENDNOTES
1. The interview questionnaire for the household survey was
prepared by Robert B. Avery and Gregory E. Elliehausen, of the Federal
Reserve Board, and Thomas A. Gustafson, of the Department of Health
and Human Services, with assistance from staffs of the sponsoring
agencies. Field work and editing and coding of survey responses was
performed under the direction of Richard T. Curtin of the SRC.
Mr. Curtin and Timothy Carr and Miles Maxfield, of Mathematica Policy
Research, administered the Pension Provider Survey. The Statistics of
Income Division of the Internal Revenue Service, provided the
high-income sample.
2. A household consists of all the persons who occupy a housing
unit or dwelling. Persons missed by the survey will be
disproportionately young, because they are college or the military,
and old, because they are in nursing homes. The later omission,
effecting an estimated 1.4 million people, is the most serious in
terms of wealth measurement. However, the failure to include a large
number of younger persons is likely to effect the long run
representativeness of the SCF when used as a panel.
3. Non-SMSA counties with less than 2,000 population were linked
with adjacent counties to form multi-county PSUs. The SCF sample used
the 1970 SRC sampling frame which was selected from a national
population of 2,700 PSUs, of which 12 were self-representing. In
addition to SMSA status, the 62 nonself-representing strata were
designed to take into account the location (Census Region), the
population, size of the largest city, and percent manufacturing
(urban) or agricultural (rural) employment of each area. In the South
Region, the percent black population and a special domain distinction
labeled "the Deep South" (South Carolina, Georgia, Alabama,
Mississippi, and Louisiana) was also used. By design, the nonself-
representing strata are of approximately equal size each totaling
between 1.7 and 2.5 million population in 1970.
4. Because New York City is so large, it is treated as though it is
made up of two PSUs. Thus, we treat the sample as having 75 PSUs, not
74.
5. The sampling of the second stage units (SSU's) was performed
with probability proportionate to size as measured by the 1970 Census
count of year-round housing units. In most areas, the largest or
"central" city of each sample PSU was included with certainty. The
second stage was dropped from the 1980 SRC National Sample design.
6. The units selected in the third stage are termed "chunks." In
urbanized areas chunks are defined to be housing units within the land
area given by a Census Block. However, Blocks with less than 16 year-
round housing units were combined with adjacent Blocks to meet a
minimum 16 unit size. In rural areas, chunks were defined to be
compact parcels of land with clearly recognizable physical boundaries
(roads, rivers, rail lines, etc.) selected with an expected count of
24 year-round housing units. Within SSU's the sample of chunks were
selected with probability proportionate to their number of year-round
housing units.
7. Once the third stage of selection was complete, SRC personnel
performed a complete listing of all housing units within the physical
boundaries of each chunk. For the 1983 SCF about three-fourths of the
chunks has been previously listed for other surveys; thus only
updating of the listings were necessary. These lists formed the basis
of selection for the fourth stage of sampling.
8. Housing units were selected randomly within each chunk. The
sampling rate was set inversely proportional to the number of year-
round housing units within the chunk as determined by the listings.
9. These procedures followed fairly standard SRC methods. For a
more detailed description of these methods, see Irene Hess, Sampling
for Social Research Surveys: 1947-1980, Ann Arbor: Institute for
Social Research, 1985.
10. Unfortunately, because of legal restrictions, knowledge of the
exact sampling procedures is restricted to employees of the SOI. The
sample drawn appears to roughly coincide with individuals having an
"adjusted gross income," modified for full capital gains and other
exclusions, of $100,000 or more in 1980.
11. Actually only the 12 self-representing and 31 of the 62 nonself-
representing PSUs were used for the high-income sample list. The
decision to exclude high-income households in the remaining nonself-
representing PSUs was based on a joint consideration of survey costs
and the relatively small expected size of the high-income sample.
Because the SOI listings were by address, and the area probability
PSUs were defined by county, some slight approximations were used in
defining the SOI sample. The actual SOI sample was defined by the ZIP
codes corresponding to the SRC sample counties, with the county
location of the main post office in a ZIP code used when county and
ZIP code boundaries did not correspond exactly.
12. The overall response rate of the high-income mailing (9 percent)
may not be quite as bad as it appears. SOI typically has response
rates of no more than 20 to 30 percent even for mailings extremely
favorable to the respondent. The low 1983 SCF response rate was also
caused by the failure to send a followup letter.
13. These procedures differ slightly from the procedures normally
used for the selection of household respondent by SRC. Generally,
only the economic dominance and age closest to 45 criteria are used.
14. Rates for income non-response were much higher. 1.8 percent of
the high-income sample gave no income data, and an additional 4.6
percent gave only partial data. Comparable figures for the edited
area probability sample were 5.5 percent and 7.1 percent. Only 2
percent of the discarded area probability sample respondents gave any
income data, and these respondents gave only partial data.
15. Means were computed separately for the high-income and cross-
section sample on an item-by-item basis and were based only on
respondents who gave dollar values.
16. One household was discarded which did not meet these criteria
because it reported more than a billion dollars in assets and appeared
to be an insincere interview.
17. See Steven G. Heeringa and Richard T. Curtin, "Household Income
and Wealth: Sample Design and Estimation for the 1983 Survey of
Consumer Finances," Statistics of Income and Related Record Research
1986-1987, Internal Revenue Service 1987.
18. See Michael Strudler, General Descriptive Booklet for the 1982
Individual Tax Model File, Statistics of Income Division, Internal
Revenue Service, 1983.
19. An estimate of the standard error due to sampling of the
estimated aggregate of each asset and liability category is given in
column 2. These figures were computed by calculating the sample
variance of each item within each sampling unit (75 area probability
PSUs and nine high-income categories). Assuming independence of
sample draws across each of these cells, the variance of an asset or
debt category total was then calculated as the sum of the variances of
each item included in that category weighted by the cell populations.
Because these estimates take the sampling weights as fixed they are
likely to understate the true sampling variance of the weighted sums.
20. See, for example, George Katona, Louis Mandell, and Jay
Schmeideskamp, 1970 Survey of Consumer Finances, Ann Arbor: Institute
for Social Research: 1971.
21. See Thomas A. Durkin and Gregory E. Elliehausen, 1977 Consumer
Credit Survey, Washington D.C. :, Board of Governors of the Federal
Reserve System, 1978.
22. See Dorothy S. Projector and Gertrude S. Weiss, Survey of
Financial Characteristics of Consumers, Washington D.C.: Board of
Governors of the Federal Reserve System, 1966.
23. See Robert B. Avery, Gregory E. Elliehausen, Arthur B.
Kennickell, and Paul A. Spindt, "The Use of Cash and Transaction
Accounts by American Families," Federal Reserve Bulletin 72 (February
1986): pp. 87-108; and Robert B. Avery, Gregory E. Elliehausen, Arthur
B. Kennickell, and Paul A. Spindt, "Changes in the Use of Transaction
Accounts and Cash from 1984 to 1986," Federal Reserve Bulletin 73
(March 1987): pp. 179-196.
24. Detailed discussion of the survey findings can be found in
"Household Wealth and Asset Ownership: 1984," Household Economic
Studies Series P-70, No. 7 (July 1986), Bureau of the Census; and
John M. McNeil and Enrique J. Lamas, "Year-Apart Estimates of
Household Net Worth from the Survey of Income and Program
Participation," NBER Conference on Research in Income and Wealth,
Baltimore, 1987. Richard T. Curtin, F. Thomas Juster, and James N.
Morgan, "Survey Estimates of Wealth: An Assessment of Quality," NBER
Conference on Research in Income and Wealth, Baltimore, 1987, provide
a detailed comparison of the PSID, SIPP, and 1983 SCF wealth data.
25. The 1986 SCF was co-sponsored by the Federal Reserve Board, the
Department of Health and Human Services, The Office of the Comptroller
of the Currency, and the General Accounting Office.
Top of page
| Next: Question Text, Variable Names, & Responses