1983 SURVEY OF CONSUMER FINANCES: TECHNICAL MANUAL AND CODEBOOK Robert B. Avery and Gregory E. Elliehausen Board of Governors of the Federal Reserve System April 1985 Last Revision: February 15, 1990 SURVEY DESCRIPTION AND OVERVIEW INTRODUCTION There have been many changes in financial markets during the last decade. Inflation and interest rates increased sharply in the late 1970s and then fell after recessions in 1980 and 1981-82. Major financial innovations occurred, such as the introduction of money market funds, and the regulation of financial markets altered dramatically after enactment of the Depository Institutions Deregulation and Monetary Control Act of 1980. To assess the effects of these changes on the financial position of households, the Board of Governors of the Federal Reserve System, the U.S. Department of Health and Human Services, the Federal Deposit Insurance Corporation, the Office of the Comptroller of the Currency, the Federal Trade Commission, the U.S. Department of Labor, and the U.S. Treasury, Office of Tax Analysis joined together to sponsor the 1983 Survey of Consumer Finances (SCF). The 1983 SCF collected data on the assets and liabilities of a nationally representative sample of U.S. households. The survey was designed to provide detailed information on the characteristics of different types of assets or liabilities and on the institutional sources of these balance sheet items. Data on reasons for various financial choices and attitudes toward financial risk, liquidity, and credit use were also collected to help explain household behavior. In addition, the 1983 survey provides information that permits estimation of pension and Social Security wealth. Such information is not generally available from other data sources. Interviewing for the 1983 Survey of Consumer Finances was conducted by the Survey Research Center (SRC) of the University of Michigan between February and August of 1983.[1] The survey sample consists of 3,824 randomly selected U.S. households and a supplemental sample of 438 high- income households drawn from federal income tax files. The supplemental high-income sample provides better representation of the upper tail of the wealth distribution than that provided by most other surveys. In conjunction with the household survey, a survey was conducted with the pension providers of those households reporting pension coverage. The Pension Provider Survey collected the formulas for computing pension benefits for 75 percent of applicable households with pensions (1,421 households). In the Summer of 1986, 2,822 of the 1983 SCF respondents were found and administered a limited telephone reinterview, offering the opportunity to study household savings over the intervening three years. This manual describes the set of "recoded, edited, and imputed" variables developed at the Federal Reserve Board. It documents the procedures used for editing the raw survey responses, the statistical methods used for imputing missing data, the construction of new variables from the original variables, and the addition of new variables which have been created by matching information from other data sources. The manual also presents technical material on the survey's design and weights. It should be used in conjunction with an earlier release of the SRC (dated 7/11/84), which describes the variables representing the raw responses of the survey respondents. The interviewers instruction manual, also released by SRC also provides useful information. For a summary of some of the basic results of the survey, the reader is referred to the following publications: Robert B. Avery, Gregory E. Elliehausen, Glenn B. Canner, and Thomas A. Gustafson, "1983 Survey of Consumer Finances," Federal Reserve Bulletin, 70 (September 1984): pp. 679-92. Robert B. Avery, Gregory E. Elliehausen, Glenn B. Canner, and Thomas A. Gustafson, "1983 Survey of Consumer Finances: A Second Report," Federal Reserve Bulletin, 70 (December 1984): pp. 857-868. Robert B. Avery and Gregory E. Elliehausen, "Financial Characteristics of High-Income Families," Federal Reserve Bulletin, 72 (March 1986): pp. 163-77. Robert B. Avery, Gregory E. Elliehausen, and Arthur B. Kennickell, "Measuring Wealth with Survey Data: An Evaluation of the 1983 Survey of Consumer Finances," Review of Income and Wealth, March 1989. The data from the 1983 SCF, including the high-income sample and the Pension Provider Survey, are available on request from the National Technical Information Service, 5283 Port Royal Road, Springfield, Virginia 22161 (telephone 703-487-4600). DESIGN AND METHODS The 1983 SCF sample consists of a randomly selected, nationally representative, area probability sample of all U.S. households (cross-section sample) and a supplemental sample of high-income households drawn from federal income tax files (high-income sample). The area probability sample used the 1970 SRC National Sample frame. The sample was designed to represent housing units in the coterminous United States (the 48 lower states) exclusive of those on military reservations, nursing and rest homes, college dorms, jails, hotels, missions, convents, monasteries, and other institutional quarters.[2] Housing units were selected by a multi-stage procedure that samples successively smaller geographic areas. Probability selection was enforced at all stages of sampling. The first stage of the sample selection procedure was that used to draw the 1970 SRC National Sample. The 1970 frame was used because full data from the 1980 Decennial Census were not available at the time of the survey. The frame was adjusted, however, to represent 1980 population values. The 1970 SRC National sample was drawn by assigning all U.S. metropolitan areas (SMSAs) and non-SMSA counties to 74 relatively homogeneous strata, and selecting one SMSA or county (primary sampling unit or PSU) per stratum.[3] Twelve of these strata, representing New York City and the other 11 largest SMSAs, contained only one PSU, and thus, were selected with certainty.[4] From each of the remaining 62 strata, which contained between 2 and 200 PSUs, one primary area was selected with a probability proportionate to population. Thirty-two of these strata were SMSAs and 30 were strata made up of non-SMSA counties or county groups. The sample was stratified by region of the country, so that each of the four major geographic regions -- Northeast, North Central, South, and West -- received representation in proportion to their population. In order to reduce variances below those that would be obtained with ordinary stratification, PSUs in each region were selected using a controlled selection method. This method controlled the distribution of sample PSUs by state and degree of urbanization. It results in a more geographically balanced sample and increases the precision of sample estimates relative to a more conventional random design. The PSUs selected for the 1983 SCF were drawn from 37 states and the District of Columbia as shown in figure 1. Figure 1 The SCF Sample (see appendix) PSUs were divided into successively smaller sampling units in subsequent stages: cities, towns, census tracts, minor civil divisions, and rural areas (second stage);[5] urban blocks and rural areas containing 16 to 40 housing units (third stage);[6,7] and individual housing units (fourth stage).[8] All of the housing units selected in the fourth stage were included in the sample.[9] The area probability sample design was targeted to reach 6,057 housing units; which, assuming an occupancy rate of .93 would yield 5,633 sample households, and, with a response rate of .71, 4,000 completed interviews. The high-income sample was drawn from a large sample of 1980 federal income tax returns created by the Statistics of Income Division (SOI) of the Internal Revenue Service (IRS). The procedures for obtaining the high-income sample were designed to shield the identity of survey respondents from the government and to preserve the confidentiality of tax returns. Using multi-faceted sampling criteria, the Internal Revenue Service selected about 5,000 returns of high-income taxpayers that were estimated to have large amounts of wealth.[10] This sample was drawn from the same primary areas that were selected for the area probability sample.[11] The Comptroller of the Currency sent letters to these 5,000 individuals requesting participation in the survey. Four hundred fifty-nine households notified the Comptroller of the Currency that they were willing to participate. This information was forwarded to SRC, which included these households in the survey sample.[12] Under these procedures the IRS never knew the names of the final respondents. The SRC did not know the names of the high-income individuals who were not willing to participate in the survey, nor did they have access to tax data for survey participants. The Office of the Comptroller of the Currency, which can legally receive such information from the IRS as a fellow member of the Department of the Treasury, only knew that the 5,000 individuals had relatively high incomes; it did not have access to any tax information for these individuals. The same questionnaire was used to interview respondents in both the area probability and high-income samples, and field interviewers were not told which households were part of the high-income sample. The interview was conducted in person. Table 1 shows the response rates, final sample sizes, and average interview length for both the area probability and high-income samples. The area probability sample is somewhat smaller than targeted (4,000). The shortfall is due primarily to a higher than expected vacancy and seasonal residence rate in the selected sample. Table 1 Response Rates Total Area Probability High-income Households Sample Sample _______________________________________________________________________________ Original Sample units -- 6062 5000 Sample Households 5855 5396 459 Complete Interviews 4262 3824 438 Refusals* 1224 1209 15 Non-interviews** 369 362 6 Response Rate .73 .71 .95 Interview length (minutes) 75 74 87 ______________________________________________________________________________ * Contact with household complete, and eligible respondent refused. ** Interview not obtained for reasons other than respondent refusal (for example, household members on vacation or respondent health problems preclude interview). For both samples, selection of the survey respondent within a household unit (or dwelling), followed a set of pre-determined rules. Survey interviewers first tried to determine the individual within the household who was "most knowledgeable" about the household's financial affairs. If this could not be determined, interviewers tried to determine the household's economically "dominant" or "mainstream" individual, i.e. the individual owning the house, or having significantly more income. This individual was selected as the survey respondent. In the event that such an individual could not be identified, such as the case of unrelated roommates sharing living quarters, the household resident closest to age 45 was selected as respondent. If the selected respondent refused to conduct the interview, survey procedures allowed the substitution of their spouse, but no other household member.[13] Once the survey respondent was ascertained, the survey sampling unit was set as the family of that individual, termed the "primary family" of the household. A family is defined as all individuals residing together in the same household who are related to each other by blood, marriage, or adoption. The primary family could be a one-person unit (if the respondent did not live with relatives) as well as a unit of two or more individuals. Individuals having "partners" or other common-law relationships were treated as married. The Census Bureau definition of a household is the same as that of the SRC. The Census Bureau, however, distinguishes between families of two or more persons and single person units. The Census defines a family as two or more individuals residing together in the same housing unit who are related to each other by blood, marriage, or adoption. Unrelated individuals are defined as individuals living by themselves or with unrelated persons. For each household, the householder is defined as owner, renter, or economically dominant individual. Any families which do not contain the householder are defined as unrelated subfamilies. Using these definitions, the SRC primary family was virtually always the householder by themselves (if an unrelated individual) or the family of the householder. Since the SRC interviewed one primary family per housing unit, the number of households will be the same as the number of primary families (we use the terms household and family interchangeably in this manual). Interviews were sought with all primary family units; however, no other families or individuals in the household, such as roomers, boarders, servants, or roommates, were interviewed (although their age and sex is noted in the household listing). The excluded individuals would be those in unrelated subfamilies and non-householder, unrelated individuals as defined by the Census. Rough calculations suggest that aggregate wealth totals derived from the survey are only slightly lower (.4 percent) because of the failure to interview all household members. SCF household listings indicate that non-interviewed individuals represent about 4,818,000 people. For the March 1983 Current Population Survey (CPS), the Bureau of the Census estimated that there were a total of 6,678,000 such persons living in 4,600,000 families, and that they earned 3.8 percent of total household income. About one-half of the survey respondents were women including about 40 percent of the cases of married households (the high-income sample respondents, though, were virtually all male). Both spouses were present during about 40 percent of the married household interviews, however, and callbacks were used in a significant portion of the remaining cases to solicit information about a spouse not present at the interview. Information on the ownership of assets and liabilities was solicited only at the family level, and thus cannot be allocated to specific individuals. Income on earnings, pensions, education, and work history was solicited for both the respondent and their spouse. Interviewing for the survey was carried out by the Survey Research Center from February through July 1983. A complete, detailed inventory of household assets and debts including businesses, pensions, properties, and financial items was collected. In addition, a comprehensive work history and demographic data were collected for primary family members. Additional questions focused on the use of financial services, specific income sources, savings and borrowing attitudes, health, and inheritance. Particular attention was paid to pension assets, including detailed questions on types of plans, expected benefits, structure of vesting, contributions, and current values of identifiable accounts. The manual itself provides a detailed list of the questions asked as part of the survey. DATA CLEANING AND IMPUTATION The actual responses given by respondents may contain missing or inconsistent information due to respondents' misunderstanding, lack of knowledge, or unwillingness to answer certain questions. These problems make analysis of the raw data difficult and, depending on the pattern of errors, may bias conclusions. A series of consistency checks and imputation procedures was developed at the Federal Reserve Board to clean the raw data and to estimate values for the missing data. The computer code to implement these data cleaning and imputation procedures is over 40,000 lines long. Specific information is given for each variable listed in this manual. The general procedures are described below. Overview: Three basic methods were used to impute missing data. The first method computed missing values by formulas based on respondent information that was closely related to the missing items. For example, missing earned income could be imputed from reported wage rates, hours worked, and work history. Asset income could be inferred using average rates of return if asset values were given. Similarly, asset values could be estimated from reported asset income. Length of unemployment coupled with the appropriate state benefit formula could be used to impute unemployment income; and work history and Social Security benefit formulas could be used to impute Social Security income. Where appropriate, random disturbances were added in making imputations. The detailed data were also useful for resolving inconsistencies in reported values. The second method assigned missing values on the basis of random draws from conditional frequency distributions. This method was used primarily to impute missing values for variables with discrete values. It was also used to estimate dollar amounts in a few cases in which a very small number of missing values were present. A variant of this method involved using a conditional mean together with information reported by the respondent to estimate the value of a missing item. The amount borrowed for a first mortgage, for example, was sometimes estimated by multiplying the purchase price of the house by the average loan-to-price ratio in the year of purchase. The third method estimated missing values by regression. Missing values were assigned the value predicted by the regression plus a random disturbance term, which was generally assumed to be a truncated log-normal variable with the same variance as the residual term of the regression. This method was used to estimate most missing dollar amounts. Income and asset regression imputations were done simultaneously, using an iterative technique, in order to preserve all second moments. Much more was involved in the process of preparing the dataset than imputation of missing values. In many instances respondents gave inconsistent answers to similar questions, which had to be resolved. There were other instances where assets and/or debts appeared to be reported twice. A more common type of problem involved the categories to which assets were assigned. For example, many elderly respondents appeared to confuse Social Security and SSI. The assignment of assets to money market accounts or SUPER NOW checking accounts was done inconsistently by respondents. Many of these kinds of problems and inconsistencies resulted in data being changed or moved. The area probability and high-income samples were handled separately. Missing values for all observations in the high-income sample were imputed. In the area probability sample, however, 159 of the original 3824 observations were discarded because virtually all dollar amounts for income and assets were missing (this procedure is described in the next section). All missing values for the remaining 3665 observations were imputed. Most of the imputations for the 1983 survey were made entirely on the basis of 1983 data. In order to preserve the appropriate inter-temporal correlations, however, some use was made of 1986 responses for those households responding to the 1986 SCF. Specifically, some 1983 values were reimputed where respondents gave "hard data" in 1986 for an item which was missing in 1983. Wages and Income: Dollar amounts for each missing income source were imputed separately, and total family income was obtained by summing income sources. Missing earned income was estimated using a wage index to deflate the current wages of the respondent and their spouse to 1982 levels. If the current wages were missing as well, they were estimated using conditional mean tables constructed from the March 1983 CPS. Missing values were assigned average log-wages for persons of the same sex, race, age, and three-digit occupation code plus a random error term. The random error term was chosen so that the correlation in wages between an individual's current and past jobs was the same as that observed for the part of the sample with complete information. Missing wage and business income of self-employed individuals were imputed separately using CPS data for self-employed individuals. For the high-income sample, a similar procedure was used to impute wage and business income. Missing values were assigned on the basis of occupation and age, with random error terms added to the conditional means. The conditional means were obtained from the high-income sample itself, however, not the CPS. Unemployment compensation was imputed using state benefit rules and job information reported in the employment section. Some pension and Social Security income was estimated using information from the employment section, but most was imputed by regression. Welfare was imputed by regression using state benefit formulas. Interest and dividend income was estimated by multiplying dollar amounts of various types of assets held by the appropriate average yields for those assets in 1982. Average yields were obtained from the Federal Reserve Bulletin and other publications. Assets: Missing values for most financial assets were estimated either by capitalizing interest and dividend income or by regression if these income sources were not reported. The procedure used when both income and financial asset values were missing was an iterative one that imputes income and asset values simultaneously. It builds maximum likelihood estimates of the covariance matrix of the set of imputed variables conditioned on demographic and other variables under the assumption that they are jointly log-normal and that given such information observations are missing randomly. Imputed values are conditional expectations plus a random error. Missing house values were imputed by inflating reported purchase prices by regional housing price indices. Some house values, and all other missing real estate, life insurance, net equity of businesses, and some land contracts were imputed by regression. Automobile values were imputed by matching the year and make of the vehicle to the National Association of Automobile Dealers "blue book" listing of used car values at the time of the survey. Debts: Respondents were asked to report dollar amounts outstanding for open-end and non-installment closed-end debts. Missing dollar amounts were estimated by regression. Asset values, income, and other demographic characteristics of the family were used as predictor variables. Random errors were added to the predicted amounts outstanding. Respondents reported credit terms -- payment size, frequency, amount borrowed, due date, and interest rate -- for mortgages, land contracts, and other real estate loans. Some payments had to be adjusted because the reported amount included taxes and insurance. The amount outstanding on each loan could then be obtained by calculating the present value of remaining payments of principal and interest. Missing credit terms were estimated using appropriate average interest rates, maturities, and loan-to-price ratios published in the Federal Reserve Bulletin (the amount outstanding could actually be computed even if one term was not reported, because of a redundancy). Similar information was collected for closed-end installment debt, although interest rates were not reported. Sufficient information was available if all other terms were reported to solve for the implied interest rate. The amount outstanding for each loan could then be computed in the same fashion as used for mortgages. When one of the terms was missing or the computed interest was implausible, an appropriate interest rate from the Federal Reserve Bulletin was assumed for area probability sample observations. For high-income sample observations, missing values were assigned on the basis of matches to other comparable loans in the high-income sample. Summary Data on Imputation: The impact of the imputation process used for the SCF is apparent from the non-response rates given for selected assets and liabilities in table 2. Non-response rates for ownership ("do you have a .. .?") and dollar amounts ("what is its value?") were computed for three subsamples -- the high-income sample, the 3665 edited area probability observations, and the 159 discarded area probability observations. The non-response rate for each item was computed by dividing the number of missing responses by the number of times that a question was asked. Dollar amounts were asked only if a respondent acknowledged ownership; and for closed-end installment debts, where amount outstanding was computed rather than reported, refer to the amount borrowed or payment size. For other debts, dollar amounts refer to amount outstanding. There are no non-response rates for automobile value because this value was calculated from reported make and model and was not solicited from respondents. Table 2 NON-RESPONSE RATES FOR SELECTED ASSETS AND LIABILITIES, 1983 SCF EDITED HIGH-INCOME DISCARDED CROSS-SECTION CROSS-SECTION ASSETS OWNRSHP $ AMT OWNRSHP $ AMT OWNRSHP $ AMT Principal residence .1 7.8 .0 1.4 .6 32.8 Other real estate (gross) .1 9.2 .0 3.0 2.5 35.3 Public stock 1.3 25.4 1.7 6.7 15.8 97.8 Bonds and trusts .7 24.7 2.4 6.2 8.5 82.9 Checking accounts .1 9.6 .5 4.7 5.0 71.7 Savings accounts .2 14.1 .7 4.8 8.2 89.6 Money market accounts .2 18.3 .3 9.3 5.0 93.1 Certificates of deposit 1.3 25.6 .9 9.0 15.2 97.4 IRA and Keogh accounts .3 11.9 .3 2.1 4.4 79.6 Savings bonds .1 17.4 .2 4.6 6.3 71.4 Life insurance cash value 2.4 71.7 .5 33.5 9.4 91.8 Business assets (net) .1 37.2 .5 17.6 4.4 96.9 Automobiles .0 -- .0 -- .1 -- DEBT Automobile debt .7 4.6 .0 1.7 2.2 36.4 Consumer debt .2 5.6 .4 3.7 .6 59.1 Principal residence debt .3 9.6 .5 3.6 1.6 50.8 Other real estate debt 1.0 8.2 .0 3.9 7.7 40.0 Note that few respondents failed to acknowledge ownership of specific items. Non-response rates for dollar amounts, however, are significantly larger.[14] The high-income sample has lower non-response rates than the area probability sample for virtually every item, despite the fact that the survey lasted longer on average for the high-income sample and respondents had more complex finances. This may reflect the fact that high-income respondents had to agree to participate before they were even approached by an interviewer. The effects of missing value imputations on asset totals are shown in table 3. The first and third columns show the percentage of the final weighted total of each asset that stems from dollar values given by respondents. The difference between this number and 100 is the percentage of each asset's dollar total that was imputed. Percentages are given separately for the edited area probability and the high-income group. Imputations accounted for less than 10 percent of total asset values in ten of the twelve asset categories in the high-income sample but in only four of the twelve categories in the area probability sample. For every asset category, the proportion of total assets accounted for by imputations is lower in the high-income sample than in the area probability sample. Cash value of life insurance is the lowest quality item -- imputations account for 41 percent of total cash value in the high-income sample and 76 percent of the cash value in the area probability sample. Table 3 IMPUTATION PERCENTAGES AND IMPLIED MEAN IMPUTATIONS, 1983 SCF* HIGH-INCOME EDITED CROSS-SECTION RESPONDENT IMPUTATION RESPONDENT IMPUTATION DATA BY MEANS DATA BY MEANS AS % OF IMPLIED % AS % OF IMPLIED % TOTAL OF TOTAL TOTAL OF TOTAL Principal residence 99.2 100.3 95.6 102.9 Other real estate (gross) 98.8 101.1 90.1 97.5 Public stock 95.6 100.2 88.4 104.7 Bonds and trusts 93.7 100.3 74.9 94.3 Checking accounts 96.5 101.1 92.6 101.1 Savings, CDs, money mkt. Savings accounts 96.7 101.1 86.3 99.9 Money market accounts 95.7 100.9 84.3 98.1 Certificates of deposit 91.8 97.5 76.4 98.9 IRA and Keogh accounts 98.5 101.2 91.3 102.6 Savings bonds 99.0 100.7 89.9 106.9 Life insurance cash value 58.8 80.7 24.1 70.4 Business assets (net) 80.1 101.1 82.4 118.9 * Full sample weights are used for each subgroup To examine the robustness of the imputation procedures each sample was "re-imputed" by assigning the sample mean to each missing dollar value instead of the more complex imputations actually used.[15] The figures in columns 2 and 4 of table 3 are the aggregate percentages of each sample including the original imputations accounted for by samples including the more crude mean-based imputations. Most percentages are quite close to 100. The poor data on the cash value of life insurance, however, is again reflected by the fact that its overall total is quite sensitive to the imputation procedures used. The life insurance percentages of less than 100 suggest that observations with missing data were those with higher than average values (at least as predicted by our more complex imputations). Conversely the 118 percent figure for the area probability sample business assets suggests that cross-sectional observations with missing business data were smaller than average. These figures suggest that the patterns of nonresponse are complex, and that simplistic imputation procedures are likely to be flawed. SAMPLE WEIGHTS For a variety of reasons, final household survey data will rarely represent a random draw from the U.S. population. Even if the original survey design were random, different types of households will have different likelihoods of completing interviews. Very high and very low income households, for example, are thought to be less likely to participate in surveys. The 1983 SCF had an additional source of non-randomness because of the inclusion of the high-income supplement. These factors necessitated the construction of weighting variables to compensate for known or estimable sources of stratification in the final data set. Several different weights were constructed. Inclusion of households in the final "cleaned" data set described in this manual results from a series of implicit stratified selection criteria. There are four major sources of implicit stratification. First, there are different response rates for different household types. This occurs because households cannot be reached (up to six attempts to contact participants were made) or because they refuse to participate when contacted. Second, the sample may not fully reflect U.S. population due to sampling error in the survey itself (in the sense that any random sample will not have exactly the same average characteristics as the population from which it is drawn). Third, as mentioned above, not all survey observations were usable for analysis because of significant missing information due to deliberate or inadvertent actions. Of the 3,824 area probability sample survey households included in the original SRC data tape, 159 observations were dropped in the cleaned file. Finally, the SCF sample is drawn from both the area probability and high-income sampling frames. If both samples are used, it is necessary to properly mix them to have a representative national sample. There are a number of ways which can be used to compensate for these types of sample stratification. The method employed here is to construct sample weights, which can be used to adjust the final sample. Briefly, the weights were calculated as follows. Area Probability sample weights: The original SRC sampling frame called for sampling 6,062 housing units in the cross-section. If each housing unit had yielded an interview, each observation would represent 13,333 housing units. In fact as shown earlier in table 1, only about 89 percent of the housing units were occupied, and of these, only about 71 percent (3824) contained households who were willing to participate in the survey. Obviously, the exact characteristics of the non-responding households are unknown; however, the response and occupancy rate differed significantly across the survey's 75 PSUs. The "non-response" weight adjusts for this differential non-response. This weight is equal to the reciprocal of the household response rate for the PSU for which the household belongs. This weight should compensate for some location-related characteristics associated with non-response and occupancy. The second type of stratification cited above can be adjusted for by taking into account how the final area probability sample (weighted for "non-response") compares with the population according to certain select characteristics. Obviously the sample could be adjusted for any one of a number of characteristics (age, income, size of household, region, etc.). SRC elected to compensate for regional sampling error by computing a "post-stratification" weight which adjusts the sample to have the same relative population as the 1980 census in the four major sample regions (Northeast, South, North Central, and West) further subdivided into urban and rural. We computed a similar weight, but post-stratified to the population and urban/rural definitions represented by the March 1983 CPS. The third type of stratification cited above can be adjusted for in two different ways. The criteria for inclusion in the cleaned area probability sample was as follows. Any observation with any dollar values given for earnings or income was automatically included. Remaining observations were excluded only if there were virtually no dollar values given for housing value (or rent), other property values, business, and financial assets. Information on debts was not used in this section. Specific rules were employed to exclude observations whose ratio of missing data to relevant questions exceeded a certain level. In practice those observations that were dropped were missing values for virtually all dollar value questions they were asked.[16] None of the high-income sample observations were dropped. In fact, these observations were remarkably clean, with comparatively little missing information. To compensate for these exclusions, a probit model was fit to model the probability that an observation in the complete 3824 area probability sample file would be included in the cleaned 3665-observation data set. Variables used in this model were selected as follows. Most asset questions consisted of two parts. An initial screen question which asked if the respondent had an asset, and a follow-up for those acknowledging ownership, asking how much they had. In virtually every instance of missing values, it was the second question which was not answered rather than the first. Thus, the probit model predicting inclusion in the cleaned sample used a number of the first type of questions, as well as standard demographic data available for all observations, as predictor variables. The fitted probit equation results can be used in two ways. The inverse of the predicted probability that an observation would have been included in the cleaned sample (conditioned on the predictor variables used), can be used to weight observations in analysis. In practice, this procedure implies higher weights for observations holding a large number of different assets. An alternative way of using the probit results attempts to account explicitly for correlation between inclusion in the sample and the error term of an estimated equation. A variable has been created which is the expectation of the "error" in the inclusion equation, conditioned on the fact that the observation indeed turned out to be included (the so-called Mills ratio). This variable can be included as a regressor in estimated equations and will account for "sample selection" bias under certain assumptions. High-Income and full sample weights: The weights for the combined sample containing both the area probability and high-income samples were more difficult to compute, and are more subject to question. Unfortunately full information on the high-income sampling procedure is not publicly available. Additional complications stem from the fact that the high-income observations were drawn from a 1980 sampling frame of taxpayers (but were contacted in 1983) and the fact that the reporting basis for tax files (individuals or married couples) is not always the same as the survey (families). Initially, the Office of Tax Analysis used a statistical file set up by the IRS to construct relative weights for the high-income sample (the sample was divided into nine cells each with a separate weight). These weights are supposed to make the high-income sample representative of the unknown full IRS sampling frame. SRC constructed sample distributions for the overlap of the area probability and high-income samples (over $100,000 dollars in household income) to compute a "meshed" combined sample weight. The SRC weight retained the relative weights of the high-income sample and left the weight for area probability observations below $100,000 largely unchanged.[17] Well after the initial SRC full sample weights were computed, information on the upper tail of the distribution of 1982 taxpayers became available. When this was compared to reported 1982 household income in the SCF sample, it appeared that the SRC meshed weight may have given too much weight to the high-income sample. As a consequence, another full-sample weight was constructed at the Federal Reserve Board which used a different approach to combine the two samples. It was decided to construct sampling weights for the high-income sample (and area probability observations with income above a certain level) using a post-stratification scheme based on control totals for an "extended" income measure constructed from the 1982 Tax Model File (TMF) of the IRS. The TMF is a stratified sample of 88,218 individual tax returns with a significant over-sampling of high incomes.[18] This income measure, which was constructed for all survey households using reported 1982 income data, is roughly comparable to the IRS measure of adjusted gross income plus excluded realized capital gains. Despite the fairly detailed income questions in the SCF, it is clear that the survey measure of business income almost surely overstates the TMF measure. It appears likely that survey respondents often report something much closer to a cash-flow concept of income rather than income netted of expenses and depreciation. Unfortunately, there is not sufficient information in either the SCF or the TMF to make a precise compensating adjustment. A gross adjustment for the aggregate difference between the survey and TMF business income totals was made in constructing the survey measure of extended income. Reported business income in the SCF was deflated by about 40 percent so that IRS and adjusted survey aggregates of business income were the same. This adjustment was quite ad hoc, however, and the potential for distortion at the individual level remains, with weights for households with business income particularly suspect. Because the reporting units in the survey and the TMF differ, the TMF data were adjusted in order to estimate income on a family basis. Most of the high-income returns in the TMF are either joint filings or those of single individuals. However, there are some married couples filing separately, particularly in the $50,000 to $100,000 income range. These observations were "aggregated" into households by assuming that separate filers were all married to people with the same income (weights for such observations were halved). The final weights are only slightly affected by variations in this adjustment. Post-stratification cells were defined by the seven categories of extended income shown in table 4. For each of the top six income cells (above $80,000), weights were determined so that the weighted number of survey observations equaled the TMF totals. High-income observations were assigned the cell average. Area probability sample observations were assigned relative weights based on their cross-section weight, but scaled so that the mean for each of the six cells was the same as that of the high-income observations. The original weights of the area probability observations with income below $80,000 were adjusted so that the weighted number of SCF households equaled the population estimated from the March 1983 CPS. High-income sample observations with income below $80,000 were arbitrarily assigned the same weight as observations in the $80,000 to $90,000 group. Table 4 FRB Full Sample Weight Household Number of Number of TMF control Average extended income Area Prob. High-income totals of weight (dollars) cases cases households assigned ___________________________________________________________________________ under $80,000 3,579 49 82,364,760 22,703 $80,000-89,999 22 11 356,324 10,798 $90,000-99,999 13 16 250,746 8,646 $100,000-124,999 23 40 362,022 5,746 $125,000-199,999 16 92 356,386 3,300 $200,000-499,999 11 148 182,424 1,147 $500,000 and over 1 82 45,338 546 ___________________________________________________________________________ All cases 3,665 438 83,918,000 20,453 Most of the work done to date with the 1983 SCF has used the SRC meshed weights. Subsequent calculations suggest that the two different weighting schemes may not make as much difference as originally expected. Aggregate wealth estimates constructed using the SRC weight are about 5.0% higher than those estimated with the Federal Reserve weight. Aggregate income estimates are 3.6% higher. The importance of the weights is shown by the numbers in table 5. Columns 1 - 3 show the distribution of the unweighted area probability, high-income, and total samples over various demographic items. Columns 4 and 5 show the weighted sample distribution of the same items using the area probability weight (column 4) and the FRB extended income weight (column 5). The unweighted sample clearly shows an over-representation of elderly, high-income, wealthy, white, and married households relative to all of the weighted samples. Table 5 Weighted and Unweighted Percentages Unweighted Weighted Area High- Full Area FRB Probability Income Sample Probability Full Age (head) 34 or less married 17.9 1.4 16.1 17.0 16.9 unmarried male 6.0 .5 5.4 6.2 6.2 unmarried female 7.4 .2 6.7 7.6 7.6 35 to 44 married 13.6 12.6 13.5 13.6 13.6 unmarried male 1.8 .4 1.7 1.9 1.8 unmarried female 4.1 .2 3.7 4.1 4.1 45 to 54 married 10.7 22.8 12.0 10.6 10.5 unmarried male 1.6 2.3 1.7 1.7 1.7 unmarried female 3.2 .0 2.9 3.2 3.2 55 to 64 married 9.6 28.3 11.6 9.7 9.7 unmarried male 1.4 2.5 1.5 1.5 1.5 unmarried female 3.7 .5 3.3 3.7 3.7 65 or more married 9.4 24.7 11.0 9.7 9.9 unmarried male 1.8 1.8 1.8 1.8 1.8 unmarried female 7.8 .9 7.1 7.7 7.7 Race Caucasian 82.8 98.6 84.5 82.3 82.2 Nonwhite or Hispanic 17.2 1.4 15.5 17.7 17.8 Family Income < $10,000 25.0 .0 22.2 24.1 24.0 $10,000-$19,999 26.8 .0 23.9 26.7 26.8 $20,000-$29,999 19.2 .2 17.3 19.2 19.3 $30,000-$49,999 19.6 .9 17.5 19.8 19.7 $50,000-$99,999 7.4 7.8 7.5 8.1 8.2 >= $100,000 2.0 91.1 11.5 2.2 2.0 Family Net Worth < $5,000 25.5 .0 22.8 25.3 25.3 $5,000-$24,999 18.3 .0 16.4 18.0 18.0 $25,000-$50,000 16.3 .0 14.6 16.0 16.0 $50,000-$99,999 17.6 1.4 15.8 17.3 17.2 $100,000-$249,999 14.3 2.3 13.0 15.0 14.7 $250,000-$999,999 7.0 26.0 9.0 7.3 7.1 >= $1,000,000 1.0 70.2 8.4 1.2 1.7 Homeownership 64.1 95.4 67.4 63.5 63.4 Education of the Head 0 to 8 grades 15.2 .5 13.6 14.4 14.5 9 to 12 grades 46.1 5.0 41.7 45.0 44.9 some college 16.9 13.0 16.5 17.6 17.7 college graduate 21.7 81.5 28.1 22.9 22.9 Labor Force Participation single, not working 17.1 1.8 15.5 16.9 17.0 single, working 21.7 8.4 20.3 22.4 22.4 married, neither working 9.7 8.0 9.5 9.8 9.8 married, one working 23.1 52.5 26.2 22.8 23.0 married, both working 28.4 29.2 28.5 28.1 27.8 Further evidence on the importance of the high-income sample and the need for proper weighting is shown by the summary statistics for various components of household wealth given in tables 6 and 7. Table 6 shows aggregate estimates using only the area probability sample, weighted with the area probability sample weights. Table 7 gives the same estimates with the full sample weighted with the FRB extended income weights.[19] Note that there is virtually no difference in the estimates of median holdings of owners and the percentage of households owning various asset and debt items between the area probability and full sample. However, for the estimates of mean and aggregate holdings. Despite their being weighted to represent the same population, the area probability and full-sample estimates of a number of aggregates look quite different. Table 6 SAMPLE STATISTICS, 1983 SCF CROSS-SECTION ONLY SUM STD ERR % GROSS OVERALL PERCNT MEAN MEDIAN ($ B) SUM ASSETS MEAN OWNING OWNERS OWNERS* GROSS ASSETS $10,127.2 $692.8 100.0% $120,678 96.0% $125,693 $49,885 Principal residence 3,623.8 87.5 35.8% 43,182 63.5% 68,002 52,342 Other real estate (gross) 1,416.7 149.4 14.0% 16,882 18.8% 89,699 36,000 Public stock 642.8 127.8 6.3% 7,660 20.2% 37,860 4,696 Bonds and trusts 344.9 57.9 3.4% 4,110 7.3% 56,421 11,000 Checking accounts 119.2 6.9 1.2% 1,421 78.6% 1,808 500 Savings, cds, money mkt. 959.2 44.2 9.5% 11,430 74.0% 15,445 3,500 Life insurance cash value 262.1 16.0 2.6% 3,124 33.9% 9,205 3,428 Business assets (net) 2,127.3 567.5 21.0% 25,349 13.9% 181,861 40,538 Automobiles 372.3 6.7 3.7% 4,436 84.4% 5,255 4,100 Miscellaneous 258.8 22.5 2.6% 3,084 15.5% 19,845 5,575 DEBT 1,444.6 69.6 14.3% 17,215 69.8% 24,655 10,797 Automobile debt 92.2 3.6 0.9% 1,098 28.9% 3,797 3,052 Consumer debt 178.5 12.5 1.8% 2,127 54.9% 3,872 1,120 Principal residence debt 855.3 29.4 8.4% 10,192 37.3% 27,355 21,468 Other real estate debt 318.7 56.4 3.1% 3,797 8.2% 46,291 18,797 NET WORTH 8,682.5 666.4 85.7% 103,463 -- ---- 34,537 INCOME (GROSS) 2,233.5 46.8 -- 26,615 -- ---- 19,625 * For gross assets, net worth, and income, this is the overall median Table 7 SAMPLE STATISTICS, FULL SAMPLE 1983 SCF SUM STD ERR % GROSS OVERALL PERCNT MEAN MEDIAN ($ B) SUM ASSETS MEAN OWNING OWNERS OWNERS* GROSS ASSETS $11,572.2 $504.6 100.0% $137,899 96.0% $143,638 $49,588 Principal residence 3,751.7 80.2 32.4% 44,706 63.4% 70,465 52,000 Other real estate (gross) 1,689.9 225.1 14.6% 20,137 18.8% 106,997 37,765 Public stock 1,033.8 180.2 8.9% 12,319 20.5% 60,161 5,000 Bonds and trusts 677.4 138.1 5.9% 8,072 7.6% 106,365 12,500 Checking accounts 119.4 5.7 1.0% 1,423 78.6% 1,811 500 Savings, cds, money mkt. 1,052.3 44.4 9.1% 12,539 74.0% 16,951 3,500 Life insurance cash value 285.3 18.9 2.5% 3,400 34.1% 9,978 3,373 Business assets (net) 2,276.3 292.0 19.7% 27,125 14.3% 190,025 45,768 Automobiles 373.7 6.5 3.2% 4,453 84.3% 5,282 4,085 Miscellaneous 312.5 25.4 2.7% 3,724 15.5% 24,073 5,506 DEBT 1,510.6 74.5 13.1% 18,001 69.6% 25,853 10,787 Automobile debt 90.6 3.5 0.8% 1,080 28.6% 3,772 3,029 Consumer debt 228.8 26.8 2.0% 2,726 54.9% 4,969 1,117 Principal residence debt 865.5 28.1 7.5% 10,314 37.0% 27,898 21,673 Other real estate debt 325.6 57.2 2.8% 3,880 8.2% 47,317 18,797 NET WORTH 10,061.6 479.3 86.9% 119,898 -- ---- 34,466 INCOME (GROSS) 2,247.4 33.8 -- 26,781 -- ---- 19,523 * For gross assets, net worth, and income, this is the overall median COMPARABILITY WITH OTHER SURVEYS The 1983 Survey of Consumer Finances is the most recent survey in a series of surveys of household finances conducted by the Survey Research Center of the University of Michigan. Surveys of Consumer Finances were conducted annually from 1946 and 1970 but were then discontinued.[20] In 1977, the Survey Research Center conducted a comprehensive household finance survey again, under the sponsorship of the Federal Reserve Board and other federal bank regulatory agencies.[21] The same basic methods were used in all these surveys, although changes in sampling and interviewing procedures were introduced from time to time to improve survey results. The content of the surveys also changed over time because of shifts in interest in various aspects of consumer finances. The early surveys, which were sponsored by the Federal Reserve Board, were concerned with the effect of the postwar accumulation of liquid assets on consumer spending. Mortgage and consumer credit received greater attention in later surveys, and a major part of the 1977 survey was concerned with the effect of federal regulations on consumer credit use. Nevertheless, several areas of inquiry were followed through much of the 1946-77 period, and results from the earlier surveys are generally comparable to those from the area probability sample for the 1983 SCF. The Federal Reserve Board also sponsored the Survey of Financial Characteristics of Consumers (SFCC) in 1963, with a followup reinterview in 1964.[22] Methodological work for this survey was conducted by the SRC, and interviewing was performed by the Bureau of the Census. Like the 1983 SCF, the 1963 SFCC collected a more detailed inventory of assets and liabilities than is customary in other consumer surveys. The 1963 survey also used Federal tax information to oversample high- income households. For the 1963 survey, a sample of housing units stratified by income reported in the 1960 Decennial Census was selected to represent households with incomes below $50,000. Households with incomes of $50,000 or more were selected from a sample of 1960 Federal income tax returns. Although this sample selection procedure is not exactly the same as that used for the 1983 survey, it produced a heavy over-sampling of households in the upper end of the income distribution, making the 1963 sample the only household survey sample that is comparable to the full sample from the 1983 SCF. The Federal Reserve has also sponsored several recent surveys on the use of different means of payment for household purchases. The Survey of Currency and Transaction Account Usage, conducted by the SRC in the summer of 1984, solicited detailed information on the use of checking, savings, and money market accounts for about 2,000 households. Data were also collected on the use of currency, credit cards, money orders, and electronic banking services. In 1986, the survey was repeated for a smaller sample.[23] Wealth information is also available from other sources. Data from the Internal Revenue Service for federal estate tax returns have been used to estimate total household wealth and its percentage distribution. Unfortunately, data from this source are available only in aggregate form, with very limited demographic breakdowns. Another source is the 1979 Income Survey Development Program of the Department of Health and Human Services, which provides information for a sample of households larger than that of most other surveys of wealth. The New York Stock Exchange has also periodically conducted surveys of household stockholders, doing a survey at a time comparable to the SCF in 1983, and more recently, in 1985. Wealth data was also collected for respondents to the Panel Study of Income Dynamics (PSID) in 1984. The most comprehensive recent survey of household wealth was conducted in 1984 (and repeated in 1985) by the Bureau of the Census on participants in the Survey of Income and Program Participation (SIPP). This survey solicited information similar to the SCF for a very large sample of households. Its initial panel was a random cross-section of about 21,000 households selected by procedures similar to those used to select the area probability sample for the 1983 SCF. Net worth information was collected between September and December 1984.[24] Aggregate wealth estimates from the earlier Surveys of Consumer Finances and SIPP are generally comparable to those from the area probability sample of the 1983 SCF in their understatement of aggregate wealth relative estimates from independent sources. Using comparably defined categories, we estimate an aggregate net worth for the SCF area probability sample of $8,277 billion versus a $7,740 billion total for the SIPP sample. The difference derives primarily from a smaller estimate of small business assets in the SIPP. The full-sample SCF estimate of the same net wealth concept is $9,610 billion. Thus, it appears that the major difference between the two surveys arises from the inclusion of the high-income sample in the SCF. The annual March Current Population Survey is perhaps the most comprehensive U.S. household economic survey, soliciting economic information from approximately 59,000 households (U.S. Bureau of the Census (1984)). The representativeness of the SCF is demonstrated by a comparison of the sample distribution of various demographic variables for the SCF and comparable March 1983 CPS survey in table 8. The CPS data are given for "primary families" defined comparably to families in the SCF. As can be seen, the SCF has a very similar distribution for most items. Table 8 A Comparison of the 1983 SCF and CPS SCF CPS Number Weighted Number Weighted of cases share of cases share Age (head) 34 or less married 661 16.9 9922 16.4 unmarried male 223 6.2 3435 5.8 unmarried female 273 7.6 4105 7.2 35 to 44 married 555 13.6 7830 13.0 unmarried male 71 1.8 1388 2.4 unmarried female 151 4.1 2165 3.7 45 to 54 married 492 10.5 6253 10.3 unmarried male 70 1.7 912 1.6 unmarried female 118 3.2 1735 3.8 55 to 64 married 475 9.7 5967 10.3 unmarried male 62 1.5 897 1.5 unmarried female 136 3.7 2130 3.8 65 or more married 452 9.9 5532 9.5 unmarried male 74 1.8 1437 2.4 unmarried female 290 7.7 5293 9.2 Race Caucasian 3468 82.3 47515 82.4 Nonwhite or Hispanic 635 17.7 11486 17.6 Family Income less than $10,000 912 24.0 15053 25.2 $10,000 to $19,999 982 26.8 15580 26.0 $20,000 to $29,999 711 19.3 12072 20.5 $30,000 to $49,999 717 19.7 11533 19.8 $50,000 to $99,999 309 8.2 4480 7.9 $100,000 or more 472 2.0 283 .5 Homeownership 2766 63.4 38320 64.9 Education of the Head 0 to 8 grades 560 14.5 9155 14.8 9 to 12 grades 1713 44.9 27269 46.4 some college 678 17.7 10355 17.6 college graduate 1152 22.9 12222 21.3 Labor Force Participation single not working 635 17.0 11130 19.2 single, working 833 22.4 12367 21.3 married, neither working 389 9.8 7088 12.0 married, one working 1077 23.0 14023 23.4 married, both working 1169 27.8 14393 24.0 Totals 4103 100.0 59001 100.0 The CPS does not collect wealth data comparable to the SCF. However, detailed household money income, by source, is available from both the CPS and SCF. A comparison of 1982 U.S. household totals for a number of income categories measured by both the SCF and the March 1983 CPS is displayed in Table 9. The CPS totals are adjusted to exclude income for secondary families and unrelated individuals, who would not have been included in the SCF. We also show a comparison of the SCF income data with aggregate 1982 household income compiled by the IRS from tax return data. A selection of cases was made from the SCF to represent the population of households that would normally file tax returns. Non-taxable income was deleted for these calculations. Table 9 Comparisons of Income Measured in SCF, CPS, and IRS Data 1982 INCOME 1982 INCOME 1982 INCOME 1982 INCOME 1983 SCF CPS 1983 SCF IRS DATA TAXABLE INC ($ B) ($ B) ($ B) ($ B) Salaries and wages 1,393.7 1,443.5 1,385.7 1,564.6 Business or farm income 291.3 110.5 290.4 53.7 Taxable interest income 98.5 95.1 95.9 157.2 Dividend income ---- ---- 46.7 54.2 Net gains from stocks ---- ---- 50.4 24.3 Rental or trust income ---- ---- 54.8 -2.1 Dividends/trust/ rental total 102.9 47.3 ---- ---- Welfare or public assistance 23.2 17.4 ---- ---- Unemployment or workman's comp 20.6 32.8 ---- ---- Alimony or child support 35.6 21.4 ---- ---- Retirement income 194.6 204.3 94.0 59.9 Category totals 2,160.4 1,972.3 2,017.9 1,911.8 The 1983 SCF overstates comparable CPS income by about 6 percent. Most of this overstatement stems from business income and income from dividends, trusts, and real estate. Interestingly, in a comparison of data with an "independent source" in 1983, the Census Bureau concluded that CPS income data "underreported by about 10 percent" (U.S. Bureau of the Census (1985, p. 218)). The SCF also overstates IRS household income by about 7 percent. Much of the discrepancy can be explained by the SCF's failure to find significant business, rental, and security losses. PENSION PROVIDER SURVEY The sample for the study of Employer Sponsored Pension Benefit Plans was derived in three interdependent stages. The overall research design was based on the use of the 1983 SCF to identify, in turn: which households were covered by employer sponsored pensions; which pension providers and plans covered these employees; and which benefit formulas and requirements governed these pension entitlements. All respondents or spouses with work experience were questioned about pension coverage on their current job, as well as vested pension entitlements from prior employers as part of the SCF. Households that reported pension coverage were asked to identify the provider of the pension -- in most cases, their employer. All the pension providers thus identified were pooled and duplicate references to the same provider were combined. In many instances sufficient information was available from the name of the pension provider to uniquely identify them, and from files available at the Department of labor to identify their Employer Identification Number (EIN). In other instances, a telephone interview was conducted with the pension provider. The provider was asked for their EIN and for documentation of the pension plans that covered workers in the occupational classification and work location that corresponded to the reference SCF respondent(s) applying to that provider (the names of the SCF respondents were not disclosed to the pension providers who were told that SRC was doing a survey of pensions). Pension Providers were recontacted by mail and telephone as necessary to insure a high response rate. The EIN and plan names and numbers were sufficient in most instances for SRC staff to match to official plan descriptions (SPDs) on file with the Department of Labor for most pension providers. In other instances the providers sent the SPDs or similar material to SRC. The details of the plans were then coded from the SPDs using a coding instrument developed by Mathematica Policy Research and SRC. Coders were hired by SRC specifically for the project, which required more skill than is generally necessary for coding (the final instrument has about 2,700 variables). Information was compiled separately for defined benefit and defined contribution plans. 1886 households (2262 individuals) were eligible for potential inclusion in the Pension Provider Survey. Of this total 1421 households (1641 individuals) were represented in the Pension Plan Survey for a 75 percent completion rate. These span 845 pension providers and 1011 pension plans. Data for the Pension Provider Survey is currently being distributed as part of the 1983 SCF package. A description of the survey is found in "Survey of Consumer Finances: Employer Sponsored Pension Benefits Plans," the Survey Research Center, 1986, by Richard T. Curtin.[25] 1986 SURVEY OF CONSUMER FINANCES The 1986 Survey re-interviewed respondents to the 1983 SCF. If the respondent had been divorced or separated since the 1983 interview, both the 1983 respondent and their 1983 spouse were included in the 1986 sample. Other members who left the family to form new households were not included. A total of 2,822 interviews were conducted, by telephone, between June and September 1986. Interviews lasted an average of 27 minutes. The survey was conducted by the SRC under the direction of Richard T. Curtin. The 1986 interview was primarily designed to update essential information in the 1983 SCF on the household balance sheet (needed to calculate savings) and employment data. However, additional lines of inquiry were also opened. Questions were asked about health and educational expenses, insurance coverage, more detailed data on family and family change (including marriage, divorce, death of a spouse, children outside the home, and ages of parents), charitable gifts of money and time, shared living arrangements, intra-family transfers and the financial details of divorce settlements. A limited amount of analysis has been performed on the data. See: Robert B. Avery, Gregory E. Elliehausen, and Arthur B. Kennickell, "Changes in Consumer Installment Debt: Evidence from the 1983 and 1986 Surveys of Consumer Finances," Federal Reserve Bulletin, Vol. 73, No. 10 (October 1987), pp. 761-778. Robert B. Avery and Arthur B. Kennickell, "Savings and Wealth: Evidence from the 1986 Survey of Consumer Finances," presented at the May 1988 NBER Conference on Research in Income and Wealth. MANUAL INSTRUCTIONS In the remainder of the manual, information is given on all the variables included in the final dataset. A brief description is given for each variable along with information on imputation and a listing of the values that the variable takes on. The question number corresponding to the actual survey questionnaire (e.g. R9) is also given for all variables except recodes. Variables are listed by number, with the numeric code used as the basis of the variable's internal name in the survey's SAS data set. All variables listed here have a "B" prefix followed by a four digit number ranging from 3001 to 5749. The original uncleaned survey responses are contained in variables with a "V" prefix. These variables range from V1 to V2613. The codes for the "V" variables are described in the original survey codebook released by the SRC. We note, though, that the "B" variables contain all the same types of information as those contained in the "V" variables. Thus, for most analyses, it is possible to use the "B" variables without reference to the "V" variables. The range of allowable values of the variables is also given. The symbol xxxx is used for continuous variables with a statement of the units used and the sample range. For discrete variables with a small number of allowable codes, all possible values and their meanings are listed. The number of sample cases (out of the 4103 "cleaned" observations) taking on each value of discrete variables is also given. If the listing is for several variables (such as the 1st, 2nd, and 3rd automobiles), then the case totals are given for the listed variables, in order, separated by slashes (e.g. 123/45/87 cases). Although useful in giving a flavor of the distribution of responses to questions, the case listings should not be used for statistical purposes, as they are unweighted distributions. Most of the information collected for the 1983 SCF applies to the full family unit. Some information, however, such as employment, education, health, and pension income, was collected individually for the survey respondent and their spouse (if they had one). For married couples, the respondent could have been either the husband or the wife. For ease of use in analysis most of the person-specific variables in the cleaned dataset have been arranged as "head" and "spouse," (where head is always the husband for married couples), instead of "respondent" and "spouse." It is easy to switch data back, if desired, by using the variable B3122 which indicates whether the respondent was the head or the spouse. Several different codes are used in the dataset, including: (1) The code "1" is almost always used for the answer "yes" to a question; (2) The code "3" is generally used for the answer "sometimes" or "maybe"; (3) The code "5" is almost always used for the answer "no" to a question; (4) The code "-4" is used to denote a "small negative number"; (5) The code "-6" is used to denote the answer "none" which is sometimes differentiated from zero; (6) The code "-7" is used to denote a special "other" response which does not fit into existing codes. SRC has cards indicating what the actual response is. "-7" is also sometimes used to denote answers like "forever" or "never" when used for continuous variables; (7) The code "-8" is used to denote the answer "don't know" (DK). Most DKs have been imputed, but some still exist for selected variables where imputation is not appropriate (e.g. attitudinal variables). In a few instances "8" is used for DK; (8) The code "-9" is used to denote "not answered" (NA). This indicates either that the interviewer inadvertently did not ask a question or that a respondent refused to answer. Most NAs have been imputed, but a few remain. In a few instances "9" is used for NA; (9) The code "0" is generally used to denote cases where a variable is inappropriate for a particular observation because the question which underlies the variable was not asked. For example, questions on spouses would be inappropriate for single households. Note that sometimes a question is asked, but the answer given is none or zero (such as "my business is worth nothing"). These answers are generally coded as -6 not 0. There are some instances, particularly with recoded variables, where 0 does denote none or nothing. All variables on the tape are integers. All dollar amounts are given in whole dollars (although in answering the questions respondents may have rounded). Some variables had to be rescaled so that information would not be lost (such as percentage answers which are generally multiplied by 100). ACKNOWLEDGEMENTS Many people contributed to the 1983 Survey of Consumer Finances. Sampling, field work, editing, and coding were conducted by the Survey Research Center of the University of Michigan under the direction of Richard T. Curtin. Mr. Curtin also oversaw the cleaning and processing of the Pension Provider Survey data. Other SRC staff also made substantial contributions. Mary Grace Moore and Lisa Poole coordinated much of the questionnaire development and data processing and editing procedures. Steve G. Heeringa supervised the sample design and drawing of the SRC sample. Field work was supervised by Nancy Gebler. Coding and editing staff were supervised by Joan Scheffler. Much of the field work, and the development of the coding instrument for the Pension Provider Survey was done by Mathematica Policy Research. Miles Maxfield and Tim Carr supervised these activities. Thomas A. Gustafson, of the Department of Health and Human Services co-authored the questionnaire, taking primary responsibility for the pension questions and overseeing the development of the Pension Provider Survey. Arthur Kennickell, of the Federal Reserve Board, developed the Federal Reserve Board weights, helped in writing this manual, and played a critical role in the later stages of data cleaning and imputation. He also co-authored the 1986 Survey of Consumer Finances. Many individuals helped in the development of the survey instrument. Particularly noteworthy contributions were made by Glenn B. Canner and James T. Fergus (Federal Reserve Board); Janet Gordon, Melanie Quinn, Peter Struck (Office of the Comptroller of the Currency); Daniel J. Villegas (Federal Trade Commission); Walter Kolodrubetz (Department of Labor); and Nelson McClung (Office of Tax Analysis). Comments and helpful suggestions were also received from Emily S. Andrews, Stuart B. Avery, Marshall E. Blume, Thomas A. Durkin, Robert M. Fisher, Gary Gilbert, Arnold A. Heggestad, Malcolm Jensen, F. Thomas Juster, Robert W. Johnson, Myron Kwast, Barbara R. Lowrey, Charles A. Luckett, Olivia S. Mitchell, Dorothy S. Projector, Lawrence H. Summers, Cameron Whiteman, and John D. Wolken. Tom Petska, Fritz Scheuren, and Dan Skelly, of the Statistics of Income Division of the Internal Revenue Service, provided the high-income sample and weights. Research assistance for data cleaning and imputation at the Federal Reserve Board was provided by Aliki Antonatos, Oscar Barnhardt, Phoebe Roaf, Julie Rochlin and Julia Springer. Additional assistance was provided by Neil Briskman, William Carbaugh, M. Elizabeth Crowell, Charlotte Jackson, Scott Hedges, Pat Ma, Elaine Peterson, Missi Reinkemeyer, Bob Schmitt, Paul Hughes- Cromwick, and Sharon Ward. WEIGHTS AND I.D. CODES Inclusion of households in the final "cleaned" survey sample results from a series of implicit stratified selection criteria. There are three major sources of implicit stratification: (1) within the area probability sample, certain types of households turned out to be less likely to participate in the survey when selected; (2) the sample is unlikely to fully reflect the U.S. population due to sampling error in the survey itself; and, (3) not all observations turned out to be usable for analysis because of significant missing information due to deliberate or inadvertent actions. Another relevant issue is the fact that two different sampling frames, area probability and IRS tax files, were used to draw observations. Methods used to construct weights to compensate for these sources of stratification and mix the area probability sample and high-income observations are described fully in the summary. This section presents information on specific weighting variables. We should note, that throughout this section, case totals reflect the entire 4,262 observation sample. In all other sections of the manual, case totals reflect the 4,103 observation "cleaned" sample. ENDNOTES 1. The interview questionnaire for the household survey was prepared by Robert B. Avery and Gregory E. Elliehausen, of the Federal Reserve Board, and Thomas A. Gustafson, of the Department of Health and Human Services, with assistance from staffs of the sponsoring agencies. Field work and editing and coding of survey responses was performed under the direction of Richard T. Curtin of the SRC. Mr. Curtin and Timothy Carr and Miles Maxfield, of Mathematica Policy Research, administered the Pension Provider Survey. The Statistics of Income Division of the Internal Revenue Service, provided the high-income sample. 2. A household consists of all the persons who occupy a housing unit or dwelling. Persons missed by the survey will be disproportionately young, because they are college or the military, and old, because they are in nursing homes. The later omission, effecting an estimated 1.4 million people, is the most serious in terms of wealth measurement. However, the failure to include a large number of younger persons is likely to effect the long run representativeness of the SCF when used as a panel. 3.Non-SMSA counties with less than 2,000 population were linked with adjacent counties to form multi-county PSUs. The SCF sample used the 1970 SRC sampling frame which was selected from a national population of 2,700 PSUs, of which 12 were self-representing. In addition to SMSA status, the 62 nonself-representing strata were designed to take into account the location (Census Region), the population, size of the largest city, and percent manufacturing (urban) or agricultural (rural) employment of each area. In the South Region, the percent black population and a special domain distinction labeled "the Deep South" (South Carolina, Georgia, Alabama, Mississippi, and Louisiana) was also used. By design, the nonself- representing strata are of approximately equal size each totaling between 1.7 and 2.5 million population in 1970. 4. Because New York City is so large, it is treated as though it is made up of two PSUs. Thus, we treat the sample as having 75 PSUs, not 74. 5. The sampling of the second stage units (SSU's) was performed with probability proportionate to size as measured by the 1970 Census count of year-round housing units. In most areas, the largest or "central" city of each sample PSU was included with certainty. The second stage was dropped from the 1980 SRC National Sample design. 6. The units selected in the third stage are termed "chunks." In urbanized areas chunks are defined to be housing units within the land area given by a Census Block. However, Blocks with less than 16 year- round housing units were combined with adjacent Blocks to meet a minimum 16 unit size. In rural areas, chunks were defined to be compact parcels of land with clearly recognizable physical boundaries (roads, rivers, rail lines, etc.) selected with an expected count of 24 year-round housing units. Within SSU's the sample of chunks were selected with probability proportionate to their number of year-round housing units. 7. Once the third stage of selection was complete, SRC personnel performed a complete listing of all housing units within the physical boundaries of each chunk. For the 1983 SCF about three-fourths of the chunks has been previously listed for other surveys; thus only updating of the listings were necessary. These lists formed the basis of selection for the fourth stage of sampling. 8. Housing units were selected randomly within each chunk. The sampling rate was set inversely proportional to the number of year- round housing units within the chunk as determined by the listings. 9. These procedures followed fairly standard SRC methods. For a more detailed description of these methods, see Irene Hess, Sampling for Social Research Surveys: 1947-1980, Ann Arbor: Institute for Social Research, 1985. 10. Unfortunately, because of legal restrictions, knowledge of the exact sampling procedures is restricted to employees of the SOI. The sample drawn appears to roughly coincide with individuals having an "adjusted gross income," modified for full capital gains and other exclusions, of $100,000 or more in 1980. 11. Actually only the 12 self-representing and 31 of the 62 nonself- representing PSUs were used for the high-income sample list. The decision to exclude high-income households in the remaining nonself- representing PSUs was based on a joint consideration of survey costs and the relatively small expected size of the high-income sample. Because the SOI listings were by address, and the area probability PSUs were defined by county, some slight approximations were used in defining the SOI sample. The actual SOI sample was defined by the ZIP codes corresponding to the SRC sample counties, with the county location of the main post office in a ZIP code used when county and ZIP code boundaries did not correspond exactly. 12. The overall response rate of the high-income mailing (9 percent) may not be quite as bad as it appears. SOI typically has response rates of no more than 20 to 30 percent even for mailings extremely favorable to the respondent. The low 1983 SCF response rate was also caused by the failure to send a followup letter. 13. These procedures differ slightly from the procedures normally used for the selection of household respondent by SRC. Generally, only the economic dominance and age closest to 45 criteria are used. 14. Rates for income non-response were much higher. 1.8 percent of the high-income sample gave no income data, and an additional 4.6 percent gave only partial data. Comparable figures for the edited area probability sample were 5.5 percent and 7.1 percent. Only 2 percent of the discarded area probability sample respondents gave any income data, and these respondents gave only partial data. 15. Means were computed separately for the high-income and cross- section sample on an item-by-item basis and were based only on respondents who gave dollar values. 16. One household was discarded which did not meet these criteria because it reported more than a billion dollars in assets and appeared to be an insincere interview. 17. See Steven G. Heeringa and Richard T. Curtin, "Household Income and Wealth: Sample Design and Estimation for the 1983 Survey of Consumer Finances," Statistics of Income and Related Record Research 1986-1987, Internal Revenue Service 1987. 18. See Michael Strudler, General Descriptive Booklet for the 1982 Individual Tax Model File, Statistics of Income Division, Internal Revenue Service, 1983. 19. An estimate of the standard error due to sampling of the estimated aggregate of each asset and liability category is given in column 2. These figures were computed by calculating the sample variance of each item within each sampling unit (75 area probability PSUs and nine high-income categories). Assuming independence of sample draws across each of these cells, the variance of an asset or debt category total was then calculated as the sum of the variances of each item included in that category weighted by the cell populations. Because these estimates take the sampling weights as fixed they are likely to understate the true sampling variance of the weighted sums. 20. See, for example, George Katona, Louis Mandell, and Jay Schmeideskamp, 1970 Survey of Consumer Finances, Ann Arbor: Institute for Social Research: 1971. 21. See Thomas A. Durkin and Gregory E. Elliehausen, 1977 Consumer Credit Survey, Washington D.C. :, Board of Governors of the Federal Reserve System, 1978. 22. See Dorothy S. Projector and Gertrude S. Weiss, Survey of Financial Characteristics of Consumers, Washington D.C.: Board of Governors of the Federal Reserve System, 1966. 23. See Robert B. Avery, Gregory E. Elliehausen, Arthur B. Kennickell, and Paul A. Spindt, "The Use of Cash and Transaction Accounts by American Families," Federal Reserve Bulletin 72 (February 1986): pp. 87-108; and Robert B. Avery, Gregory E. Elliehausen, Arthur B. Kennickell, and Paul A. Spindt, "Changes in the Use of Transaction Accounts and Cash from 1984 to 1986," Federal Reserve Bulletin 73 (March 1987): pp. 179-196. 24. Detailed discussion of the survey findings can be found in "Household Wealth and Asset Ownership: 1984," Household Economic Studies Series P-70, No. 7 (July 1986), Bureau of the Census; and John M. McNeil and Enrique J. Lamas, "Year-Apart Estimates of Household Net Worth from the Survey of Income and Program Participation," NBER Conference on Research in Income and Wealth, Baltimore, 1987. Richard T. Curtin, F. Thomas Juster, and James N. Morgan, "Survey Estimates of Wealth: An Assessment of Quality," NBER Conference on Research in Income and Wealth, Baltimore, 1987, provide a detailed comparison of the PSID, SIPP, and 1983 SCF wealth data. 25. The 1986 SCF was co-sponsored by the Federal Reserve Board, the Department of Health and Human Services, The Office of the Comptroller of the Currency, and the General Accounting Office. VARIABLE LISTING AND DEFINITIONS Observation Code V1 OBSERVATION CODE. This is a unique observation identifier. It corresponds to the case I.D. on the actual interview facesheet. It was assigned chronologically in the order the interviews were processed. xxxx. code (1 to 4288) B3001 SAMPLE CODE. This code indicates which sample the observation is in. 1. high-income sample (438 cases) 2. area probability sample "cleaned" sample (3665 cases) 3. area probability sample excluded observations (159 cases) Full Area Probability Sample weights B3002 NON-RESPONSE ADJUSTMENT FACTOR. This variable adjusts the area probability sample for the first type of stratification cited above. The non-response adjustment factor is computed as the reciprocal of the household response rate of the primary sampling unit (PSU) to which the household belongs (see variable B3013). There are 75 different PSUs in the sample (although there are only 64 unique values for B3002). The range of this variable is 1.055 to 2.924 with a mean of 1.41013 for the full area probability sample. xxxxx. weight times 10000 (10,550 to 29,240) 0. high-income sample (438 cases) B3003 1980 POST-STRATIFICATION WEIGHT. This variable provides one adjustment for the second factor cited above. It adjusts the full area probability sample (weighted by B3002) to have the same total number of households as the 1980 census (excluding Alaska and Hawaii). It separately weights the sample for the four regions of the country (see B3117) further divided by urban (center city and suburbs, B3118 = 1-4) and rural (adjacent and outlying divisions, B3118 = 5-6). Urban/rural distinctions are determined for each observation according to the treatment of its area in the 1970 Census (although population figures are given as of 1980). B3003 takes on only seven possible values, with a mean of 14840.1. 13733. urban south (622 cases) 14133. urban northeast (562 cases) 14133. urban north central (625 cases) 15280. rural north central (435 cases) 15400. rural northeast (211 cases) 15613. urban west (482 cases) 15680. rural south (729 cases) 16319. rural west (158 cases) 0. high-income sample (438 cases) B3004 1983 POST-STRATIFICATION WEIGHT. This variable is identical to B3003 except that a different post-stratification scheme is used. Observations were grouped into five geographic areas within each U.S. region: (1) central cities of SMSAs with more than 1,000,000 persons (B3119 = 1,2); (2) other areas within SMSAs with more than 1,000,000 persons (B3119 = 4,5); (3) central cities of SMSAs with less than 1,000,000 (B3119 = 3); (4) other areas within SMSAs with less than 1,000,000 (B3119 = 6); and (5) non-SMSA areas (B3119 = 7). SMSA and central city distinctions were made according to 1970 Census definitions, because these were used in the basic sampling frame. The one million person cutoff, however, was made according to the 1983 estimated population of each 1970-defined SMSA. Post-stratification weights were computed for each of the 20 U.S. areas to blow-up the full area probability sample (adjusted for non-response) into the estimated 1983 U.S. total of 83,918,000 households (and the 20 sub-groups as well). Unlike B3003, the post-stratification for the western region in B3004 includes Alaska and Hawaii. This ex-post weighting scheme is essentially the same as that used by the Census Bureau in reporting the March 1983 CPS survey, which used a similar sampling strategy as that of the SCF. 9235. northeast, suburban, less than 1,000,000 in the SMSA (215 cases) 10801. north central, suburban, less than 1,000,000 in the SMSA (188 cases) 13123. south, suburban, less than 1,000,000 in the SMSA (265 cases) 14112. north central, non-SMSA (404 cases) 14274. north central, center city, less than 1,000,000 in the SMSA (150 cases) 14958. south, non-SMSA (652 cases) 15191. west, center city, more than 1,000,000 in the SMSA (131 cases) 15533. north central, center city, more than 1,000,000 in the SMSA (144 cases) 15833. south, suburban, more than 1,000,000 in the SMSA (153 cases) 16081. west, non-SMSA (158 cases) 16500. northest, center city, more than 1,000,000 in the SMSA (144 cases) 16665. northeast, suburban, more than 1,000,000 in the SMSA (192 cases) 17077. northeast, non-SMSA (149 cases) 17202. south, center city, more than 1,000,000 in the SMSA (92 cases) 18123. west, suburban, less than 1,000,000 in the SMSA (106 cases) 18427. west, suburban, more than 1,000,000 in the SMSA (163 cases) 18439. northeast, center city, less than 1,000,000 in the SMSA (73 cases) 19621. west, center city, less than 1,000,000 in the SMSA (82 cases) 19973. south, center city, less than 1,000,000 in the SMSA (189 cases) 21608. northeast, suburban, more than 1,000,000 in the SMSA (174 cases) 0. high-income sample (438 cases) B3005 FULL SAMPLE 1983 COMPOSITE WEIGHT. This variable is equal to the non-response adjustment factor weight (B3002) times the 1983 post-stratification weight (B3004). THIS IS THE RECOMMENDED WEIGHT TO USE WITH THE FULL AREA PROBABILITY SAMPLE. This weight will "blow up" the 3,824 observation full area probability sample into the aggregate U.S. household population (including Alaska and Hawaii) as measured by the 1983 CPS. The average value of B3005 is 21,945.3 and it totals 83,918,807. xxxxx. weight (10,860 to 50,299) 0. high-income sample (438 cases) "Cleaned" Area Probability Sample Weights B3006 INCLUSION PROBIT PREDICTED VALUE. This variable is the "y-hat" of the "cleaned" area probability sample inclusion probit model. xxxxx. "y hat" times 10000 (2,111 to 48,565) 0. high-income sample (438 cases) B3007 CLEANED SAMPLE INCLUSION WEIGHT. This variable is the inverse of the estimated probability that an observation would be included in the cleaned area probability sample, given it is in the full area probability sample. Its average is 1.04313. This variable will blow the cleaned sample up into the full sample. xxxxx. weight times 10000 (10,000 to 17,135) 0. high-income sample or excluded area probability sample (597 cases) B3008 CLEANED AREA PROBABILITY 1980 POST-STRATIFICATION WEIGHT. This variable will post-stratify the cleaned area probability sample to the household totals and regional distribution of the 1980 Census. It is approximately equal to B3003 times B3007 (the post-stratification was changed slightly). This variable times B3002 (divided by 10,000) is the variable used to weight the data presented in the September 1984 and December 1984 Federal Reserve Bulletin articles on the 1983 SCF. The average value of this weight is 15,496.5 for the 3,665 observations in the cleaned sample. xxxxx. weight (13,958 to 25,536) 0. high-income or excluded area probability sample (597 cases) B3009 CLEANED AREA PROBABILITY 1983 POST-STRATIFICATION WEIGHT. This variable will post-stratify the cleaned area probability sample to the household totals by region in the 1983 CPS. It is approximately equal to B3004 times B3007. The average value of this weight is 16,217.9 for the 3,665 observations in the cleaned sample. xxxxx. weight (9,276 to 35,950) 0. high-income or excluded area probability sample (597 cases) B3010 CLEANED-SAMPLE 1983 COMPOSITE WEIGHT. This variable is equal to the non-response adjustment factor weight (B3002) times the post-stratification weight (B3009). THIS IS THE RECOMMENDED WEIGHT TO USE WITH THE CLEANED AREA PROBABILITY SAMPLE. This weight will blow the 3,665 observation area probability sample up into the aggregate (including Alaska and Hawaii) U.S. household population as measured by the 1983 CPS. The average value of B3010 is 22,897.4 and it totals 83,919,054. xxxxx. weight (10,974 to 56,337) 0. high-income or excluded area probability sample (597 cases) B3011 INCLUSION ERROR EXPECTATION. This variable is the expectation of the underlying latent variable error in the cleaned area probability sample probit inclusion equation conditioned on an observation appearing in the cleaned sample (the Mills ratio). Use of this variable as an independent regressor in estimating analytic models may at least partially correct for sample selection bias. xxxxx. error expectation times 10000 (9,370 to 50,409) 0. high-income or excluded area probability sample (597 cases) High-income and Total Sample Weights B3012 HIGH-INCOME SAMPLE WEIGHTS. This variable is given only for the high-income sample and gives relative sampling weights within that sample as computed by the IRS and the Office of Tax Analysis. This weight should generally be the one used when performing analysis using only the high-income sample. B3012 takes on only nine different values ranging from 60 to 2,533. The nine different classes reflect the original sampling frame and are based primarily on income. The mean of B3010 is 1,280.34. 60. (19 cases) 121. (100 cases) 201. (48 cases) 261. (16 cases) 442. (21 cases) 1528. (58 cases) 2191. (21 cases) 2472. (46 cases) 2533. (109 cases) 0. area probability sample (3824 cases) B3013 FULL SAMPLE PSU CODE. This variable gives a unique sampling cell number to all observations in the sample. This variable can be used in forming weights. Values 1 through 9 are the nine cells in B3012 for the high-income observations. The remainder indicate area probability sample PSUs. Values 10 through 22 are self-representing PSUs. Values 23 through 54 are PSUs primarily in SMSAs and urban areas. Values 55 through 84 imply PSUs primarily in rural counties. xx. cell number (1 to 84) B3014 FULL SAMPLE SRC COMPOSITE WEIGHT. This weight combines the non-response weight (B3002), the 1980 post-stratification weight (B3003), high-income weights (B3010), and an income-based adjustment to mesh the full area probability sample with the high-income sample. The income adjustment is very slight for those area probability sample observations with incomes below $50,000. Area probability sample observations in higher income strata will have a much more significant reduction in their weight. The high-income sample weights are essentially the same as B3012 times 2, with a slight population adjustment. B3014 will blow the full sample up into the aggregate 1980 U.S. household population. xxxxx. weight (173 to 40,069) B3015 FULL CLEANED SAMPLE SRC COMPOSITE WEIGHT. This variable is identical to B3014 except that it applies to the cleaned area probability sample and uses the 1983 post-stratification weight B3009. This is the weight that was used for the March 1986 Federal Reserve Bulletin article. xxxxx. weight (182 to 56,264) 0. excluded area probability sample observations (159 cases) B3016 EXTENDED INCOME FRB WEIGHT. This is a full sample weight which should be similar to B3015 in use. It was constructed by post-stratification to the 1982 IRS tables using extended income (see the summary for a description). THIS IS THE RECOMMENDED FULL SAMPLE WEIGHT. The total number of implied households is 83,917,975, with a mean weight of 20452.8 No missing values. xxxxx. weight (546 to 56,473) 0. excluded area probability sample observations (159 cases) B3017 REVISED SRC AREA PROBABILITY WEIGHT. This variable is the revised SRC weight as of 1987. It takes into account the removal of the 159 excluded area probability cases, and post-stratifies to July 1, 1983 Census population figures. xxxxx. weight (16,529 to 44,471) 0. high-income and excluded area probability sample observations (597 cases) B3018 REVISED SRC HIGH-INCOME WEIGHT. This variable is the revised SRC high-income sample weight as of 1987. This weight takes into account different response rates from the self-representing and other PSUs for the initial SOI sampling. xxxx. weight (59 to 11,783) 0. area probability sample (3824 cases) B3019 REVISED SRC COMPOSITE WEIGHT. This variable is the revised SRC composite weight as of 1987. It combines B3017 and B3018 and is designed to be used with the full 4103 sample. xxxxx. weight (60 to 43,601) 0. excluded area probability sample observations (159 cases) Pension Provider Survey (H) Head (husband if Married) (S) Spouse (wife) B3031 (H) COMPLETION CODE. B3053 (S) This variable is a constructed variable indicating whether or not the Pension Provider Survey was successfully completed. 1. the Pension Provider Survey completed and coded (1181/460 cases) 2. the Pension Provider Survey contact made, but the survey could not be coded (288/67 cases) 3. permission for the Pension Provider Survey given, but could not contact the pension provider (18/3 cases) 4. permission for the Pension Provider Survey given and contact made; however, respondent/spouse actually not eligible for pension and should not have been asked for pension information (38/6 cases) 5. eligible for the Pension Provider Survey, but permission and/or the name of the pension provider not given (147/53 cases) 6. not eligible for Pension Provider Survey (2590/2147 cases) 0. INAP, no spouse (0/1526 cases) B3032 (H) PENSION CORRESPONDENCE. B3054 (S) This variable indicates which job identified at the interview as appropriate for the Pension Provider Survey. If the Pension Provider Survey was completed, this variable indicates which job it applies to. 1. Pension Provider Survey corresponds to current job (1537/544 cases) 2. Pension Provider Survey corresponds to the job before retired/disabled or the last paid job if a student or housewife (44/28 cases) 3. Pension Provider Survey corresponds to longest prior job (83/16 cases) 4. Pension Provider Survey corresponds to the job from which respondent expects to or now receives a pension (8/1 cases) 0. INAP, no Pension Provider job information given or no spouse (2590/3673 cases) question: X14/X15 B3033 (H,#1) PENSION PROVIDER SURVEY PENSION PROVIDER ID NUMBER. B3038 (H,#2) This variable indicates the four digit number assigned to B3043 (H,#3) the pension provider sought for this observation. Answered B3048 (H,#4) Answered only if contact with the pension provider was sought B3055 (S,#1) (B3031 or B3053 coded 1 to 4). Normally only one provider was B3060 (S,#2) sought per person. In about ten cases, however, it turned out B3065 (S,#3) that an individual had two providers (usually with two plans). B3070 (S,#4) In these instances B3033 may be different than B3038 etc. It was common, though, for the person to be in multiple plans provided by the same provider. This will be indicated in the file by the variable B3035 etc. The variable B3033 etc. corresponds to the variable V3 (PPID) in the Provider Survey file. xxxx. code (5001-8033) 0. INAP not in Pension Provider Survey, no spouse, or not that many plans (2737/4132/4239/4261 3726/4219/4258/4262 cases) B3034 (H,#1) PENSION PROVIDER SURVEY RESULT CODE. B3039 (H,#2) This variable indicates the result of the Pension Provider B3044 (H,#3) Survey inquiry for this observation and this plan. Answered B3049 (H,#4) only if contact with the pension provider was sought (B3031 B3056 (S,#1) or B3053 coded 1 to 4). Except in a few instances this B3061 (S,#2) variable will have the same value for all plans of a person. B3066 (S,#3) B3071 (S,#4) 1. complete interview (1181/125/22/1/460/42/4/0 cases) 2. partial interview -- no SPD (129/1/0/31/0/0/0 cases) 3. complete interview but incomplete SPD (72/4/1/0/10/1/0/0 cases) 4. refusal by provider to complete interview (87/0/0/0/26/0/0/0 cases) 5. no pension plan at provider (28/0/0/0/4/0/0/0 cases) 6. pension plan at provider but respondent/spouse job not covered (10/0/0/0/2/0/0/0 cases) 7. inadequate or incorrect name/address of provider given (4/0/0/0/1/0/0/0 cases) 8. could not locate provider (14/0/0/0/2/0/0/0/0 cases) 0. INAP not in Pension Provider Survey, no spouse, or not that many plans (2737/4132/4239/4261 3726/4219/4258/4262 cases) B3035 (H,#1) PENSION PROVIDER SURVEY PENSION PLAN NUMBER. B3040 (H,#2) This variable indicates the three digit pension plan number B3045 (H,#3) assigned to this particular observation in the Pension B3050 (H,#4) Provider Survey. Answered only if the Pension Provider Survey B3057 (S,#1) was successfully coded (B3031 or B3053 coded 1). This number B3062 (S,#2) is should be used in combination with the Provider ID for a B3067 (S,#3) unique identification of the plan/provider. It corresponds to B3072 (S,#4) to variable V4 (PLAN #) in the Provider Survey file. xxx. code (1-997) -8. plan number not assigned (30/0/0/0/31/1/0/0 cases) -9. NA, no official plan number (270/11/1/0/151/9/0/0 cases) 0. INAP not in Pension Provider Survey, no spouse, or not that many plans (3081/4137/4240/4261 3802/4220/4258/4262 cases) B3036 (H,#1) PENSION PROVIDER SURVEY PENSION PLAN SEQUENCE NUMBER. B3041 (H,#2) This variable indicates the sequence code number of the B3046 (H,#3) Pension Provider Survey plan which corresponds to the B3051 (H,#4) observation. Answered only if the Pension Provider Survey was B3058 (S,#1) successfully coded (B3031 or B3053 coded 1). Both head and B3063 (S,#2) spouse could have up to four different pension plan sequence B3068 (S,#3) IDs. This variable corresponds to the variable V2 (SEQ #) B3073 (S,#4) in the Provider Survey file. xxxx. code (1-1132) 0. INAP not in Pension Provider Survey, no spouse, or not that many plans (3081/4137/4240/4261 3802/4220/4258/4262 cases) B3037 (H,#1) PENSION PROVIDER SURVEY PENSION INTERVIEW CODING ID. B3042 (H,#2) This variable indicates the code number of the Pension B3047 (H,#3) Provider Survey plan which corresponds to the observation. B3052 (H,#4) Answered only if the Pension Provider Survey was successfully B3059 (S,#1) coded (B3031 or B3053 coded 1). Both Head and Spouse could B3064 (S,#2) have up to four different Pension Plans IDs, although only one B3069 (S,#3) pension provider contact was made for each individual. This B3074 (S,#4) occurred because some pensions had multiple plans and parts. This variable correspondes to the variable V1 (CODING ID) in the Provider Survey file. If the code is between 1 and 2,999 then it indicates that the plan is a defined benefit plan. If the code is between 3,000 and 4,999 then the plan is a defined benefit plan. If the code is 5,000 or over then it signifies a mixed defined benefit/contribution plan. xxxx. code (1-5043) 0. INAP not in Pension Provider Survey, no spouse, or not that many plans (3081/4137/4240/4261 3802/4220/4258/4262 cases) 1986 Survey of Consumer Finances B3075 1986 SURVEY OF CONSUMER FINANCES CONTACT STATUS. This variable indicates the status of the household in the 1986 Survey of Consumer Finances. 1. excluded area probability observation, not used for either 1983 or 1986 samples (159 cases) 2. household interviewed in 1983 but not in 1986 (1322 cases) 3. household interviewed in 1986 as an intact unit (2612 cases) 4. household split in 1986, both respondent and spouse interviewed separately (41 cases, thus 82 cases in 1986) 5. household split in 1986, only one part (respondent or spouse) interviewed (128 cases) B3076 (H) 1986 SURVEY OF CONSUMER FINANCES LOCATION CODE. B3078 (S) This variable indicates whether the 1983 head (and/or spouse) participated in the 1986 Survey of Consumer Finances. No missing values. 1. head/spouse married in 1983, still married in 1986 and interviewed in 1986 (counted as one 1986 observation) (1748/1748 cases) 3. head/spouse married in 1983, still married in 1986 and couple found but refused interview in 1986 (284/284 cases) 4. head/spouse married in 1983, still married in 1986 and couple could not complete interview because of physical inability of respondent in 1986 (8/8 cases) 5. head/spouse married in 1983, still married in 1986 and couple could not complete interview because of language problem in 1986 (7/7 cases) 6. head/spouse married in 1983, still married in 1986 and couple could not complete interview because household was overseas during entire study period in 1986 (7/7 cases) 7. head/spouse married in 1983, 1986 marital status unknown, neither respondent nor spouse could be located in 1986 (229/229 cases) 8. head/spouse married in 1983, 1986 marital status unknown, household not included for 1986 survey because it refused to give either address or phone recontact information in 1983 (46/46 cases) 9. head/spouse married in 1983, 1986 marital status unknown, household not included for 1986 survey because excluded by SRC randomly (109/109 cases) 10. head/spouse married in 1983, 1986 marital status unknown, household not included for 1986 survey because it was in the excluded area probability sample in 1983 (it did give recontact information, however) (93/93 cases) 11. head/spouse married in 1983, no longer together in 1986 and head (or spouse) interviewed in 1986 (thus if both B3040 and B3042 are coded 11 the household counts as two 1986 observations, if only one is coded 11 then it counts as one 1986 observation) (88/112 cases) 12. head/spouse married in 1983, no longer together in 1986 and head (or spouse) gave a partial interview in 1986 (not treated as a 1986 observation) (9/9 cases) 13. head/spouse married in 1983, no longer together in 1986 and head (or spouse) found but refused interview in 1986 (5/12 cases) 14. head/spouse married in 1983, no longer together in 1986 and head (or spouse) could not complete interview because of physical inability in 1986 (3/3 cases) 16. head/spouse married in 1983, no longer together in 1986 and head (or spouse) could not complete interview because household was overseas during entire study period in 1986 (0/1 cases) 17. head/spouse married in 1983, head (or spouse) deceased in 1986 (73/36 cases) 18. head/spouse married in 1983, no longer together in 1986 and head (or spouse) could not be located in 1986 (23/20 cases) 19. head/spouse living as partners in 1983, no longer together in 1986 and head (or spouse) was not pursued for an interview in 1986 (4/2 cases) 21. respondent not married in 1983, interviewed in 1986 (counted as one 1986 observation) (864/0 cases) 23. respondent not married in 1983, respondent found but refused interview in 1986 (131/0 cases) 24. respondent not married in 1983, respondent could not complete interview because of physical inability in 1986 (14/0 cases) 25. respondent not married in 1983, respndent could not complete interview because of language problem in 1986 (3/0 cases) 26. respondent not married in 1983, could not complete interview because respndent was overseas during entire study period in 1986 (4/0 cases) 27. respondent not married in 1983, respondent deceased in 1986 (45/0 cases) 28. respondent not married in 1983, respondent could not be located in 1986 (253/0 cases) 29. respondent not married in 1983, household not included for 1986 survey because it refused to give address or phone recontact information in 1983 (35/0 cases) 30. respondent not married in 1983, household not included for 1986 survey because it was excluded by SRC randomly (126/0 cases) 31. respondent not married in 1983, household not included for 1986 survey because it was in the excluded area probability sample in 1983 (it did give recontact information, however) (51/0 cases) 0. INAP, no spouse (0/1526 cases) B3077 (H) 1986 SURVEY OF CONSUMER FINANCES ID CODE. B3079 (S) This variable is the 1986 ID code (V1 number) corresponding to the head (and spouse in B3043 if interviewed separately). xxxx. ID number (17-7340) 0. INAP, household excluded from 1986 survey or no spouse (460/1774 cases) HOUSEHOLD DEMOGRAPHICS Persons in Household B3101 TOTAL NUMBER OF PERSONS IN HOUSEHOLD (PRIMARY FAMILY). This is the total number of people in the household (or primary family) referred to throughout the questionnaire. It excludes all non-related persons who live in the household unit (dwelling) but are not in the primary family. This variable corresponds to the Census Bureau's terms "family" or "non-family householder" and SRC's term "family unit." As indicated in the summary, for household units with multiple families, only the primary family was interviewed. Household composition is taken from the interviewer coding sheet. No missing values. 1. one (938 cases) 2. two (1272 cases) 3. three (717 cases) 4. four (683 cases) 5. five (307 cases) 6. six (115 cases) 7. seven (40 cases) 8. eight (21 cases) 9. nine (7 cases) 11. eleven (2 cases) 13. thirteen (1 case) B3102 TOTAL NUMBER OF PERSONS IN HOUSEHOLD 18 OR OLDER. This total excludes any non-related persons who live in the household structure but are not in the primary family. Age is determined from the interviewer coding sheet. Even if the respondent or spouse are under 18 he/she will be included here. No missing values. 1. one (1169 cases) 2. two (2232 cases) 3. three (482 cases) 4. four (169 cases) 5. five (34 cases) 6. six (15 cases) 7. seven (1 case) 8. eight (1 case) B3103 TOTAL NUMBER OF PERSONS IN HOUSEHOLD 65 OR OLDER. This total excludes any non-related persons who live in the household structure but are not in the primary family. Age is determined from the interviewer coding sheet. No missing values. 1. one (588 cases) 2. two (320 cases) 3. three (6 cases) 4. four (1 case) 0. no household members 65 or older (3188 cases) B3104 TOTAL NUMBER OF PERSONS IN HOUSEHOLD UNDER 18. This total excludes any non-related persons who live in the household structure but are not in the primary family. Age is determined from the interviewer coding sheet. Respondent or spouse are not included here even if under 18 (B3104 plus B3102 will equal B3101). No missing values. 1. one (667 cases) 2. two (624 cases) 3. three (228 cases) 4. four (71 cases) 5. five (27 cases) 6. six (5 cases) 7. seven (6 cases) 8. eight (2 cases) 0. no household members under 18 (2473 cases) B3105 AGE OF YOUNGEST CHILD UNDER 18. Excludes any non-related persons who live in the household structure but are not in the primary family. It includes all children under 18, not just the children of the respondent and/or spouse. Age is determined from the interviewer coding sheet. Respondent or spouse are not listed here even if under 18. No missing values. 1. one (288 cases) 2. two (158 cases) 3. three (88 cases) 4. four (92 cases) 5. five (84 cases) 6. six (82 cases) 7. seven (76 cases) 8. eight (79 cases) 9. nine (75 cases) 10. ten (76 cases) 11. eleven (73 cases) 12. twelve (77 cases) 13. thirteen (82 cases) 14. fourteen (66 cases) 15. fifteen (64 cases) 16. sixteen (84 cases) 17. seventeen (86 cases) 0. no household members under 18 (2473 cases) B3106 AGE OF OLDEST CHILD UNDER 18. Excludes any non-related persons who live in the household structure but are not in the primary family. Age is determined from the interviewer coding sheet. Respondent or spouse are not listed here even if under 18. No missing values. 1. one (114 cases) 2. two (81 cases) 3. three (69 cases) 4. four (64 cases) 5. five (67 cases) 6. six (70 cases) 7. seven (64 cases) 8. eight (75 cases) 9. nine (77 cases) 10. ten (79 cases) 11. eleven (81 cases) 12. twelve (89 cases) 13. thirteen (95 cases) 14. fourteen (104 cases) 15. fifteen (112 cases) 16. sixteen (175 cases) 17. seventeen (214 cases) 0. no household members under 18 (2473 cases) B3107 NUMBER OF CHILDREN OF RESPONDENT/SPOUSE NOT LIVING WITH THEM. Indicates the number of children of either the respondent or spouse not living in the household (thus not included in totals above). This should include children of previous marriages living with former spouses or older children in college or on their own. No persons listed on the interviewer coding sheet (see B3125 - B3154) are included here. No missing values. 1. one (565 cases) 2. two (628 cases) 3. three (421 cases) 4. four (257 cases) 5. five (121 cases) 6. six (73 cases) 7. seven (34 cases) 8. eight (17 cases) 9. nine (13 cases) 10. ten (5 cases) 11. eleven (5 cases) 12. twelve (4 cases) 13. thirteen (1 case) 16. sixteen (1 case) 17. seventeen (1 case) 0. none (1957 cases) question: R63/R63a B3108 TOTAL NUMBER OF CHILDREN OF RESPONDENT AND/OR SPOUSE. The total number of living children of respondent and/or spouse including those not living in the household (B3107 plus children of respondent and/or spouse included in B3104). No missing values. 1. one (633 cases) 2. two (1071 cases) 3. three (672 cases) 4. four (418 cases) 5. five (233 cases) 6. six (117 cases) 7. seven (69 cases) 8. eight (45 cases) 9. nine (22 cases) 10. ten (15 cases) 11. eleven (10 cases) 12. twelve (4 cases) 13. thirteen (4 cases) 14. fourteen (2 case) 17. seventeen (2 case) 0. none (786 cases) B3109 HOUSEHOLD UNIT COMPOSITION CODE. Type of household unit. Describes residents of the household unit or dwelling. 1. nuclear family -- single persons living by themselves or only with spouse and/or children (3634 cases) 2. extended family -- nuclear family plus other related persons living in the household (brother, parent, etc.) (296 cases) 3. unrelated persons only -- household dwelling includes only respondent plus other unrelated individuals (roommates etc.) These individuals would be termed unrelated individuals or residents of group quarters by the Census Bureau (133 cases) 4. nuclear family plus -- household dwelling includes nuclear family (respondent plus spouse and/or children) plus at least one unrelated individual (a Census defined unrelated subfamily, formerly called a secondary family, or an unrelated individual) (31 cases) 5. extended family plus -- household dwelling includes extended family (respondent plus other relatives) plus at least one unrelated individual (a Census defined unrelated subfamily or unrelated individual) (9 cases) Household Characteristics B4503 AGE OF HEAD BY DATE OF BIRTH. The head is the respondent for single persons and the husband for married couples. No missing values. xx. years (15-98) B3110 AGE OF HEAD BY DATE OF BIRTH -- RECODE. A recode of B4503. 1. under 25 (295 cases) 2. 25-34 (862 cases) 3. 35-44 (777 cases) 4. 45-54 (680 cases) 5. 55-64 (673 cases) 6. 65-74 (527 cases) 7. 75 and over (289 cases) B3126 SEX OF HEAD. The head is the respondent for single persons and the husband for married couples. No missing values. 1. male (3135 cases) 2. female (968 cases) B3111 RACE OF HOUSEHOLD. Variable is the observed race of the survey respondent. All missing values were imputed using census data and other sources. 1. caucasion except hispanic (3468 cases) 2. black except hispanic (478 cases) 3. hispanic (111 cases) 4. American indian or Alaskan native (9 cases) 5. Asian or pacific islander (37 cases) question: X3 B3112 MARITAL STATUS OF RESPONDENT. No missing values (no imputations were needed). 1. married (includes common-law marriage or couples living together as "partners") (2635 cases) 2. separated (144 cases) 3. divorced (431 cases) 4. widowed (442 cases) 5. never married (451 cases) question: R59 B3113 EDUCATION OF HEAD -- RECODE. A recode of B4505 through B4507. The head is the respondent for single persons and the husband for married couples. 1. 0-8 grades (560 cases) 2. 9-12 grades, no high school diploma (511 cases) 3. high school diploma or equivalent, no college (1201 cases) 4. some college, no college degree (678 cases) 5. college degree (1153 cases) B3114 OCCUPATION OF HEAD -- RECODE. Recode of current job if working, or previous job if retired, disabled, or unemployed. The head is the respondent for single persons and the husband for married couples. 1. professional, technical, and kindred workers (640 cases) 2. managers and administrators (except farm) (642 cases) 3. self-employed managers (234 cases) 4. sales, clerical, and kindred workers (518 cases) 5. craftsmen, protective service, and kindred workers (675 cases) 6. operatives, laborers, and service workers (1167 cases) 7. farmers and farm managers (92 cases) 8. miscellaneous (members of armed services, housewives, students, never worked, and other occupations) (135 cases) B3115 LABOR FORCE PARTICIPATION -- RECODE 1. single household, not in labor force (635 cases) 2. single household, in labor force (833 cases) 3. respondent and spouse household, neither in labor force (389 cases) 4. respondent and spouse household, one in labor force (1077 cases) 5. respondent and spouse household, both in labor force (1169 cases) B3116 LIFE-CYCLE STAGE OF HOUSEHOLD. The head is the respondent for single persons and the husband for married couples. No missing values. 1. neither respondent or spouse 65 or over, with some relative of respondent or spouse, over age 18, living in household, but no relatives under 18 (1040 cases) 2. neither respondent or spouse 65 or over, no other relatives living in household (623 cases) 3. either respondent or spouse 65 or over, with some relative of respondent or spouse, over age 18, living in household, but no relatives under 18 (495 cases) 4. either respondent or spouse 65 or over, no other relatives living in household (315 cases) 5. head married, relatives 18 or under living in household (1238 cases) 6. female-headed household (must be single), relatives 18 or under living in the household, but no relative over 18 (270 cases) 7. unmarried head, relatives 18 or under in the household, and either male-headed or female-headed with other relatives over 18 present (122 cases) B3201 TOTAL 1982 HOUSEHOLD INCOME. Total reported income. No missing values. xxxxxxx. dollars (-24,062 to 3,425,887) B3203 TOTAL 1982 HOUSEHOLD INCOME -- RECODED. A recode of B3201. 1. less than $5,000 (351 cases) 2. $5,000-7,499 (298 cases) 3. $7,500-9,999 (263 cases) 4. $10,000-14,999 (525 cases) 5. $15,000-19,999 (457 cases) 6. $20,000-24,999 (385 cases) 7. $25,000-29,999 (326 cases) 8. $30,000-39,999 (462 cases) 9. $40,000-49,999 (255 cases) 10. $50,000 and more (781 cases) Geographic Location B3117 REGION OF THE COUNTRY. Not given for the high-income sample. 1. northeast (Maine, Massachusetts, Connecticut, New York, New Jersey, Pennsylvania) (737 cases) 2. north central (Ohio, Indiana, Illinois, Michigan, Wisconsin, Minnesota, Missouri, Nebraska, Iowa, South Dakota) (1016 cases) 3. south (Virginia, North Carolina, South Carolina, Georgia, Florida, Alabama, Mississippi, Louisiana, Arkansas, Tennessee, Texas, Oklahoma, Kentucky, Maryland, District of Columbia, West Virginia) (1289 cases) 4. west (Colorado, Utah, Arizona, California, Oregon, Washington) (623 cases) 0. high-income sample (438 cases) B3118 BELT CODE. This variable was coded according to the 1970 Census with additions from census population reports. It was used for the 1980 post-stratification weight (B3003). Not given for the high-income sample. 1. central cities of the two Standard Consolidated Areas (SCA's) plus the ten largest SMSA's -- New York, Los Angeles, Chicago, Philadelphia, Boston, Washington, Baltimore, Detroit, San Francisco, St Louis, Cleveland, Pittsburgh. These are the self-representing PSUs (316 cases) 2. central cities of other SMSA's (648 cases) 3. suburbs of the two SCA's or ten largest SMSA's. Suburbs are defined as all urbanized areas within the SMSA exclusive of the central city plus the remainder of any county containing a central city or part of a central city (516 cases) 4. suburbs of other SMSA's (714 cases) 5. adjacent areas. An adjacent area includes all territory beyond the outer boundary of the suburban belt, but within fifty miles of the central business district of a central city. This can still be in the SMSA (844 cases) 6. outlying areas. An outlying area includes all territory more than fifty miles from the central business district of a central city. This can still be in the SMSA (627 cases) 0. high-income sample (438 cases) B3119 1970 SMSA CODE. This variable was coded acc