----------------------------------------------------------------------------- ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- CODEBOOK FOR 2004 SURVEY OF CONSUMER FINANCES ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Survey of Consumer Finances Financial Studies Section Division of Research and Statistics Board of Governors of the Federal Reserve System Mail Stop 153 Washington, DC 20551 ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- To: Users of the 2004 SCF From: Arthur Kennickell, SCF Project Director Date: February 9, 2006 Subject: Description of the final version of the 2004 SCF ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- WARNING: This codebook contains over 100,000 lines of text, including this introduction, variable descriptions, the source code for the computer program used to collect the survey data, and other material. Most users will probably NOT want to print the entire document. Generally, we recommend working with the material in a text editor using a non-proportional font for display. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The codebook serves as the principal guide to the variables included on the final public version (February 20, 2006 version) of the 2004 SCF data set. However, not every variable included in this codebook is actually in the public use data set. Among other things, the data set does NOT include most variables related to the sample design, details of geography, or the 3-digit industry and occupation codes. Although we have attempted to mark the variables in the codebook that are not available to the public, there may be errors and omissions. The definitive list of the variables included is given at the end of this file. Please consult that list to determine whether a given variable is available to you. The SCF is sponsored by the Board of Governors of the Federal Reserve System in cooperation with the Statistics of Income Division of the Internal Revenue Service. Data for the 2004 SCF were collected by the NORC, a nation organization for research and computing at the University of Chicago (NORC). For a general overview of the 2004 SCF, see Brian K. Bucks, Arthur B. Kennickell, and Kevin B. Moore "Recent Changes in U.S. Family Finances: Evidence from the 2001 and 2004 Survey of Consumer Finances," Federal Reserve Bulletin, www.federalreserve.gov/pubs/ 2005/05index.htm, winter 2006. Results you may obtain from using this release of the 2004 SCF data may differ from those reported in this article for several reasons. First, the analysis weights used in that article were altered to provide robust estimates of the detailed categories shown. In brief, the data were examined for extreme outliers, and where a given case was overly influential in determining an outcome, the weight was trimmed and other weights were inflated to maintain a constant population. Second, as noted below, the public version of the data has been systematically altered to minimize the likelihood that unusual individual cases could be identified. Our analysis of the public data set suggests that these changes should not alter the conclusions of reasonable analyses of the data. Finally, over time we correct errors that we find in the data set. In our past experience, the effects of such errors on the estimates have been quite small. This codebook is intended to provide only an overview of the most critical technical elements of the survey. For more details, see Arthur B. Kennickell "Wealth Measurement in the Survey of Consumer Finances: Methodology and Directions for Future Research," May 2000, http://www.federalreserve.gov/pub/oss/oss2/method.html and references cited in that paper. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- TABLE OF CONTENTS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- DATA FILES INCLUDED ON SCF WEB SITE FOR THIS RELEASE QUESTIONNAIRE UNIT OF ANALYSIS SAMPLE DESIGN CODEBOOK CONVENTIONS VARIABLE NAMES GENERAL DATA CONVENTIONS CASE ID NUMBERS "OTHER" CODES GRIDS SUMMARY VARIABLES DATA REVIEW IMPUTATION DISCUSSION OF RANGE DATA COLLECTION AND J-CODES ANALYSIS WEIGHTS SAMPLING ERROR DISCLOSURE REVIEW COMPARISON WITH OTHER DATA ACKNOWLEDGMENTS VARIABLE DEFINITIONS SURVEYCRAFT PROGRAM, MAIN QUESTIONNAIRE (ENGLISH VERSION) SURVEYCRAFT PROGRAM, MAIN QUESTIONNAIRE (SPANISH VERSION) SURVEYCRAFT PROGRAM, PROBE$ SURVEYCRAFT PROGRAM, INTERVIEWER COMMENTS MAPPING FROM SURVEYCRAFT VARIABLES TO SCF VARIABLES LIST OF VARIABLES INCLUDED IN PUBLIC DATA SET ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- DATA FILES INCLUDED ON SCF WEB SITE FOR THIS RELEASE ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The primary data files for the survey consist of the following elements: (1) the main data set, (2) a file of replicate weights corresponding to X42001 (see below for a description of the replicate weights), and (3) an aggregated version of the main data set containing summary variables corresponding to those used in "Recent Changes in U.S. Family Finances: Evidence from the 2001 and 2004 Survey of Consumer Finances" (Brian K. Bucks, Arthur B. Kennickell, and Kevin B. Moore, Federal Reserve Bulletin, 2006). The data are provided in a variety of formats. To aid users in reconciling their calculations with those found in the 2006 Bulletin article, two sets of tables comparable to those in the article are provided: the first set is based on the current internal version of the data, and the second version is based on the current public version of the data. Finally, in the Excel version of the aggregated data file, a table-making utility is provided for users who wish to make customized calculations. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- QUESTIONNAIRE ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The 2004 SCF was collected using computer-assisted personal interviewing (CAPI). Thus, there is no questionnaire in the usual sense. This codebook serves as the most comprehensive guide to the definitions of variables included in the survey. Later in this file, copies of the Surveycraft (SC) programs that were used to collect the data are included. The SC programs serve as the authoritative reference for questions relating to question ordering and skip sequences. Because question ordering is important in understanding the meaning of many questions, users of the data are encouraged to consult the SC program. After the SC program, a translation of most SC variables into SCF variables is provided. Although there is usually a direct correspondence between the SC variables and the final variables listed in this codebook, there are some places where the connections are indirect: In some cases, the same question is asked in two difference places, and in the final data set all instances of answers to the question are mapped into a single location; in other cases variables may be inferred from other information (for example, if a respondent reported a wage on a current job and reported that their employer contributed a certain percent of their wage to a pension plan, then the dollar contribution to the plan would be filled in). Almost always, the data rearrangements can be identified from the shadow variables associated with the variables (see section "VARIABLE NAMES" below). ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- UNIT OF ANALYSIS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Most of the data in the survey are intended to represent the financial characteristics of a subset of the household unit referred to as the "primary economic unit" (PEU). In brief, the PEU consists of an economically dominant single individual or couple (married or living as partners) in a household and all other individuals in the household who are financially interdependent with that individual or couple. For example, in the case of a household composed of a married couple who own their home, a minor child, a dependent adult child, and a financially independent parent of one of the members of the couple, the PEU would be the couple and the two children. Summary information is collected at the end of the interview for all household members who are not included in the PEU. The only variables collected separately for the respondent and the spouse or partner of the respondent are those concerning employment, pension, and demographic characteristics. Throughout the codebook, we refer to the "head" of the household. The use of this term is euphemistic and merely reflects the systematic way in which the data set has been organized. The head is taken to be the single core individual in a PEU without a core couple. In a PEU with a central couple, the head is taken to be either the male in a mixed-sex couple or the older individual in the case of a same-sex couple. No judgment about the internal organization of the households is implied by this organization of the data. When the original respondent was someone other than the person determined to be the head in this sense, all data (including response codes) for the two members of the couple were systematically swapped. The variable X8000 indicates which cases have been subjected to such rearrangement. NOTE: Because only limited information is collected on the ownership of assets and liabilities within the PEU, it is not possible, in general, to make direct separate estimates of the financial characteristics of the individuals in the survey households unless one is prepared to make a number of fairly complex assumptions. To understand this point more thoroughly, there is no substitute for a careful reading of the actual survey questions. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- SAMPLE DESIGN ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The SCF is based on a dual-frame sample design. One set of the survey cases was selected from a standard multi-stage area-probability design. This part of the sample, which contributed 3,007 cases to the final set of interviews, is intended to provide good coverage of characteristics, such as home ownership, that are broadly distributed in the population. The other set of the survey cases was selected as a list sample from statistical records (the Individual Reserach Tax File) derived from tax data by the Statistics of Income Division of the Internal Revenue Service (SOI). These records were made available under strict rules governing confidentiality, the rights of potential respondents to refuse participation in the survey, and the types of information that can be made available. This second sample was designed to disproportionately select families that were likely to be relatively wealthy (see "Modeling Wealth with Multiple Observations of Income: Redesign of the Sample for the 2001 Survey of Consumer Finances," Arthur B. Kennickell, October 2001, http://www.federalreserve.gov/pubs/oss/oss2/method.html for a more extended discussion of the design of the list sample). The list sample contributed 1,515 cases to the final set of interviews. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- CODEBOOK CONVENTIONS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- For many purposes it is useful to know which responses were available to the interviewer and which were actually known by the respondent. Responses that are noted in the codeframes below by an asterisk are ones that were available to the interviewer on the screen during the interview. In general, if a response is given in the codebook in lower case letters, this indicates that it was permissible for the interviewer to read it to the respondent. Responses listed in all upper case letters are ones that were not intended to be read to the respondent. Codes that result from the recoding of responses originally reported as "other" are also given in lower case letters. Other subsidiary question texts given in capital letters are intended as interviewer instructions. In some cases, codes were available conditional on responses to earlier questions. One such example that appears throughout the interview is the reporting of institutions where the respondent has accounts of some type. If the respondent reported fewer than seven financial institution at X305, every time the interviewer came to a question that asked about the institution where the respondent had an account, the screen displayed the names of the already listed institutions (referred to as "Institution 1" etc. in the codebook), a code for "add an institution," and a code to enter to record an unusual type of institution ("not a financial institution"). Once seven institutions had been recorded (either at X305 or by adding institutions later in the interview), the screen displayed the names of the six institutions, the "not a financial institution" field, and a set of codes for the type of institution (i.e., commercial bank, savings and loan or savings bank, credit union, etc.). For many questions there are multiple versions. Most commonly, there are variants that are appropriate for single individuals and ones appropriate for families of two or more. Some other variants are more complicated. For example, suppose that a respondent lives in a building with multiple housing units (X702=1), the family owns the entire building (X714=1), and they own the unit they live in separately from the rest of the building. The CAPI program stores the information that there is such a property. Later in the interview when the respondent is asked about the number of investment real estate and vacation properties, one variant of question X1701 reminds the respondent to include the property mentioned earlier. There are many other such instances where the computer alters questions to suit the previous answers given by the respondent, and this codebook attempts to provide at least a summary form of all the possible questions. For example, at X1711 (correspondingly at X1811 and X1911), the respondent is asked whether there are any outstanding loans on a property. If the respondent had previously reported at X1703 (correspondingly at X1803 and X1903) that the property was a time-share, then the variant for time-shares is asked; otherwise a more generic question is asked. Telephone interviewing has long been important in the SCF, but it was particularly so for the 2004 survey; 44.7 percent of all interviews were completed by telephone. At the beginning of the interview, the interviewer entered a response to X7578 to indicate the mode of the interview. Because the normal visual cues, such as show cards, are not usually available to respondents for telephone interviews, the question text needed to be altered, generally in simple ways. Throughout the codebook, alternative versions of questions used in telphone interviews have been flagged. Of the 4,522 interviews, 155 were conducted in Spanish. The Spanish text is not provided in the main section of the codebook, but the interested user may consult the SC program for the Spanish version of the instrument, which is provided later in the codebook. The authoritative references for deterimining how the information was presented to the respondent are the executable CAPI program, which is available at http://www.federalreserve.gov/pubs/oss/oss2/2004/ scf2004home.html, and the text of the computer program program used by interviewers to collect the data; the text of this program is appended to this codebook. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- VARIABLE NAMES ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The codebook refers to the variables by the names they have in the version of the survey data set formatted for use with SAS. These names consist of a number prefixed by an "X." We have tried, insofar as it was possible, to retain the variable numbering system used in earlier SCFs. Where the content of a variable has changed in a substantive way, we have assigned a new variable number. Since the 2001 SCF, the following variables were added: 1) variables associated with the newly added 7th financial institution X332, X6656-X6663, X6894-X6897, X334 2) X7138, X7580, X9258, X1226, X7023-X7034, X3024-X3030, X3726 3) variables associated with the new savings grid beginning with X3727 (see variable defintions below) 4) X6551-X6574, X6756-X6758, X7785, X7787, X6575-X6593, X6595-X6599, X6900-X6905, X6933-X6999, X8467-X8472, X7004 5) variables associated with the new current job pension grid beginning with X11001 (see variable defintions below) 6) variables associated with the newly added 6th cash settlement X5818-X5822. Variables dropped since 2001 include: 1) X7130, X2103, X2110, X2204, X2304, X2404, X7151 2) variables associated with the "old" money market grid, the "old" savings grid, and the old current job pension grid and 3) X6445, X6449, X6453, X3610, X3620, X3630, X3631, X6816-X6820, X6826, X6831-X6835, X6841, X6463, X6468, X6473, X6478, X6483, X6488, X6804, X6805, X8455, X8456, X6491-X6496, X6807, X6808, X8458, X8459, X7666, X7667. Each of the variables in the main data set has a "shadow" variable that describes--in almost all cases--the original state of the variable (i.e., whether it was missing for some reason, a range response was given, etc.). The most important exception is reported values which have been imputed or otherwise altered to protect the privacy of respondents (see "DISCLOSURE REVIEW" below). Users who so desire may use the shadow variables to restore the data to something very close to their original condition. The shadow variables have the same numbers as the main variable, but have a prefix of "J." A list of the values taken by the shadow variables is given in the section below entitled "DISCUSSION OF RANGE DATA COLLECTION AND J-CODES." ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- GENERAL DATA CONVENTIONS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Throughout the SCF data set, a value of zero has only one meaning: that the item in question is inapplicable. That is, if a family does not have a checking account, then the number of checking accounts they own would be coded as a zero. Whenever zero is a legitimate response to a question, a value of -1 is used to signify that value. Other specialized codes are defined for specific variables in the codebook. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- CASE ID NUMBERS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Under the original case numbering system (XX1), important aspects of the sample design are apparent from the identification numbers. Because such information is not releasable under the agreements which allow us to collect the data, each case included in the public version of the data set has been given a random identification number (YY1). Users should note that it is not possible to know with certainty from the information provided in the public version of this data set which cases derive from the list sample. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- "OTHER" CODES ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- In almost every case where a respondent could supply a response that did not fit in the codeframe offered to interviewers on their computer screens, the CAPI program was constructed to allow the entry of a verbatim response. There were a few open-ended questions that were set up to accept only a verbatim response. All of these verbatim responses were run through a standard coding process at NORC. Once the data were at the FRB, strenuous efforts were made to resolve all instances of responses that were unresolved by the NORC coders. Responses that remain coded "other" in the final data set are unusual, but legitimate responses which do not fit within the existing codeframe; because these responses appeared unlikey to reoccur in future surveys, the codeframe was not augmented. Responses that were not informative (or were not answers to the questions that were asked) were treated as missing values and were imputed. An identical process has been followed since the 1995 SCF. In earlier surveys, the information recorded for "other" responses was not as complete, and the efforts to recode the available verbatim data were somewhat less stringent. Thus, analysts should exercise caution in time series comparisons of "other" responses from the 1995 and later surveys with those in earlier years. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- GRIDS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Some sets of questions in the SCF have a natural iterative pattern. For example, the survey asks for detailed information on up to the first six checking accounts owned by the PEU, and summary information is collected about all remaining accounts. The detailed questions are the same for each account. In past interviews done with paper and pencil, some respondents have resisted answering the detailed questions but have been willing to provide summary information. Typically, interviewers recorded the summary information in the margins of the questionniare, and editors allocated the data to the skipped questions according to a set of fixed rules. To allow for a variety of respondent-interviewer interactions in the SCF CAPI program, the grid questions were organized to provide a way of collecting summary information in a systematic way. We refer to the associated summary variables as "mop-up variables." Past surveys also indicated that some respondents recalled additional instances of items once they began answering questions in a grid, but interviewers often did not revise the originally reported number. The CAPI procedures were set up to allow for this possibility of recalling additional items. Consider first a respondent who gives a non-missing response to the question that asks for the number of items of the type to be queried in the grid. The interviewer would ask the respondent the first set of detailed questions on the item. Then, the interviewer would be confronted with a question (not to be read to the respondent); INTERVIEWER: CONTINUE, OR GO TO MOPUP OF LOOP? The intention of this question was to allow the interviewer to deal with a potentially hostile respondent and immediately branch to the mop-up questions. If the respondent was cooperative, the interviewer entered a CONTINUE response and followed an identical procedure at each iteration until either the number of items reported was exhausted, or the maximum number of detailed questions was asked and the mop-up question was asked to get summary information on all remaining items. If the respondent reported a number of items less than the maximum number about which the detailed questions are asked, the following question was asked at the end of the final iteration: Do you (or your family living here) have another xxxx? A YES response here indicates that the respondent recalled an additional instance in the process of answering the detailed questions. A respondent could continue to "add" iterations until the maximum number of iterations is reached and the mop-up questions are asked. Another possibility is that a respondent may either not know (or be unwilling to tell) the number of instances of an item. Because it is known that there is at least one such instance, the first set of detailed questions is asked. Then the respondent is asked: Do you (or your family living here) have another xxxx? The questioning then proceedes exactly as it would for a respondent who recalled additional instances after providing an initial number of instances. In processing the data, several steps were taken to attribute the data collected to their correct location. First, in some cases interviewers answered the question "INTERVIEWER: CONTINUE, OR GO TO MOPUP OF LOOP?" with the latter response, even though only one more instance remained. In such cases, the mop-up data were mapped into the appropriate position in the grid. This data movement is not directly recorded in the J-variables for such cases, although the movement can be deduced from the patterns of J-variables of other questions within the iteration of the grid that do not have mop-up equivalents: the value of the J-variables for such variables without mop-up equivalents would normally be 2052. Second, when respondents added instances, the originally reported number was updated and stored in the customary SCF variable number. The originally reported number of instances has been retained in the data set since such information cannot be recovered in any other way from the data made available. Third, when summary information was given by respondents who broke off their responses in a grid prematurely, that information was used to bound the imputations of the detailed data. Data items that have an associated J-variable with a value of 90 are ones where a complete response was given in the parallel mop-up variable, and those with a J-variable of 91 are ones where a range response was given in the parallel mop-up variable. There are some complicated mixed cases where a respondent gave a missing value for the number of instances, but was willing to provide non-missing mop-up data. Though tedious, it is possible to deduce this information from the J-variables provided. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- SUMMARY VARIABLES ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Summary variables (e.g., NET WORTH) are not included in the main data set. Although it is complicated to construct such variables, it is our belief that a substantial amount of judgment is involved in defining variables, and that other analysts should make their own decisions. However, as a convenience to users, we have included on the SCF web site a program written in the SAS language that was used to create the variables used in the 2006 Federal Reserve Bulletin article on the survey and an Excel file containing the summary variables. Users who wish to use the definitions in this program are encouraged to review the definitions to be certain that classifications are appropriate for their analytical purposes. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- DATA REVIEW ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- A very large amount of time has been spent in searching for errors in the data and resolving those errors. Many seeming inconsistencies are actually in the raw data and appear to have no obvious reconciliation. Our initial presumption is always that the respondent understood each question and reported accurately, and that the process of transcription and coding did not distort that information. In cases where other information led us beyond a reasonable doubt of the validity of the data, we have changed data, either by altering values directly or by setting them to missing and imputing them; in all such cases, the shadow variables indicate that we have overridden reported data (for an overview of the extent of data changes, see "Measuring Data Quality In the 1998 Survey of Consumer Finances," Arthur B. Kennickell, August 1999, http://www.federalreserve.gov/pubs/oss/oss2/ method.html). We ask our colleagues who use this data set to help us in finding any remaining resolvable inconsistencies. The imputations for missing data are subject to hierarchical logical constraints, but otherwise they reflect the data, whether they be consistent or inconsistent. For example, total income (X5729) in the reported data is sometimes not equal to the sum of the individual components (X5702 etc.), so this constraint is not automatically applied to the imputed data. Variability in the imputations for a variable in a given case may sometimes be large. This variation is a reflection of the fundamental uncertainty about the true value of the item. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- IMPUTATION ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Most of the variables that originally contained a missing value code have been imputed. The exceptions include such variables as X6695 (which reports the original number of checking accounts reported by the survey respondent) and X6504 (which is the interviewer's description of the property where the respondent lives). The nature of any originally missing values may be understood by examining the J-codes associated with the variables. A multiple imputation procedure yielding five values for each missing value is used to approximate the distribution of the missing data. The individual imputation are made by drawing repeatedly from an estimate of the conditional distribution of the data. The imputations are stored as five successive replicates ("implicates") of each data record. Thus, the number of observations in the full data set (22,610) is five times the actual number of respondents (4,522) (see DISCLOSURE REVIEW below for information on the public version of the data set). The imputation procedure is described in "Multiple Imputation in the Survey of Consumer Finances" (Arthur B. Kennickell, September 1998, http://www.federalreserve.gov/ pubs/oss/oss2/method.html). For a general discussion of multiple imputation and its uses, see "Multiple Imputation for Nonresponse in Surveys" by Donald B. Rubin, John Wiley and Sons, 1987. Multiple imputation offers two distinct advantges compared with singly-imputed data. First, because multiple imputation yields multiple outcomes from a random process, it supports more efficient estimation than singly-imputed data. Second, multiple imputation allows users to make straightforward estimates of the degree of uncertainty associated with the missing information. For users who want to estimate only simple statistics such as means and medians ignoring the effects of imputation error on the standard errors of these estimates, it will probably be sufficient to divide the weights by 5. Software to compute means and medians and their associated standard errors with respect to imputation and sampling error is provided in the section on sampling error later in this codebook. Users who want to estimate more complex statistics, particularly regressions, should be cautious in their treatment of the implicates. Many regression packages will treat each of the five implicates as an independent observation and correspondingly inflate the reported statistical significance of results. Users who want to calculate regression estimates, but who have no immediate use for proper significance tests, could either average the dependent and independent values across the implicates or multiply their standard errors by the square root of five. For an easily understandable discussion of multiple imputation in the SCF from a user's point of view, see Catherine Montalto and Jaimie Sung, "Multiple Imputation in the 1992 Survey of Consumer Finances," Financial Counseling and Planning, Volume 7, 1996, pages 133-146 (or on the Internet at http://www.hec.ohio-state.edu/hanna/ imput.htm). That article also contains a set of simple SAS macros to use to compute correct standard errors from multiply imputed data. Two alternatives for processing general model estimates are offered here, one written in SAS (MACRO MISECOMP) and the other in Stata (StataMIcode.do). See the section "ANALYSIS WEIGHTS" below for a brief discussion of the inclusion of sample design effects in the estimation of complex statistics. *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; ******************************************************************************; ***** MACRO MISECOMP *****; ******************************************************************************; * MACRO MISECOMP computes standard errors corrected for multiple imputation; * The input may be regression results, or any other results (e.g., probits) that include a point estimate and a standard error estimate for each implicate; * The data sets are named &DSN.1-&DSN&NIMP (where &DSN and &NIMP are defined below); * The form of the input data set is described above; * Often, it is quite easy to copy output directly from a statistical procedure into the form of this program without deleting extraneous information; * The required input variables are VARN (a name of the statistic of interest in all NIMP data sets), B1-B&NIMP (a working name for the point estimate of interest for each implicate--where the terminal number corresponds to the terminal number of the input data set), and S1-S&NIMP (a working name for the standard error of the point estimate in each implicate--where the terminal number corresponds to the terminal number of the input data set; * The parameters of the MACRO are: NIMP: number of implicates (default is 5) DSN: first part of name of each of the NIMP input data sets (e.g., DSN11, DSN12,...,DSN15 could be results for implicates 1-5 for model 1) (default is DSN1i, where "i" ranges from 1 to NIMP) PRNTPR: determines the number of digits of the output data (default is SAS format 10.6); * The output includes three lines for each unique VARN in the input data sets: the final point estimate, the final standard error, and the final t-statistic; ******************************************************************************; * Steps to compute standard errors; * (1) run each model (regressions, probits, etc.) for each of the five implicates separately; * (2) copy the model outputs into program code as described above; /* For example, DATA DSNij; INPUT VARN $ Bi Si; CARDS; data here ; RUN; where "i" ranges over the number of distinct models treated, and "j" ranges over the number of implicates. NOTE: any technique that reads VARN, Bi and Si into the data sets will work. */ * (3) call MISECOMP (MACRO defaults will work correctly for the SCF if the data set names are DSN11, DSN12, DSN13, DSN14, DSN15); ******************************************************************************; ******************************************************************************; %MACRO MISECOMP(NIMP=5,DSN=DSN1,PRNTPR=10.6); DATA &DSN.1; SET &DSN.1; ORD=_N_; RUN; %DO I=1 %TO &NIMP; PROC SORT DATA=&DSN&I; BY VARN; RUN; %END; DATA ALL; MERGE %DO I=1 %TO &NIMP; &DSN&I %END; ;; BY VARN; ARRAY BMOD {*} %DO I=1 %TO &NIMP; B&I %END;; ARRAY SMOD {*} %DO I=1 %TO &NIMP; S&I %END;; BETA=0; SIGMA=0; ST=0; DO J=1 TO &NIMP; BETA=BMOD{J}+BETA; SIGMA=SMOD{J}**2+SIGMA; END; BETA=BETA/&NIMP; SIGMA=SIGMA/&NIMP; DO I=1 TO &NIMP; ST=ST+(BETA-BMOD{I})**2; END; SIGMA=SQRT(SIGMA+(1+1/5)*ST/(5-1)); TSTAT=BETA/SIGMA; RUN; PROC SORT DATA=ALL; BY ORD; RUN; DATA ALL; SET ALL; PUT VARN @15 BETA &PRNTPR / @15 SIGMA &PRNTPR / @15 TSTAT &PRNTPR; RUN; %MEND MISECOMP; %MISECOMP; ******************************************************************************; ******************************************************************************; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++* *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++* ****************************************************************************** ***** Stata.MIcode.do ***** ****************************************************************************** * Like the MACRO MISECOMP, this code computes coefficients and * standard errors corrected for multiple imputation for various models * (regressions, probit, logit, biprobit, heckprob, etc.). For each * independent variable (including a constant term), the output includes * the corrected point estimate, standard error, t_statistic, and an * indicator of the significance of the coefficient. One asterisk - 10% * level, two asterisks - 5% level and three asterisks - 1% level. * NOTE: In the example below, users must specify a data set name in * place of YOURDATA, a Stata statistical model command in place of * REGRESS, and dependent and independent variables names in place of * DEPVAR VAR1 VAR2 VAR3 ..VARN. Users may also save the results in a * permanent data set which in the example below is named FINALRESULTS. * This code may be copied into .do file, embedded in a program, * or executed interactively line by line ****************************************************************************** #delimit ; clear all; set mem 160m; set matsize 600; * Use a data set from the particular year that includes all five imputations; use "YOURDATA"; * Create a variable which identifies each implicate, substitute y1 and yy1 with x1 and xx1 when the data is from years prior to 1992; gen imp=y1-yy1*10; save YOURDATA, replace; use YOURDATA; * Sorting by the implicate variable is necessary before using the statsby command; sort imp; * The statsby command can be used with any statistical command. The expression must be bound in double quotes, followed by the requested output and the BY variable; * WARNING: Although most estimation commands work with the statsby command, not all of the options for the estimation commands will work with this code.; * Output the coefficients with the _b option in the statsby command; statsby "REGRESS DEPVAR VAR1 VAR2 VAR3 ..VARN " _b , by(imp); * Create five columns of coefficients per variable and name them by implicate number b1 b2 b3 b4 b5; xpose, clear varname; renpfix v b; drop if _varname=="imp"; gen str20 varname=substr(_varname,3,.); sort varname; save betas, replace; * The same procedure outputs the standard errors using the _se option in the statsby command; clear all; use YOURDATA; statsby "REGRESS DEPVAR VAR1 VAR2 VAR3 ..VARN " _se , by(imp); * Create five columns of standard errors per variable and name them by implicate number s1 s2 s3 s4 s5; xpose, clear varname; renpfix v s; drop if _varname=="imp"; gen str20 varname=substr(_varname,4,.); sort varname; save sigmas, replace; * Merge together the coefficients and standard errors to calculate the adjusted values; use betas; merge using sigmas; gen beta=(b1+b2+b3+b4+b5)/5; gen s0=((s1)^2+(s2)^2+(s3)^2+(s4)^2+(s5)^2)/5; gen st=(beta-b1)^2+(beta-b2)^2+(beta-b3)^2+(beta-b4)^2 +(beta-b5)^2; gen sigma=sqrt(s0+6/5*st/4); gen tstat=beta/sigma; gen str3 signf="***" if abs(tstat) >= 2.58; replace signf="**" if abs(tstat) >= 1.96 & abs(tstat) < 2.58; replace signf="*" if abs(tstat) >=1.65 & abs(tstat) < 1.96; replace signf="" if abs(tstat) < 1.65; keep varname beta sigma tstat signf; list varname beta sigma tstat signf; save FINALRESULTS; * NOTE: For a model with a selection equation (e.g., Heckman model), the results for the two equations will be stacked. In the case of the Heckman model the selection equation is listed first, followed by the second stage equation; ******************************************************************************; ******************************************************************************; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- DISCUSSION OF RANGE DATA COLLECTION AND J-CODES ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Since the 1995 SCF, the CAPI program has allowed interviewers a variety of ways to enter partial information (for a detailed description and analysis of range data in the 1995 survey, see "Using Range Techniques with CAPI in the 1995 Survey of Consumer Finances," Arthur B. Kennickell, January 1997, http://www.federalreserve.gov/ pubs/oss/oss2/method.html). In the past, we had evidence that some respondents volunteered figures in ranges. Good interviewers have always tried to get respondents to settle on a single "best" figure, but sometimes it may be that there is no firm figure (e.g., the value of a privately-held business may be known only at the point it is actually sold) and probing too far could cause the respondent to answer "don't know" or to refuse to answer. The CAPI program allows respondents to report a range of possible value in such cases. There is another class of respondents who may not volunteer a range, who do not know (or will not give) an exact figure, but who will give some information about the value. To obtain information from this second group of people, the CAPI program includes two options. First, a respondent who is uncomfortable actually saying an amount may report a letter from a card that specifies a number of ranges. The range card has been used very successfully in earlier waves of the SCF, but CAPI allows the option to be presented consistently. Second, a respondent who declines the use of the range card is asked a series of questions in a "decision tree" that are designed to specify a range. The dollar breaks in the decision tree vary by question (so that, for example, monthly rent is not subject to the same ranges as the value of corporate stock). The computer sequences used for range follow-up for all dollar values in the survey (known as "PROBE$") are outlined schematically in a section below. It should be noted that interviewers were strongly instructed that a single dollar value is the best answer to each of these questions. Although there is the distinct possibility that respondents may become "trained" in the use of the range questions during the course of the interview (the effect of this training is unclear at present: respondents may tend to report "too many" ranges because they know that they are allowed; alternatively, respondents may learn that it is much quicker to give a single dollar figure), interviewers should be using all of the standard techniques to get respondents to give a single figure where possible. Evidence from the SCF suggests that this approach dramatically reduces the frequency of "don't know" responses, but it has little effect on refusals. Although the overall proportion of respondents reporting no information is much lower, generally the proportion providing complete responses declined. *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; Schematic diagram of the sequence used for all dollar questions: Qnn. How much is your [******]? level 1: $________ Volunteer a range Don't know Refuse |________________| level 2: Confirm Range card Give a range? or dollar range? RC DR RC DR NO/DK Refuse level 3: OUT Letter Upper bound Letter/Own range Decision Lower bound tree level 4: Confirm Confirm Letter Own Confirm Confirm level 5: OUT OUT Confirm Confirm OUT OUT level 6: OUT OUT (OUT=proceed to next question) At the first level, the respondent has the option of providing a dollar amount (interviewers were strongly urged to obtain a single dollar value where possible), volunteering a range, answering "don't know," or refusing to answer. These responses require a variety of different follow-up questions. In the case of a single dollar figure, the CAPI program displays in words the number the interviewer has typed into the computer and proceeds to the next question. If the respondent volunteers a range, there is an option to report either a range in dollars (in some cases the upper or lower bound of a range may be missing--e.g., as in the case where a respondent answers "greater than a million dollars") or to give a letter from a range card (the ranges are given below). If the respondent answers "don't know" or refuses to answer, the program will request a range; if the respondent agrees, the program will accept the same types of range that may be volunteered directly from the initial dollar screen. If the respondent is unable to provide a range in this way (that is the respondent answers "no" or "don't know"), the program presents a series of questions known as a "decision tree," which is specified in greater detail below. If the respondent refuses at any point beyond the initial dollar screen or answers "don't know" at any point after entering the decision tree, the program proceeds to the confirmation screen. The exact question text for this sequence is given below. Because of software limitations, negative ranges presented a special problem. It was not feasible to build in negative ranges directly. As a compromise, interviewers were instructed to collect the ranges in absolute values and record in a comment box available in the program the fact that the range was negative. If R volunteers a range at level 1, the following is displayed at level 2: SELECT TYPE OF RANGE: ENTER LETTER FROM RANGE CARD R WILL GIVE RANGE If the range card option is chosen, the screen displays: ENTER LETTER FROM RANGE CARD: and any of the following letters on the range card may be entered: A ...... $1 - $100 B ...... $101 - $500 C ...... $501 - $1,000 D ...... $1,001 - $2,500 E ...... $2,501 - $5,000 F ...... $5,001 - $7,500 G ...... $7,501 - $10,000 H ...... $10,001 - $25,000 I ...... $25,001 - $50,000 J ...... $50,001 - $75,000 K ...... $75,001 - $100,000 L ...... $100,001 - $250,000 M ...... $250,001 - $500,000 N ...... $500,001 - $1 million O ...... $1 million - $5 million P ...... $5 million - $10 million Q ...... $10 million - $25 million R ...... $25 million - $50 million S ...... $50 million - $100 million T ...... More than $100 million If R offers to give a dollar range, the screen displays: ENTER LOW END OF RANGE : $___,___,___.__ ENTER HIGH END OF RANGE : $___,___,___.__ Whichever type of range is selected, after the range information is entered, the program skips to the confirmation screen. If R answers "don't know" or "refuse" at level 1, the following text is presented: IF IN-PERSON READ: Can you give me a range--either your own range or one from the range card? IF TELEPHONE: Can you give me a range? YES, OWN RANGE YES, RANGE CARD NO If one of the YES options is chosen, the sequence is as given above for directly volunteered ranges. If R refuses at this point, the program skips directly to the confirmation screen. If R answers "no/don't know", level three is a decision tree designed to guide the R into a range response where possible. In the decision tree, respondents are asked a series of questions to bound the true response within an interval. The intervals are defined in terms of seven values; let those values be denoted generically by V1, V2, V3, V4, V5, V6, V7. The questions asked take the following form: Q1. Was it more than V4 dollars, less than V4 dollars, or about V4 dollars? MORE --> GO TO Q2 LESS --> GO TO Q5 SAME, DK, REF --> CONFIRMATION SCREEN Q2. Was it more than V5 dollars, less than V5 dollars, or about V5 dollars? MORE --> GO TO Q3 LESS, SAME, DK, REF --> CONFIRMATION SCREEN Q3. Was it more than V6 dollars, less than V6 dollars, or about V6 dollars? MORE --> GO TO Q4 LESS, SAME, DK, REF --> CONFIRMATION SCREEN Q4. Was it more than V7 dollars, less than V7 dollars, or about V7 dollars? MORE, LESS, SAME, DK, REF --> CONFIRMATION SCREEN Q5. Was it more than V1 dollars, less than V1 dollars, or about V1 dollars? MORE --> GO TO Q6 LESS, SAME, DK, REF --> CONFIRMATION SCREEN Q6. Was it more than V2 dollars, less than V2 dollars, or about V2 dollars? MORE --> GO TO Q7 LESS, SAME, DK, REF --> CONFIRMATION SCREEN Q7. Was it more than V3 dollars, less than V3 dollars, or about V3 dollars? MORE, LESS, SAME, DK, REF --> CONFIRMATION SCREEN To allow for appropriate ranges for all dollar questions, there are eight different versions of the V1 to V7 variables given below. Version V1 V2 V3 V4 V5 V6 V7 1 10,000 100,000 250,000 500,000 1,000,000 5,000,000 10,000,000 2 10 25 50 100 200 300 500 3 50,000 100,000 150,000 250,000 500,000 1,000,000 5,000,000 4 5,000 25,000 50,000 100,000 250,000 500,000 1,000,000 5 5,000 10,000 25,000 50,000 100,000 250,000 750,000 6 500 1,000 5,000 10,000 25,000 75,000 250,000 7 100 250 500 1,000 2,000 10,000 50,000 8 50 100 250 500 1,000 5,000 10,000 There are 22 possible unique outcomes of each version of each of the 8 versions of the decision tree: 1. Q1=NO, Q5=NO 2. Q1=NO, Q5=DK 3. Q1=NO, Q5=Ref 4. Q1=NO, Q5=YES, Q6=NO 5. Q1=NO, Q5=YES, Q6=DK 6. Q1=NO, Q5=YES, Q6=Ref 7. Q1=NO, Q5=YES, Q6=YES, Q7=NO 8. Q1=NO, Q5=YES, Q6=YES, Q7=DK 9. Q1=NO, Q5=YES, Q6=YES, Q7=Ref 10. Q1=NO, Q5=YES, Q6=YES, Q7=YES 11. Q1=YES, Q2=NO 12. Q1=YES, Q2=DK 13. Q1=YES, Q2=Ref 14. Q1=YES, Q2=YES, Q3=NO 15. Q1=YES, Q2=YES, Q3=DK 16 Q1=YES, Q2=YES, Q3=Ref 17. Q1=YES, Q2=YES, Q3=YES, Q4=NO 18. Q1=YES, Q2=YES, Q3=YES, Q4=DK 19. Q1=YES, Q2=YES, Q3=YES, Q4=Ref 20. Q1=YES, Q2=YES, Q3=YES, Q4=YES 21. Q1=Ref ---> NOTE: RESULTS IN NO BOUNDING INFORMATION 22. Q1=DK ---> NOTE: RESULTS IN NO BOUNDING INFORMATION If R answers "don't know" or "refuse" at any point in the decision tree, the program skips to the confirmation screen. The confirmation screen: Where the R has given a complete dollar response the confirmation screen displays: I would like to confirm that that amount is... (amount in words) Where the R has given a a letter from the range card, the confirmation screen displays: I would like to confirm that is range card letter (letter). Where the R enters and completes any questions in the decision tree, the confirmation screen displays" I would like to confirm that the amount is in a range around... (midpoint of a fully bounded range or the endpoint of an open-ended range) Where the R refuses or answers "don't know" in a way that no range information at all is obtained, the confirmation screen indicates to the interviewer that no information has been obtained. Nothing on the confirmation screen is read to the R in this case. The data entry on the confirmation screen offers the following two options: THIS IS CORRECT GO BACK AND FIX *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; The shadow variables fall into three large groups. Codes of less than 90 indicate that data were not originally missing (or that they could be inferred with high confidence from other information). Codes with an integer value from 90 through 996 indicate that the respondent provided a range response. The extensive form of the paths through the range questions encompasses a large number of outcomes, as is reflected in the number of possible range codes. For the codes that indicate a range response, there may also be a decimal component. A code with a decimal part equal to 0.5 indicates that the initial response that the respondent gave to the associated dollar question was "don't know." In every other case, there should be no decimal component to the shadow variable. Codes of 997 or more indicate that the associated data value was completely missing. There is an important exception to the normal assignment of J-codes. In some cases, it is not known where a reported value should actually be reported, because a higher-order question was missing. For example, if the respondent does not know if a car loan is a regular installment loan, by default the CAPI program asks a generic question about the typical payment on the loan; if the loan is a regular installment loan, the appropriate question would be about regular payments; in the initial data processing, the payment amount is inserted into both potential locations and one of them is set to the code for inapplicable (zero) after the loan types is imputed. In such cases, the original J-code for the reported data is retained in all relevant locations. As a quality control mechanism, the imputation software is set up so that it can never alter an original J-code. *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; Definitions of the "J" Variables (2004 version) 0 = Originally reported value. See above for an exception. 1 = Question is inapplicable (e.g., R has no checking account so value of checking account is coded as zero.) NOTE: all values of zero in the data set are in some sense inapplicable [also see J-code value 14]; reported values of zero are typically stored as -1. 2 = Data taken from (or moved from) another location (e.g., a motorcycle misclassified in the automobile grid moved to the other vehicle grid); data moved from another location and added to data already at the new location (e.g., wage income from spouse reported in independent adult part of the questionnaire added to data reported for R in the section on total family income); data reported in a "mop-up" field that could be directly mapped into the correct final location. These moves and changes may be the result of verbatim responses, interviewer comment, or other information. 4 = CAPI program error: resolution yields a non-missing (non-range) value. 5 = Indicates a value coded directly by FRB staff from a verbatim ("other/specify") response or interviewer comments that translate directly into a valid response. This code indicates that there was the exercise of a bare minimum of judgment in encoding the content of the text data. ("Super-no" corrections are included here.) 6 = Indicates a value coded directly by NORC from a verbatim ("other/specify") response. 8 = variable computed from other non-missing variables. 9 = variable overridden by logically equivalent information to maintain consistency of data (e.g., when type of property is a time share (X1703=25), but R says they own the share alone (X1704=1)--rather than saying that the property is a time share (X1704=5)--then the response to X1704 is changed to 5). 10 = This code applies to variables where part of the original value reported should have been (or was) also reported elsewhere and is edited out here (e.g., in the case where the wage income of NPEU member is reported at X6403 and at X5702 along with income of the PEU, the NPEU value is removed from X5702 and J5702=10). 11 = Assumption made in CAPI program to guide questions dependent on marital history (applies at X107 only): value originally answered "don't know" or "refuse". 12 = Assumption made in CAPI program to guide questions dependent on marital history (applies at X107 and X7020 only); except code 11. Question not directly asked. 13 = Data change in editing; based on information in interviewer comments made during or after the interview, data structures elsewhere in the interview, data retrieval from interviewers, or mechanical review of data patterns. Judgment is implied in the use of this code. 14 = Inapplicable code generated by any data adjustment, (particularly adjustments associated with J-codes 2, 10, 13, 15, 16, and 17). 15 = Non-stochastic imputation of missing data (typically based at least in part on other, non-codeable data). 17 = Value of originally missing data item implied by/computed from other variable(s). Relatively more judgement is implied by this code than a code 8. 30 = Respondent agreed to provide a dollar range as a response (either as a directly volunteered range or in response to the question soliciting a range after an intial response or "don't know" or "refuse"), but the upper and lower bounds of the range given were identical. 31 = Respondent entered the decision tree, but chose one of the boundary points of a range as the approximate value. ALL RESPONSES THAT FOLLOW HAVE AT LEAST SOME MISSING INFORMATION 90 = Bounding information available based on summary information provided by respondent (typically, if a R does not know information about items beyond a certain number in a set of detailed questions about a larger number of such items, the R is asked one or a number of summary questions about all remaining instances). 91 = Same as 90, but R gave range data for the summary information. RANGE RESPONSES: POSITIVE RANGES DECISION TREE RESPONSES THAT RESULTED IN A BOUND FOR POSITIVE NUMBERS (NOTE: for decision tree codes, responses that resulted in no usable bounding information are collected separately below) '*' indicates an open-ended interval NOTE: for J-code outcomes from 101-878, 921-940, and 971-990, .5 is added to the J-code if the original response was DK 101 = Decision tree response, version 1: outcome 1 (*,<=V1):($10K,*) 102 = Decision tree response, version 1: outcome 2 (*,<=V4):($500K,*) 103 = Decision tree response, version 1: outcome 3 (*,<=V4):($500K,*) 104 = Decision tree response, version 1: outcome 4 (>V1,<=V2):(>$10K,<=$100K) 105 = Decision tree response, version 1: outcome 5 (>V1,<=V4):(>$10K,<=$500K) 106 = Decision tree response, version 1: outcome 6 (>V1,<=V4):(>$10K,<=$500K) 107 = Decision tree response, version 1: outcome 7 (>V2,<=V3):(>$100K,<=$250K) 108 = Decision tree response, version 1: outcome 8 (>V2,<=V4):(>$100K,<=$500K) 109 = Decision tree response, version 1: outcome 9 (>V2,<=V4):(>$100K,<=$500K) 110 = Decision tree response, version 1: outcome 10 (>V3,<=V4):(>$250K,<=$500K) 119 = Decision tree response, version 1: outcome 11 (>V4,<=V5):(>$500K,<=$1M) 120 = Decision tree response, version 1: outcome 12 (>V4,*):(>$500K,*) 121 = Decision tree response, version 1: outcome 13 (>V4,*):(>$500,*) 122 = Decision tree response, version 1: outcome 14 (>V5,<=V6):(>$1M,<=$5M) 123 = Decision tree response, version 1: outcome 15 (>V5,*):(>$1M,*) 124 = Decision tree response, version 1: outcome 16 (>V5,*):(>$1M,*) 125 = Decision tree response, version 1: outcome 17 (>V6,<=V7):(>$5M,<=$10M) 126 = Decision tree response, version 1: outcome 18 (>V6,*):(>$5M,*) 127 = Decision tree response, version 1: outcome 19 (>V6,*):(>$5M,*) 128 = Decision tree response, version 1: outcome 20 (>V7,*):(>$10M,*) 201 = Decision tree response, version 2: outcome 1 (*,<=V1):(*,<=$10) 202 = Decision tree response, version 2: outcome 2 (*,<=V4):(*,<=$100) 203 = Decision tree response, version 2: outcome 3 (*,<=V4):(*,<=$100) 204 = Decision tree response, version 2: outcome 4 (>V1,<=V2:(>$10,<=$25) 205 = Decision tree response, version 2: outcome 5 (>V1,<=V4):(>$10,<=$100) 206 = Decision tree response, version 2: outcome 6 (>V1,<=V4):(>$10,<=$100) 207 = Decision tree response, version 2: outcome 7 (>V2,<=V3):(>$25,<=$50) 208 = Decision tree response, version 2: outcome 8 (>V2,<=V4):(>$25,<=$100) 209 = Decision tree response, version 2: outcome 9 (>V2,<=V4):(>$25,<=$100) 210 = Decision tree response, version 2: outcome 10 (>V3,<=V4):(>$50,<=$100) 219 = Decision tree response, version 2: outcome 11 (>V4,<=V5):(>$100,<=$200) 220 = Decision tree response, version 2: outcome 12 (>V4,*):(>$200,*) 221 = Decision tree response, version 2: outcome 13 (>V4,*):(>$200,*) 222 = Decision tree response, version 2: outcome 14 (>V5,<=V6):(>$200,<=$300) 223 = Decision tree response, version 2: outcome 15 (>V5,*):(>$200,*) 224 = Decision tree response, version 2: outcome 16 (>V5,*):(>$200,*) 225 = Decision tree response, version 2: outcome 17 (>V6,<=V7):(>$300,<=$500) 226 = Decision tree response, version 2: outcome 18 (>V6,*):(>$300,*) 227 = Decision tree response, version 2: outcome 19 (>V6,*):(>$300,*) 228 = Decision tree response, version 2: outcome 20 (>V7,*):(>$500,*) 301 = Decision tree response, version 3: outcome 1 (*,<=V1):(*,<=$50K) 302 = Decision tree response, version 3: outcome 2 (*,<=V4):(*,<=$250K) 303 = Decision tree response, version 3: outcome 3 (*,<=V4):(*,<=$250K) 304 = Decision tree response, version 3: outcome 4 (>V1,<=V2:(>$50K,<=$100K) 305 = Decision tree response, version 3: outcome 5 (>V1,<=V4):(>$50K,<=$250K) 306 = Decision tree response, version 3: outcome 6 (>V1,<=V4):(>$50K,<=$250K) 307 = Decision tree response, version 3: outcome 7 (>V2,<=V3):(>$100K,<=$150K) 308 = Decision tree response, version 3: outcome 8 (>V2,<=V4):(>$100K,<=$250K) 309 = Decision tree response, version 3: outcome 9 (>V2,<=V4):(>$100K,<=$250K) 310 = Decision tree response, version 3: outcome 10 (>V3,<=V4):(>$150K,<=$250K) 319 = Decision tree response, version 3: outcome 11 (>V4,<=V5):(>$250K,<=$500K) 320 = Decision tree response, version 3: outcome 12 (>V4,*):(>$250K,*) 321 = Decision tree response, version 3: outcome 13 (>V4,*):(>$250K,*) 322 = Decision tree response, version 3: outcome 14 (>V5,<=V6):(>$500K,<=$1M) 323 = Decision tree response, version 3: outcome 15 (>V5,*):(>$500K,*) 324 = Decision tree response, version 3: outcome 16 (>V5,*):(>$500K,*) 325 = Decision tree response, version 3: outcome 17 (>V6,<=V7):(>$1M,<=$5M) 326 = Decision tree response, version 3: outcome 18 (>V6,*):(>$1M,*) 327 = Decision tree response, version 3: outcome 19 (>V6,*):(>$1M,*) 328 = Decision tree response, version 3: outcome 20 (>V7,*):(>$5M,*) 401 = Decision tree response, version 4: outcome 1 (*,<=V1):(*,<=$5K) 402 = Decision tree response, version 4: outcome 2 (*,<=V4):(*,<=$100K) 403 = Decision tree response, version 4: outcome 3 (*,<=V4):(*,<=$100K) 404 = Decision tree response, version 4: outcome 4 (>V1,<=V2):(>$5K,<=$25K) 405 = Decision tree response, version 4: outcome 5 (>V1,<=V4):(>$5K,<=$100K) 406 = Decision tree response, version 4: outcome 6 (>V1,<=V4):(>$5K,<=$100K) 407 = Decision tree response, version 4: outcome 7 (>V2,<=V3):(>$25K,<=$50K) 408 = Decision tree response, version 4: outcome 8 (>V2,<=V4):(>$25K,<=$100K) 409 = Decision tree response, version 4: outcome 9 (>V2,<=V4):(>$25K,<=$100K) 410 = Decision tree response, version 4: outcome 10 (>V3,<=V4):(>$50K,<=$100K) 419 = Decision tree response, version 4: outcome 11 (>V4,<=V5):(>$100K,<=$250K) 420 = Decision tree response, version 4: outcome 12 (>V4,*):(>$100K,*) 421 = Decision tree response, version 4: outcome 13 (>V4,*):(>$100K,*) 422 = Decision tree response, version 4: outcome 14 (>V5,<=V6):(>$250K,<=$500K) 423 = Decision tree response, version 4: outcome 15 (>V5,*):(>$250K,*) 424 = Decision tree response, version 4: outcome 16 (>V5,*):(>$250K,*) 425 = Decision tree response, version 4: outcome 17 (>V6,<=V7):(>$500K,<=$1M) 426 = Decision tree response, version 4: outcome 18 (>V6,*):(>$500K,*) 427 = Decision tree response, version 4: outcome 19 (>V6,*):(>$500K,*) 428 = Decision tree response, version 4: outcome 20 (>V7,*):(>$1M,*) 501 = Decision tree response, version 5: outcome 1 (*,<=V1):(*,<=$5K) 502 = Decision tree response, version 5: outcome 2 (*,<=V4):(*,<=$50K) 503 = Decision tree response, version 5: outcome 3 (*,<=V4):(*,<=$50K) 504 = Decision tree response, version 5: outcome 4 (>V1,<=V2):(>$5K,<=$10K) 505 = Decision tree response, version 5: outcome 5 (>V1,<=V4):(>$5K,<=$50K) 506 = Decision tree response, version 5: outcome 6 (>V1,<=V4):(>$5K,<=$50K) 507 = Decision tree response, version 5: outcome 7 (>V2,<=V3):(>$10K,<=$25K) 508 = Decision tree response, version 5: outcome 8 (>V2,<=V4):(>$10K,<=$50K) 509 = Decision tree response, version 5: outcome 9 (>V2,<=V4):(>$10K,<=$50K) 510 = Decision tree response, version 5: outcome 10 (>V3,<=V4):(>$25K,<=$50K) 519 = Decision tree response, version 5: outcome 11 (>V4,<=V5):(>$50K,<=$100K) 520 = Decision tree response, version 5: outcome 12 (>V4,*):(>$50K,*) 521 = Decision tree response, version 5: outcome 13 (>V4,*):(>$50K,*) 522 = Decision tree response, version 5: outcome 14 (>V5,<=V6):(>$100K,<=$250K) 523 = Decision tree response, version 5: outcome 15 (>V5,*):(>$100K,*) 524 = Decision tree response, version 5: outcome 16 (>V5,*):(>$100K,*) 525 = Decision tree response, version 5: outcome 17 (>V6,<=V7):(>$250K,<=$750K) 526 = Decision tree response, version 5: outcome 18 (>V6,*):(>$250K,*) 527 = Decision tree response, version 5: outcome 19 (>V6,*):(>$250K,*) 528 = Decision tree response, version 5: outcome 20 (>V7,*):(>$750K,*) 601 = Decision tree response, version 6: outcome 1 (*,<=V1):(*,<=$500) 602 = Decision tree response, version 6: outcome 2 (*,<=V4):(*,<=$10K) 603 = Decision tree response, version 6: outcome 3 (*,<=V4):(*,<=$10K) 604 = Decision tree response, version 6: outcome 4 (>V1,<=V2:(>$500,<=$1K) 605 = Decision tree response, version 6: outcome 5 (>V1,<=V4):(>$500,<=$10K) 606 = Decision tree response, version 6: outcome 6 (>V1,<=V4):(>$500,<=$10K) 607 = Decision tree response, version 6: outcome 7 (>V2,<=V3):(>$1K,<=$5K) 608 = Decision tree response, version 6: outcome 8 (>V2,<=V4):(>$1K,<=$10K) 609 = Decision tree response, version 6: outcome 9 (>V2,<=V4):(>$1K,<=$10K) 610 = Decision tree response, version 6: outcome 10 (>V3,<=V4):(>$5K,<=$10K) 619 = Decision tree response, version 6: outcome 11 (>V4,<=V5):(>$10K,<=$25K) 620 = Decision tree response, version 6: outcome 12 (>V4,*):(>$10K,*) 621 = Decision tree response, version 6: outcome 13 (>V4,*):(>$10K,*) 622 = Decision tree response, version 6: outcome 14 (>V5,<=V6):(>$25K,<=$75K) 623 = Decision tree response, version 6: outcome 15 (>V5,*):(>$25K,*) 624 = Decision tree response, version 6: outcome 16 (>V5,*):(>$25K,*) 625 = Decision tree response, version 6: outcome 17 (>V6,<=V7):(>$75K,$250K) 626 = Decision tree response, version 6: outcome 18 (>V6,*):(>$75K,*) 627 = Decision tree response, version 6: outcome 19 (>V6,*):(>$75K,*) 628 = Decision tree response, version 6: outcome 20 (>V7,*):(>$250K,*) 701 = Decision tree response, version 7: outcome 1 (*,<=V1):(*,<=$100) 702 = Decision tree response, version 7: outcome 2 (*,<=V4):(*,<=$250) 703 = Decision tree response, version 7: outcome 3 (*,<=V4):(*,<=$250) 704 = Decision tree response, version 7: outcome 4 (>V1,<=V2:(>$100,<=$1K) 705 = Decision tree response, version 7: outcome 5 (>V1,<=V4):(>$100,<=$1K) 706 = Decision tree response, version 7: outcome 6 (>V1,<=V4):(>$100,<=$1K) 707 = Decision tree response, version 7: outcome 7 (>V2,<=V3):(>$250,<=$500) 708 = Decision tree response, version 7: outcome 8 (>V2,<=V4):(>$250,<=$1K) 709 = Decision tree response, version 7: outcome 9 (>V2,<=V4):(>$250,<=$1K) 710 = Decision tree response, version 7: outcome 10 (>V3,<=V4):(>$500,<=$1K) 719 = Decision tree response, version 7: outcome 11 (>V4,<=V5):(>$1K,<=$2K) 720 = Decision tree response, version 7: outcome 12 (>V4,*):(>$1K,*) 721 = Decision tree response, version 7: outcome 13 (>V4,*):(>$1K,*) 722 = Decision tree response, version 7: outcome 14 (>V5,<=V6):(>$2K,<=$10K) 723 = Decision tree response, version 7: outcome 15 (>V5,*):(>$2K,*) 724 = Decision tree response, version 7: outcome 16 (>V5,*):(>$2K,*) 725 = Decision tree response, version 7: outcome 17 (>V6,<=V7):(>$10K,<=$50K) 726 = Decision tree response, version 7: outcome 18 (>V6,*):(>$10K,*) 727 = Decision tree response, version 7: outcome 19 (>V6,*):(>$10K,*) 728 = Decision tree response, version 7: outcome 20 (>V7,*):(>$50K,*) 801 = Decision tree response, version 8: outcome 1 (*,<=V1):(*,<=$50) 802 = Decision tree response, version 8: outcome 2 (*,<=V4):(*,<=$500) 803 = Decision tree response, version 8: outcome 3 (*,<=V4):(*,<=$500) 804 = Decision tree response, version 8: outcome 4 (>V1,<=V2):(>$50,<=$100) 805 = Decision tree response, version 8: outcome 5 (>V1,<=V4):(>$50,<=$500) 806 = Decision tree response, version 8: outcome 6 (>V1,<=V4):(>$50,<=$500) 807 = Decision tree response, version 8: outcome 7 (>V2,<=V3):(>$100,<=$250) 808 = Decision tree response, version 8: outcome 8 (>V2,<=V4):(>$100,<=$500) 809 = Decision tree response, version 8: outcome 9 (>V2,<=V4):(>$100,<=$500) 810 = Decision tree response, version 8: outcome 10 (>V3,<=V4):(>$250,<=$500) 819 = Decision tree response, version 8: outcome 11 (>V4,<=V5):(>$500,<=$1K) 820 = Decision tree response, version 8: outcome 12 (>V4,*):(>$500,*) 821 = Decision tree response, version 8: outcome 13 (>V4,*):(>$500,*) 822 = Decision tree response, version 8: outcome 14 (>V5,<=V6):(>$1K,<=$5K) 823 = Decision tree response, version 8: outcome 15 (>V5,*):(>$1K,*) 824 = Decision tree response, version 8: outcome 16 (>V5,*):(>$1K,*) 825 = Decision tree response, version 8: outcome 17 (>V6,<=V7):(>$5K,<=$10K) 826 = Decision tree response, version 8: outcome 18 (>V6,*):(>$5K,*) 827 = Decision tree response, version 8: outcome 19 (>V6,*):(>$5K,*) 828 = Decision tree response, version 8: outcome 20 (>V7,*):(>$10K,*) RANGE CARD RESPONSES FOR POSITIVE NUMBERS 901 = Range card response via [F9]: range A. $1 to $100 902 = Range card response via [F9]: range B. $101 to $500 903 = Range card response via [F9]: range C. $501 to $1,000 904 = Range card response via [F9]: range D. $1,001 to $2,500 905 = Range card response via [F9]: range E. $2,501 to $5,000 906 = Range card response via [F9]: range F. $5,001 to $7,500 907 = Range card response via [F9]: range G. $7,501 to $10,000 908 = Range card response via [F9]: range H. $10,001 to $25,000 909 = Range card response via [F9]: range I. $25,001 to $50,000 910 = Range card response via [F9]: range J. $50,001 to $75,000 911 = Range card response via [F9]: range K. $75,001 to $100,000 912 = Range card response via [F9]: range L. $100,001 to $250,000 913 = Range card response via [F9]: range M. $250,001 to $500,000 914 = Range card response via [F9]: range N. $500,001 to $1,000,000 915 = Range card response via [F9]: range O. $1,000,001 to $5,000,000 916 = Range card response via [F9]: range P. $5,000,001 to $10,000,000 917 = Range card response via [F9]: range Q. $10,000,001 to $25,000,000 918 = Range card response via [F9]: range R. $25,000,001 to $50,000,000 919 = Range card response via [F9]: range S. $50,000,001 to $100,000,000 920 = Range card response via [F9]: range T. More than $100,000,000 921 = Range card response via DKDOL: range A. $1 to $100 922 = Range card response via DKDOL: range B. $101 to $500 923 = Range card response via DKDOL: range C. $501 to $1,000 924 = Range card response via DKDOL: range D. $1,001 to $2,500 925 = Range card response via DKDOL: range E. $2,501 to $5,000 926 = Range card response via DKDOL: range F. $5,001 to $7,500 927 = Range card response via DKDOL: range G. $7,501 to $10,000 928 = Range card response via DKDOL: range H. $10,001 to $25,000 929 = Range card response via DKDOL: range I. $25,001 to $50,000 930 = Range card response via DKDOL: range J. $50,001 to $75,000 931 = Range card response via DKDOL: range K. $75,001 to $100,000 932 = Range card response via DKDOL: range L. $100,001 to $250,000 933 = Range card response via DKDOL: range M. $250,001 to $500,000 934 = Range card response via DKDOL: range N. $500,001 to $1,000,000 935 = Range card response via DKDOL: range O. $1,000,001 to $5,000,000 936 = Range card response via DKDOL: range P. $5,000,001 to $10,000,000 937 = Range card response via DKDOL: range Q. $10,000,001 to $25,000,000 938 = Range card response via DKDOL: range R. $25,000,001 to $50,000,000 939 = Range card response via DKDOL: range S. $50,000,001 to $100,000,000 940 = Range card response via DKDOL: range T. More than $100,000,000 RESPONDENT-PROVIDED DOLLAR RANGE FOR POSITIVE NUMBERS 941 = Upper and lower bounds given: Reached via [F9] 942 = Upper bound given, lower bound missing: Reached via [F9] 943 = Lower bound given, upper bound missing: Reached via [F9] 944 = Upper and lower bounds given: Reached via DK8/9 945 = Upper bound given, lower bound missing: Reached via DK8/9 946 = Lower bound given, upper bound missing: Reached via DK8/9 INTERVIEW COMMENT INDICATES THAT RANGES ARE NEGATIVE DECISION TREE RESPONSES THAT RESULTED IN A BOUND FOR NEGATIVE NUMBERS (NOTE: for decision tree codes, responses that resulted in no usable bounding information are collected separately below) 151 = Decision tree response, version 1: outcome 1 (negative value) 152 = Decision tree response, version 1: outcome 2 (negative value) 153 = Decision tree response, version 1: outcome 3 (negative value) 154 = Decision tree response, version 1: outcome 4 (negative value) 155 = Decision tree response, version 1: outcome 5 (negative value) 156 = Decision tree response, version 1: outcome 6 (negative value) 157 = Decision tree response, version 1: outcome 7 (negative value) 158 = Decision tree response, version 1: outcome 8 (negative value) 159 = Decision tree response, version 1: outcome 9 (negative value) 160 = Decision tree response, version 1: outcome 10 (negative value) 169 = Decision tree response, version 1: outcome 11 (negative value) 170 = Decision tree response, version 1: outcome 12 (negative value) 171 = Decision tree response, version 1: outcome 13 (negative value) 172 = Decision tree response, version 1: outcome 14 (negative value) 173 = Decision tree response, version 1: outcome 15 (negative value) 174 = Decision tree response, version 1: outcome 16 (negative value) 175 = Decision tree response, version 1: outcome 17 (negative value) 176 = Decision tree response, version 1: outcome 18 (negative value) 177 = Decision tree response, version 1: outcome 19 (negative value) 178 = Decision tree response, version 1: outcome 20 (negative value) 251 = Decision tree response, version 2: outcome 1 (negative value) 252 = Decision tree response, version 2: outcome 2 (negative value) 253 = Decision tree response, version 2: outcome 3 (negative value) 254 = Decision tree response, version 2: outcome 4 (negative value) 255 = Decision tree response, version 2: outcome 5 (negative value) 256 = Decision tree response, version 2: outcome 6 (negative value) 257 = Decision tree response, version 2: outcome 7 (negative value) 258 = Decision tree response, version 2: outcome 8 (negative value) 259 = Decision tree response, version 2: outcome 9 (negative value) 260 = Decision tree response, version 2: outcome 10 (negative value) 269 = Decision tree response, version 2: outcome 11 (negative value) 270 = Decision tree response, version 2: outcome 12 (negative value) 271 = Decision tree response, version 2: outcome 13 (negative value) 272 = Decision tree response, version 2: outcome 14 (negative value) 273 = Decision tree response, version 2: outcome 15 (negative value) 274 = Decision tree response, version 2: outcome 16 (negative value) 275 = Decision tree response, version 2: outcome 17 (negative value) 276 = Decision tree response, version 2: outcome 18 (negative value) 277 = Decision tree response, version 2: outcome 19 (negative value) 278 = Decision tree response, version 2: outcome 20 (negative value) 351 = Decision tree response, version 3: outcome 1 (negative value) 352 = Decision tree response, version 3: outcome 2 (negative value) 353 = Decision tree response, version 3: outcome 3 (negative value) 354 = Decision tree response, version 3: outcome 4 (negative value) 355 = Decision tree response, version 3: outcome 5 (negative value) 356 = Decision tree response, version 3: outcome 6 (negative value) 357 = Decision tree response, version 3: outcome 7 (negative value) 358 = Decision tree response, version 3: outcome 8 (negative value) 359 = Decision tree response, version 3: outcome 9 (negative value) 360 = Decision tree response, version 3: outcome 10 (negative value) 369 = Decision tree response, version 3: outcome 11 (negative value) 370 = Decision tree response, version 3: outcome 12 (negative value) 371 = Decision tree response, version 3: outcome 13 (negative value) 372 = Decision tree response, version 3: outcome 14 (negative value) 373 = Decision tree response, version 3: outcome 15 (negative value) 374 = Decision tree response, version 3: outcome 16 (negative value) 375 = Decision tree response, version 3: outcome 17 (negative value) 376 = Decision tree response, version 3: outcome 18 (negative value) 377 = Decision tree response, version 3: outcome 19 (negative value) 378 = Decision tree response, version 3: outcome 20 (negative value) 451 = Decision tree response, version 4: outcome 1 (negative value) 452 = Decision tree response, version 4: outcome 2 (negative value) 453 = Decision tree response, version 4: outcome 3 (negative value) 454 = Decision tree response, version 4: outcome 4 (negative value) 455 = Decision tree response, version 4: outcome 5 (negative value) 456 = Decision tree response, version 4: outcome 6 (negative value) 457 = Decision tree response, version 4: outcome 7 (negative value) 458 = Decision tree response, version 4: outcome 8 (negative value) 459 = Decision tree response, version 4: outcome 9 (negative value) 460 = Decision tree response, version 4: outcome 10 (negative value) 469 = Decision tree response, version 4: outcome 11 (negative value) 470 = Decision tree response, version 4: outcome 12 (negative value) 471 = Decision tree response, version 4: outcome 13 (negative value) 472 = Decision tree response, version 4: outcome 14 (negative value) 473 = Decision tree response, version 4: outcome 15 (negative value) 474 = Decision tree response, version 4: outcome 16 (negative value) 475 = Decision tree response, version 4: outcome 17 (negative value) 476 = Decision tree response, version 4: outcome 18 (negative value) 477 = Decision tree response, version 4: outcome 19 (negative value) 478 = Decision tree response, version 4: outcome 20 (negative value) 551 = Decision tree response, version 5: outcome 1 (negative value) 552 = Decision tree response, version 5: outcome 2 (negative value) 553 = Decision tree response, version 5: outcome 3 (negative value) 554 = Decision tree response, version 5: outcome 4 (negative value) 555 = Decision tree response, version 5: outcome 5 (negative value) 556 = Decision tree response, version 5: outcome 6 (negative value) 557 = Decision tree response, version 5: outcome 7 (negative value) 558 = Decision tree response, version 5: outcome 8 (negative value) 559 = Decision tree response, version 5: outcome 9 (negative value) 560 = Decision tree response, version 5: outcome 10 (negative value) 569 = Decision tree response, version 5: outcome 11 (negative value) 570 = Decision tree response, version 5: outcome 12 (negative value) 571 = Decision tree response, version 5: outcome 13 (negative value) 572 = Decision tree response, version 5: outcome 14 (negative value) 573 = Decision tree response, version 5: outcome 15 (negative value) 574 = Decision tree response, version 5: outcome 16 (negative value) 575 = Decision tree response, version 5: outcome 17 (negative value) 576 = Decision tree response, version 5: outcome 18 (negative value) 577 = Decision tree response, version 5: outcome 19 (negative value) 578 = Decision tree response, version 5: outcome 20 (negative value) 651 = Decision tree response, version 6: outcome 1 (negative value) 652 = Decision tree response, version 6: outcome 2 (negative value) 653 = Decision tree response, version 6: outcome 3 (negative value) 654 = Decision tree response, version 6: outcome 4 (negative value) 655 = Decision tree response, version 6: outcome 5 (negative value) 656 = Decision tree response, version 6: outcome 6 (negative value) 657 = Decision tree response, version 6: outcome 7 (negative value) 658 = Decision tree response, version 6: outcome 8 (negative value) 659 = Decision tree response, version 6: outcome 9 (negative value) 660 = Decision tree response, version 6: outcome 10 (negative value) 669 = Decision tree response, version 6: outcome 11 (negative value) 670 = Decision tree response, version 6: outcome 12 (negative value) 671 = Decision tree response, version 6: outcome 13 (negative value) 672 = Decision tree response, version 6: outcome 14 (negative value) 673 = Decision tree response, version 6: outcome 15 (negative value) 674 = Decision tree response, version 6: outcome 16 (negative value) 675 = Decision tree response, version 6: outcome 17 (negative value) 676 = Decision tree response, version 6: outcome 18 (negative value) 677 = Decision tree response, version 6: outcome 19 (negative value) 678 = Decision tree response, version 6: outcome 20 (negative value) 751 = Decision tree response, version 7: outcome 1 (negative value) 752 = Decision tree response, version 7: outcome 2 (negative value) 753 = Decision tree response, version 7: outcome 3 (negative value) 754 = Decision tree response, version 7: outcome 4 (negative value) 755 = Decision tree response, version 7: outcome 5 (negative value) 756 = Decision tree response, version 7: outcome 6 (negative value) 757 = Decision tree response, version 7: outcome 7 (negative value) 758 = Decision tree response, version 7: outcome 8 (negative value) 759 = Decision tree response, version 7: outcome 9 (negative value) 760 = Decision tree response, version 7: outcome 10 (negative value) 769 = Decision tree response, version 7: outcome 11 (negative value) 770 = Decision tree response, version 7: outcome 12 (negative value) 771 = Decision tree response, version 7: outcome 13 (negative value) 772 = Decision tree response, version 7: outcome 14 (negative value) 773 = Decision tree response, version 7: outcome 15 (negative value) 774 = Decision tree response, version 7: outcome 16 (negative value) 775 = Decision tree response, version 7: outcome 17 (negative value) 776 = Decision tree response, version 7: outcome 18 (negative value) 777 = Decision tree response, version 7: outcome 19 (negative value) 778 = Decision tree response, version 7: outcome 20 (negative value) 851 = Decision tree response, version 8: outcome 1 (negative value) 852 = Decision tree response, version 8: outcome 2 (negative value) 853 = Decision tree response, version 8: outcome 3 (negative value) 854 = Decision tree response, version 8: outcome 4 (negative value) 855 = Decision tree response, version 8: outcome 5 (negative value) 856 = Decision tree response, version 8: outcome 6 (negative value) 857 = Decision tree response, version 8: outcome 7 (negative value) 858 = Decision tree response, version 8: outcome 8 (negative value) 859 = Decision tree response, version 8: outcome 9 (negative value) 860 = Decision tree response, version 8: outcome 10 (negative value) 869 = Decision tree response, version 8: outcome 11 (negative value) 870 = Decision tree response, version 8: outcome 12 (negative value) 871 = Decision tree response, version 8: outcome 13 (negative value) 872 = Decision tree response, version 8: outcome 14 (negative value) 873 = Decision tree response, version 8: outcome 15 (negative value) 874 = Decision tree response, version 8: outcome 16 (negative value) 875 = Decision tree response, version 8: outcome 17 (negative value) 876 = Decision tree response, version 8: outcome 18 (negative value) 877 = Decision tree response, version 8: outcome 19 (negative value) 878 = Decision tree response, version 8: outcome 20 (negative value) RANGE CARD RESPONSES FOR NEGATIVE NUMBERS 951 = Range card response via [F9]: range A. -$1 to -$100 952 = Range card response via [F9]: range B. -$101 to -$500 953 = Range card response via [F9]: range C. -$501 to -$1,000 954 = Range card response via [F9]: range D. -$1,001 to -$2,500 955 = Range card response via [F9]: range E. -$2,501 to -$5,000 956 = Range card response via [F9]: range F. -$5,001 to -$7,500 957 = Range card response via [F9]: range G. -$7,501 to -$10,000 958 = Range card response via [F9]: range H. -$10,001 to -$25,000 959 = Range card response via [F9]: range I. -$25,001 to -$50,000 960 = Range card response via [F9]: range J. -$50,001 to -$75,000 961 = Range card response via [F9]: range K. -$75,001 to -$100,000 962 = Range card response via [F9]: range L. -$100,001 to -$250,000 963 = Range card response via [F9]: range M. -$250,001 to -$500,000 964 = Range card response via [F9]: range N. -$500,001 to -$1,000,000 965 = Range card response via [F9]: range O. -$1,000,001 to -$5,000,000 966 = Range card response via [F9]: range P. -$5,000,001 to -$10,000,000 967 = Range card response via [F9]: range Q. -$10,000,001 to -$25,000,000 968 = Range card response via [F9]: range R. -$25,000,001 to -$50,000,000 969 = Range card response via [F9]: range S. -$50,000,001 to -$100,000,000 970 = Range card response via [F9]: range T. Less than -$100,000,000 971 = Range card response via DKDOL: range A. -$1 to -$100 972 = Range card response via DKDOL: range B. -$101 to -$500 973 = Range card response via DKDOL: range C. -$501 to -$1,000 974 = Range card response via DKDOL: range D. -$1,001 to -$2,500 975 = Range card response via DKDOL: range E. -$2,501 to -$5,000 976 = Range card response via DKDOL: range F. -$5,001 to -$7,500 977 = Range card response via DKDOL: range G. -$7,501 to -$10,000 978 = Range card response via DKDOL: range H. -$10,001 to -$25,000 979 = Range card response via DKDOL: range I. -$25,001 to -$50,000 980 = Range card response via DKDOL: range J. -$50,001 to -$75,000 981 = Range card response via DKDOL: range K. -$75,001 to -$100,000 982 = Range card response via DKDOL: range L. -$100,001 to -$250,000 983 = Range card response via DKDOL: range M. -$250,001 to -$500,000 984 = Range card response via DKDOL: range N. -$500,001 to -$1,000,000 985 = Range card response via DKDOL: range O. -$1,000,001 to -$5,000,000 986 = Range card response via DKDOL: range P. -$5,000,001 to -$10,000,000 987 = Range card response via DKDOL: range Q. -$10,000,001 to -$25,000,000 988 = Range card response via DKDOL: range R. -$25,000,001 to -$50,000,000 989 = Range card response via DKDOL: range S. -$50,000,001 to -$100,000,000 990 = Range card response via DKDOL: range T. Less than -$100,000,000 RESPONDENT-PROVIDED DOLLAR RANGE FOR NEGATIVE NUMBERS 991 = Upper and lower bounds given (negative amount): Reached via [F9] 992 = Upper bound given, lower bound missing (negative amount): Reached via [F9] 993 = Lower bound given, upper bound missing (negative amount): Reached via [F9] 994 = Upper and lower bounds given (negative amount): Reached via DK8/9 995 = Upper bound given, lower bound missing (negative amount): Reached via DK8/9 996 = Lower bound given, upper bound missing (negative amount): Reached via DK8/9 OTHER RANGE RESPONSES THAT YIELDED NO NUMERICAL BOUNDING INFORMATION: ALL VARIABLES WITH J-CODE VALUES BELOW THIS POINT INITIALLY CONTAIN MISSING VALUE CODES AND ALL VARIABLES WITH RANGE J-CODE VALUES ABOVE THIS POINT INITIALLY CONTAIN A RANGE MID-POINT OR OTHER SUCH VALUE 1000 = R answered DK/REF to main $ question, and refused following question requesting a range from the range card (negative amount) 1001 = R answered DK/Ref to main $ question, and refused following question requesting type of range (negative amount) 1002 = R answered DK to main $ question, and DK (entered with a function key) to the following question requesting a range from the range card (negative amount) 1003 = R answered Ref to main $ question, and DK (entered with a function key) to the following question requesting a range from the range card (negative amount) 1094 = Exit decision tree at Q1 with Ref, any version: outcome 21 1095 = Exit Decision tree at Q1 with DK, any version: outcome 22 1100 = R answered DK/REF to main $ question, and refused following question requesting a range from the range card 1101 = R answered DK/Ref to main $ question, and refused following question requesting type of range 1102 = R answered DK to main $ question, and DK (entered with a function key) to the following question requesting a range from the range card 1103 = R answered Ref to main $ question, and DK (entered with a function key) to the following question requesting a range from the range card 1104 = Interviewer entered R's initial response as [F9], but R subsequently did not provide any range information within PROBE$ OTHER CODES FOR MISSING DATA 2050 = Original response was DK. 2052 = Original response missing as a result of missing information for a higher-order question. For example, if the respondent refused to say whether or not the family had a checking account, then the number of checking accounts would be missing in this sense. In a few circumstances a different procedure is followed: (1) if a dollar variable was missing and the answers in PROBE$ yielded a missing value that variable has an associated frequency question that is only asked when a positive value of the dollar variable is reported, then the frequency variable is given the same J-code as the dollar variable; (2) for clusters of variables containing a dollar amount and percent options (for example, employer match rate percentage contribution and dollar amount of contribution to a pension plan) that can be computed from each other (perhaps given some other variable--in the case of the example, this other variable would be the worker's wage). 2053 = Original response was refused 2054 = Original response was "some, DK how many" (see B6). 2056 = Missing value determined from verbatim response by NORC coders. 2060 = Unresolved data problem (none should remain in final data set). 2079 = Data missing because of questionnaire error. 2080 = Recode variable, missing because data not collected for sub-group, data to be imputed. 2081 = Recode variable, some, but not all components originally missing. 2082 = Recode variable, all components originally missing. 2097 = Override of reported information with (at least partially) imputed data 2098 = Override of reported/inap./other information with a missing value. 2099 = Used for absent spouse for J104 or J105 when X104 or X105 < 0. 3000 = Data missing because R broke off the interview (each of these cases reviewed to be sure that sufficient information is reported that the case can count as a "partial accepted as complete") 3001 = CAPI program error yielding a missing value. 3002 = Temporary value given to variables containing illegal values. These will all be resolved in editing and converted to other existing codes. (includes "range U") 3003 = Illegal zero response 3004 = Uninformative/irrelevant verbatim response 3005 = Data not available (applies to data from survey screener) 3500 = Data set to missing and imputed for disclosure avoidance General instructions for J variable coding for recoded variables: When a recoded variable is taken directly from another single X-variable, it should have the same J-variable code. When a recoded variable may come from a single variable in the original X-variables, or as the result of a calculation based on some number of X-variables, it is important to distinguish the information content in the J-variables. As noted above, when the value is taken directly, the J-variable should have exactly the same value as that for the X-variable's shadow J-variable. However, when some calculation is involved, this should be reflected in the J-variable -- codes 8, 2081, and 2082. When a recoded variable cannot be computed because some part of the underlying information was not collected for some subset of cases, the recoded variable's J-variable should be coded 9 or 2080. *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- ANALYSIS WEIGHTS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Because the SCF sample is not an equal-probability design, weights play a critical role in interpreting the survey data. The main data set contains the final nonresponse-adjusted sampling weights. These weights are intended to compensate for unequal probabilities of selection in the original design and for unit nonresponse (failure to obtain an interview). The weight (X42001) is a partially design-based weight constructed at the Federal Reserve using original selection probabilities and frame information along with aggregate control totals estimated from the Current Population Survey. This weight is a relatively minor revision of the consistent weight series (X42000) maintained for the SCFs beginning with 1989 (For a detailed discussion of these weights, see "Consistent Weight Design for the 1989, 1992, and 1995 SCFs and the Distribution of Wealth," by Arthur B. Kennickell and R. Louise Woodburn, Review of Income and Wealth, Series 45, Number 2, June 1999, pp. 193-215 or the longer version given on the SCF web site at http://www.federalreserve.gov/pubs/oss/oss2/method.html). The nature of the revisions to the consistent weights is described in "Revisions to the SCF Weighting Methodology: Accounting for Race/Ethnicity and Homeownership" (Arthur B. Kennickell, January 1999, http://www.federalreserve.gov/pubs/oss/oss2/method.html). A version of the revised weight has been computed for all the surveys beginning with 1989, and this variable has been added to the public versions of the SCF data sets. Users should be aware that the population defined by the weights for *each implicate* (see above) is 112.1 million households: the sum of each of the weights over all sample cases and imputation replicates is equal to five times the number of households in the sample universe. Although the weights should produce reliable results at the level of broad aggregates (e.g., net worth and income ), it is important to note that many of the variables collected in the SCF are highly skewed in their distributions and that many such variables will apply to only a relatively small fraction of the sample; thus, estimates of characterstics of such variables may be distorted by outliers. In the SCF group at the Federal Reserve, we routinely review our calculations for the presence of overly-influential outliers, and robust techniques are applied when appropriate. We encourage other users to exercise similar care in analyzing the data. The issue of weighting in regressions has long been controversial. Users of the SCF may find two references particularly useful: (1) Analysis of Complex Surveys, C.J. Skinner, D. Holt, and T.M.F. Smith (editors), John Wiley and Sons, 1989 (see particularly pages 8-10, 154-157, and 286-287). (2) The Analysis of Household Surveys: A Microeconometric Approach to Development Policy, Angus Deaton, Johns Hopkins University Press, 1997 (see particularly pages 67-73). At the least, users should think carefully about the effects of weights in their particular models. Weighted estimates may be dramatically less efficient than unweighted estimates. If one is interested in estimating descriptions of the population--rather than "structural models"--there are some clear justifications for weighting and making estimtes of sampling error (see below) to test for statistical significante. If weights make a substantial difference in regression estimates, analysts may want to consider the possibily that their models omit some key structure that could be controlled for in a way other than weighting. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- SAMPLING ERROR ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The SCF is a designed as a scientific instrument for the measurement of behavior. However, even under ideal operational conditions, the measurements of the survey are limited in a fundamental way by the fact that it is based on a sample of respondents rather than the entire population. Variability of estimates based on sample data can be estimated. Because we are unable to release any of the basic sample information about the cases in the data set, users are unable on their own to compute reasonable estimates of the sampling variances of their estimates. To facilitate such estimation, we provide a file of replicate weights and multiplicity factors corresponding to X42001. Using detailed information about the original sample design, we selected 999 sample replicates from the final set of completed cases in a way intended to capture the important dimensions of sample variation (for details see "Weighting design for the 1992 Survey of Consumer Finances," Arthur Kennickell, Douglas McManus and Louise Woodburn, December 1996, http://www.federalreserve.gov/pubs/oss/oss2/ method.html). For each survey case and each replicate, the file contains a weight (WT1B1-WT1B999) and the number of times the case was selected in the replicate (MM1-MM999). We computed weights for each replicate using exactly the same procedures we used for the main weights. Replicate weights were computed only for the first implicate of each case. For many purposes, users for the replicate weight files will probably want to multiply the weight times the multiplicity: in all cases the sum of each of the weights times the corresponding multiplicities of the cases equals the total number of households. To estimate the sampling variance of the mean of family income, for example, a user would estimate the mean 999 times using the replicate weights and compute the standard error of that estimate. An estimate of the total standard error attributable to imputation and sampling is given by SQRT((6/5)*imputation variance + sampling variance). A simple SAS program to compute the standard error due to sampling and imputation for the mean and median of a given variable is provided below. This program may be adapted easily for other types of calculations. For example, to compute the standard error of a proportion, create a zero/one dummy variable to indicate the presence of the item; the standard error of the mean will be the correct standard error of the proportion. To reduce the computer memory requirement, the program computes sampling error using blocks of 100 replicate weights rather than the full set at once. Users with large amounts of RAM may wish to increase the size of these blocks, and those with smaller amounts may wish to decrease the size. *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; * MACRO MEANIT; * AK November 12, 2002; * DSN specifies the name of the data set to be used (the data set should contain the following: the main weight renamed as WGT0, a set of variables WGT1-WGT999 equal to the replicate weights multiplied by the multiplicity factors, a variable for which one wishes to compute the standard error due to imputation and sampling for the mean and median, and a variable IMPLIC equal to the implicate number of each case) VAR contains the name of the variable for which one desires standard errors PFLAG: blank prints interim statistics/any character string (e.g., NO) surpresses printing WHERE: defines subsets of data (use IML conventions, e.g., ((X333=3 | X444=4) & X555=5 & X666^=6); * WARNING: this MACRO is not intended to be used with subsets of the full survey data where the population total varies across subsetted implicates: to use this MACRO to make calculations for a subset of the full data set, invoke the WHERE statement; * NOTE: the calculation excludes observations with missing values from the calculation. Thus, if one wants to make the calculation for only non-INAP values, a convenient short cut might be to set all such values (normally zero in the main SCF database) to a missing value (a WHERE condition would also work). The program assumes that missing value patterns are consistent across implicates--if this is not the case, a WHERE condition should be used; * WARNING: if one uses this MACRO to compute variances for very small sub-populations, there is a chance that some of the replicates may contain no cases where the condition defining the sub-population holds. In this case, the program will return a fatal error; %MACRO MEANIT(DSN=,VAR=NW2,PFLAG=,WHERE=); PROC SORT DATA=&DSN; BY &VAR; RUN; * compute pooled (over implicates) global mean/median; PROC UNIVARIATE DATA=&DSN; %IF (&WHERE NE ) %THEN %DO; WHERE (&WHERE & &VAR>.Z); %END; %ELSE %DO; WHERE (&VAR>.Z); %END; FREQ WGT0; VAR &VAR; RUN; PROC IML WORKSPACE=9000 SYMSIZE=5000; RESET LOG LINESIZE=78; PRINT "CALCULATION FOR &VAR"; * first imputation variance; EDIT &DSN; TEMP={IMPLIC &VAR WGT0}; %IF (&WHERE EQ ) %THEN %DO; READ ALL VAR TEMP WHERE (&VAR>.Z) INTO MDATA; %END; %ELSE %DO; READ ALL VAR TEMP WHERE (&WHERE) INTO MDATA; %END; * total population; %IF (&WHERE EQ ) %THEN %DO; POP=SUM(MDATA[,3])/5; %END; * create matrix to hold values of means/medians by implicates; IM=SHAPE(0,1,5); ID=SHAPE(0,1,5); * compute mean/median; DO I=1 TO 5; IMP=MDATA[LOC(MDATA[,1]=I),2:3]; * compute mean; MM=IMP[,1]#IMP[,2]; %IF (&WHERE NE ) %THEN %DO; POP=SUM(IMP[,2]); %END; IM[1,I]=MM[+,]/POP; * compute median; IMP[,2]=CUSUM(IMP[,2])/POP; ID[1,I]=IMP[MIN(LOC(IMP[,2]>=.5)),1]; FREE IMP MM; END; IMEAN=IM[,+]/5; IMEDIAN=ID[,+]/5; PRINT "MEAN OVER IMPLICATES " IMEAN; PRINT "MEDIAN OVER IMPLICATES " IMEDIAN; FREE MDATA IMEAN IMEDIAN; %IF (&PFLAG EQ ) %THEN %DO; PRINT IM ID; %END; * next sampling variance; * create matrix to hold values of means/medians by replicates; RM=SHAPE(0,1,999); RD=SHAPE(0,1,999); %DO I=1 %TO 10; %IF (&PFLAG EQ ) %THEN %PUT CLUMP NUMBER &I; %IF (&I EQ 1) %THEN %DO; %LET TOP=99; %LET BOT=1; %LET LEN=100; %END; %ELSE %DO; %LET BOT=%EVAL(&TOP+1); %LET TOP=%EVAL(&TOP+100); %LET LEN=101; %END; %LET WSTR=%STR(); %DO J=&BOT %TO &TOP; %LET WSTR=&WSTR WGT&J; %END; EDIT &DSN; TEMP={&VAR &WSTR}; %IF (&WHERE EQ ) %THEN %DO; READ ALL VAR TEMP WHERE (IMPLIC=1 & &VAR>.Z) INTO MDATA; %END; %ELSE %DO; READ ALL VAR TEMP WHERE (IMPLIC=1 & &WHERE) INTO MDATA; %END; * compute means; MEAN=MDATA[,2:&LEN]#MDATA[,1]; %IF (&WHERE NE ) %THEN %DO; POP=MDATA[+,2:&LEN]; RM[,&BOT:&TOP]=MEAN[+,]/POP[,1:&LEN-1]; %END; %ELSE %DO; RM[,&BOT:&TOP]=MEAN[+,]/POP; %END; * compute medians; DO I=2 TO &LEN; %IF (&WHERE NE ) %THEN %DO; MDATA[,I]=CUSUM(MDATA[,I])/POP[I-1]; %END; %ELSE %DO; MDATA[,I]=CUSUM(MDATA[,I])/POP; %END; RD[&BOT+I-2]=MDATA[MIN(LOC(MDATA[,I]>=.5)),1]; END; FREE MDATA; %END; %IF (&PFLAG EQ ) %THEN %DO; PRINT RM RD; %END; * finally, compute standard error wrt imputation/sampling; * (X-X-bar)**2/(n-1); IVM=(IM-IM[,+]/5)##2; IVM=IVM[,+]/4; IVD=(ID-ID[,+]/5)##2; IVD=IVD[,+]/4; RVM=(RM-RM[,+]/999)##2; RVM=RVM[,+]/998; RVD=(RD-RD[,+]/999)##2; RVD=RVD[,+]/998; * SQRT((((ni+1)/ni))*(SIGMAI**2) + SIGMAR**2); TVM=SQRT((6/5)*IVM+RVM); TVD=SQRT((6/5)*IVD+RVD); IVM=SQRT(IVM); IVD=SQRT(IVD); RVM=SQRT(RVM); RVD=SQRT(RVD); PRINT "STD DEV IMPUTATION: MEAN: " IVM " MEDIAN: " IVD; PRINT "STD DEV SAMPLING: MEAN: " RVM " MEDIAN: " RVD; PRINT "COMBINED STD DEV: MEAN: " TVM " MEDIAN: " TVD; QUIT; %MEND MEANIT; * create data set from main data set and replicate weight file; DATA DAT(KEEP=NW IMPLIC WGT0-WGT999); MERGE xxx.main_ds(KEEP=Y1 X42001 ...) xxx.rep_wgts(KEEP=Y1 MM1-MM999 WT1B1-WT1B999); BY Y1; * multiply replicate weights by the multiplicity; ARRAY MULT {*} MM1-MM999; ARRAY RWGT {*} WT1B1-WT1B999; ARRAY WGTS {*} WGT1-WGT999; DO I=1 TO DIM(MULT); * take max of multiplicity/weight: where cases not selected for a replicate, there are missing values in these variables; WGTS{I}=MAX(0,MULT{I})*MAX(0,RWGT{I}); END; WGT0=X42001; * define implicate number of case; IMPLIC=Y1-10*YY1; * define net worth (for example); NW=.......; RUN; * run the macro; %MEANIT(DSN=DAT,VAR=NW); *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*; ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- DISCLOSURE REVIEW ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- Variables not included in the public data set are available only to the project staff working on the SCF. Not even other researchers at the Federal Reserve Board are allowed access to the non-public data. There is no provision whatsoever for allowing direct access to such information for external researchers. Occasionally, when a researcher outside of the SCF project staff has a topic that complements the research interest of the Federal Reserve Board and when there has been time for the project staff to engage with the researcher, the project staff have made limited runs against the internal data of computer programs that have been specified and fully tested by external researchers. A special case must be made for each such instance. THERE IS NO ROUTINE PROVISION FOR ACCESS TO THE RESTRICED SCF DATA. A paramount goal of the survey is to protect the privacy of the participants, who generously shared their personal information. In light of this goal, the data in this release have been systematically altered by several means to minimize the possibility of identifying any survey respondent. For some discrete variables, small or unusual cells were collapsed as noted in the individual variable descriptions below with the descriptions of the variables. Continuous variables were rounded. Data were also blurred by other means intentionally not specified. In addition, a number of other cases were identified for more extensive treatment. Some of these cases were selected on the basis of extreme or unusual data values; other cases were selected at random. For each of these cases, a selection of critical variables was set to missing and statistically imputed subject to constraints designed to ensure that any distortions induced in key population statistics would be minimal. Where relevant, the codebook provides more detailed information on cell collapsing and other techniques. By design, the SCF sample excludes people who are included in the Forbes Magazine list of the 400 wealthiest people in the U.S. (see references in "SAMPLE DESIGN" above). However, there are several reasons why respondents with wealth at this level could appear in the sample anyway. In the 2004 survey, there were 3 observations that had net worth at least equal to the minumum level needed to qualify for the Forbes list. Because it would be very difficult to obscure sufficiently the identity of such people without rendering their data virtually useless, it was decided to remove them from the public version of the datset. Thus, the public version of the data set contains 4,519 of the 4,522 observations in the full data set. It is important to note that aside from the cell collapsing, there is no key in this codebook or in the data set that would allow users to identify with certainty either which data items have been smoothed or otherwise altered, or which cases were selected for imputation of critical values (that is, the shadow variables in this data set may not always reflect the true original status of every variable). Although this blurring of the data will have some effect on analysis, that effect should be negligible in most cases. For further details on the procedures taken to protect the identity of respondents, see "Analyzing the Disclosure Review Procedures for the 1995 Survey of Consumer Finances," Gerhard Fries, Barry W. Johnson, and R. Louise Woodburn, September 1997, http://www.federalreserve.gov/pubs/oss/ oss2/method.html) and "Multiple Imputation and Disclosure Protection: The Case of the 1995 SCF" (Arthur B. Kennickell, November 1997, http://www.federalreserve.gov/pubs/oss/oss2/method.html). The disclosure protections applied to the data are the product of an agreement between the Federal Reserve Board, NORC, and SOI. Users who feel that the restrictions imposed on the public data set are too constaining are encouraged to submit written proposals for expanded data release, and those requests will be given serious consideration in the release of data from future surveys. Note that dollar variables in the public data set have been rounded according to the following scheme which preserves the population mean on average: * All dollar variables except wages; ARRAY AMT {*} X412 X413 X414 X420 X421 X423 X424 X426 X427 X429 X430 X7575 X505 X510 X513 X518 X521 X526 X602 X604 X607 X612 X614 X617 X619 X623 X627 X631 X635 X703 X708 X716 X717 X721 X7138 X804 X805 X808 X813 X812 X904 X905 X908 X913 X912 X1004 X1005 X1008 X1013 X1012 X1035 X1039 X1040 X1044 X7141 X1108 X1109 X1104 X7142 X1119 X1120 X1115 X7143 X1130 X1131 X1126 X1136 X8401 X1202 X1206 X1210 X1211 X1215 X1219 X1220 X1224 X1405 X1408 X1409 X1410 X1415 X1417 X1505 X1508 X1509 X1510 X1515 X1517 X1605 X1608 X1609 X1610 X1615 X1617 X1619 X8402 X1621 X8404 X1706 X1709 X1714 X1715 X1718 X1723 X1722 X1730 X1806 X1809 X1814 X1815 X1818 X1823 X1822 X1830 X1906 X1909 X1914 X1915 X1918 X1923 X1922 X1930 X2002 X2003 X2006 X2007 X2010 X2012 X2013 X2016 X2017 X2020 X8406 X8407 X8410 X8411 X8414 X8416 X8417 X8420 X8421 X8424 X3121 X3124 X3126 X3129 X3130 X3131 X3132 X3221 X3224 X3226 X3229 X3230 X3231 X3232 X3321 X3324 X3326 X3329 X3330 X3331 X3332 X3335 X8425 X3336 X8426 X3337 X8427 X3408 X3409 X3410 X3412 X3413 X3414 X3416 X3417 X3418 X3420 X3421 X3422 X3424 X3425 X3426 X3428 X8452 X3429 X8453 X3430 X8454 X2105 X2112 X2117 X8428 X2209 X2213 X2214 X2218 X2309 X2313 X2314 X2318 X2409 X2413 X2414 X2418 X7158 X7162 X7164 X7169 X2422 X8430 X2424 X8432 X2425 X8433 X2506 X2510 X2514 X2515 X2519 X2606 X2610 X2614 X2615 X2619 X2623 X8435 X2625 X8437 X2626 X8438 X7805 X7815 X7817 X7824 X7828 X7838 X7840 X7847 X7851 X7861 X7863 X7870 X7905 X7915 X7917 X7924 X7928 X7938 X7940 X7947 X7951 X7961 X7963 X7970 X7179 X8440 X7180 X8441 X2714 X2718 X2719 X2723 X2731 X2735 X2736 X2740 X2814 X2818 X2819 X2823 X2831 X2835 X2836 X2840 X2914 X2918 X2919 X2923 X2931 X2935 X2936 X2940 X7183 X8443 X7184 X8444 X3024 X3027 X3029 X7187 X3506 X3510 X3514 X3518 X3522 X3526 X3529 X8446 X6551-X6554 X6558-X6562 X6566-X6570 X6574 X6756-X6758 X3721 X3730 X3736 X3742 X3748 X3754 X3760 X3765 X8473 X3822 X3824 X3826 X3828 X3830 X7787 X6704 X3833 X3835 X3902 X3906 X7635 X3908 X7636 X3910 X7637 X7633 X7638 X7634 X7639 X6705 X6706 X3915 X3922 X7641 X3918 X3920 X3930 X3932 X6577 X6578 X6580 X8480 X6587 X6588 X6590 X8490 X4003 X4005 X4006 X4010 X4011 X4014 X4018 X4022 X4026 X4030 X4032 X11015 X11023 X11027 X11028 X11032 X11042 X11045 X11051 X11056 X11115 X11123 X11127 X11128 X11132 X11142 X11145 X11151 X11156 X11215 X11223 X11227 X11228 X11232 X11242 X11245 X11251 X11256 X11259 X8465 X11315 X11323 X11327 X11328 X11332 X11342 X11345 X11351 X11356 X11415 X11423 X11427 X11428 X11432 X11442 X11445 X11451 X11456 X11515 X11523 X11527 X11528 X11532 X11542 X11545 X11551 X11556 X11559 X8466 X5306 X5311 X6462 X6464 X5318 X6467 X6469 X5326 X6472 X6474 X5334 X6477 X6479 X5418 X6482 X6484 X5426 X6487 X6489 X5434 X6957 X8467 X6958 X8468 X5504 X5507 X5510 X5513 X5516 X5519 X6806 X8457 X5604 X5608 X6965 X5612 X5616 X6971 X5620 X5624 X6977 X5628 X5632 X6983 X5636 X5640 X6989 X5644 X5648 X6995 X6997 X8470 X6998 X8471 X5702 X5704 X5706 X5708 X5710 X5712 X5714 X5716 X5718 X5720 X5722 X5724 X5729 X7362 X5732 X5734 X5751 X7651 X7652 X5804 X5809 X5814 X5818 X8451 X5821 X5823 X5926 X6650 X6403 X6415 X6418 X6421 X6432 X6436 X6437 X6439 X8163 X8164 X8166 X8167 X8168 X8188; DO I = 1 TO DIM(AMT); IF (0 < AMT{I} < 5) THEN AMT{I}=1; ELSE IF (5 <= AMT{I} < 1000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},10); IF (RAN>PROB/10) THEN AMT{I}=10*(INT(AMT{I}/10)); ELSE AMT{I}=10*(1+INT(AMT{I}/10)); IF AMT{I}=0 THEN AMT{I}=5; END; ELSE IF (1000 <= AMT{I} < 10000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},100); IF (RAN>PROB/100) THEN AMT{I}=100*(INT(AMT{I}/100)); ELSE AMT{I}=100*(1+INT(AMT{I}/100)); END; ELSE IF (10000 <= AMT{I} < 1000000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},1000); IF (RAN>PROB/1000) THEN AMT{I}=1000*(INT(AMT{I}/1000)); ELSE AMT{I}=1000*(1+INT(AMT{I}/1000)); END; ELSE IF (1000000 <= AMT{I}) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},10000); IF (RAN>PROB/10000) THEN AMT{I}=10000*(INT(AMT{I}/10000)); ELSE AMT{I}=10000*(1+INT(AMT{I}/10000)); END; ELSE IF (-1000 <= AMT{I} < - 5) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},10); IF (RAN>PROB/10) THEN AMT{I}=10*(INT(AMT{I}/10)); ELSE AMT{I}=10*(1+INT(AMT{I}/10)); END; ELSE IF (-10000 <= AMT{I} < -1000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},100); IF (RAN>PROB/100) THEN AMT{I}=100*(INT(AMT{I}/100)); ELSE AMT{I}=100*(1+INT(AMT{I}/100)); END; ELSE IF (-1000000 < AMT{I} < -10000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT{I},1000); IF (RAN>PROB/1000) THEN AMT{I}=1000*(INT(AMT{I}/1000)); ELSE AMT{I}=1000*(1+INT(AMT{I}/1000)); END; ELSE IF .Z < AMT{I} <= -1000000 THEN AMT{I}=-1000000; END; * wages: special treatment for hourly wages <=25; ARRAY AMT2 {*} X4112 X4131 X4509 X4520 X4532 X4540 X4605 X4613 X4712 X4731 X5109 X5120 X5132 X5140 X5205 X5213; ARRAY PER2 {*} X4113 X4132 X4510 X4521 X4533 X4541 X4606 X4614 X4713 X4732 X5110 X5121 X5133 X5141 X5206 X5214; DO I=1 TO DIM(AMT2); IF PER2{I}=18 THEN DO; IF (AMT2{I} < 25 AND AMT2{I} > 0) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},.1); IF (RAN>PROB/.1) THEN AMT2{I}=.1*(INT(AMT2{I}/.1)); ELSE AMT2{I}=.1*(1+INT(AMT2{I}/.1)); END; ELSE IF (25 <= AMT2{I} < 1000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},10); IF (RAN>PROB/10) THEN AMT2{I}=10*(INT(AMT2{I}/10)); ELSE AMT2{I}=10*(1+INT(AMT2{I}/10)); END; ELSE IF (1000 <= AMT2{I} < 10000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},100); IF (RAN>PROB/100) THEN AMT2{I}=100*(INT(AMT2{I}/100)); ELSE AMT2{I}=100*(1+INT(AMT2{I}/100)); END; ELSE IF (10000 <= AMT2{I} < 1000000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},1000); IF (RAN>PROB/1000) THEN AMT2{I}=1000*(INT(AMT2{I}/1000)); ELSE AMT2{I}=1000*(1+INT(AMT2{I}/1000)); END; ELSE IF (1000000 <= AMT2{I}) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},10000); IF (RAN>PROB/10000) THEN AMT2{I}=10000*(INT(AMT2{I}/10000)); ELSE AMT2{I}=10000*(1+INT(AMT2{I}/10000)); END; ELSE IF (-1000 <= AMT2{I} < - 5) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},10); IF (RAN>PROB/10) THEN AMT2{I}=10*(INT(AMT2{I}/10)); ELSE AMT2{I}=10*(1+INT(AMT2{I}/10)); END; ELSE IF (-10000 <= AMT2{I} < -1000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},100); IF (RAN>PROB/100) THEN AMT2{I}=100*(INT(AMT2{I}/100)); ELSE AMT2{I}=100*(1+INT(AMT2{I}/100)); END; ELSE IF (-1000000 < AMT2{I} < -10000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},1000); IF (RAN>PROB/1000) THEN AMT2{I}=1000*(INT(AMT2{I}/1000)); ELSE AMT2{I}=1000*(1+INT(AMT2{I}/1000)); END; ELSE IF .Z < AMT2{I} <= -1000000 THEN AMT2{I}=-1000000; END; ELSE DO; IF (0 < AMT2{I} < 5) THEN AMT2{I}=1; ELSE IF (5 <= AMT2{I} < 1000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},10); IF (RAN>PROB/10) THEN AMT2{I}=10*(INT(AMT2{I}/10)); ELSE AMT2{I}=10*(1+INT(AMT2{I}/10)); END; ELSE IF (1000 <= AMT2{I} < 10000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},100); IF (RAN>PROB/100) THEN AMT2{I}=100*(INT(AMT2{I}/100)); ELSE AMT2{I}=100*(1+INT(AMT2{I}/100)); END; ELSE IF (10000 <= AMT2{I} < 1000000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},1000); IF (RAN>PROB/1000) THEN AMT2{I}=1000*(INT(AMT2{I}/1000)); ELSE AMT2{I}=1000*(1+INT(AMT2{I}/1000)); END; ELSE IF (1000000 <= AMT2{I}) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},10000); IF (RAN>PROB/10000) THEN AMT2{I}=10000*(INT(AMT2{I}/10000)); ELSE AMT2{I}=10000*(1+INT(AMT2{I}/10000)); END; ELSE IF (-1000 <= AMT2{I} < - 5) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},10); IF (RAN>PROB/10) THEN AMT2{I}=10*(INT(AMT2{I}/10)); ELSE AMT2{I}=10*(1+INT(AMT2{I}/10)); END; ELSE IF (-10000 <= AMT2{I} < -1000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},100); IF (RAN>PROB/100) THEN AMT2{I}=100*(INT(AMT2{I}/100)); ELSE AMT2{I}=100*(1+INT(AMT2{I}/100)); END; ELSE IF (-1000000 < AMT2{I} < -10000) THEN DO; RAN=UNIFORM(5555555); PROB=MOD(AMT2{I},1000); IF (RAN>PROB/1000) THEN AMT2{I}=1000*(INT(AMT2{I}/1000)); ELSE AMT2{I}=1000*(1+INT(AMT2{I}/1000)); END; ELSE IF .Z < AMT2{I} <= -1000000 THEN AMT2{I}=-1000000; END; END; ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- COMPARISON WITH OTHER DATA ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- In general, medians of financial characteristics estimated from the SCF should compare well with medians estimated from other surveys using comparable population definitions. However, estimates of means will often differ, largely for two reasons. First, means of many financial characteristics may not be very robustly estimated in surveys that interview only a relatively small number of wealthy households. The distribution of many financial characteristics (e.g., net worth) is highly skewed, and sparce representation of the upper tail will translate into a noisy estimate of statistics, such as the mean, that are strongly affected by the top of the distribution. Second, there may also be a degree of bias in the measurement of some financial characteristics. Evidence suggests that there is differentially higher nonresponse among wealthy households. Failure to account for such differences in the creation of analysis weights leads to a misrepresentation of the size of the upper tail of wealth and characterstics associated with being in that tail. By using frame data for the list sample, the SCF has the means to identify and make some corrections for such nonresponse. However, this option is not available in most other surveys. The SCF may also be compared with aggregate statistics, such as the flow of funds accounts (FFA), which are constructed by the Board of Governors of the Federal Reserve System. An extensive analysis of the differences in these two sources is provided by Rochelle Antoniewicz ("A Comparison of the Household Sector from the Flow of Funds Accounts and the Survey of Consumer Finances," October 2000, http://www. federalreserve.gov/pubs/oss/oss2/method.html). As discussed in that paper in detail, there are many conceptual differences between the SCF and the FFA. Three of these differences are particularly noteworthy here. First, the FFA "household sector" includes the holdings of nonprofits, and estimates of those holdings must be made to create a population basis closer to that used in the SCF. Second, the financial concepts used in the FFA often differ from those used in the SCF. Substantial effort is usually required to align the concepts in the two sources, and in some cases there is no clear way of doing so. Third, both the FFA and the SCF provide statistical estimates. Because the two series are developed from independent sources of information with different statistical properties, it would be surprising if they yielded precisely the same totals even if the populations and concepts could be made perfectly consistent. ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- ACKNOWLEDGMENTS ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- The SCF is a large project that involves intense commitment by many people. At the Federal Reserve, the main project staff involved with the creation of the data included Ryan Bledsoe, Brian Bucks, Gerhard Fries, Arthur Kennickell, Kevin Moore, Michael Neal, Brian Preslopsky, and Brooke Wells. Important support has come from the FRB officer corps, particulary Edward Ettin, Diana Hancock and Myron Kwast who have invested their credibility in making the project possible. Individual members of the Board of Governors have actively encouraged the development and use of the survey. Support from the Statistics of Income Division at the IRS has been essential. Barry Johnson has been tireless in his work to obtain the necessary data for the selection of the list sample, in his work on the disclosure review, and in sharing the insights he has gained in working with the IRS estate tax data. Nicholas Greenia, the confidentiality officer for SOI, and Michael Strudler were particularly helpful in securing the necessary data and documentation. Thomas Petska, the director of SOI, and James Nunns at the Office of Tax Analysis at the Department of the Treasury have encouraged and supported us through many difficult periods. At NORC, very many people have touched the project in important ways. Leslie Athey was the Project Director. Fritz Scheuren served as NORC corporate liaison to the SCF. Craig Coelen, Nancy Potok and John Thompson also provided key corporate oversight. Angela Abbott, Azure Addison, Christine Carr, Dennis Dew, Sarah Hughes and Mandy Sha played important roles in the development of materials for the project and in the support of data collection. Statistical support was provided by Steven Pedlow and Fritz Scheuren. Data processing support was provided by Haider Baig, Val Cooke, Brian Homan, Sarah Hughes, Sharon Hurn, Katie O'Shea, Mandy Sha and Geoff Walker. Val Cooke revised the CAPI program for the project, building on earlier versions of the program developed by herself, Geoff Walker and Phil Panzuck. Renee Grigorian served as the graphic designer for a new refusal conversion package developed for the survey. The principal Central Office training staff for the project included Azure Addison, Ron Broach, Dennis Dew, Sarah Hughes, Ajay Sagar, Micah Sjoblom and John Sokolowski. Coding of respondents' verbatim responses was provided by Sarah Hughes and Nauman Mirza. Logistical support was provided by Albert Bard. Financial management was the responsibility of Antonio Macias. Catherine Haggerty and Bob Bailey, who were leading figures in the 1998 SCF, served as consultants in early