Bootstrapping has become the most widely used replication method. With many surveys, the bootstrap samples are constructed from available sampling information from the survey data themselves. For the 1993 NSSBF, however, creating the weights for the bootstrap samples requires the use of confidential data not available to the public. Therefore, a special data set containing 1,000 bootstrap samples was created to allow researchers to use bootstrap methods in their own estimations. Each bootstrap sample was drawn with replacement, so that in each bootstrap sample, some firms appear in the sample multiple times and some firms do not appear at all. In the data set BOOTSTRP, each of the 4,637 firms in the public data set have 1,000 multiplicity variables (labeled MULT1-MULT1000), which indicate how many times the firm was sampled in each of the bootstrap samples, and 1,000 weights (labeled BWGT1-BWGT1000), which were calculated so that the bootstrap sample weights sum to the target population.
SAS Transport File
ASCII Flat File
Example: Using the 1993 NSSBF Bootstrap Data
The following example, showing how to use the BOOTSTRP data set to estimate the variance of a statistic, may be helpful. We assume that the researcher is already familiar with bootstrap estimation. To estimate the mean assets of firms in the target population using the variable ASSETS from the data set NSSBF93, we first compute the weighted mean of ASSETS using the final weight variable FIN_WGT. Most statistical packages also estimate the standard error of the estimate, but for this sample the researcher realizes that, for the purpose at hand, the estimate for the standard error is incorrect and decides to estimate the standard error of the mean using bootstrap methods. Because the variable ASSETS is reported for almost every firm in the sample, the researcher decides that 100 bootstrap replicates will do.
The first bootstrap mean is computed using the first bootstrap replicate in
BOOTSTRP. A temporary data set is first created from BOOTSTRP with two
variables: PWCODE and WEIGHT. This temporary data set has one observation for
firms with MULT1=1, two observations for firms with MULT1=2, and so on. Firms
with MULT1=0 are dropped from this temporary dataset. The variable BWGT1 is
renamed to WEIGHT. The resulting data set might look like this:
This data set is then merged on PWCODE with the NSSBF93 data set (dropping all observations not in the temporary bootstrap replicate data set). The weighted mean of ASSETS is computed again, using the bootstrap weight variable, WEIGHT. This bootstrap mean is saved.
This process is repeated 99 times. The process generates a total of 100 bootstrap estimates of the mean. The 100 bootstrap estimates are then used to compute the standard error (or variance) of the mean.
Comments, questions, or Problems?