This section describes the data set and the process used to develop the credit-scoring model employed in this study.
Here are the types of data used for this study and the sources from which they were drawn:
The Federal Reserve obtained from TransUnion the full credit records (excluding any identifying personal or creditor information) of a nationally representative random sample of 301,536 individuals as of June 30, 2003.86 The Federal Reserve subsequently received updated information on the credit records of these individuals as of December 31, 2004. Some individuals (15,743) in the initial 2003 sample no longer had active credit records as of December 31, 2004, in some instances because the individual had died. However, other factors may also have limited the ability to update records. A total of 285,793 individuals still had active credit files as of December 31, 2004.87
Characteristics of the sample of credit records. In the aggregate, the sample of credit records used for this study contained information on about 3.7 million credit accounts (also referred to as "tradelines"), more than 318,000 collection-related actions, and roughly 65,000 monetary-related public actions. Not every individual has information of each type. In the sample, approximately 260,000, or 86 percent, of the individuals had records of credit accounts as of the date the sample was drawn (table 7).88 Although a large portion of individuals had items indicating public-record actions, collection agency accounts, or credit inquiries, well less than 1 percent of the individuals with credit records had only public-record items or only records of a creditor inquiry. However, for about 12 percent of the individuals in the sample, the only items in their credit records pertained to collection agency accounts.
Credit characteristics. TransUnion included a file of 312 precalculated summary variables ("credit characteristics") in the data provided to the Federal Reserve (appendix B provides a list of the 312 credit characteristics). These credit characteristics are summary measures of the individual items that constitute a credit record. These characteristics (such as one representing the age of an individual's oldest account) were created by TransUnion for model development according to its own needs and those of its customers. The credit characteristics provided to the Federal Reserve are those commonly offered to model builders by TransUnion.89 The characteristics reflect only credit-related factors, not personal or demographic information, as such information is not included in the credit records maintained by credit-reporting agencies.
Computing performance measures from the credit records. Credit records can be used to estimate various measures of payment performance for each individual or account. Credit records contain information on the payment performance of most accounts for the 48 months preceding the date the record was drawn. For these accounts, the information is sufficient to assess performance over any performance period within the 48 months. Similarly, filing dates for collection and public records determine the precise date when such events occurred. However, month-by-month payment records are not available for all accounts, particularly those that are seriously delinquent. For those accounts, the only information available is the date of last delinquency; it is not possible to determine if the accounts were also delinquent in the months preceding that point. For this reason, in model development, performance is typically measured over a specific period of time, usually 18--24 months, and the end point of that period is the date on which the credit record was drawn.
Typically, for the reasons cited above, performance is determined by whether any of the individual's accounts suffered any of a specific group of problems during the performance period, rather than, for example, by how often a problem occurred during the period. As described later in this study, we measure performance over an 18-month backward-looking period as constructed from credit records drawn on December 31, 2004.
TransUnion provided two different generic credit history scores for each individual in the sample--the TransRisk Account Management Score (TransRisk Score) and the VantageScore. The two scores used here are as of the date the sample was drawn. The TransRisk Score was generated by TransUnion's proprietary model for assessing the credit risk of existing accounts. In particular, the TransRisk Score was constructed with a selected group of factors drawn from the credit records of individuals to predict the likelihood that at least one existing credit account would become seriously delinquent over an ensuing performance period.
As with other commonly used consumer credit history scores, larger values for the TransRisk Score indicate a lower risk of default. About 20 percent of individuals in the sample received neither the VantageScore nor the TransRisk Score, primarily because they had too few active credit accounts. Most individuals who had a credit account but no credit score were those who could use the account but were not legally responsible for any debt they owed. About 7 percent of the sample had a TransRisk Score but not a VantageScore, as the latter had more-restrictive rules for determining which credit records could be scored.
As noted earlier, the VantageScore was developed jointly by Equifax, Experian, and TransUnion to create a measure of credit risk that scores individuals consistently across all three companies.90 The model was developed from a national sample of approximately 15 million anonymous credit files of individuals drawn from each of the agencies' credit files. The data extracted for model development were taken from the same points in time by all three agencies.91 The initial point was June 2003 (the same as in the sample used for the present study). Credit records from that time provided the characteristics used in model development; account performance was measured as of June 2005 (a 24-month performance period in contrast to the 18-month performance period used in this study). The VantageScore predicts the likelihood that a random credit account of an individual will become seriously delinquent over the performance period. Again, higher values of the score are associated with a lower risk of default.
TransUnion supplied a file of its TransRisk Score by census tract for individuals both with and without a mortgage. As with all other data used for this report, the file contained no personal identifying information. The data were based on a nationally representative sample of about 27 million individuals drawn from all credit records maintained by TransUnion as of December 31, 2004. The database was used to determine the mean score for individuals in the census tract as a weighted average of the scores of those with mortgages and those without.
The only personal demographic information included in an individual's credit record is the individual's date of birth (date of birth is not present in about one-third of the credit records). However, the credit records contain additional types of information--name, Social Security number, and current and previous addresses--which can be used to obtain further demographic information on the individual from other data sources. For purposes of this study, TransUnion, at the request of the Federal Reserve, provided information to other data repositories--the U.S. Social Security Administration (SSA) and a demographic information company--to obtain demographic data on the individuals in the credit sample. These matches involved a double-blind process between TransUnion and the other data sources so that the integrity and privacy of each party's records were maintained.
TransUnion supplied locational information (but not exact residential addresses) on the individuals in the sample to the Federal Reserve when it provided the credit-record information.
Social Security Administration data. The SSA gathers demographic information on the form used by individuals to apply for a Social Security card.92 Information from the SSA records was made available to the Federal Reserve solely for purposes of preparing this report to the Congress. The procedures followed for this study ensured that the SSA received no information included in the credit records of the individuals other than the personally identifying information needed to match the administrative records maintained by the SSA. The Federal Reserve received from the SSA a data file that included the demographic characteristics of the individuals in the sample but no personally identifying information. TransUnion did not receive any information from the SSA or the Federal Reserve on the demographic characteristics of the individuals in the sample. The SSA data are the same items that are made available to other researchers and government agencies conducting studies that require personal demographic information.
With the names and Social Security numbers provided by TransUnion, the SSA extracted and provided to the Federal Reserve the following information for each matched individual to the extent available: citizenship, the date the individual filed for a Social Security card, place of birth, state or country of birth, race or ethnic description, sex, and date of birth. All of the above information except the race or ethnicity of the applicant is required on the application form for a Social Security card; race or ethnicity is requested on the form, but the applicant is not required to supply it.
Two aspects of the SSA administrative records bear importantly on the analysis in this study. First, some individuals failed to provide some demographic characteristics when completing their applications. Also, some applied more than once for a Social Security card (SSA card) and so had more than one opportunity to report their demographic characteristics; the SSA provided the Federal Reserve the information reported by these individuals on each of their applications, and in some cases the information was inconsistent.93 For example, some individuals reported different dates of birth, sex, or country of origin on their various applications.
Second, the SSA in 1981 changed the options offered to individuals for reporting their racial or ethnic status. For the years preceding 1981, individuals had three choices, from which they were asked to select one--"White," "Black," or "Other." Beginning in 1981, individuals have had five options, from which they choose only one--(1) "Asian, Asian American, or Pacific Islander"; (2) "Hispanic"; (3) "Black (Not Hispanic)"; (4) "North American Indian or Alaskan Native"; and (5) "White (Not Hispanic)."
Data from a national demographic information company. To obtain yet more, or further corroborating, information on the demographic and economic characteristics of the individuals in the sample, the Federal Reserve obtained data from one of the nation's leading demographic information companies. The data received by the Federal Reserve is the same as the information made available to creditors or other entities that use the data for marketing and solicitation activities.
The demographic information company develops information in two ways. It infers language preference, country of origin, ethnicity, and religion by analyzing first and last names in combination with geographic location; consequently, these items were available for all individuals in the company's records. The company gathers other demographic and economic information from thousands of public and private sources nationwide, so not all of these are available for all individuals in its records. The national demographic information company validates the accuracy of its data in various ways, including personal interviews with people from all ethnic and religious groups, immigration records, biographical sources, and other primary databases.
For each individual whose information existed in the records of both TransUnion and the national demographic information company, the Federal Reserve received the following information to the extent available: race, education, sex, marital status, language preference, religion, occupation, income range, and date of birth.
Locational information from Census 2000 data. At the request of the Federal Reserve, TransUnion "geocoded" the current address of each individual in the sample to help identify the year 2000 census-block group of the person's residence.94 The census-block location of about 15 percent of the sample could not be identified, and for an additional very small number of individuals in the sample (544), not even the census tract could be identified. This geographic information was matched to Census 2000 files at the U.S. Bureau of the Census; those data include the racial or ethnic makeup and income of each census-block group and census tract as of April 2000.
Collectively, the sources described above provide information on age, marital status, sex, race, ethnicity, religion, language preference, country of origin, income, and geographic location. Problems of inconsistency and missing data had to be resolved, however, before the information could be used for the present analysis. First, some demographic characteristics for a given individual were provided by multiple sources, and in some of those cases the information was inconsistent. Inconsistency extended even to the SSA records for some individuals because, as noted above, some individuals provided different information for the same item when completing applications for replacement Social Security cards. Second, the information on some demographic characteristics was simply missing.
To resolve inconsistencies across different data sources for race, ethnicity, sex, and age, we chose to rely on the data provided in the records maintained by the SSA unless we had strong reason to believe that this information was incorrect, in which case we deemed it "missing." The SSA data were preferred because applicants are required to provide all information, with the exception of race or ethnicity, to receive a Social Security card and because the data were collected and maintained in a consistent way. Alternative assignments of certain characteristics--race, ethnicity, sex, and age from the national demographic information company; date of birth from TransUnion; and characteristics not available in the SSA data including religion and language preference were used only to impute SSA data when it was not available. The only information obtained form the national demographic company that was used in the primary analysis was marital status.95
Details about the availability of specific demographic items from each source of data are provided in table 8. Overall, almost 80 percent of the 301,536 individuals in the sample could be matched to SSA records. An even larger proportion, 90 percent of those with a credit score as of June 30, 2003--the sample most relevant for this analysis-- could be matched to SSA records.
Age and sex were available for virtually all of the individuals matched to the SSA records. Although information on race or ethnicity was available for almost 97 percent of the individuals matched to the SSA records, data on about 40 percent of the individuals was collected before the SSA changed the race and ethnicity categories it tracks, an aspect of the data discussed below.
In general, demographic information on an individual from multiple data sources was largely consistent across the sources. For example, sex was consistently reported across the sources 96 percent of the time. Reported age, within three years, was consistent 96 percent of the time between the demographic information company and the SSA, and 98 percent of the time between TransUnion and the SSA.
For demographic items not included in the SSA data, the incidence of missing or unreported data varied widely. For example, country of origin was provided for only 10 percent of the 301,536 individuals in the sample. In contrast, religion was available for 86 percent of the individuals, and marital status was provided for 71 percent of the individuals. Census tract of residence was available for virtually everyone in the sample, and a census-block group was identified for 86 percent of the sample.
Inconsistency within the SSA data. Several issues had to be addressed before the SSA data could be used. First, about 51 percent of the sample individuals had more than one SSA filing, and the data in some of those cases were inconsistent. Second, the age information supplied by the SSA was sometimes implausible because it implied that the individual was extremely old or young or because it was inconsistent with the age of the individual's oldest account in their credit record.96 Third, the question on race and ethnicity on the application form for a Social Security card changed in 1981. These issues were dealt with as follows.
In general, when individuals filed more than one application for a Social Security card, we used information from the most recent filing. The only exception to this rule involved age and sex; when such information from the most recent filing was implausible or was inconsistent with information provided by the demographic information company or TransUnion's credit records, we used the information from an earlier filing if it was consistent with information from these other sources.
Various rules were used to identify and address implausible values for age in the SSA data. The basic rule was that if the date of birth in the SSA records indicated that the individual was younger than 15 or greater than 100 years of age at the time the credit records were drawn, then the reported age was deemed to be implausible. In addition, regardless of the age reported in the SSA data, if the age of the oldest credit record in the individual's credit files implied that the person took out credit when the person was younger than 15, then the SSA age data were again deemed implausible. An implausible age suggested that the SSA record and the credit records had potentially been mismatched, and in such cases all SSA data for demographic items--age, race, ethnicity, and sex--were treated as "missing." In total, only about 2 percent of the sample had ages deemed to be implausible.
Only 0.5 percent of individuals in the sample gave inconsistent responses on sex when they completed more than one application for an SSA card. If information on sex from the demographic information company was available, it was used to resolve the SSA inconsistency. Otherwise, sex was determined by the individual's most recent application for an SSA card.
The most difficult inconsistency in the SSA data came from the change in the options provided to individuals for identifying their race or ethnicity when applying for SSA cards.
Change in categories of race and ethnicity in the SSA data. As noted, before 1981, individuals were asked to choose one of only three options--white, black, or other. Beginning in 1981, individuals were asked to choose one of five options--(1) Asian, Asian American, or Pacific Islander; (2) Hispanic; (3) black (not Hispanic); (4) North American Indian or Alaskan Native; and (5) white (not Hispanic).
To employ a single set of categories for race and ethnicity and retain the greater detail available after 1980, we chose to use the five post-1980 categories. The problem then focused on "pre-1981" individuals, those whose only application for a Social Security card was before 1981; their set of three responses would have to be distributed across the set of five responses available after 1980. We chose to "predict" whether a pre-1981 individual who chose white or black would have instead selected one of the three options unavailable before 1981 if they had had the opportunity to do so: Asian, Asian American, or Pacific Islander (hereafter, Asian); Hispanic; or North American Indian or Alaskan Native (hereafter, Native American). For those answering "other," the question is which of the five options, including white or black, they would have chosen since the option other would no longer have been available.
The "prediction" is the probability that an individual would select one of the missing options; the probability is calculated from a multinomial logistic model estimated with data from individuals applying for Social Security cards in the 1981-85 period. Those individuals were chosen for the estimation sample because it was believed that they would be most similar in age and other characteristics to the pre-1981 sample. The independent variables used in the predictive model were age, sex, and country of origin (from SSA records); ethnicity and race (Hispanic, Asian, black, and Native American), language preference, religion, and marital status (from the demographic information company); and percent of the population according to the Census 2000 data that was Asian, Hispanic, black, or Native American in the census-block group of the individual's residence (or in the census tract, if census-block group was not available) The model was validated against the responses of individuals who filed applications for Social Security cards before 1981 and then filed again in 1981 or later.97
Pre-1981 individuals classifying themselves as white were assigned a zero probability of being black; the model coefficients were used to assign one of the other four choices for the individual. A similar rule was applied for pre-1981 individuals classifying themselves as black--that is, they were assigned a zero probability of being white--and in addition they were assigned a zero probability of being Native American. No restrictions were imposed for pre-1981 individuals classifying themselves as "other."
This procedure does not result in imputation of race for all pre-1981 sample individuals. With the exception of one small group, whose race was imputed to be black or white by the model, all of the pre-1981 individuals treated as black or non-Hispanic white in the disparate-impact analysis would have reported their race in corresponding terms to the SSA if they had applied for a Social Security card after 1980. The exception was a small number of pre-1981 individuals who classified themselves as "other" and were not assigned high probabilities of being Hispanic, Native American, or Asian. The major impact of the procedure is on the Asian, Hispanic, and Native American groups, whose entire pre-1981 portion of the sample had to be "carved out" from the pre-1981 white, black, and "other" groups.
In total, there are 301,536 individuals in the study sample. These individuals are separated into three groups for most of the analysis. The primary group is the 232,467 individuals with both a TransRisk Score and a VantageScore (table 9). This is the base sample used to evaluate credit-score and performance differences across populations. A subset of this sample is the 200,437 individuals used to estimate the FRB base model described in the next section. The third group is the 69,069 remaining individuals lacking at least one score that were not used for most of the analysis.98
Nine different demographic groupings are used to describe the population for much of the analysis: Two measures of race or ethnicity (SSA data and the location of residence);99 sex; marital status; national origin (foreign-born or not); age; and characteristics of the census block or tract where they reside. The characteristics of the census block or tract are relative income, percentage of the population that is of a racial or ethnic minority, and whether it is urban or rural.100 For most of these categories, there is an "unknown" group where the characteristic could not be determined.
For each demographic group, summary statistics are presented that show the contents of credit-record files for the three sample definitions and nine demographic groupings. Not surprisingly, individuals in the full sample of credit records provided by TransUnion differ some from the records of scorable individuals or those used to estimate the FRB base model. The principal difference is in the mean number of credit accounts for individuals, which is much lower for the full sample than for either the scorable sample population or the estimation sample. The mean number of trade accounts for the same population group differs little between the scorable sample population and the estimation sample.
Because credit scores reflect the content of credit records, a review of the differences in content across demographic groups provides useful context for the analysis that follows. The patterns found hold both for the scorable population and the somewhat smaller population used to estimate the FRB base model.
The content of credit records differs greatly across populations. For example, blacks are less likely than other racial or ethnic groups to have a revolving or mortgage account and much more likely to have either a public record or a reported medical or other collection item. Also, compared with other populations, blacks and Hispanics evidence elevated rates of at least one account 90 days or more past due. Married individuals, whether male or female, are more likely to have either revolving, installment, or mortgage credit than single individuals, are less likely to have a public record or collection account, and are less likely to have one or more accounts 90 days or more delinquent.
Differences by age are also found. Individuals younger than age 30 are less likely to have a revolving or mortgage account but more likely to have an installment account than older individuals. Younger individuals have a lower incidence of a public record item, but a higher incidence of a nonmedical-related collection account, than older individuals. Also, the incidence of at least one account reported as delinquent 90 days or more declines with age after age 40.
Representativeness of the sample. The sample of credit records of individuals obtained for this study is nationally representative of the individuals included in the credit records of the national credit-reporting agencies.101 Further comparisons were made to evaluate how closely the sample mirrors the population of U.S. adults (those aged 18 or more). The distribution of individuals in the sample population arrayed by their state of residence is quite similar to the distribution of all adults (individuals 18 or more) in the United States as of June 2003 as estimated by the Bureau of the Census (table 10). Also, the racial or ethnic characteristics of the sample population as assigned here closely mirror the distribution of race and ethnicity for all adults in the United States as reflected in the census, although the proportion of Hispanics in the sample population is somewhat lower than in the population overall (table 11). Also, males are slightly overrepresented in the credit-record sample and younger individuals are underrepresented. The data further show that the distribution by race and age of scorable individuals differs from the distribution of individuals for whom scores were not available. Blacks, younger individuals and individuals residing in lower-income census tracts and census tracts with larger shares of minority population were less likely to have been scored.
The desire to maximize the transparency of the credit-scoring model building process used in this study led us to rely entirely on a set of rules (algorithms) to create and select credit characteristics and attributes to be included in the model. This approach differs from industry practice in the construction of such models, which often relies on the experience of the model developer to supplement the automated rules they use. The rules we selected for the development of the present model are intended to mimic general industry practice to the greatest extent possible.
To recall, although the approach used for this study is informative and allows an assessment of the potential for differential effect across groups of individuals, it will not necessarily reflect what the results of a differential effect analysis would be if applied to any specific credit-scoring model currently used by the credit industry. Also, the results here, covering credit-related experiences over the 2003-04 period, may not match results for a different period because credit use and economic conditions change over time.
The development of any model requires decisions about several broad issues, including type of model, sample, and time period. Regarding type of model, the model could be designed to predict performance for new accounts, existing accounts, or a combination of the two. Further, it could predict that at least one account will go bad; that a specific account will go bad; or that a specific category of accounts, such as credit cards, will go bad. Also to be chosen would be the size of the estimating sample and the "performance period," that is, the period over which performance would be tracked. Finally, decisions also have to be made about which credit characteristics would be used as predictive factors in a model.
Two types of generic credit history models are widely used in the credit industry: one to generate a new account acquisition score and one to generate an account maintenance score. New account acquisition models are designed to predict delinquency or default over a performance period on accounts that are opened during the beginning of that period. New account models are used in soliciting accounts and to help underwrite responses to solicitations as well as for the review of other applications for credit. Account maintenance models are designed to predict delinquency or default on accounts that were in active use and not delinquent at the beginning of the performance period. Account maintenance models are used to help adjust credit limits, interest rates, and other features on existing accounts.
In addition, the industry often uses "hybrid" models that are combinations of the above two types. Hybrid models predict performance for any account--new or existing--during the performance period. Largely because of sample-size considerations, the model developed for this study is a hybrid type. Including data on both new and existing accounts makes better use of the available sample.
An additional decision in developing the model was whether to make the model "account based" or "person based." Account-based models assess the probability that a specific account will become delinquent or default, whereas person-based models assess the likelihood that any of an individual's accounts will become delinquent or default over the performance period. Given that both types could be estimated equally well and that the person-based type is the more commonly used in the industry for estimating generic credit history models such as the one here, we chose to estimate a person-based model.
Finally, many credit scores are designed to predict performance for a specific type of product, such as credit cards or automobile loans. Others are generic, designed to predict performance for any loan. As noted above, the model estimated here is generic and thus considers performance on all types of accounts.
In sum, the model we developed is
Before this study began, the Federal Reserve had obtained, for other purposes, the nationally representative sample of the credit records of approximately 300,000 individuals as of June 30, 2003, that was described earlier in this section.102 This sample size was deemed sufficient to estimate either an account-maintenance model or a hybrid type; a new account acquisition model would likely have required a larger sample. For reasons discussed above, we chose to use the sample to estimate a hybrid model.
All model development uses credit records for individuals drawn at two different points in time. The length of time between these two dates dictates the performance period used in model development. At the time this study was initiated, the decision on the timing of the updated sample had not been made. Industry practice is to use a performance period ranging from 18 months to 24 months; 24 months is likely the most common for the development of a generic credit history score. The 24-month time frame is desirable because it tends to reduce the effect of seasonality in the use of credit. The time frame established for this study by the Congress led us to select December 31, 2004 for the updated sample of credit records. This implies an 18-month performance period. Although it is on the short-end of industry practice, this performance period is long enough to provide a sufficient number of defaults and delinquencies to build a viable credit-scoring model.
The choice of model dictates, for the most part, the performance measure. Our choice of a hybrid, person-based model meant that the appropriate performance measure should cover all new and existing accounts for a given individual. Implementing this measure required additional decisions.
First, "new" and "existing" accounts must be defined. Industry practice varies. We defined a new account as one reported as having been opened during the first six months of the performance period (July 2003--December 2003).103 Existing accounts were those opened before the performance period and not closed before the beginning of the period. ("Closed" means that either the account has been paid off or has been "frozen," generally due to poor performance.)
Generally, accounts are closed when they become seriously delinquent. Thus, the requirement that existing accounts not be closed before the beginning of the performance period implies that accounts that were seriously delinquent before the beginning of the performance period would generally be excluded from the calculation of the performance measure (see below). Because minor delinquencies generally do not result in the closing of an account, accounts with such delinquencies were most likely not excluded from the calculation of the performance measure.
The second decision involved how to assess payment performance on an account and payment performance by a person. Payment performance on an account has many dimensions. One could count, for example, the number of times an account has been delinquent; the severity of the delinquency; or the dollar amount past due on the account. The industry uses each of these measures. A common way of measuring performance, and the one used here, is to classify accounts as "good," "bad," or "indeterminate" on the basis of the most severe level of delinquency during the performance period. A credit account that was delinquent for 90 days or more or was involved in bankruptcy, repossession, charge-off, or collection was defined as "bad." An account that exhibited no delinquency whatsoever, showed no other "bad" indicators, and showed satisfactory performance was classified as "good." All other accounts--for example, those 30 days or 60 days delinquent--were classified as "indeterminate."
Payment performance by a person is based on the good, bad, and indeterminate performance (as defined above, with one small adjustment) of all the person's accounts. An individual's payment performance was classified as "bad" if any of that person's accounts was bad. Further, as stated earlier, performance was determined for the 18-month period from June 30, 2003, to December 31, 2004. By law, accounts with major-derogatory information of the sort we use to define "bad" generally must be removed from the credit record after a period of seven to ten years, depending on the type of derogatory. Accounts without such serious delinquency can remain in the credit record forever. Hence, with one exception, all accounts that were active at any point during the performance period should have performance information present in the December 2004 database. The exception is seriously delinquent accounts transferred to a collection agency; the credit-reporting agency would delete from those accounts the information reported by the original lender. To account for this possibility, if the individual shows evidence of new collections as reported by a collection agency or new public records during the performance period, the individual is categorized as bad.104 This treatment of collections and public records is a common industry practice.
An individual's payment performance was classified as "good" if all of that person's accounts were good and they had no new public records or record of collection agency accounts. The payment performance of all other individuals was classified as "indeterminate." The small adjustment involved individuals whose payment performance was good with the exception of one account that had a delinquency of, at most, 30 days-- the payment performance of such individuals was treated as good.
Since the full credit records of each individual were available for this study, it would have been possible to have created any credit characteristic that could have conceivably been used in model development. However, in the spirit of the rule-based process of model development used here, the decision was made to restrict the variables eligible for inclusion in the model to the 312 credit characteristics included in the data provided for this study. These characteristics are quite comprehensive and are typical of those used in the industry.
An additional restriction for the estimation sample is that each individual's credit record had to be "scorable" as of June 30, 2003. Credit records with limited credit history information or lacking relatively recent credit activity typically do not contain sufficient information to predict performance and are typically excluded from model development. Industry practice differs in terms of what information is necessary for an individual credit record to be scorable. For the model developed here, the credit records of individuals that had been assigned a TransRisk Score and a VantageScore as of June 30, 2003, were treated as scorable. A review of the credit records of individuals not assigned a credit score indicates that most of them had no credit accounts, and those that did typically had only inactive or extremely new accounts.
The resulting estimation sample consisted of 200,437 individuals who were scorable and also had either good or bad performance for the any-account performance measure used in model estimation. (See table 9 for sample statistics for the estimation sample.).
After the sample of credit records had been drawn and the dependent variable defined and constructed, the sample was segmented, attributes were created, and characteristics were selected. The model was then empirically estimated. Each of these steps is described below.
On the basis of their credit records, individuals were segmented into three groups (following industry practice, these segments are termed "scorecards"): those with two or fewer accounts ("thin-file scorecard"), and two groups of those with three or more accounts--those with a major-derogatory account, collection account, or public record ("major-derogatory scorecard") and those without the credit-record blemishes that define the major-derogatory scorecard ("clean-file scorecard").105 Typically, industry credit history models are based on a multiple scorecard segmentation scheme. Greater predictive power is achieved by segmenting the population and building specific scorecards for subpopulations with distinct credit-risk patterns. However, these models are usually estimated with at least 1 million individuals and often many more. Because the sample size for the present model is only one-fourth the size of the typical industry sample, the number of scorecards had to be limited. The three scorecards chosen here are those generally viewed as the most important by industry model developers. Attribute creation and model estimation were performed separately for each of the three groups.
A series of attributes was created from each of the 312 credit characteristics included by TransUnion in the data provided to the Federal Reserve.106 An attribute is a dichotomous indicator variable (that is, a variable that can only take on values of zero or 1) constructed from a credit characteristic and reflects a specific range of values of the characteristic. An attribute is assigned a value of 1 when the value of the characteristic falls within the range specified for the attribute, and zero otherwise. Many attributes can be created for each characteristic, and together they cover all possible values of the characteristic. The number of attributes used to cover the range of all possible values is determined by the model builder. For example, the characteristic "total number of months since the oldest account was opened" might be assigned three attributes: one attribute for individuals whose oldest account is one or two years old, a second for individuals whose oldest account is three to seven years old, and the third for individuals whose oldest account is eight or more years old; however, it could be assigned just two attributes or many more.
Given the myriad ways of subdividing characteristics into attributes, rules and procedures have been developed by the industry to simplify this task. To create the attributes for the present model, we employed a statistically based procedure that roughly approximates the approach used by industry model developers. For each individual, an initial process was applied to each characteristic for each scorecard, as follows.
First, an attribute was created for each characteristic with a missing value. Second, the process evaluated all possible divisions of the characteristic's range of nonmissing values into two attributes, each attribute covering a compact set of sequential values. The division selected was the one that best predicted performance for that scorecard. The prediction was formed by assigning a performance probability equal to the average performance of the individuals assigned to each of the two implied attributes.107 An additional constraint for the division was that the difference in the mean performance for individuals in the two implied attributes had to be statistically significant. This rule implied that, for some characteristics, no subdivisions could be created; these characteristics are unrelated to performance.
Once each characteristic was subdivided into two attributes for nonmissing values, further subdivisions of each attribute were evaluated. For each attribute, every possible subdivision into two was evaluated. The same rules and evaluation procedures were employed as in the initial process. Again, only statistically significant subdivisions were allowed. For example, for the characteristic "total number of months since the oldest account was opened," suppose the initial process created the attributes "three years or less" and "four years or more" (for age counted only in whole years). The next step would involve looking at all possible subdivisions of each of those two attributes. For example, subdivision of the "three years or less" attribute would look at two possible further subdivisions, (1) "one year or less" and "two or three years" and (2) "two years or less" and "three years." If neither of these further subdivisions had a statistically significant relationship to performance, then the attribute "three years or less" would not be subdivided. Otherwise, the subdivision that was most predictive of performance would be selected, and the attribute would be split into two attributes.
The process of subdivision continued until there were no remaining attributes with statistically significant splits. At each step, only subdivisions of existing attributes were considered. Thus, for example, if "total number of months since the oldest account was opened" was subdivided into "three years or less" and "four years or more," no subdivisions that would cut across this initial division (for example, an intermediate range of "three years and four years") would be considered. Because this analysis was done separately for each scorecard, the attributes selected for a characteristic do not have to be the same across the three scorecards; in fact, they do differ, as shown below.
Although the creation of attributes was governed by the mechanical application of the procedure outlined above, the process was somewhat more complicated than implied by the preceding discussion. In particular, the process also required that successive attributes imply that the characteristic as a whole be consistently positively or negatively related to performance (referred to here as "monotonicity"). Again using "total number of months since the oldest account was opened" as an example, assume that the attribute "four years or more" had an average performance of 0.5 and that a split of the other attribute, "three years or less," was being considered that would create the subdivisions "one year or less" and "two or three years." An average performance of less than 0.5 for the "one year or less" subdivision and of greater than 0.5 for the "two or three years" subdivision would result in a non-monotonic relationship between the value of the characteristic and performance and would not be considered for that reason.
Once attributes were created, each characteristic was evaluated for potential inclusion in the model through a process of "forward stepwise regression" applied separately to each scorecard. That technique sequentially chooses from among the 312 available characteristics according to whether inclusion improves the predictive power of the model. When evaluating a characteristic for potential inclusion in the model, all attributes of that characteristic were considered. In some cases, however, some individual attributes were combined to ensure monotonicity in the weights assigned to each attribute.108
As noted earlier, industry practice limits the number of characteristics that are included in a functioning model. The process of determining that number varies across model developers and applications. For the model developed here, the number of characteristics in each scorecard was limited by requiring that the last characteristic added to the model contribute to the predictive power by more than a threshold amount. The threshold was selected somewhat arbitrarily and was defined as a 0.75 percent increase in the "divergence statistic" that results from the inclusion of an additional characteristic.109 For each scorecard, characteristics that were not included in the final model would not have materially improved the predictiveness of the model.110
To carry out the forward stepwise regression, the single characteristic among the full set of 312 whose attributes best predicted performance for individuals on that scorecard was identified. With that characteristic now included in the model, the remaining 311 characteristics were evaluated, and the one among those 311 that most improved the predictiveness of the model was added as the second characteristic. This process was used to select all subsequent characteristics that improved predictiveness by more than the threshold amount.
Once the stopping point has been reached, a second phase commences in which all of the characteristics in the model are tested to determine their individual, marginal contribution to the divergence statistic. That is, for each characteristic that has been included, the divergence statistic for the model is calculated without that characteristic and then with that characteristic. If the divergence statistic does not rise more than 0.75 percent when the characteristic is restored to the model, then that characteristic is dropped. Each time a characteristic is removed, the abridged model is re-estimated to ensure that the contribution of each of the remaining characteristics to the divergence statistic is above the threshold; if it is, then new characteristics are considered for inclusion in the model. New characteristics are added if the percentage improvement to the divergence statistic exceeds 0.75 percent. New characteristics are added until there is no additional characteristic that produces an improvement in the divergence statistic that is above the 0.75 percent threshold, and each included characteristic's contribution to the divergence statistic is above this threshold. As with the rest of the model development process, the characteristic selection process is conducted separately for each scorecard.
The credit-scoring model developed for this study, the "FRB base model," consists of three scorecards that incorporate 19 of the 312 credit characteristics available in the data provided by TransUnion (the 19 characteristics are listed in appendix C). Some credit characteristics appear on more than one scorecard, and the number of attributes associated with them varies (See tables 12.A--C). The thin-file scorecard has 8 credit characteristics and includes 9.9 percent of the individuals in the estimation sample. The clean-file scorecard has 8 credit characteristics and covers 58.9 percent of the estimation sample. The major-derogatory scorecard has 10 credit characteristics and includes 31.2 percent of the sample.
Each characteristic and its associated attributes are assigned a certain number of credit points; the points represent the weight assigned to each characteristic in calculating an individual's credit score. On the thin-file scorecard, for example, the characteristic with the widest range of possible credit points is "total number of public records and derogatory accounts with an amount owed greater than $100." This characteristic has five attributes. The attribute associated with the largest number of possible credit points is "five or more" (that is, five or more public record and derogatory accounts); that attribute accounts for negative 425 points. About 8 percent of the individuals on the thin-file scorecard are associated with this specific attribute. To derive an individual's credit score, one would sum the number of credit points across the various characteristics on the scorecard applicable to that individual.
Because performance is not inherently scaled, a normalization was necessary to estimate the model. In estimating the model here, the dependent variable was defined as a dichotomous variable that took a value of 1000 to represent good performance and zero to represent bad performance and that was estimated using ordinary least squares. Thus, the predicted value from the regression is 1000 times the probability that an individual would have good performance. Scores (or individual predictions from the model) of 1000 represent an estimated probability of 1 that an individual's performance will be "good"; scores of zero represent a probability of 1 that an individual's performance will be "bad." A score of 500 represents an estimated probability that an individual's credit performance has an equal chance of being either good or bad. For the empirical analysis presented in the forthcoming sections of this study, all the credit scores are further normalized to a rank-order scale of zero to 100 (described below). Converting the FRB base score to this normalized score requires a nonlinear transformation (See table 13).
The three scorecards differ greatly from each other in terms of the percentage of individuals who experience bad performance over the 18-month performance period (using the measure of bad performance used to estimate the model). The proportion of individuals on the clean-file scorecard who experienced bad performance was 7.4 percent; on the thin-file scorecard, 34.8 percent; and on the major-derogatory-file scorecard, 64.7 percent (shown earlier in tables 12.A--C). Overall, 28.0 percent of the individuals in the sample experienced bad performance over the 18-month performance period (data not shown in table).
As noted earlier, the industry uses a variety of metrics to assess the ability of a credit-scoring model to position individuals on an ordinal scale (that is, "rank order" them) according to the credit risk they pose. The KS statistic is the primary metric used in this study. The higher the KS score, the better the model separates goods from bads. Overall, the KS statistic for the FRB base model is 73.0 percent, which, according to industry representatives, is in line with other generic credit-scoring models that use the same measure of performance for estimation. The ability of the FRB base model to separate goods from bads is illustrated in figure 1, panel A, where the cumulative distribution of scores for individuals exhibiting good performance over the 18-month performance period is consistently and substantially below the distribution of individuals with bad performance. The figure shows that the cumulative distributions of goods and bads in the FRB base model (panel A) are comparable to those of the TransRisk Score (panel B) and the VantageScore (panel C) as measured over the same population and performance measure.
The ability of the FRB base model to predict loan performance appears to be on a par with other generic credit-scoring models. The ability of the three scorecards to distinguish between the goods and the bads differs significantly: The scorable sample KS statistic for the thin-file scorecard is 72.3 percent; for the clean-file scorecard, 53.4 percent; and for the major-derogatory scorecard, 61.7 percent. Industry experience indicates that this variation in KS statistic is to be expected. The KS statistic for individual scorecards typically varies depending upon, among other things, the specific sample of credit records used to estimate the scorecard, the time period evaluated, and the measure of performance that is used in estimation.
The credit-scoring model developed here is an approximation of the generic credit-scoring models used by the lending industry. As explained earlier, for purposes of the study, this approximation has many virtues. However, it is only an approximation and, for a number of reasons, does not fully reflect industry models.
First, the model developed here divides the sample of credit records into only three scorecards because of the relatively small size of the credit-record sample. To better classify individuals according to credit risk, the industry commonly uses larger samples and more scorecards.111 Second, whereas the performance period used here is 18 months, industry models more commonly use 24 months. Compared with use of the longer period, the use of 18 months produces fewer observations of loans becoming delinquent and reduces somewhat the precision of the model specification. Third, the definition of a "bad" outcome used here is likely quite similar to, but may differ in nuance from, the definition used commonly in the industry because the definition of a "bad" is typically proprietary. Fourth, the determination of the stopping point for adding characteristics to the three scorecards used here was an arbitrary threshold based on the divergence statistic. Industry model developers may use other techniques or select different thresholds to determine a stopping point. Fifth, the 312 characteristics in the credit-record database used here were those provided by the TransUnion; model developers may create and use their own characteristics. Sixth, model developers typically assume a logistic relationship between the predictive characteristics and model performance. For model estimation here, a linear probability model was assumed and estimated with least squares because of data processing costs.112 Finally, model developers have long experience in developing scorecards, and through that experience may have learned to create more effective attributes; as a consequence, the specific attributes of characteristics in the model here may differ from those used in some industry models.