Section 215 of the Fact Act asks for several empirical analyses regarding the relationships between credit scores and other factors for different demographic populations. These include an analysis of the empirical relationship between credit scores and actual losses experienced by lenders; an evaluation of the effect of scores on the availability and affordability of credit; and an evaluation of whether credit scoring in general, and the factors included in credit-scoring models in particular, may result in negative or differential effects on specific subpopulations and, if so, whether such effects could be mitigated by changes in the model development process.
As noted earlier, there has been little research previously on these topics because reliable data for conducting such research is not readily available. Creditors generally are prohibited from collecting race, ethnicity, and other personal demographic information on applications for credit, except in the case of mortgage credit. Even in the context of mortgage credit, only limited information can be collected.78 Likewise, with the exception of dates of birth, the credit records maintained by the credit-reporting agencies do not include any personal demographic information. The empirical analysis advanced in this section uses data from the Social Security Administration (SSA) to supplement credit-record data to make this research possible.
The analysis presented here to address the empirical issues raised in the Fact Act is conducted using a large, nationally representative sample of individual credit records drawn from the credit records maintained by TransUnion. The credit-record data are supplemented with information on personal demographic and economic characteristics obtained from records maintained for other purposes by the SSA and other sources. In addition, two commercially available credit scores for each individual were provided by TransUnion. Both the credit scores and the credit-record data were obtained for individuals as of two dates separated by 18 months. For that period, the information was sufficient to assess loan performance, to identify which individuals were able to obtain new credit, and to determine the pricing on a portion of those new loans.
The assembled data set was used to address questions related to the relationship between credit scores and actual losses experienced by lenders (proxied by loan performance) and the effect of scores on the availability and affordability of credit. Addressing the question of possible differential effects on different populations was more complicated. As noted earlier, it was determined that this issue could be best addressed by the development of an original credit-scoring model. The information in the assembled data set was sufficient to estimate a generic credit history scoring model using a method that emulated standard industry definitions and procedures. The analysis of possible differential effect across populations relies on the estimated model with the estimation procedures varied in several ways designed to investigate various aspects of this issue.
This section presents background information on discrimination and lending and discusses the concept of differential effect as used in the present study. The three subsequent sections describe the data set and the process used to develop the credit-scoring model used in this study; present results related to the relationship between credit scores on the one hand and loan performance and credit availability and affordability on the other; and present the results related to an assessment of differential effect.
Under the Equal Credit Opportunity Act (ECOA), it is unlawful for a lender to discriminate against a credit applicant on a prohibited basis in any aspect of a credit transaction. The prohibited bases under ECOA include race, color, religion, sex, national origin, age, and marital status. Under both ECOA and the Fair Housing Act (FHA), it is unlawful for a lender to discriminate on a prohibited basis in a transaction related to residential real estate, although the prohibited bases under the FHA differ somewhat from the prohibited bases under ECOA.79
Unlawful discrimination on a prohibited basis can take a variety of forms, such as
A creditor may not express, orally or in writing, a preference for applicants on a prohibited basis or indicate that it will treat applicants differently on a prohibited basis. A creditor may not discriminate on a prohibited basis because of the personal characteristics of a person associated with a credit applicant (for example, a co-applicant, spouse, business partner, or live-in aide) or the present or prospective occupants of the area where property to be financed is located. Finally, the FHA requires lenders to make reasonable accommodations for a person with disabilities when such accommodations are necessary to afford the person an equal opportunity to apply for credit.
Despite the existence of federal (and state) antidiscrimination laws, longstanding concerns about discrimination in credit markets persist regarding all aspects of the lending process, including marketing, credit evaluation, the establishment of loan terms, and loan servicing.
From a legal standpoint, discrimination in lending generally involves the concepts of "disparate treatment" and "disparate impact." Disparate treatment is deemed to have occurred when a lender treats similarly situated applicants differently based on one of the prohibited factors (for example, offering less favorable terms to minority applicants).80 Disparate impact occurs when a practice that the lender applies uniformly to all applicants has a discriminatory effect on a prohibited basis and does not have a sufficient business justification.
Discriminatory treatment is considered intentional if the lender takes into account the protected characteristic of the individual subject to the discriminatory treatment. Allegations of disparate impact do not presume intentional behavior but rather simply assert the existence of a disproportionate adverse effect on a protected group.
Some observers maintain that increased reliance on automated credit-evaluation systems, including credit scoring, serves to reduce the potential for discrimination in lending because the automated nature of the process reduces the opportunities for bias, whether overt or inadvertent, to influence lending outcomes. Others have expressed the view that the credit-scoring process itself and some of the factors within credit-scoring models may disadvantage minorities or other segments of the population protected by fair lending laws.81
The Federal Reserve's Regulation B, which implements ECOA, notes that there are two broad types of credit evaluation: (1) traditional judgmental credit-evaluation systems, which may rely on the subjective evaluation of loan officers, and (2) credit-scoring systems that are empirically derived and demonstrably and statistically sound.82 A judgmental system may rely on a traditional, subjective evaluation by loan officers.
A "credit-scoring system" is a system that evaluates an applicant's creditworthiness mechanically, based on key attributes of the applicant and aspects of the transaction, and that determines, alone or in conjunction with an evaluation of additional information about the applicant, whether the applicant is deemed creditworthy. Section 202.2(p) of Regulation B sets forth several criteria that a credit-scoring system must satisfy to be considered an empirically derived, demonstrably and statistically sound credit-scoring system. First, the system must be based on data that are derived from an empirical comparison of sample groups or the population of creditworthy and noncreditworthy applicants who applied for credit within a reasonable preceding period of time. Second, the system must be developed for the purpose of evaluating the creditworthiness of individuals with respect to the legitimate business interests of the creditor utilizing the system. Third, the system must be developed and validated using accepted statistical principles and methodology. Fourth, the system must be periodically revalidated by the use of appropriate statistical principles and methodology and adjusted as necessary to maintain predictive ability.
The data from which to develop such a system may be obtained from either a single credit grantor or multiple credit grantors. A creditor is responsible for ensuring its system is validated and revalidated based on the creditor's own data.
An empirically derived, demonstrably and statistically sound credit-scoring system may include age as a predictive factor (provided that those aged 62 or older are not assigned a negative factor or value). Besides age, no other prohibited basis may be used as a factor in a credit-scoring model.
Developers of credit-scoring models may not legally consider race, ethnicity, or other prohibited bases in model development. Thus, so long as the models do not include these characteristics, it is very unlikely that the use of credit scoring would result in discriminatory treatment. Of course, discrimination could arise if lenders fail to apply credit scores evenhandedly, ignore the scores, or exercise overrides for some populations or in some circumstances. These scenarios, however, are beyond the scope of this study.
Under the law, the test for disparate impact requires that a practice have a disproportionate impact on a protected population without a sufficient business justification for that impact. In a well-designed, empirically derived, demonstrably and statistically sound credit-scoring system, the attributes in the model must have a clear predictive value and a sufficient business rationale. The issue of disparate impact may arise, however, if an alternative approach or specification can achieve the business objective with less discriminatory effect or if the predictiveness of the variable stems primarily from the fact that it is a proxy for a protected population.
A banking bulletin issued by the Office of the Comptroller of the Currency (OCC) regarding credit scoring discusses in some detail the circumstances that can lead to disparate impact in the use of credit scoring.83 According to the OCC,
"Disparate impact may occur in a credit scoring system when:
Each of those circumstances must be present to violate fair lending laws under "disparate impact."
Relatively little research has been undertaken to assess the potential disparate impact of credit scoring.84 Fair Isaac conducted such an analysis assessing the potential disparate impact of credit scoring using a nationally representative sample of roughly 800,000 credit records of individuals obtained from TransUnion.85 Because the personal characteristics of the individuals were not known to Fair Isaac (or TransUnion) the Zip code for the individual's place of residence was matched to 1990 census data to determine the proportion of minority population (black or Hispanic) where the individual lived. In the study, areas with relatively large minority populations (70 percent or more) were termed "high-minority areas."
One area of concern addressed in the study is that certain population segments may be underrepresented in the credit-record files maintained by the national credit-reporting agencies and that, as a consequence, credit-scoring models developed from these data may not provide an accurate indication of the credit use, and therefore credit risks, posed by these underrepresented populations. The Fair Isaac analysis found that there was a reasonably close correspondence between the share of minority population residing in areas with a high concentration of minorities and the overall share of credit records from individuals in such areas. This was taken as an indication that generic credit-scoring model development is based on credit records that reflect a wide range of racial and ethnic groups.
The analysis also revealed that the share of individuals from high-minority areas with relatively low credit scores was about twice as large as the share of individuals from other areas. The research further found that for the high-minority areas and other populations, credit scores performed well in rank ordering future loan performance. Finally, the analysis built separate scorecards for individuals residing in high-minority areas and for the sample as a whole and found that there were no factors that were predictive in one scorecard that were not predictive in the other and that the predictive factors aligned quite well in descending order of importance in both scorecards. The analysis concluded that Fair Isaac credit scores are both effective and "fair" in assessing risk for both populations.
In the previous section, the phrase disparate impact was used to refer to the possible differential adverse effects that credit-scoring models may have on various groups in a legal context. In this section, we define more precisely the meaning of the term differential effect as used in the statistical analysis of the present study. Although related, the legal term "disparate impact" and the term "differential effect" used here are not the same. The concept of disparate impact embodies specific legal criteria, must be applied on a case-by-case basis, and must consider all relevant facts and circumstances, including any business justification. The concept of differential effect used here is a statistical concept and does not necessarily correspond to the legal concept.
The congressional directive does not distinguish between legal or illegal disparate impact (refer to appendix A). Rather, it focuses on the potentially adverse effect that credit scores may have on classes of individuals grouped by personal demographic, economic, and locational characteristics; some of those effects may potentially be illegal, and some may not.
The first step in defining the phrase "differential effect" is to define "effect." In developing a statistical credit-scoring model to predict credit performance, "effect" represents an association between the demographic characteristic (for example, age) and credit performance, controlling for the predictive factors in the model that are related to credit performance in a demographically neutral environment. Thus, "effect" cannot exist unless the demographic characteristic itself (for example, age) is related to (or correlated with) credit performance. An implication of this is that individuals of different ages would not be expected to have the same average performance after controlling for the predictive factors in the model that are related to credit performance in an age-neutral environment.
This definition is a purely statistical one and does not imply causality in the relation between the demographic characteristic and credit performance; for example, it may reflect variables that are not included in the model. Thus, the concept of effect is model specific and, indeed, will depend on the specific sample and methodology used to measure performance as well as on the set of predictive factors included in the model.
If the demographic characteristic, such as age, is not used explicitly in developing a credit-scoring model, one of three outcomes is possible. First, a set of predictive factors in the model may be highly correlated with age and effectively serve as a proxy for age in predicting performance. These factors will be assigned weights that will reflect their direct effect on predicted performance (in an age-neutral environment) as well as their role as proxies for age. If these predictive factors are perfect proxies, they will absorb the entire effects of age on performance. If so, there will be no difference in the expected performance of individuals with the same credit scores and different ages.
The second possibility is that none of the predictive factors in the model are correlated with age. If so, the weights assigned these factors will reflect purely their direct effects on predicted performance, and the scoring model itself will not reflect any of the correlation between age and performance. Factor weights in the model would be the same as those estimated in the age-neutral environment. In this case, individuals with the same score but different ages would not be expected to perform the same. For individuals of different ages, the expected difference between their actual performance and their performance predicted on the basis of the model would represent the "age effect" on performance.
The third possibility is a hybrid of the first two. That is, the predictive factors are imperfect proxies for age. If so, factor model weights will reflect some, but not all, of the effects of age on performance. Here, as in the second case, one would not expect individuals of different ages but with the same scores to perform equivalently; however, the expected differences in performance will be smaller than in the second case, in which the predictive factors absorb none of the age effects.
In the cases just described, the issue is the extent to which the predictive factors in the model represent, or "absorb," the effects of age. For the most part, the extent will depend upon the correlations between age and the predictive factors in the model. If the correlations are high, one would expect the model to absorb much of the effects of age; if the correlations are low, one would expect the model to absorb little of those effects.
The above discussion defines "effect" as used in the term "differential effect." The "differential" portion of the term is a relative concept used for subgroups of a population, not a population as a whole. It focuses on the portion of the effect of a characteristic that is absorbed by other characteristics. Specifically, use of model A can be defined to have differential effect for a specific subgroup, such as younger individuals, relative to model B as follows: The absorbed component of the age effect in model A is larger than the absorbed component of the age effect in model B, and, as a consequence, younger individuals have lower credit scores (that is, higher risk assessments) with model A, controlling for credit performance, than when model B is used.
Defined this way, differential effect will generally be a zero-sum outcome. For example, if good credit performance is positively related to age, then the less a credit-scoring model absorbs age effects, the higher the scores of younger individuals will be. Alternatively, the more a model absorbs the age effects, the higher the scores of older individuals will be. If younger individuals were the focus of attention, then use of a credit-scoring model that absorbs a substantial portion of the age effect would be described as having a differential effect for that group as compared with a model in which less of the age effect is absorbed.
In general, the subgroups that, all else being equal, perform worse will be those most likely to show negative differential effect when a group characteristic is used. However, patterns can be complex. Totally different groups may be affected when a credit-scoring model absorbs the effects of a population characteristic if they happen to have credit profiles similar to those of the portion of the population group with poor performance. For example, if older recent immigrants have short credit histories (as do younger individuals), and length of credit history earns a place in a credit-scoring model only because it absorbs the impact of age, then older recent immigrants may also experience a differential effect from the use of this credit characteristic. In this case, the adversely affected subgroup need not show poor performance.
The concept of differential effect used here applies only to the group as a whole; the outcomes for specific individuals will vary when different models are employed. It is only on average that younger individuals will be more adversely affected (receive lower scores) the more a model absorbs age. Depending upon their specific financial experiences, some younger individuals may have higher scores the more age is absorbed into the model.