Keywords: International trade, product classification
Empirical researchers including Bernard, Redding and Schott (2010, 2011), Bernard, Jensen, Redding, and Schott (2009), Goldberg, Khandelwal, Pavcnik, and Topalova (2010) and Pierce (2011), increasingly use product-level data to study trends in exports, imports and domestic production. These data have been particularly useful for examining the extent to which firms' growth in output or trade is due to "intensive" versus "extensive" margins, i.e., the degree to which growth takes place within surviving products or via product adding and dropping. At the same time, national statistical agencies frequently update product classification systems to incorporate new goods, drop obsolete categories and harmonize their systems with other countries. Absent a proper concordance, it can be difficult for researchers to distinguish true product-switching from spurious changes to product mix associated with product reclassifications.
In this article we present an algorithm for constructing a concordance among revisions of the Harmonized System (HS) product codes used to track U.S. exports and imports over time. HS codes have been used by the U.S. Census Bureau since 1989 and are updated frequently. Our algorithm matches revised codes to synthetic, time-invariant identifiers that follow " families" of related products. We use our algorithm to construct the first comprehensive concordance of U.S. HS codes over time, covering the period 1989 to 2009. In an electronic appendix, we provide the Stata code used to build the concordance, thereby allowing other researchers the means to customize it or to extend it to incorporate future revisions of HS categories.
Our concordance reveals that changes in HS codes are frequent and widespread, and that they affect product categories representing a substantial portion of trade value. Indeed, of the 16,836 (8,859) import (export) codes active in 2004, 7,503 (2,929) underwent revision between 1989 and 2004-the years examined in Bernard, Jensen, Redding and Schott (2009). Furthermore, these revised codes represent 59 and 43 percent of import and export value in 2004, respectively.
The prevalence and importance of product code changes in U.S. trade underscore the need for HS code concordances in the analysis of trade flows. Using our concordance to control for changes to product categories over time, for example, Bernard, Jensen, Redding, and Schott (2009) show that most of the year-to-year change in U.S. trade - as well as adjustments to "shocks" such as the 1997 Asian financial crisis - occur along the intensive margin.
The algorithm is general enough to be used to create concordances of virtually any national or international product classification system over time. This includes other international trade product classification systems such as the European Union's Combined Nomenclature or the Tariff Schedule of Japan. Moreover, the algorithm can be employed to construct concordances over time for a variety of national or international production-based product classification systems such as the North American Industry Classification System (NAICS), International Standard Industrial Classification (ISIC) or the statistical classification of economic activities in the European Union(NACE).
The remainder of the article is organized as follows. Section 2 provides a brief description of U.S. HS codes. Section 3 describes the data used to construct our concordance and Section 4 outlines the concordance algorithm. Section 5 describes the properties of a 1989 to 2004 HS-over-time concordance created using the algorithm from Section 4. Section 6 shows the effect of using the HS-over-time concordance on the measurement of product-adding and dropping using year-over-year decompositions of U.S. exports as in Bernard, Jensen, Redding, and Schott (2009). Section 7 describes the general applicability of the algorithm to other product classification systems. An electronic appendix on our personal websites provides concordance files in .csv format, as well as the Stata code used to generate the concordances.
U.S. HS codes are based on the Harmonized System established by the World Customs Organization (WCO). The WCO assigns 6-digit codes for general categories, and countries adopting the system then define their own codes to capture commodities at more detailed levels. In the United States, the most detailed level of disaggregation is ten digits. In this article, we refer to ten-digit codes as "product" or "goods" categories. U.S. export codes-technically referred to as Schedule B codes-are administered by the United States Census Bureau (Census). U.S. import codes-technically referred to as Harmonized Tariff System (HTS) codes-are administered by the U.S. International Trade Commission (USITC). We refer to HTS and Schedule B codes together as "HS Codes" throughout this article.
Changes to U.S. export or import product codes can occur via three routes: changes by the WCO to the official list of international six-digit prefixes; U.S. legislation that affects U.S. eight-digit codes (imports only); and changes by the Committee for Statistical Annotation of Tariff Schedules (known as the "484(f) Committee") to statistical ten-digit codes.
HS codes are updated for several reasons. The WCO, for example, makes adjustment to the HS to reflect developments in technology and changes in trade patterns. In addition, the 484(f) Committee may split a single HS code into several new codes in order to report import or export data at a more detailed level. Similarly, producers may petition one of the official bodies noted above for code changes to obtain a higher profile for the goods they export or import.
A large number of changes in 10-digit U.S. HS codes can be attributed to the WCO's revisions of 6-digit HS categories. The WCO has made three major revisions to the HS in 1996, 2002, 2007, with another revision planned for 2012. Each of these revisions resulted in hundreds of 6-digit HS categories being deleted, while hundreds of other 6-digit HS categories were added. The effect of the WCO's revisions on the number of U.S. HS changes is apparent in Table 1, where a large number of HS changes are concentrated in WCO revision years.
Each year, Census publishes documents outlining the HS codes that have become "obsolete" and the "new" codes that will take their place. We refer to these documents as Census' " obsolete-new" files. For exports, HS code changes take effect annually in January; for imports, they can occur within as well as across years. Obsolete-new files for years before 1997 are available only in hard copy and were transcribed into electronic form as part of the construction of our concordance. These files as well as electronic versions of subsequent files were obtained from Mayumi Hairston Escalante at Census. The most recent obsolete-new files are currently posted on the Census website.
We use the terms "simple" and "complex" to describe the two basic changes to HS codes that can occur in a obsolete-new file. Simple changes make no adjustments to the actual items covered by a particular code, they just swap one ten-digit code for another. There are several possible reasons for a one-to-one renumbering, including:
In contrast to simple changes, complex changes alter the mix of items captured by a particular code. For these changes, the items formerly encompassed by one or more "obsolete" codes are distributed to one or more " new" codes. In 2002, for example, various types of waste oil, which previously were grouped with the fresh oils to which they were most similar, were given their own HS codes. As a result, the (now obsolete) former fresh oil product categories were linked to the new waste oil categories from which they emerged. Some new-obsolete files contain "blanket" mappings, our term for mappings that include codes ending in a series of X's, e.g., 8486XXXXXX. These observations are dropped from our concordance, as we are unable to determine the specific HS codes to which they refer.
For each set of obsolete-new mappings in a particular obsolete-new file, we construct a synthetic HS code which we refer to as a " setyear" (setyr in our Stata code). This synthetic code records both the count of the change since the first change in 1989 and an identifier for when it takes place. Formally, for exports, it is defined as the count of the particular mapping plus the four-digit year in which the change occurs divided by 10,000. For imports, it is the count of the particular mapping plus six-digit year-month in which the change occurs divided by 1,000,000. The very first setyears for exports and imports, for example, are equal to 1.1989 and 1.198906.
Table 3. summarizes the number of obsolete-new mappings in the raw data for export and import codes, respectively. Results for export codes are displayed in the left panel while those for import codes are displayed in the middle and right panels. The first column of each panel notes the year-month in which the noted changes take place. The second and third columns report the total number of retired and replacement codes encompassed by the number of sets reported in column four. Note that the number of sets in column four of each panel is smaller than the numbers of HS codes in columns two and three because multiple codes are often involved in a particular change (i.e., a particular set). The fifth column reports the number of changes that are "simple" in the sense outlined above.
As indicated in the table, HS codes are updated unevenly in the sense that some years (e.g., 2002) encompass substantially more changes than others (e.g., 2000).
Concording HS codes over time is complicated by the existence of chains of HS-code changes across months and years, which we refer to as "family trees". There are two basic types of family tree. We refer to the first case, displayed in Figure 4., generically as a "growing family tree". In this case, code from period may become obsolete and be mapped to new codes and in period . Then, in period , codes and may become obsolete and be mapped to new codes and , and and , respectively. Our concordance of the period to period HS codes assigns a common synthetic code to all HS codes in a growing family tree. Such an assignment may result in potentially many more HS codes being mapped to a given synthetic code in the final year of the concordance than in the first year. In 1997, for example, 7802000000 is mapped to 7802000030 and 7802000060. In a 1996 to 1997 concordance, we would assign a single synthetic HS code to all of these actual HS codes. For this reason, it may be useful for some analyses to restrict a concordance to a narrower set of years than the 1989 to 2009 concordance provided below.
The second type of family tree, which we refer to generically as a "shrinking family tree", is displayed in Figure 4.. In this case, codes and , and and , from period separately become obsolete and mapped to codes and , respectively, in period . Then, in period , codes and become obsolete and are assigned to new code . In this case, the number of HS codes mapped to the family's common synthetic code declines over time. In 1997, for example, 8506800010 and 8506800050 are mapped to 8506800000. In a 1996 to 1997 concordance, we would assign a single synthetic HS code to all of these actual HS codes.
Notes: Table reports changes to export (left panel) and import (middle and right panel) HS codes in noted year-month. Obsolete is number of codes retired from prior year. New is number of codes replacing these retirements. Sets is a count of the overall number of obsolete-new matches. Simple refers to re-numberings of individual codes.
The algorithm we develop for concording HS codes between arbitrary beginning and ending year-months accounts for both types of family trees, as well as combinations of the two types. Though specific details about how the algorithm is implemented can be determined by examining the Stata code in the electronic Appendix, the basic steps are as follows:
Step four is accomplished by successively merging subsequent obsolete-new mappings to all periods' obsolete-new mappings between the beginning and end years of the concordance. To bridge codes used from 1989 onwards, for example, the chained file is constructed as follows. First, merge the new codes in the 1990 file to the obsolete codes in 1991 file, dropping any codes that are unique to 1991. Second, merge the obsolete codes in the 1992 file to the new codes in the previously merged 1990-1991 file, again dropping any codes unique to 1992. This procedure is then repeated until reaching the desired end year of the concordance. Note that this successive merging has to be done starting with every year-month between the beginning and ending year-month because chains can begin in any year-month, and they would be missed otherwise given the dropping just mentioned. After these chains are created, they are appended into a single file and added to all obsolete-new mappings that are not parts of a chain.
This section describes a 1989 to 2004 concordance constructed using the algorithm described above, which was employed in Bernard, Jensen, Redding, and Schott (2009). The first and second columns of Table 2 summarize total U.S. exports in 1989 and 2004 and the total number of HS product categories exported in those two years, respectively. Columns three and four provide analogous detail with respect to U.S. imports. As indicated in the table, (nominal) exports more than double while (nominal) imports more than triple over the fifteen-year interval. The number of preconcordance export and import HS codes observed in each year of data grows 13 percent and 21 percent, respectively.
|Exports Value||Exports Codes||Imports Value||Imports Codes|
Notes: Export and import values in billions of U.S. dollars. Number of codes refers to number of original ten-digit HS categories in the raw trade data.
Table 3 reports two decompositions of export and import codes. The first three rows of the Table show how many of the original HS codes in each year survive versus being replaced by synthetic codes. The remaining rows in the table decompose the actual plus synthetic codes that remain after the concordance into those which are common across years and those which are idiosyncratic to a particular year.
|Exports 1989||Percent||Exports 2004||Percent||Imports 1989||Percent||Imports 2004||Percent|
|Original HS codes||7853||100||8859||100||13941||100||16836||100|
|Not replaced by synthetic codes||5936||76||5930||67||9383||67||9333||55|
|Replaced by synthetic codes||1917||24||2929||33||4558||33||7503||45|
|Actual + synthetic codes after concordance||7162||91||7157||81||12527||90||12534||74|
|Common to both years||5904||75||5904||67||9047||65||9047||54|
|Appear in only one year||32||0||26||0||336||2||286||2|
|Common to both years||1221||16||1221||14||3057||22||3057||18|
|Appear in only one year||5||0||6||0||87||1||144||1|
Notes: Table decomposes the number of original HS codes in each year into those replaced by a synthetic code versus not, and total surviving HS plus synthetic codes in each year into noted sub-groups. All replacements are with respect to a 1989 to 2004 concordance. Even columns display values as a percent of first row in preceding column.
Of the 7,853 original HS codes appearing in the 1989 U.S. export data, for example, 1,917 are replaced by synthetic codes. Since the same synthetic code is often assigned to more than one original code, the resulting concorded dataset contains 7,162 actual plus synthetic codes. Of these, 5,936 and 1,226 are actual and synthetic, respectively. Each of these totals, in turn, can be broken down into actual codes which are common to both 1989 and 2004 (5,904), synthetic codes that are common to both 1989 and 2004 (1,221), actual codes unique to 1989 (32) and synthetic codes that are unique to 1989 (5). These breakdowns reveal that the number of actual and synthetic export and import goods actually added and dropped between 1989 and 2004 is relatively small.
The values of U.S. exports and imports associated with each of the cells in Table 3 are reported in Table 4. As indicated below, synthetic codes account for the majority of import value in both 1989 and 2004.
|Exports 1989||Percent||Exports 2004||Percent||Imports 1989||Percent||Imports 2004||Percent|
|Original HS codes||353765||100||817936||100||468012||100||1460160||100|
|Not replaced by synthetic codes||222293||63||467854||57||196051||42||600941||41|
|Replaced by synthetic codes||131472||37||350082||43||271961||58||859219||59|
|Actual + synthetic codes after concordance||353765||100||817936||100||468012||100||1460160||100|
|Common to both years||204570||58||448183||55||193451||41||588628||40|
|Appear in only one year||17723||5||19672||2||2600||1||12314||1|
|Common to both years||131405||37||347416||42||270859||58||855029||59|
|Appear in only one year||67||0||2666||0||1103||0||4190||0|
Tables 3 and 4 also underscore the prevalence of changes in HS codes over time. As of 2004, 45 percent of import products and 33 percent of export products had been involved in an HS code change since 1989. Moreover, trade in products with code changes accounted for 59 percent of the value of U.S. imports and 43 percent of the value of U.S. exports in 2004.
We note that two features of Census' new-obsolete mappings complicate the identification of new product introductions (e.g., iPods). First, new HS codes always emerge from predecessor HS codes. Second, new HS codes' emergence may take place an unknown period of time after an underlying good has been introduced. Statistical agencies may wait to establish a new HS category until it reaches a certain size or until manufactures apply sufficient lobbying.
In this section we illustrate the importance of controlling for HS code reclassifications when measuring product adding and dropping in U.S. export data. In Table 6. below, we present the value and share of U.S. exports associated with product adding and dropping, both with and without controlling for changes in HS codes over time. The top portion of the table reports results with unadjusted HS codes and the bottom portion reports results after controlling for HS code reclassifications using our concordance We report these results for two-year periods between 1993 and 2003 as in Bernard, Jensen, Redding, and Schott (2009).
The figures reported in Table 5 were generated using publicly-available product-level U.S. export data. At this level of data aggregation, product adding refers to an instance in which the U.S. does not export a product in the beginning year of the period, but does export that product in the end year. Similarly, product dropping refers to an instance in which the U.S. did export a product in the beginning year, but did not export that product in the end year.
|No concordance: Added products||11934||63662||108544||15735||25009||4338||1484||4593||92395||4587|
|No concordance: Added products (% Beginning year exports)||2.6%||12.4%||18.6%||2.5%||3.6%||0.6%||0.2%||0.6%||12.6%||0.7%|
|No concordance: Dropped products||11028||52010||102890||16547||24907||4114||1954||4920||101289||5357|
|No concordance: Dropped products (% Beginning year exports)||2.4%||10.1%||17.6%||2.7%||3.6%||0.6%||0.3%||0.6%||13.9%||0.8%|
|With concordance: Dropped products||360||53||963||713||522||220||477||208||683||420|
|With concordance: Dropped products (% Beginning year exports)||0.1%||0.0%||0.2%||0.1%||0.1%||0.0%||0.1%||0.0%||0.1%||0.1%|
|With concordance: Added products||276||15||900||26||2172||2573||6||1937||44||41|
|With concordance: Added products (% Beginning year exports)||0.1%||0.0%||0.2%||0.0%||0.3%||0.4%||0.0%||0.2%||0.0%||0.0%|
|Net Intensive Margin Growth||46652||58963||34142||65583||-7225||12122||88068||-49065||-28874||31256|
|Net Intensive Margin Growth (% Beginning Year Exports)||10.04%||11.51%||5.86%||10.53%||-1.05%||1.78%||12.71%||-6.29%||-3.95%||4.51%|
|Net Intensive Margin Growth||47641||70653||39860||65457||-8773||9993||88068||-51122||-37130||30865|
|Net Intensive Margin Growth (% Beginning Year Exports)||10.25%||13.79%||6.84%||10.51%||-1.28%||1.47%||12.71%||-6.55%||-5.08%||4.45%|
Notes: Table displays the value of U.S. exports associated with added and dropped products over two-year time periods where products are defined both without and with the HS-over-time concordance. Rows for "Added Products" and "Dropped Products" are measured in Millions of U.S. Dollars. Additional rows report the value associated with added and dropped products as a share of the total value of exports in the beginning year of each two-year period.
As can be seen in the table, the value of exports associated with product adding and dropping is greatly overstated in the "no concordance" case with unadjusted HS codes. The reason for this overstatement is intuitive-some of the products that appeared and disappeared during each two-year period were due to changes in HS codes, rather than the U.S. starting or stopping exporting those products. This phenomenon is particularly pronounced in time periods with many HS code changes such as 1995-1996 and 2001-2002. In the period from 1995-1996, for example, export data with unadjusted HS codes indicate that product adding (dropping) equaled 19 percent (18 percent) of the value of 1995 exports. After using the concordance, the shares of 1995 exports associated with product adding and dropping were 0.2 percent each.
This example illustrates the importance of properly controlling for changes in HS codes in research examining product-adding and dropping. Indeed, accounting for these changes in HS codes contributed to Bernard, Jensen, Redding, and Schott's (2009) finding that most of the year-to-year changes in U.S. trade values occurred along the intensive margin associated with surviving products, rather than the extensive margin associated with product-adding and dropping.
The algorithm described in this article can be used to create a concordance for any product classification system over time so long as the associated statistical agency periodically makes available mappings of obsolete and new codes. Given this information, the process of assigning product codes to families will be identical to that described above, and it should be fairly simple to adapt our Stata code to cover any idiosyncrasies.
For example, the algorithm could be applied to other international trade product classification systems such as the European Union's Combined Nomenclature (CN) codes. Changes to the CN are published annually in the L-series of the Official Journal of the European Communities. Application of our method would permit evaluation of the EU's product-level exports and imports on a consistent basis over time. Moreover, it is possible to apply the algorithm to more aggregated levels of international trade product classification systems, such as the 6-digit HS codes defined by the WCO.
Our algorithm can also be applied to track changes in production-based industry classification systems such as NAICS (North America) or NACE (EU). The U.S. Census Bureau, for example, publishes correspondence tables for the various revisions to NAICS, and these can be used to identify "families" of industry codes over time. The analogous information for NACE is published by Eurostat with each NACE revision.
Controlling for changes in product codes over time is critical in the growing body of research examining firms' product-mix choices. In this article, we present a concordance algorithm that can be used to track changes in product codes and generate time-consistent " synthetic" codes. We use this algorithm to generate the first complete concordance of changes in U.S. HS codes over time. We also describe the prevalence of changes in HS codes over time, underscoring the importance of controlling for these changes in empirical research. Lastly, we provide an electronic appendix containing the final concordance files, as well as Stata code that can be used to customize this and other product code concordances.
This appendix describes the files contained in the electronic appendix available online at:
http://www.som.yale.edu/faculty/pks4/sub_international.htm. All files are contained in a zip folder with filename hs_concordance_20101020.zip.
The files hts.do and schedule_b.do contain our algorithm for creating import and export HS concordances, respectively, for arbitrary beginning and ending year-months between 1989 and 2009. Those comfortable with Stata programming should find these files relatively easy to manipulate. Those unfamiliar with Stata programming can instead use one of the output files described below.
The file trade_merge.do is a Stata program that matches our HS-over-time concordances to publicly available U.S. trade data. Researchers may find this example useful when employing the concordances in their own research. In addition, this Stata program produces some of the output files described below.
Each Stata program requires as an input a data file containing the raw obsolete-new mappings discussed in the main text. These input files are named sch_b_concordances_20100522_02.dta and hts_concordances_20100522_02.dta, respectively, where 20100522 is the user-defined version date. The basic structure of these input files resembles the raw obsolete-new files; i.e., each set of obsolete HS codes is followed by the new set of HS codes into which they map. In this sense, researchers who wish to examine a simple record of changes to HS codes, as reported in the official obsolete-new releases may find these files useful. The files contain the following variables:
The Stata programs described above produce the output files that can be used to concord HS codes in U.S. import and export data Specifically, the code produces output files:
where BEG and END reflect beginning and end years (exports: 1989_2009) or year-months (imports: 198906_200907), respectively. These concordances include the same variables as the input files, but with setyr and effyr standardized across family trees, as described in Section 4 above. Variables in the concordance output files include:
The files simple_hts_198906_200907.dta and simple_schedule_b_1989_2009.dta provide the setyear for all HS codes that have experienced changes between 1989 and 2009 for imports and exports, respectively. The files have a simple two-column format where the first column reports the HS code that has experienced a change between 1989 and 2009 and the second column provides the setyear for that HS code. Researchers can merge this file by HS code with product-level trade data and easily assign a setyear to any HS codes that have been changed. HS codes not appearing in these output files are consistent across all years of the data.
In almost every case, this simple concordance is one-to-one, in the sense that each HS code maps to a single setyear. However, six (two) HTS (Schedule B) codes were listed as obsolete in one year and then "reappeared" as new codes in a later year with a different setyear. Each of these HS codes, therefore, has two setyears. The dates given in the setyear indicate the years in which they became active. These duplicate HS codes are: HTS - 2905492000, 5112196010, 5112196020, 5112196040, 5112196050, 7304390040; Schedule B - 481190900, 9027501000.
The files setyr_x_1989_2009.dta and setyr_m_1989_2009.dta, provide a record of every HS code associated with every setyear that appears in the 1989-2009 concorded data. The first column of each file lists the setyears, sorted from low to high. Each additional column lists the actual HS codes appearing in a particular year of the trade data that should be replace by the setyear. These actual HS codes also are sorted from low to high in each year. To concord U.S. trade data from 1989 to 2009, one would just replace all codes listed in the table with the synthetic setyear, and then collapse the data according to these setyears. HS codes not appearing in these output files are consistent across all years of the data.