Native language and acquired language as determinants of product-level trade
Abstract
The present paper estimates response parameters of bilateral export flows at the disaggregated Harmonised System 6-digit level to two fractional variables reflecting the language commonality between exporting and importing countries common native language versus common acquired language. In particular, the response to common acquired language is shown to vary systematically with the price (and trade) elasticity of demand across products and with the degree of product differentiation and competition across products. These findings support the notion that there is a relatively important role to play for foreign-language acquisition in stimulating cross-border trade and international integration, which are known to induce (average) consumer gains.
1 INTRODUCTION
Common language features among the most prominent drivers of bilateral trade (see Melitz & Toubal, 2014) and measures thereof are now a customary ingredient in the trade-cost function in the literature of so-called gravity models (see Egger & Lassmann, 2012, 2015). While common language is often portrayed by way of a binary variable indicating whether two countries share a common official language or not, there are two deeper ways in which language commonality should actually matter—through common native language as a measure of common cultural heritage and common spoken language as a measure of communication proficiency between the citizens of the same or of different countries. The results in Melitz and Toubal (2014) suggest that, across a wide range of countries, communication proficiency appears to be relatively more important than cultural heritage for stimulating cross-border trade transactions. The overview in Egger and Toubal (2016) provides further support for these conclusions.
The present paper scrutinises on the latter result by providing evidence on the responsiveness of trade at the disaggregated product level to common native language versus common acquired language. It does so by conducting a two-step analysis, where the response parameters of bilateral exports at the extensive and, alternatively, the intensive country margin for each product are estimated (i.e., the responsiveness of the probability of positive exports versus the volume of positive exports) in a first step, and these response parameters are used as dependent variables in an exploratory analysis in a second step.
The main results of this paper may be summarised as follows. First, the partial effect of an increase in the prevalence of speaking an acquired or native common language on bilateral exports varies to a large extent across product categories. In particular, there is statistical significance about positive language coefficients of either type at the extensive margin of bilateral product-level exports but not so at the intensive margin (the value of positive bilateral product-level exports). Overall, language commonality appears to have little bearing for the intensive margin of exports when estimating gravity-type regressions at the product level. Hence, findings which point to positive effects at the aggregate (country-pair) level may be an artefact of aggregation, and common spoken language appears to be mainly important as a fixed-cost-reducing factor for the propensity to enter a market. On average, common acquired language coefficients are larger than common native language coefficients at the extensive export margin across products. This is the case to an even stronger extent for differentiated products relative to other (in particular, homogeneous) ones. On average, the importance of common spoken (acquired or native) language rises gradually with the degree of product differentiation—as it is measured by the trade elasticity for a product.
The remainder of the paper is organised as follows. The subsequent section provides a brief overview of related earlier work to the present paper. Section 3.1 outlines the empirical approach. Section 3.1.1 summarises the results. The last section concludes with a brief summary.
2 AN OVERVIEW OF THE RELATED LITERATURE
There is widespread agreement that common languages and cultural proximity play an important role in overcoming informal barriers to international trade (see Anderson & van Wincoop, 2004). As shown by Hummels (1999) or Eaton and Kortum (2002), the absence of a common language is a transaction cost in international exchange. Both papers estimate the impact of language on bilateral trade using a gravity framework and introduce a dummy variable that is unity, if two countries share a common official language and zero otherwise. Results from both studies imply a tax equivalent on bilateral trade of about 7% when the elasticity of substitution between products is assumed to be σ = 8 at the aggregate level. The positive effect of common official language is confirmed in many studies as suggested by Egger and Lassmann (2012). In their meta-analysis based on 701 language coefficients on bilateral trade, they find that sharing a common language induces an average positive partial impact (before any general-equilibrium adjustments) on bilateral trade flows which varies between 33% and 44% depending on the control variables in the regression models.
Two comments are in order with regard to these results. First, the binary official language indicator does not reflect the significance of all linguistic factors, because it focuses solely on indirect communication through (official) translation. It therefore fails to capture a large part of other linguistic aspects, such as the ease of communication by speaking a language (eventually another one than the official ones) or of ethnic ties and trust which cohabit with common native languages. Melitz (2008) provides a first analysis on the channels through which a common language promotes bilateral trade. In particular, he provides evidence that direct communication is far more important in explaining bilateral trade than translation, and his results emphasise the role of common spoken language as a means to facilitate trade through direct communication (rather than translation). In subsequent work, Melitz and Toubal (2014) provide an important step towards the understanding of the impact of common language on bilateral trade based on data on 42 common native and spoken languages in a large number of 195 countries.1 They find that the joint impact of different aspects of common language is at least twice as large as the one of common official language. Their findings, moreover, suggest that common spoken (acquired plus native) language is particularly important—in comparison with native language alone, and the ease of communication plays a substantial role in explaining the role of common language for bilateral trade.
Second, the tax equivalent of any form of language commonality on bilateral trade depends on the elasticity of substitution between all goods and, hence, on the degree of competition on product markets. Economists have found that product differentiation is associated with a greater need of information by customers (see Rauch, 1999), and we would suspect that the greater degree of product complexity associated with differentiation raises the importance of language commonality as a means of information provision. However, much less is known on the latter in the context of the importance of acquired versus native language commonality, a gap the current paper aims at filling.
3 DATA AND FACTS
The analysis of the direct (partial) effects of common spoken language on bilateral export flows of products requires detailed information on native and spoken languages as well as detailed product-level export data for a large sample of country pairs. In order to conduct the empirical analysis, we combine three data sets which have information on export values of 6-digit product-level categories by sources and destinations, bilateral information on common spoken native and acquired languages, and information on (gravity-equation-type) control variables.
3.1 Data description
3.1.1 Trade data
We use data on bilateral export flows at the disaggregated Harmonised System 6-digit level (HS6) in 2005.2 The source for these data is the BACI database of CEPII, which corrects for various inconsistencies in the original data (see Gaulier & Zignago, 2010). The database covers 224 countries and territories and 5,107 HS6 product categories in 2005. Of all the countries and territories, 29 (mostly tiny islands) will be excluded from the analysis due to missing data on the share of native and acquired spoken languages as well as on the covariates.
3.1.2 Data on common languages
Regarding common languages, we rely on the Language Dataset provided by Melitz and Toubal (2014).3 In constructing the data set, Melitz and Toubal (2014) required all languages to be spoken by at least 4% of the population in two different countries. The native and spoken languages are collected from the same source (i.e., the same year) wherever possible. When this could not be done, preference is given to closer dates. It is impossible to avoid that the data are collected over a range of years between 2001 and 2008. Based on this source, we obtain a cross-sectional data set, which covers 42 common native and spoken languages in a large sample of 195 countries. In this study, we focus particularly on the role of common spoken language on bilateral trade at the product level. As a spoken language is either acquired by non-native speakers or is innate in native speakers, we propose discerning the influence of common spoken native language from that of common spoken acquired language. We use the data on native and spoken language to construct two measures, Nij and Aij, which denote native and acquired language overlap between countries i and j around the year of interest.





3.1.3 Gravity control variables
We use a set of gravity controls in order to discern the impact of language commonality from other determinants of bilateral trade. Data on geographical and historical trade barriers are taken from the Centre d’Études Prospectives et d'Informations Internationales’ (CEPII's) database. The corresponding variables are log(Distanceij) (log of the geographical distance between the main economic centres of countries i and j; continuous), Contiguityij (land adjacency between countries i and j; binary), Colony(1945)ij (whether one country in i and j was the coloniser of the other one after 1945; binary), Common coloniserij (whether the two countries i and j had the same coloniser in common at some point in history; binary), Islandij (whether the two countries are islands, binary); Continentij (whether both countries belong to the same continent, binary); Landlockedij (whether both countries are landlocked, binary).
Moreover, we condition on sharing the same legal system by adding a binary indicator variable for common legal system, Legalij. We also add a fractional variable for common religion, Religionij, that reflects the probability that two randomly drawn people from two countries will share the same religion. Finally, we use the history of wars between countries since 1823 as a control variable, Warij. These variables are taken from and described in detail in Melitz and Toubal (2014).
The reason for why including these variables is important lies in the fact that they capture aspects of common cultural heritage, history and trust, which common spoken language is correlated with. Hence, in order to tease out the direct impact of language commonality on exports, it is important to hold these features constant (see Egger & Lassmann, 2015).
3.2 Facts
Table 1 reports on the average domestic and foreign spoken language homogeneity regarding native language, , and acquired language,
.
Variable | Obs. | Mean | SD |
---|---|---|---|
![]() |
195 | 0.435 | 0.390 |
![]() |
195 | 0.174 | 0.232 |
![]() |
37,830 | 0.031 | 0.138 |
![]() |
37,830 | 0.037 | 0.097 |
According to the table, not surprisingly, the average language homogeneity is much higher within countries than between them. This is particularly the case for native language but, to a lesser extent, also for acquired language. Figures A1 and A2 display and
, respectively, while Figures A3 and A4 display
and
, respectively, by way of maps. In these figures, darker colours refer to a higher value of the respective common language overlap concept. The colours are allotted such that each one represents a specific number of countries within a certain bracket of language commonality.4
According to Figure A1, is highest in South America, in Europe and in Russia. There is a much higher degree of native language plurality in other countries. Second, according to Figure A2,
is highest in northern and central Europe as well as in Western Africa. The Moran statistics in Figures A1 and A2 suggest a U-shaped pattern of spatial clustering across the levels of domestic country language proximity: countries with a very homogeneous domestic language portfolio tend to be close to each other (not necessarily speaking the same language, though) and ones which have a very dispersed domestic language portfolio also tend to be fairly strongly spatially clustered. This pattern reflects the consequences of colonialism.
Similar to and
in Figures A1 and A2, we display the spatial pattern of
and
in Figures A3 and A4, respectively. An inspection suggests that the latter are largely different from the ones of
and
. There are three fundamental reasons for this difference: first, some peoples are more likely to migrate; second, some countries are more open to immigration; and, third, some languages (such as English) are more likely to be taught and learned than others.
Figures A3 and A4 suggest that, across all concepts and
English-speaking, Scandinavian-, Spanish (especially, European) and German-speaking countries are most connected to the average (population-weighted) foreign economy by way of the considered common spoken language concepts. According to the figures, the spatial clustering of the foreign country language proximity concepts is much lower than that of domestic country language proximity. There is some evidence of positive spatial clustering only at higher levels of proximity (see the corresponding Moran statistics).
Table 2 reports correlation coefficients among Nij and Aij, which we will focus on in the subsequent analysis. As expected, there is a negative unconditional correlation between Nij and Aij by design, since the total number of potential speakers in a country is normalised to unity and fixed.
N ij | A ij | |
---|---|---|
N ij | 1.000 | |
A ij | −0.050 | 1.000 |
4 EMPIRICAL APPROACH
Our empirical approach is based on a two-step estimation procedure. In the first step, we estimate the impact of common native and acquired language on the extensive and intensive margins of bilateral exports for each product category using a gravity framework. We obtain one estimate each for common native and acquired language for each trade margin and product. In the second step, we describe the variation in export responses to native and acquired language commonality.
4.1 Estimating behavioural responses of product-level bilateral exports to native and acquired language commonality






This approach obtains 4,582 parameter-couple estimates for each margin m.
4.2 Describing the variation in export responses to native and acquired language commonality
In a second step, we describe the variation in across p in two ways: once by considering average differences across product classes in terms of the Rauch (1999) classification of products (distinguishing among differentiated, listed and homogeneous products);6 and once by associating the variability in price (and trade) elasticities as a continuous measure of product diversity and taste for variety as estimated by Kee, Nicita and Olarreaga (2008) and (Kee, Nicita, & Olarreaga, 2009),
.
For either approach, we need to take into account that are estimated with imprecision, which requires sampling techniques in order to obtain appropriate inference. When relating
to the trade elasticity
, we need to account for the fact that the latter is estimated with imprecision as well. We address these issues by bootstrap sampling, where
are drawn from the respective mp-specific mean and bivariate-normal variance–covariance matrix of the parameters (as these parameters are asymptotically normal), and the price elasticities of demand
from Kee et al. (2008, 2009) are independently sampled from a univariate normal with point-estimate mean and variance as reported by these authors.
5 RESULTS
5.1 Coefficients on common native versus acquired language in bilateral product-level exports regressions
In Figures 1 and 2, we present the set of estimated coefficients for each trade margin along the differentiated and homogeneous goods dimensions proposed by Rauch (1999).7 The Figures display the estimated coefficients that are significant at 1% level of significance.8


In Figure 1, we show the effects of acquired and native languages on the extensive margin of trade by product. The estimated coefficients are positive for the vast majority of product categories.9 The figure suggests that acquired language is more essential for conveying information and affects the fixed costs of exporting than native language is. This can be seen from the fact that a much bigger mass of coefficients takes on large values in the right panel (acquired language commonality) relative to the left one (native language commonality).
We do not find many significant parameters of common native language on the probability to export homogeneous products. While common native languages do not influence the extensive margin of exports of homogeneous goods, we find positive and significant coefficients of the common acquired language variable. It accords with the idea that the role of common spoken language on homogeneous exports is mostly driven by common acquired language, while owing little to personal affinities and trust (which should be related to cultural heritage as captured by common native language). In the case of differentiated products, both common acquired and native languages induce a statistically significant impact. This result confirms that linguistic influences become more important as we turn from trade of simple to more complex products.
The results concerning the impact of common native and acquired languages on the value of exports as reported in Figure 2 are less clear-cut. We find statistically significant coefficients of the native and the acquired language variables on very few product categories only. Therefore, the mass of coefficients displayed in Figure 2 is much smaller than the one in Figure 1. In the case of homogeneous goods, native and acquired language are mostly statistically insignificant. In the case of differentiated product categories, we find that acquired language commonality affects export volumes of many more product categories in a positive way than native language commonality does.
Overall, these results suggest larger effects of acquired language commonality on bilateral exports than native language commonality, in particular, for differentiated products. Common native and acquired languages affect both margins of trade. But, a larger number of differentiated products are affected by acquired language through the extensive margin of trade in comparison with the intensive one.
5.2 Explaining the variation in common native and acquired language regression coefficients across products
We may condense the information about the language regression coefficients displayed in Figures 1 and 2 by way of simple (heteroscedastic) regressions. In the subsequent analysis, we use 100 bootstrap samples of the data from which we estimate the coefficients as before and then regress them in each sample on explanatory variables. We conduct three types of such regressions based on extensive and intensive margin coefficients each. Two of them involve a binary explanatory variable each which indicates whether a product is differentiated in the conservative Rauch-classification sense. The control group includes both homogenous and listed products. In a way, the corresponding results indicate the average difference in the language coefficients between the differentiated and homogeneous product samples underlying Figures 1 and 2. In one of the regression, we restrict the list of products entering the control group to the ones of homogeneous products and drop the listed ones. In a third regression, we use the estimate of the trade elasticity of products—as a continuous measure of product diversification—as made available by Kee et al. (2008, 2009). With regard to the latter coefficients, it is important to note that they are negative and higher (smaller in absolute value) for more differentiated products. We conduct this analysis once for all estimated coefficients (see Table 3) and once for the restricted sample where we exclude statistically insignificant coefficients from that analysis (see Table A1). Each of the tables is organised in two vertical blocks, one for the extensive and one for the intensive export margin at the product level, and in four columns, two pertaining to the acquired language coefficient () and two to the native language coefficient (
). Within each vertical block and column, we report on two parameters from independent regressions, the one on the binary differentiated-goods indicator, and the one on the continuous trade elasticity from Kee et al. (2008, 2009). Concerning the extensive margin results, the pattern of coefficients in Tables 3 and 4 is similar, so that we can focus on the findings contained in Table 3. The results are more nuanced when we analyse the intensive margins.
![]() |
![]() |
|||
Panel A. Extensive margin | ||||
![]() |
0.001*** (0.000) |
0.001*** (0.000) |
||
Differentiated products (1/0) |
0.027*** (0.000) |
0.011*** (0.000) |
||
Panel B. Intensive margin | ||||
![]() |
−0.005 (0.006) |
0.009 (0.006) |
||
Differentiated products (1/0) |
0.532*** (0.084) |
0.006 (0.397) |
||
Obs. | 4,582 | 4,582 | 4,582 | 4,582 |
Notes
- Dependent variable: estimated common native and acquired language coefficients. Standard errors bootstrapped with 100 replications.
- ***, **, *, significant at 1%, 5%, 10%, respectively.
![]() |
![]() |
|||
Panel A. Extensive margin | ||||
![]() |
0.001*** (0.000) |
0.001*** (0.000) |
||
Differentiated products (1/0) |
0.023*** (0.003) |
0.010*** (0.002) |
||
Panel B. Intensive margin | ||||
![]() |
−0.010 (0.017) |
0.006 (0.005) |
||
Differentiated products (1/0) |
0.426* (0.229) |
0.528*** (0.094) |
||
Obs. | 3,154 | 3,154 | 3,154 | 3,154 |
- Dependent variable: estimated common native and acquired language coefficients. Standard errors bootstrapped with 100 replications.
- ***, **, *, significant at 1%, 5%, 10%, respectively.
The results in the first column of Table 3 suggest that differentiated goods have a higher coefficient on common acquired language at the extensive export margin than others. The same is true for native language, but to a lesser quantitative extent. On average, a higher trade elasticity for a product—that is, a larger degree of product differentiation—is associated with higher language coefficients at the extensive margin. This result is consistent with and confirms the insights gained from the binary variables. However, the responsiveness to acquired language commonality appears to react more sensitively to an increase in the degree of product differentiation than this is the case for native language commonality.
The table suggests that there is little to no role of product differentiation to play at the intensive margin of product-level exports. The results concerning the differentiated good categories in Table 4 are more nuanced when we drop the listed goods category and operate a sharper distinction between differentiated and homogenous goods. At the intensive export margin, the responsiveness to native language commonality reacts more sensitively to an increase in the degree of product differentiation than this is the case for acquired language commonality.
6 CONCLUSIONS
This paper provides an in-depth analysis of the responsiveness of product-level exports to acquired versus native language commonality. It gains insights from a large number of product-level regressions on the language variables apart from controls and relates the estimates as measures of partial effects of either type of language commonality on the propensity to export (an extensive margin) versus the log level of positive exports (an intensive margin).
The results suggest that language commonality—in particular, in terms of acquired and somewhat less so for native language—is very important at the extensive but not the intensive export margin. In fact, the difference in either partial effect of language commonality is positive and significantly different from zero between differentiated and other products, and the difference is bigger for acquired than for native language. We also find that the partial impact of greater product differentiation measured by the continuous trade elasticity is positive on either type of responsiveness to language commonality, but particularly so for common native language.
APPENDIX A
LIST OF FIGURES AND TABLE










![]() |
![]() |
|||
Panel A. Extensive margin | ||||
![]() |
0.001*** (0.000) |
0.001*** (0.000) |
||
Differentiated products (1/0) |
0.022*** (0.001) |
0.011*** (0.001) |
||
Obs. | 4,133 | 4,133 | 840 | 840 |
Panel B. Intensive margin | ||||
![]() |
−0.031 (0.020) |
0.450 (3.198) |
||
Differentiated products (1/0) |
1.703*** (0.392) |
−9.025 (53.004) |
||
802 | 802 | 134 | 134 |
- Dependent variable: estimated common native and acquired language coefficients. Standard errors bootstrapped with 100 replications.
- ***, **, *, significant at 1%, 5%, 10%, respectively.