Volume 18, Issue 4 pp. 430-453
RESEARCH ARTICLE
Open Access

Gender diversity and patent quality: Evidence from Chinese patent data

Zhijie Zhang

Zhijie Zhang

School of Public Economics and Administration, Shanghai University of Finance and Economics, Shanghai, China

Contribution: Software, Writing - original draft

Search for more papers by this author
Qingqing Zong

Corresponding Author

Qingqing Zong

School of Public Economics and Administration, Shanghai University of Finance and Economics, Shanghai, China

Correspondence Qingqing Zong, School of Public Economics and Administration, Room 518, Shanghai University of Finance and Economics, Shanghai 200433, China.

Email: [email protected]

Search for more papers by this author
First published: 07 June 2023

Abstract

Based on the patent data from China National Intellectual Property Administration, we attempt to examine the effect of gender diversity in inventor teams on patent quality. We argue that gender diversity in inventor teams can promote patent quality, especially the invention patents that are high quality and radically innovative. Moreover, we find that the positive effect will be enhanced in places where women are well educated and improved by reducing gender discrimination. We propose that the effect will be more significant when market competition or market uncertainty is high as well as when the enterprises are private and growing. Ultimately, our study advocates that governments and enterprises should pay more attention to female labor forces, especially in the area of science and innovation, which is beneficial to improve innovation in China and eliminate gender inequality in the labor market.

1 INTRODUCTION

Gender inequality in the labor market is a severe socioeconomic problem in China (Chen et al., 2013; Kuhn & Shen, 2013; Zhang et al., 2021). Women represent a minority of the population in the labor market, especially in the science and innovation area. Gender inequality in the science and innovation area results in huge waste and distortion of the allocation of female human capital, which is adverse to innovation. Under this context, this paper uses patent data to explore the impact of gender diversity within inventor teams on patent quality and the heterogeneity of regional factors, industries' characteristics, and enterprises' characteristics, which is helpful to improve innovation in China and eliminate gender inequality.

There are many reasons why women are underrepresented in science and innovation area. An important reason is the long-standing sex preference in East Asian countries. Innovation is knowledge-intensive, which requires the participants to be well-educated, while sex preference makes the families neglect the human capital investments in women (Becker & Lewis, 1973). Another reason is the stereotypes about women. It is generally believed that women are irrational and lack logical thinking ability compared with men, which means that it is difficult for them to meet the requirements of being a good scientist (Reuben et al., 2014). In addition, women traditionally take on more responsibilities at home (Giménez-Nadal et al., 2019), and they need to make more efforts to balance their family and career than men (Sullivan, 2019). Due to the above reasons, women are more likely to face the “glass ceiling” and may even change their career plans, resulting in distortion of the allocation of female human capital. The existence of gender discrimination will also make female scientists get insufficient attention and waste their talents, which is not conducive to gender synergies in innovation.

To increase the proportion of female scientists in the science and innovation area, the Chinese government has issued a series of policies to eliminate gender inequality in the labor market and support female scientists to give full play to their talents. In view of the important role of women in promoting innovation, recent studies have begun to explore the relationship between gender diversity in the workforce and innovation. But most studies focused on the effects of gender diversity among executives or employees on innovation (Attah-Boakye et al., 2020; Griffin et al., 2021), neglecting the role of gender diversity in R&D teams. Using the unique patent data in China from 2003 to 2013 and the Chinese industrial enterprises' data, we try to explore the impact of gender diversity within inventor teams on patent quality.

Gender diversity within inventor teams helps improve patent quality in two ways. On the one hand, gender diversity can lead to better team creativity, which arises from a diverse and comprehensive knowledge base and good team dynamics. Díaz-García et al. (2013) found that gender diversity within R&D teams leads to radical innovation rather than incremental innovation. Because of the different values and behavior patterns across gender, gender diversity within inventor teams can increase the ability of information acquisition and bring about new ideas to the group. Prat (2002) argued that teams with different members whose information sets are different can get more information, especially in highly uncertain activities like innovation. And teams with diverse backgrounds and perspectives will be more likely to generate new ideas for solving problems. The “value-in-diversity” perspective also suggested that gender diversity in R&D teams can increase innovation performance by offering various ideas (Østergaard et al., 2011; Sastre, 2015). In addition, gender diversity within inventor teams contributes to better social interactions and an open work climate to develop good team dynamics. Men tend to be assertive, and opinionated, which is prone to conflicts making team members fear to share their ideas, but women are more democratic and gifted at communication, which can effectively improve the work climate. Fenwick and Neal (2001) argued that the excellent performance of mixed groups is attributed to the combination of women's and men's work styles. In fact, it is not only diversity itself but also the interactions between members that make the group more productive and creative.

On the other hand, according to psychological research, some psychological and physiological changes will take place in both men and women when they face the opposite sex, which help to improve labor productivity and job satisfaction. Ronay and Hippel (2010) found that male skateboarders will take more risks and increase their chances of success when women are present. Fields and Blum (1997) and Nielsen and Madsen (2017) both found that employees will be more satisfied with their jobs and increase social connections with team members when they are in teams with different genders, which can reduce the probability of leaving.

Based on the above theoretical analysis, we empirically examine the effect of gender diversity within inventor teams on patent quality by using the Chinese patent data from 2003 to 2013. We find a significantly positive relationship between gender diversity within inventor teams and patent quality. The results hold consistent with a lot of robustness checks. Further, we conduct heterogeneity analyses from four dimensions: regional factors, industries' characteristics, enterprises' characteristics, and patent type. We find that the positive effect will be enhanced in places where women are well-educated and improved by reducing gender discrimination. We propose that this effect will be more obvious when market competition or market uncertainty is high as well as when the companies are private and growing. Our study is helpful to eliminate gender inequality in the labor market, especially in the high-intelligence groups, and improve innovation in China.

The rest of this paper is organized as follows: Section 2 is the literature review. Section 3 is the descriptive statistics of the data and the model setting. Section 4 gives the empirical results and their interpretations, and Section 5 is the conclusions and policy implications.

2 LITERATURE REVIEW

Research about gender inequality in the labor market, patent quality, and the effect of gender diversity on innovation is closely related to this study.

2.1 Gender inequality in labor market

Lots of studies have explored the origins of gender inequality in the labor market. Some studies focused on explaining the low female labor participation rate from the perspective of physical fitness. Alesina et al. (2013) found that women from cultures where the plow is used in agriculture participate less in the workforce because of the lack of strong bodies. Carranza (2014) found that the exogenous variety of soil texture can affect the female labor participation rate in agriculture because it is difficult for women to plow in clayey soil textures due to their lack of strong bodies.

Some studies have turned to explain gender inequality from gender norms. Bertrand et al. (2015) found that the relative income of the wife in the household concentrates on the left of 1/2 because of the existence of gender norms that wives should earn less than their husbands. In China, women also participate less in the workforce and spend more time doing housework because of the gender norms (Zhao et al., 2022). What is more, women traditionally carry on more work at home, including elderly care and care for children, which will further reduce the female labor participation rate (Giménez-Nadal et al., 2019; Sullivan, 2019).

Other studies have begun to explain gender inequality from behavioral pattern differences between men and women with the use of big data. For example, Wiswall and Zafar (2018) found that gender differences in preferences for the workplace explain at least a quarter of the gender wage gaps. Women prefer choosing jobs with greater work flexibility and job stability, while men prefer jobs with higher earnings growth. Cook et al. (2021) found that there are roughly 7% gender earning gaps among Uber drivers due to three factors: experiences, preferences, and constraints over where to work and live, and preferences for driving speed. Women prefer to work near home and drive slowly, which makes them get fewer earnings than men.

Gender inequality in the labor market results in serious adverse impacts on social and economic development. Cooke et al. (2019) used a comprehensive business registration reform in Portugal as a quasinatural experiment and found that discriminatory employers grow less over time after the reform and are more likely to exit the market. Ashenfelter and Hannan (1986) found the reverse relationship between the proportion of female employees and the market power of enterprises in the bank industry. Thus, eliminating gender inequality and making full use of women's talents are beneficial for social and economic development (Tsou & Yang, 2019). This paper explores the impact of gender diversity within the inventor teams on innovation based on the context of gender inequality in the labor market.

2.2 Patent quality

Patent quality has become an important indicator of innovation at the macro and micro levels. Lots of studies used the number of forward citations of patents to reflect patent quality. The number of forward citations of a patent can reflect the importance and progressiveness of the patent in an industry, which means that the patent is high-quality when the number of forward citations is plentiful. Some studies also used the knowledge width method to measure the patent quality based on the international patent classification numbers (Aghion et al., 2005; Akcigit et al., 2016). In addition, some studies used machine learning to extract the keywords of patents from the patents' summary and compare them with other patents' keywords to determine whether the patents are innovative (Kelly et al., 2021).

Lots of studies focus on the factors affecting patent quality. Most studies explored the factors from three aspects: enterprises' characteristics, regional factors, and policies. For example, Hsieh et al. (2022) found that firms with STEM directors apply for more invention patents and have more R&D expenditures, especially the companies focusing on innovation, because STEM directors may contribute more technical expertise to corporate strategic decisions on innovation activities. Furman et al. (2021) used the expansion of USPTO Patent and Trademark Depository Library to explore the effect of information disclosure through patents on subsequent innovation and they found that local patenting increases sharply after the patent library opens, which plays a role through the disclosure of the technical information in patent documents. Dang and Motohashi (2015) explored the impacts of patent subsidy programs on innovation and found that patent subsidy programs can increase patent applications and grants significantly, especially low-quality patents. In addition, there have been lots of studies exploring the impacts on innovation from the perspective of tax policy (Ivus et al., 2021), regional human capital (Kong et al., 2022), and population aging (Tan et al., 2022). However, there has been little study to explore the impacts of characteristics of R&D personnel who are the core elements of innovation.

2.3 Effect of gender diversity on innovation

Mostly related to our study are the studies about the effect of gender diversity on innovation. Most studies focused on the impacts of gender diversity in executives on innovation. Attah-Boakye et al. (2020) empirically examined the effect of gender diversity in boardrooms on innovation using data from 472 multinational enterprises and found that gender diversity in directors can promote innovation as female and male directors have different behavior patterns, which can bring different information for group decision-making. Griffin et al. (2021) found a positive relationship between enterprises' innovation capability and boardroom gender diversity by analyzing the data of enterprises' patents and boardrooms from 45 countries and regions. In addition, some studies focused on the impact of gender diversity among enterprises' employees on innovation. Horbach and Jacob (2018) found a positive relationship between gender diversity in enterprises' employees and innovation by using data from the German Employment Institute. Wang and Wei (2017) also found that complementary effects among employees of different genders can improve corporate labor productivity. However, few studies paid attention to the impact of gender diversity on R&D personnel who are the core participants in innovation (González-Moreno et al., 2018; Xie et al., 2020). We explore the effect of gender diversity in inventor teams on patent quality by using the patent data from China National Intellectual Property Administration.

Compared to most studies, the marginal contributions of this paper are as follows:

(1) We attempt to study the influencing factors of patent quality from the aspect of gender structure of inventor teams, which contributes some findings to related literature. Most studies that explore the factors affecting patent quality focus on enterprises' characteristics or regional factors, but we pay more attention to the impacts of characteristics of inventors who are the core participants in innovation.

(2) Only a few studies examined the effect of the gender structure of R&D personnel on innovation while most studies focused on the effect of gender diversity on enterprises' directors or employees (Xie et al., 2020). Furthermore, we make good use of patent data from the China National Intellectual Property Administration, the most detailed database on patents in China to systematically study the effect of gender diversity in inventor teams on patent quality, thus enriching relevant research on gender diversity.

(3) As the patent data does not offer information about the gender of inventors, we use machine learning to train a model that can predict gender based on people's names, which provides reference for future research.

(4) We analyze the heterogeneity from four dimensions: regional factors, industries' characteristics, enterprises' characteristics, and patent type, and find a lot of meaningful results.

3 DATA AND MODEL SETTING

3.1 Data source and processing

We use the patent data between 2003 and 2013 from China National Intellectual Property Administration to examine the effect of gender diversity on patent quality. The data includes inventors' names and other information, but does not include information about the number of forward citations and other important information. We acquire about the number of forward citations and others from the Pastnap Database, which offers detailed information on patents. The city-level data comes from the China City Statistic Yearbook and the enterprise-level data comes from the China Industrial Enterprises Database.

At the same time, referring to Kou and Liu (2020), we match the patent data with enterprise-level data by the information about enterprises' names and patent owners and further match with city-level data. We remove the samples in 2010 due to the lack of enterprises' names in the China Industrial Enterprises Database. Meanwhile, due to the significant differences between utility model patents, invention patents, and design patents, we only use the samples of utility model patents and invention patents. We further drop the observations with missing information on control variables.

3.2 Variable definition

3.2.1 Dependent variable: Patent quality (Patent_quality)

Most studies used the number of forward citations of patents as the proxy variable of patent quality (Ferrucci, 2020; Zhang et al., 2020). The more the number of forward citations a patent has, the more important the patent is in its own field. Thus, the number of forward citations can reflect the patent quality. Some studies used the knowledge width method to measure the patent quality based on the international patent classification numbers (Aghion et al., 2005; Akcigit et al., 2016). This paper uses the patent quality calculated based on the knowledge width method for the robustness check.

3.2.2 Independent variable: Gender diversity in inventor teams (Diversity)

Identifying the gender of each inventor is the key to measuring gender diversity in inventor teams. The patent data does not contain information about the gender of each inventor, so we use machine learning to predict the gender of each inventor based on their names, which can reflect gender to some extent. The World Intellectual Property Organization produced a “Global Gender-Name Dictionary” that can predict people's gender by their names. Jensen et al. (2018) and Koning et al. (2021) both used the “dictionary method” to identify the corresponding inventors' gender. However, such studies are based on foreign data, which are inappropriate for Chinese names. Hence, we will use more cutting-edge methods, and machine learning to identify the gender of each inventor.

Specifically, we first train a model that can predict a gender by name based on the data from the 1% China Population Sample Survey in 2005, which contains over 2.5 million samples including respondents' name and gender. The training process is as follows: (1) As only the first names can reflect the gender to some extent, we extract the first names and gender from over 2.5 million samples and divide each first name into words. Then, we remove the recurring and 3453 words are left. (2) We define a 3453-dimensional vector for each word. For example, the first word is expressed by a vector whose first element is one and the remaining elements are 0. Other words are expressed in the same way. (3) We express each name in our training samples in 3453-dimensional vectors based on the encoded words. For example, if a name contains the first and second words, then the name can be expressed in the sum of the two vectors of the two words. (4) We use the Bayesian algorithm to train our model based on the encoded names and genders.

After obtaining the model, we apply the patent data to identify the genders according to the inventors' first names. In addition, we also use Python's third-party module Ngender to identify inventors' gender, and the gender predicted by this method is used for the robustness check. Gender diversity in inventor teams is calculated according to Blau (1977), which is defined as follows:
urn:x-wiley:28313224:media:ise354:ise354-math-0001()

where urn:x-wiley:28313224:media:ise354:ise354-math-0002 represents the square of the proportion of female members in the inventor teams, and urn:x-wiley:28313224:media:ise354:ise354-math-0003 is the square of the proportion of male members in the inventor teams. The logic of the construction of this index is similar to Herfindahl–Hirschman Index (HHI), which is widely used in industrial economics. This index is a continuous variable varying from 0 to 0.5, which means that the members of the team are the same gender when the index is 0, and the proportion of female members is equal to the proportion of male members when the index is 0.5. Although male inventors are dominant in most samples and the index Diversity is positively related to the proportion of female inventors in the team, we attempt to study whether increasing the gender diversity in inventor teams can improve patent quality, rather than whether increasing the proportion of female inventors can improve patent quality.

3.2.3 Control variables

Patent quality is also related to many patent characteristics, enterprise characteristics, and regional factors, so we need to further control these variables. Referring to the existing studies (Ebersberger et al., 2023; Ferrucci & Lissoni, 2019), we select control variables from the patent level, enterprise level, and city level. As for patent characteristics, we add the number of inventors in the team (Total_num), the logarithm of the number of patents in the same family (Family_size), the logarithm of the number of patent backward citations (Patent_citation), the logarithm of the number of literature citations (Literature_citation), and whether it is an invention patent (Patent_type) into our empirical model. As for enterprise characteristics, we further control the nature of enterprise property rights (State), the logarithm of enterprise assets (Asset), the logarithm of per-capita sales income (Labor_productivity) and enterprise age (Age). We control the logarithm of per-capita gross domestic product (Pgdp), the proportion of output value of secondary industry (Second_industry), the logarithm of the number of students per 10,000 people (Human_capital), and the proportion of actually used foreign capital in regional GDP (Fdishare).

Table 1 presents the descriptive statistics of the main variables. The total number of observations is 813,816, of which the invention patents accounted for 59.1%. The mean of patent quality is 4.302 and the standard deviation is 5.756, which shows that the quality of different patents is quite different. The mean of gender diversity is 0.131 and the standard deviation is 0.198. Figure 1 further reports the distribution of gender diversity, which means that the gender composition of the Chinese inventor teams is relatively monotonous. According to the data used in this paper, the main reason is the low proportion of female inventors. The average number of patents in the same family is 1.645, and the mean of the number of backward citations is 1.933, while the average number of literature citations is 0.141, indicating that the main knowledge sources of the patents are existing patents. The average size of the inventor teams in our samples is 2.847, showing that cooperation is very common in innovation. As for the enterprise level, patents owned by state-owned enterprises account for 23.20%, and the average age of the enterprises is 13.313 years.

Table 1. Descriptive statistics.
Variable Observation Mean Standard deviation Min Max
Patent_quality 813,816 4.302 5.756 0.000 31.000
Diversity 813,816 0.131 0.198 0.000 0.500
Panel A: Patent level
Family_size 813,816 1.645 1.446 1.000 11.000
Literature_citation 813,816 0.141 0.515 0.000 3.000
Patent_citation 813,816 1.933 2.780 0.000 11.000
Total_num 813,816 2.847 2.110 1.000 11.000
Patent_type 813,816 0.591 0.492 0.000 1.000
Panel B: Enterprise level
State 813,816 0.232 0.422 0.000 1.000
Asset 813,816 20.346 2.644 15.794 25.984
Labor_ productivity 813,816 13.548 1.465 11.007 20.246
Age 813,816 13.313 11.547 1.000 64.000
Panel C: Regional level
Pgdp 813,816 11.326 0.909 9.042 13.054
Second_industry 813,816 0.496 0.090 0.223 0.674
Human_capital 813,816 5.538 0.818 2.980 7.075
Fdishare 813,816 0.039 0.021 0.002 0.117
  • Abbreviation: Pgdp, per-capita gross domestic product.
Details are in the caption following the image
Distribution of variable Diversity.

3.3 Model setting and estimation method

To quantify the impact of gender diversity on patent quality, we construct the following model:
urn:x-wiley:28313224:media:ise354:ise354-math-0004()

Among them, i represents the patent, t represents the year, j represents the enterprise, q represents the patent category, h represents the industry, and c represents the city. Patent_quality is patent quality, which is represented by the number of forward citations. Diversity is gender diversity in inventor teams and X is a set of control variables. urn:x-wiley:28313224:media:ise354:ise354-math-0005 is the year fixed effect, urn:x-wiley:28313224:media:ise354:ise354-math-0006 is the patent category fixed effect, urn:x-wiley:28313224:media:ise354:ise354-math-0007 is the industry fixed effect, urn:x-wiley:28313224:media:ise354:ise354-math-0008 is the city fixed effect, and urn:x-wiley:28313224:media:ise354:ise354-math-0009 is the random error.

The number of forward citations is a positive integer and the estimated results will be biased if we use the ordinary least-square method (OLS), so we will use Poisson regression or negative binomial regression. The difference between Poisson regression and negative binomial regression is whether there are significant differences between the mean of the patent quality and the variance of the patent quality, namely “overdispersion.” If there is no significant difference, we will use Poisson regression and the converse is negative binomial regression. According to the descriptive statistics in Table 1, the mean of patent quality is 4.302 and the standard deviation is 5.756, which means that we should use negative binomial regression.

4 EMPIRICAL RESULTS

4.1 Benchmark regression results

The benchmark results are shown in Table 2. Column (1) is the result that only includes gender diversity in inventor teams. The coefficient of gender diversity is positive, indicating improving gender diversity within inventor teams can promote patent quality. Column (2) is the regression result of further controlling year fixed effect, city fixed effect, industry fixed effect, and patent category fixed effect, and the coefficient of gender diversity is still significantly positive. Columns (3), (4), and (5) further include control variables of patents' characteristics, enterprises' characteristics, and cities' factors, respectively. The regression results show that the improvement of gender diversity in inventor teams can promote patent quality, which is consistent with the above theoretical analysis. Because the results in Table 2 are based on negative binomial regression, the coefficients cannot reflect the marginal effect directly. By calculating the marginal effect at the average of gender diversity based on the result in column (5), we further find that increasing one unit of gender diversity in inventor teams can increase 0.26 forward citations, which means that each unit increase of gender diversity can improve patent quality by about 6.04%. The effect of gender diversity on patent quality is very significant. As for the control variables, we find that the number of patents in the same family is positively related to the patent quality, which is consistent with Ferrucci and Lissoni (2019). If the enterprises spend more money to submit patent applications in many countries to gain priorities, the patents must get more citations and be of good quality. The number of literature citations and patent citations also have a positive relationship with patent quality, which means that the existing literature and patents are important knowledge sources for the patents. Besides, the larger the team size, the higher the quality of the patent. At the same time, the quality of invention patents is higher than that of utility model patents. As for the enterprise level, we can find that the patents belonging to bigger or state-owned enterprises are more high quality. What is more, as for regional factors, we can find that the improvement of regional human capital can improve the patent quality, while the higher proportion of secondary industry outputs may inhibit the patent quality.

Table 2. Negative binomial regression on patent quality.
Independent variable Dependent variable: Number of patent forward citations
(1) (2) (3) (4) (5)
Diversity 0.414*** (0.039) 0.333*** (0.020) 0.077*** (0.014) 0.067*** (0.014) 0.067*** (0.014)
Family_size 0.254*** (0.037) 0.241*** (0.036) 0.240*** (0.036)
Literature_citation 0.170*** (0.015) 0.167*** (0.015) 0.168*** (0.015)
Patent_citation 0.138*** (0.009) 0.134*** (0.010) 0.133*** (0.010)
Total_num 0.039*** (0.002) 0.035*** (0.002) 0.035*** (0.002)
Patent_type 0.488*** (0.028) 0.482*** (0.028) 0.483*** (0.028)
State 0.062*** (0.020) 0.061*** (0.021)
Asset 0.019*** (0.004) 0.019*** (0.004)
Labor_ productivity 0.008 (0.006) 0.008 (0.006)
Age −0.013 (0.008) −0.013 (0.008)
Pgdp −0.091 (0.082)
Second_industry −0.399** (0.168)
Human_capital 0.081*** (0.027)
Fdishare 0.912 (0.632)
Year fixed effect N Y Y Y Y
City fixed effect N Y Y Y Y
Industry fixed effect N Y Y Y Y
Patent category fixed effect N Y Y Y Y
Observations 813,816 813,816 813,816 813,816 813,816
urn:x-wiley:28313224:media:ise354:ise354-math-0010 0.001 0.020 0.036 0.037 0.037
  • Note: The coefficient within the parenthesis is robust standard errors clustered at the company level.
  • Abbreviations: N, no; Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

4.2 Robustness checks

We further perform a series of robustness checks on the following aspects.

(1) Replacing dependent variable: The patent quality is measured by the number of forward citations above. Next, we will refer to Aghion et al. (2005) and Akcigit et al. (2016) to measure the patent quality by the knowledge width method, which can reflect the knowledge the patents contain. Specifically, each patent has its international patent classification numbers, and the number of classification numbers can reflect the patents' complexity to some extent. Thus, we construct the index of patent quality referring to the HHI based on the international patent classification numbers. The regression result is in column (1) of Table 3. Meanwhile, to avoid the problems caused by different time spans, we use the number of forward citations received within 5 and 3 years after registration to measure patent quality. The regression results are in columns (2) and (3). We can find that there is a significantly positive correlation between gender diversity and patent quality, which is consistent with the above results.

Table 3. Robustness checks.
Independent variable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Diversity 0.146*** (0.034) 0.071*** (0.015) 0.081*** (0.017) 0.067*** (0.014) 0.062*** (0.018) 0.060*** (0.015) 0.358*** (0.065) 0.066*** (0.014) 0.060*** (0.014) 0.037*** (0.013)
Diversity2 −0.039*** (0.008)
Female_percent 0.031*** (0.011)
Family_size 0.103** (0.053) 0.223*** (0.026) 0.240*** (0.022) 0.240*** (0.036) 0.240*** (0.036) 0.196*** (0.041) 0.053 (0.047) 0.238*** (0.038) 1.178*** (0.152) 0.236*** (0.034) 0.240*** (0.036) 0.222*** (0.032)
Literature_citation 0.130*** (0.021) 0.168*** (0.013) 0.151*** (0.013) 0.168*** (0.015) 0.168*** (0.036) 0.189*** (0.016) 0.206*** (0.015) 0.162*** (0.016) 0.645*** (0.072) 0.163*** (0.015) 0.168*** (0.015) 0.151*** (0.013)
Patent_citation 0.022** (0.011) 0.154*** (0.012) 0.138*** (0.013) 0.133*** (0.010) 0.134*** (0.010) 0.088*** (0.013) 0.047*** (0.014) 0.136*** (0.010) 0.140*** (0.051) 0.134*** (0.010) 0.133*** (0.010) 0.112*** (0.009)
Total_num 0.021*** (0.004) 0.035*** (0.002) 0.035*** (0.002) 0.036*** (0.002) 0.037*** (0.002) 0.031*** (0.003) 0.030*** (0.003) 0.036*** (0.002) 0.145*** (0.011) 0.034*** (0.002) 0.035*** (0.002) 0.027*** (0.002)
Patent_type 0.620*** (0.029) 0.575*** (0.028) 0.588*** (0.028) 0.483*** (0.028) 0.483*** (0.028) 0.487*** (0.023) 0.497*** (0.019) 0.470*** (0.029) 2.130*** (0.136) 0.482*** (0.027) 0.483*** (0.028) 0.529*** (0.024)
State 0.003 (0.048) 0.015 (0.023) −0.002 (0.021) 0.061*** (0.021) 0.061*** (0.021) 0.038** (0.018) 0.045** (0.018) 0.050** (0.021) 0.299*** (0.089) 0.058*** (0.020) 0.061*** (0.021) 0.067*** (0.018)
Asset −0.046*** (0.009) 0.025*** (0.009) 0.031*** (0.005) 0.019*** (0.004) 0.019*** (0.004) 0.025*** (0.004) 0.031*** (0.005) 0.024*** (0.004) 0.080*** (0.021) 0.019*** (0.004) 0.019*** (0.004) 0.010*** (0.004)
Labor_productivity 0.001 (0.013) 0.013** (0.006) 0.011 (0.008) 0.008 (0.006) 0.008 (0.006) 0.011** (0.005) 0.009* (0.005) 0.005 (0.006) 0.026 (0.030) 0.008 (0.006) 0.008 (0.006) 0.012** (0.005)
Age −0.048*** (0.017) −0.022** (0.009) −0.030*** (0.009) −0.013 (0.008) −0.013 (0.008) −0.011 (0.008) −0.011 (0.009) −0.020** (0.008) −0.071* (0.042) −0.013 (0.008) −0.013 (0.008) −0.014* (0.007)
Pgdp −0.039 (0.139) 0.083 (0.128) 0.314* (0.173) −0.091 (0.082) −0.091 (0.081) 0.063 (0.042) 0.060* (0.036) −0.102 (0.085) 0.229 (0.446) −0.080 (0.052) −0.090 (0.082) −0.025 (0.068)
Second_industry 0.072 (0.355) 0.072 (0.217) −0.063 (0.207) −0.399** (0.168) −0.399** (0.168) −0.715*** (0.179) −0.533*** (0.175) −0.434** (0.176) −2.272*** (0.830) −0.396** (0.157) −0.399** (0.168) −0.411*** (0.145)
Human_capital 0.075 (0.062) 0.140*** (0.032) 0.150*** (0.036) 0.081*** (0.027) 0.081*** (0.027) 0.128*** (0.030) 0.103*** (0.029) 0.112*** (0.034) 0.403*** (0.151) 0.077*** (0.026) 0.081*** (0.027) 0.033 (0.023)
Fdishare 1.509 (1.169) 2.165** (0.876) 2.083** (0.827) 0.913 (0.632) 0.914 (0.633) 0.601 (0.567) −0.298 (0.546) 1.079* (0.653) 3.470 (2.701) 0.869 (0.621) 0.913 (0.632) 0.431 (0.508)
Year fixed effect Y Y Y Y Y Y Y Y Y Y Y Y
City fixed effect Y Y Y Y Y Y Y Y Y Y Y Y
Industry fixed effect Y Y Y Y Y Y Y Y Y Y Y Y
Patent category fixed effect Y Y Y Y Y Y Y Y Y Y Y Y
Observations 813,816 813,816 813,816 813,816 813,816 1,034,993 1,255,264 768,826 813,816 813,816 813,816 527,451
urn:x-wiley:28313224:media:ise354:ise354-math-0011 0.044 0.043 0.048 0.037 0.037 0.049 0.056 0.037 0.156 0.037 0.035
  • Note: Column (1) in Table 3 is the result of replacing the dependent variable with the knowledge width method. Columns (2) and (3) are the result of replacing the dependent variable with the number of forward citations within 5 and 3 years after registration. Column (4) is the result of replacing the independent variables with Diversity2, and column (5) is the result of replacing the independent variables with the proportion of female inventors in the team. Column (6) is the result of increasing samples belonging to listed manufacturing companies from 2014 to 2019. Column (7) is the result of increasing samples belonging to listed companies from 2014 to 2019. Column (8) is the result of excluding the fuzzy-matched samples. Column (9) is the result of OLS, and column (10) is the result of zero-inflated negative binomial regression. Column (11) is the result of replacing the gender prediction method. Column (12) is the result of excluding the samples whose inventors are only one.
  • Abbreviations: OLS, ordinary least-square method; Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

(2) Replacing independent variable: We measure the gender diversity in inventor teams (Diversity2) by the absolute value of the proportion gap between male and female members in the teams. This index ranges from 0 to 1. The larger the index is, the more monotonous the gender composition of the team is. The regression result is in column (4) of Table 3, which confirms the above conclusions again. Although our paper focuses on gender diversity in inventor teams and not the proportion of female inventors, gender diversity in inventor teams is positively related to the proportion of female inventors. Thus, we carry out another regression whose independent variable is the proportion of female inventors as a robustness check. The regression result is in column (5).

(3) Increasing samples: In view of the availability of the China Industrial Enterprises Database, we only use patent data from 2003 to 2013 in benchmark regression. To ensure the timeliness of our conclusions, we match the data of listed companies with the China Industrial Enterprises Database based on their names and include patent data of listed manufacturing companies from 2014 to 2019 into regression. The regression result is in column (6). Meanwhile, we also include all patent data of listed companies from 2014 to 2019 into regression. The regression result is in column (7). The regression results in columns (6) and (7) show that increasing the gender diversity in inventor teams can improve patent quality, which is consistent with our conclusions.

(4) Excluding the fuzzy-matched samples: Referring to Kou and Liu (2020), we, respectively, match the patent data with enterprise-level data by the enterprises' full names and the keywords of the enterprises' names. Thus, to avoid the impact of fuzzy matching, we exclude the samples matched by the keywords. The regression result is in column (8), and the positive relationship still remains.

(5) Replacing the estimation methods: We use OLS to estimate the effect and the regression result is in column (9). What is more, because the number of forward citations is zero in many observations, we use zero-inflated negative binomial regression to estimate the effect. The result is in column (10). According to columns (9) and (10), we can find the same positive correlation between gender diversity in inventor teams and patent quality.

(6) Replacing gender prediction method: We use the data from the 1% China Population Sample Survey in 2005 to train our model which can predict gender based on people's first names. To avoid the under-representation of the data, we use Python's third-party module Ngender to predict the inventors' gender. The Ngender is based on more than 20 million hotel check-in information and also uses the Bayesian algorithm to train the model. The result is in column (11) of Table 3.

(7) Excluding the samples whose inventors are only one: Because some patents only include one inventor, such samples are excluded. The result is in column (12). The positive relationship between gender diversity within inventor teams and patent quality still remains.

4.3 Endogeneity test

Considering that the higher the patent quality is, the gender diversity in inventor teams may be higher in some fields, especially in chemistry; this may cause potential two-way causal problems. In addition, we have controlled various control variables from three levels and fixed effects from four aspects; however, some variables influencing dependent and independent variables still will be missing, especially some characteristics of patents. Gender diversity is calculated based on the predicted gender by machine learning, and there will be some measurement errors in gender diversity. In view of these problems, we use instrument variables to alleviate the potential endogeneity. We use the gender diversity in each national economy industry on two-digit standard industrial classification as the instrumental variable for the gender diversity in inventor teams. The data come from China Labor Statistical Yearbook, and then we also refer to Blau (1977) to construct the index. At the same time, because the number of forward citations is a positive integer, we refer to Wooldridge (2015), using the control function method to regress. We first regress the instrument variable and other exogenous control variables to the endogenous independent variable and then add the residual obtained in the first stage into the second regression from which we can judge whether there is an endogenous problem according to the significance level of the residual in the second regression. The regression result is shown in Table 4. We can find that the coefficient of the gender diversity in each national economy industry is significantly positive and the F-statistic is more than 10, which means the instrument variable we choose is highly correlated with our independent variable. The coefficient of the residual in the second regression is significantly negative, which means there is indeed endogeneity in our model. According to the regression results in Table 4, we can also find that gender diversity in inventor teams has a positive correlation with the number of forward citations, which is consistent with our above results.

Table 4. Estimation result of two-stage least squares method.
Independent variable (1) (2)
Diversity Number of forward citations
First stage Second stage
Diversity 10.527*** (2.157)
Gender diversity in each industry 0.161*** (0.049)
Family_size 0.004** (0.002) 0.199*** (0.035)
Literature_citation 0.013*** (0.001) 0.031 (0.029)
Patent_citation 0.002* (0.001) 0.115*** (0.010)
Total_num 0.033*** (0.001) −0.305*** (0.070)
Patent_type −0.003 (0.002) 0.512*** (0.026)
State 0.009*** (0.003) −0.038 (0.027)
Asset 0.003*** (0.000) −0.011 (0.008)
Labor_ productivity −0.001** (0.001) 0.023*** (0.006)
Age 0.002* (0.001) −0.038*** (0.010)
Pgdp 0.004 (0.011) −0.109 (0.083)
Second_industry −0.014 (0.025) −0.194 (0.165)
Human_capital −0.006* (0.003) 0.142*** (0.026)
Fdishare 0.077 (0.072) 0.210 (0.599)
Residual from the first regression −10.462*** (2.154)
Year fixed effect Y Y
City fixed effect Y Y
Industry fixed effect Y Y
Patent category fixed effect Y Y
Observations 813,816 813,816
urn:x-wiley:28313224:media:ise354:ise354-math-0012 0.163 0.037
  • Abbreviations: Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

4.4 Heterogeneity analysis

We further explore the heterogeneity impacts of gender diversity in inventor teams on patent quality from four perspectives: regional factors, industries' characteristics, enterprises' characteristics, and patent type.

4.4.1 Heterogeneity of regional factors

(1) Heterogeneity of gender discrimination (Discrimination): Approximately 30% of patents in our observations include female members, and there are only 5% of samples with female inventors dominated, which means that female inventors are at a disadvantage. The effect of gender diversity on patent quality is working through the complementarity of female inventors and male inventors. Only when female inventors are fully valued, the synergistic effect of male inventors and female inventors can be exerted. Based on the above analysis, we believe that the impact of gender diversity on innovation should vary with the degree of regional gender discrimination. The effect will be suppressed where gender discrimination is more severe and female inventors are neglected awfully.

We use the answer to “men are naturally stronger than women” in the Chinese General Social Survey averaged at the province to reflect the degree of regional gender discrimination. The greater the index is, the more severer the bias against women is. The regression result is in column (1) of Table 5. The coefficient of the interaction between gender discrimination and gender diversity is significantly negative, which shows that the effect of gender diversity on patent quality will be suppressed by gender discrimination.

Table 5. Heterogeneity of regional factors.
Independent variable Gender discrimination Female education
(1) (2)
Diversity 0.526** (0.209) 0.038** (0.019)
Family_size 0.278*** (0.031) 0.238*** (0.036)
Literature_citation 0.200*** (0.015) 0.167*** (0.015)
Patent_citation 0.232*** (0.008) 0.132*** (0.010)
Total_num 0.030*** (0.003) 0.035*** (0.002)
Patent_type 0.209*** (0.019) 0.488*** (0.028)
State 0.058** (0.028) 0.061*** (0.020)
Asset 0.019*** (0.005) 0.019*** (0.004)
Labor_ productivity 0.016** (0.007) 0.007 (0.006)
Age −0.015 (0.010) −0.013 (0.085)
Pgdp 0.253 (0.2761) 0.002 (0.085)
Second_industry 0.280 (0.967) −0.453*** (0.170)
Human_capital 0.119* (0.066) 0.076*** (0.028)
Fdishare −1.864 (1.733) 0.753 (0.592)
Discrimination −0.089** (0.034)
Discrimination × Diversity −0.153** (0.069)
Education −0.100*** (0.017)
Education × Diversity 0.062** (0.030)
Year fixed effect Y Y
City fixed effect Y Y
Industry fixed effect Y Y
Patent category fixed effect Y Y
Observations 353,135 813,816
urn:x-wiley:28313224:media:ise354:ise354-math-0013 0.038 0.037
  • Abbreviations: Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

(2) Heterogeneity of female education (Education): A key reason for the low proportion of women in the inventor teams is the discrimination against women's scientific research ability (Brammer Charlotte, 2018). The discrimination against women will be reduced with the improvement of women's education, and female inventors can fully make use of their abilities. We believe that the effect of gender diversity on patent quality is more significant where women are well-educated.

Referring to Chen and Li (2021), we use the proportion of women with college degrees or above in the total population as a proxy for women's education level based on the data from China Statistical Yearbook. We divide all observations into two groups according to the median of women's education, and the group with higher education is assigned one, and the group with lower education is assigned zero. Further, we add the interaction between women's education and gender diversity in inventor teams into the empirical model to analyze the heterogeneity of women's education. The regression result is in column (2) of Table 5. The coefficient of the interaction is significantly positive, which shows that the effect of gender diversity on innovation will be amplified in places where women are well-educated.

4.4.2 Heterogeneity of industries' characteristics

(1) Heterogeneity of market uncertainty (Market_uncertain): Referring to Xie et al. (2020) and Bergh and Lawless (1998), we use the sales revenue of each industry, which comes from the China Industry Statistical Yearbook, to construct a proxy for market uncertainty. Specifically, we regress industry sales against time and divide the standard errors of the regression slope coefficients by the average of industry sales. According to the above, we divide all samples into two groups by the median and set the dummy variable. Again, we add the interaction between market uncertainty and gender diversity into our empirical model to analyze the heterogeneity of market uncertainty.

The regression result is in column (1) of Table 6, which shows that the effect of gender diversity on patent quality is more significant in industries with high market uncertainty. The enterprises have to face more risks and need to update their technology quickly in industries with higher uncertainty. Female and male inventors have different information sets, which can enhance the teams' information availability and help the enterprises to bring about quick technology updates.

Table 6. Heterogeneity of industries' characteristics.
Independent variable Market uncertainty Market competition
(1) (2)
Diversity 0.044** (0.018) 0.101*** (0.022)
Family_size 0.232*** (0.036) 0.240*** (0.036)
Literature_citation 0.162*** (0.0150) 0.168*** (0.015)
Patent_citation 0.127*** (0.010) 0.133*** (0.010)
Total_num 0.035*** (0.002) 0.035*** (0.002)
Patent_type 0.505*** (0.026) 0.483*** (0.028)
State 0.051** (0.021) 0.061*** (0.021)
Asset 0.018*** (0.004) 0.019*** (0.004)
Labor_ productivity 0.011* (0.006) 0.008 (0.006)
Age −0.014 (0.009) −0.013 (0.008)
Pgdp −0.138 (0.086) −0.092 (0.082)
Second_industry −0.298 (0.215) −0.399** (0.167)
Human_capital 0.109*** (0.029) 0.080*** (0.027)
Fdishare 0.280 (0.473) 0.910 (0.632)
Market_uncertain 0.002 (0.013)
Market_uncertain × Diversity 0.047* (0.027)
HHI 0.002 (0.015)
HHI × Diversity −0.064** (0.031)
Year fixed effect Y Y
City fixed effect Y Y
Industry fixed effect Y Y
Patent category fixed effect Y Y
Observations 737,451 813,816
urn:x-wiley:28313224:media:ise354:ise354-math-0014 0.037 0.037
  • Abbreviations: HHI, Herfindahl–Hirschman Index; Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

(2) Heterogeneity of market competition (HHI): We further explore the heterogeneity of market competition. We refer to most studies, using the HHI as a proxy for market competition. Using the data from China Industrial Enterprises Database, we calculate the HHI based on the enterprises' assets; then, we divide all samples into two groups by the median of HHI and set a dummy variable in the same way as before. We add the interaction between market competition and gender diversity into our empirical model to analyze the heterogeneity. The result is in column (2) of Table 6, which shows that the effect of gender diversity on patent quality is more significant in industries with acute competition. Enterprises need to update their technology quickly and improve their innovation ability to build their competitive advantages. They can get benefits from teams with members across gender which are sensitive to market information.

4.4.3 Heterogeneity of enterprises' characteristics

We further explore the heterogeneity of enterprises' characteristics. We divide all samples into two groups by the nature of enterprises' property rights and perform regression, respectively. We also divide all samples into two groups by the median of the enterprises' age and perform regression, respectively. The results are in Table 7.

Table 7. Heterogeneity of enterprises' characteristics.
Nature of property right Enterprises' age
State-owned enterprises Private enterprises Mature enterprises Growing enterprises
Independent variable (1) (2) (3) (4)
Diversity 0.025 (0.021) 0.071*** (0.017) 0.034** (0.016) 0.113*** (0.020)
Family_size 0.205*** (0.034) 0.243*** (0.047) 0.194*** (0.043) 0.319*** (0.014)
Literature_citation 0.131*** (0.019) 0.170*** (0.018) 0.147*** (0.020) 0.208*** (0.012)
Patent_citation 0.073*** (0.021) 0.146*** (0.010) 0.116*** (0.014) 0.151*** (0.009)
Total_num 0.029*** (0.003) 0.039*** (0.002) 0.036*** (0.002) 0.035*** (0.003)
Patent_type 0.633*** (0.034) 0.443*** (0.031) 0.500*** (0.036) 0.453*** (0.020)
Asset 0.010* (0.006) 0.022*** (0.005) 0.022*** (0.005) 0.011** (0.005)
Labor_ productivity 0.014*** (0.005) 0.011 (0.007) 0.004 (0.006) 0.019*** (0.007)
Age −0.004 (0.009) −0.018* (0.010) 0.011 (0.017) −0.044*** (0.014)
Pgdp 0.033 (0.081) −0.145 (0.104) −0.065 (0.094) −0.136* (0.083)
Second_industry −0.536** (0.231) −0.350* (0.212) −0.374 (0.270) −0.393* (0.203)
Human_capital −0.032 (0.034) 0.106*** (0.036) 0.101** (0.041) 0.058* (0.030)
Fdishare −0.790 (0.551) 1.439** (0.729) 0.068 (0.721) 1.659** (0.721)
State 0.043* (0.022) 0.092*** (0.025)
Year fixed effect Y Y Y Y
City fixed effect Y Y Y Y
Industry fixed effect Y Y Y Y
Patent category fixed effect Y Y Y Y
Observations 188,606 625,210 457,145 356,672
urn:x-wiley:28313224:media:ise354:ise354-math-0015 0.035 0.037 0.037 0.036
  • Abbreviations: Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

According to columns (1) and (2) of Table 7, the effect of gender diversity is more significant in private enterprises. A possible reason is that private enterprises have great innovation incentives as they face more risk and uncertainty. The teams with members across gender can make full use of their advantages to help promote the innovation ability of private enterprises. At the same time, according to columns (3) and (4) of Table 7, the effect of gender diversity on patent quality is more significant in growing enterprises as well, as the growing enterprises have strong innovation incentives and they can get benefits from teams with members across gender.

4.4.4 Heterogeneity of patent types

Díaz-García et al. (2013) found that gender diversity within R&D teams leads to radical innovation rather than incremental innovation because gender diversity within R&D teams can foster team creativity and generate novel ideas. Comparing with utility model patents, invention patents are more radically innovative (Li & Zheng, 2016), which require more team creativity. We divide all samples into invention patents and utility model patents, and perform regression, respectively. The results are in Table 8, which show that the effect of gender diversity on patent quality is more obvious in invention patents, which are consistent with Díaz-García et al. (2013) and confirm that gender diversity can improve patent quality by promoting team creativity to a degree.

Table 8. Heterogeneity of patent type.
Invention patents Utility model patents
(1) (2)
Diversity 0.076*** (0.018) −0.005 (0.018)
Family_size 0.205*** (0.034) 0.313*** (0.010)
Literature_citation 0.150*** (0.016) 0.288*** (0.060)
Patent_citation 0.178*** (0.015) 0.307*** (0.013)
Total_num 0.032*** (0.003) 0.036*** (0.002)
State 0.080*** (0.029) 0.025 (0.019)
Asset 0.011** (0.005) 0.025*** (0.005)
Labor_ productivity 0.013** (0.006) 0.002 (0.006)
Age −0.010 (0.011) −0.009 (0.007)
Pgdp −0.038 (0.089) −0.043 (0.053)
Second_industry −0.231 (0.276) −0.781*** (0.146)
Human_capital 0.004 (0.042) 0.101*** (0.022)
Fdishare 1.428 (0.042) −0.458 (0.376)
Year fixed effect Y Y
City fixed effect Y Y
Industry fixed effect Y Y
Patent category fixed effect Y Y
Observations 480,721 333,095
urn:x-wiley:28313224:media:ise354:ise354-math-0016 0.026 0.017
  • Abbreviations: Pgdp, per-capita gross domestic product; Y, yes.
  • *, **, and *** refer to significant level at 10%, 5%, and 1%, respectively.

5 CONCLUSIONS AND POLICY IMPLICATIONS

Under the contexts of gender inequality in labor market, the low proportion of female scientists and innovation-driven development strategy in China, we use the patent data from China National Intellectual Property Administration to explore the effect of gender diversity on patent quality and perform a series of heterogeneity analyses, which include regional factors, industries' characteristics, enterprises' characteristics, and patent type.

Based on the empirical results, we find some important conclusions.
  • (1)

    Gender diversity in inventor teams can promote patent quality: As for the marginal effect, increasing gender diversity in inventor teams by 1 unit can increase the number of forward citations of patents by about 6%. Gender diversity within the inventor team can promote patent quality mainly by increasing team creativity and labor productivity, which mainly benefited from the better ability of information acquisition and an open work climate. In addition, the heterogeneity of patent type shows that the effect of gender diversity on patent quality is more significant in invention patents, which are radical innovations and need higher requirements for team creativity. This can confirm the mechanism to some degree.

  • (2)

    The results remain robust when we perform a series of robustness checks from the following aspects: Replacing the independent variable, replacing the dependent variable, increasing samples, excluding the samples whose inventor is one, excluding the fuzzy-matched samples, replacing estimation methods, and replacing the gender prediction method.

  • (3)

    The results of heterogeneity of regional factors show that the effect of gender diversity on patent quality is more significant in places where gender discrimination is weak and women are well-educated. The main reason is that the complementarity of female inventors and male inventors can be more effective in these places.

  • (4)

    The results of the heterogeneity of industries' characteristics show that the effect of gender diversity on patent quality is more obvious in industries with high uncertainty and fierce competition as the enterprises in these industries have higher innovation incentives.

  • (5)

    The results of the heterogeneity of enterprises' characteristics show that the effect of gender diversity on patent quality is more effective in private enterprises and growing enterprises.

The conclusions of this study have some important policy implications for reducing gender inequality in the labor market, especially in the innovation area, and for promoting Chinese innovation ability:
  • (1)

    We find that improving gender diversity in inventor teams can promote patent quality, especially the invention patents. Thus, the government should give more support to female scientists and improve the representation of female scientists. At the same time, the government should break the “glass ceiling” women will face in their work to make full use of female talents and avoid the distortion of the allocation of human capital.

  • (2)

    According to a series of heterogeneity, the government should eliminate gender discrimination and improve the protection of female labor forces. On the one hand, the government should improve labor protection legislation and strengthen labor protection supervision, especially for female labor forces; on the other hand, the government should allocate more education resources to women and encourage more women to study science and technology and break the industries' entrance barriers to realize the optimal allocation of human resources.

  • (3)

    The effect of gender diversity on patent quality is more significant in industries with high uncertainty and fierce competition, and it is also more obvious in private enterprises and growing enterprises. Thus, the government should pay more attention to these industries and enterprises to maximize the contribution of female scientists.

ACKNOWLEDGMENTS

Zhijie Zhang acknowledges the financial support from the Innovation and Research Fund for Postgraduates by Shanghai University of Finance and Economics (Grant No.: CXJJ-2021-346). Qingqing Zong acknowledges the financial support from the National Natural Science Foundation of China (Grant No.: 71804104).

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflicts of interest.

    ETHICS STATEMENT

    Not applicable.

    APPENDIX 1

    See Figure A1.

    Details are in the caption following the image
    The process of machine learning.
    Next, the specific process of machine learning is introduced step by step according to the above flowchart as follows:
    • (1)

      Data source: The first step of machine learning is to determine the data, the quality and quantity of which directly determine the accuracy of the prediction. We use the data from the 1% China Population Sample Survey in 2005, with a total of 2,585,476 samples, including the name and gender of each respondent, which is well represented. We divide all samples into two parts randomly, the training set and the test set. The training set accounts for 90% of the total samples and the test set accounts for 10% of the total samples. The training set is used to train the predicting model and the test set is used to verify the accuracy of the model.

    • (2)

      Feature selection: The biggest challenge of this article is that the patent data does not offer the information about inventors' gender. Thus, we predict inventors' gender based on their first names because everyone's first name can reflect their gender to some extent. The logic of the machine learning used in our paper is based on the fact that people's first names can reflect their gender. For example, we are more likely to predict that a person whose first name is Jianguo (建国) is male regardless of their actual gender because our prediction is based on the fact that the proportion of men is higher than that of women among the people called Jianguo (建国). In addition, people with the same first name will be given the same predicted gender regardless of their actual gender, because the prediction logic of our model is only based on people's first names. Because computers can not recognize whole Chinese names, we have split all first names into single words and encoded them. We have to remove the recurring words and rare words that appear less than five times in all names. The rare words will affect the prediction accuracy of the model, so we have to remove these rare words. The patents whose inventors have first names with rare words will be removed because our model cannot predict their gender. Such samples are very few. After removing recurring and rare words, there are 3453 words left.

    • (3)

      Encoding: Because computers can not recognize Chinese characters, we have to encode Chinese characters. We use the one-hot method to represent each word. Especially, each word is numbered in order as 1, 2, 3, …, 3453, and the word j is expressed by a 3453-dimensional vector urn:x-wiley:28313224:media:ise354:ise354-math-0017, where the jth element is 1 and the others are 0. After encoding each word, each name can be expressed by a 3453-dimensional vector by summing the encoded word that each first name contained.

    • (4)

      Training: After encoding all names, we will use the Bayesian algorithm to train the model. The Bayesian algorithm used in our paper is based on the Bayesian theorem, which is urn:x-wiley:28313224:media:ise354:ise354-math-0018. Next, we will use an example to illustrate how to use the Bayesian algorithm to predict gender based on names. For example, we attempt to predict the gender of San Zhang (张三); then, we have to calculate urn:x-wiley:28313224:media:ise354:ise354-math-0019. urn:x-wiley:28313224:media:ise354:ise354-math-0020 is the probability that San Zhang is male. urn:x-wiley:28313224:media:ise354:ise354-math-0021 is the proportion of male samples called San in our training set. urn:x-wiley:28313224:media:ise354:ise354-math-0022 is the proportion of men in our training set. urn:x-wiley:28313224:media:ise354:ise354-math-0023 is the proportion of people whose names are San in our training set. We can calculate the probability that San Zhang is male according to the above formula and predict the gender based on the probability.

    • (5)

      Estimation: After training, the accuracy of this model can be evaluated according to a series of standards. Because the model is trained based on the training set, we will verify the accuracy of the model based on the test set. We do not know the gender of the inventors in our regression samples, so we can not verify the accuracy of the model based on regression samples. The accuracy of our model is 86%, which means that the predicted gender of 86% of the samples in the test set is consistent with their actual gender. The accuracy of our model is very excellent.

    • (6)

      Application: The last step is to apply our model to our patent data.

    • 1 The number of forward citations is the number of the patent cited by other patents.
    • 2 Detailed descriptions of machine learning are provided in Appendix 1.
    • 3 If a word appears less than five times in all names, we define it as a rare word. The rare words will affect the prediction accuracy of the model, so we remove these rare words.
    • 4 The prediction accuracy of our model is 86%.
    • 5 Source: https://github.com/observerss/ngender/tree/0a018e374888e70f43b91f5777de6a0ba448f940.
    • 6 The number of patents in the same family refers to a group of documents that repeatedly publish or approve the same or basically the same content based on the same priority document, so a large patent family not only increases the application cost but also increases the possibility of being cited.
    • 7 The number of patent backward references refers to the number of times the patent refers to other patents.
    • 8 The number of literature references refers to the number of literature references the patent refers to.
    • 9 The invention and utility model patents are divided into eight parts, where A is necessary for human life (agriculture, light, medical), B for operation and transportation, C for chemistry and metallurgy, D for textile and paper, E for fixed building (building, mining), F for mechanical engineering, G for physics, and H for electricity.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.