Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality
Author summary Gender disparities in academia have prompted researchers to investigate gender gaps in professorship roles and publication authorship. Of particular concern are the gender gaps in cancer clinical trial authorship. Methodologies that evaluate gender disparities in academia often rely on tools that infer gender from first names. Tools that predict gender from first names are often used in methodologies that determine the gender ratios of academic departments or publishing authors in a discipline. However, researchers must choose between different gender predicting tools that vary in their accuracy, ease of use, and cost. We evaluated the binary gender prediction accuracy of Genderize, Gender API, and the gender R package on a gold-standard dataset of 32,968 clinical trialists from around the world. Genderize and Gender API are commercially available, while the gender R package is free and open source. We found that Genderize and Gender API were more accurate than the gende