Volume 169, Issue 2 pp. 385-387
LETTER TO THE EDITOR
Full Access

Multivariate ordinal probit analysis in the skeletal assessment of sex

Lyle W. Konigsberg

Corresponding Author

Lyle W. Konigsberg

Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, Illinois

Correspondence

Lyle Konigsberg, Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, IL 61801.

Email: [email protected]

Search for more papers by this author
Susan R. Frankenberg

Susan R. Frankenberg

Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, Illinois

Search for more papers by this author
First published: 29 March 2019
Citations: 11

In their article, Klales, Ousley, and Vollner (2012) codify Phenice's (1969) three sexually dimorphic characters of the pubic symphysis into five-point ordinal scores. Although we see this as a valuable contribution, we have considerable issues with the statistical methods used in Klales et al. (2012). Although seven years have passed since their publication in this journal, we feel that a comment is necessary because their statistical methods have continued to be used (Kenyhercz, Klales, Stull, McCormick, & Cole, 2017; Klales, 2016; Klales & Burns, 2017; Klales & Cole, 2017; Lesciotto & Doershuk, 2017) including recently within this journal (Gómez-Valdés et al., 2017).

Klales and co-workers (Klales, 2016:297; Klales & Burns, 2017:750; Klales et al., 2012:109) erroneously refer to their analyses as using “ordinal logistic regression,” a technique also known by the shorter name of “ordered logit.” Ordered logit, or the similar ordered probit, is used when there is an ordinal dependent variable, such as in Equation 8 of Konigsberg and Hens (1998). What Klales and co-workers used is a logistic regression of the binary variable sex onto scores for the Phenice traits (see Equations 4 and 5 of Konigsberg & Hens, 1998). This is not simply a semantic difference. It reverses the philosophy of “transition analysis” (Boldsen, Milner, Konigsberg, & Wood, 2002; Milner & Boldsen, 2012), where skeletal characteristics are considered the dependent variables and some demographic measure is the independent variable. Thus, Klales et al. should have used ordered probit or logit regression of the Phenice characteristics onto sex, treating the Phenice characteristics as ordinal categorical dependent variables and sex as the binary independent variable. This modeling makes logical sense, as the Phenice indicators depend on the sex of the individual, rather than the sex of the individual depending on the indicators.

With this said, we first analyze a large dataset in the backwards fashion of Klales and co-workers. The data are from 774 individuals in the Terry Collection (Hunt & Albanese, 2005). This sample is a subset of a slightly larger dataset we previously analyzed using methods that allowed for missing data (Konigsberg, Herrmann, & Wescott, 2002). The three Phenice characteristics were scored on a five-point ordinal scale of “F,” “F?,” “?,” “M?,” and “M” which we numbered from one to five. Data collection was funded by National Science Foundation grant BCS97-27386 awarded to the first author. Dr. Nicholas P. Herrmann scored 669 of the pubic bones, the first author scored 42 bones, and Dr. Daniel J. Wescott scored 33 bones. Given that one observer scored most of the bones and that the three observers were working at the same time, we suspect that there is little observer error. We do not suggest that our scoring was the same as Klales and co-workers', because we did not have reference to their ordinal scale.

Klales et al. did not consider model selection in their original analysis, but we have used the Bayesian Information Criterion (BIC) (Schwarz, 1978) to compare models. All of our analyses were done in R (R Core Team, 2018), and the scripts and data are available from http://faculty.las.illinois.edu/lylek/AJPA2019/. The models we considered were the saturated model with all three indicators, the three models with two indicators, the three models with one indicator, and the null model with no indicators. The model with the lowest BIC was the one that included only the ventral arc (VA) and the subpubic concavity (SPC). Applying this model, 339 of the 347 actual females were correctly identified, as were 420 of the 427 actual males, for a percent correct classification of 98.06%. Klales et al. (2012) argued that the percent correct classification in their study was biased up because the training and test samples were one in the same. They consequently applied a leave-one-out (LOO) cross-validation. We did this as well and found the exact same classification results as in the original biased case of testing the method on the training sample. This is as one would expect for a large sample with many ties. There were 291 females with scores of one and one for the two traits and 372 males with scores of five and five. In place of the LOO method, we also used the 632+ method (following some of the code in “errorest_632plus” from the library “sortinghat” [Ramey, 2013]) described in Efron and Tibshirani (1997). We used 50 LOO bootstraps as suggested in Efron and Tibshrani and ultimately found an estimated correct identification percentage of 97.98%. As this is only 0.08% less than the biased rate of 98.06%, we give no further consideration to potential bias from reclassification of training cases.

We then fit a bivariate ordinal probit model to the ventral arc and subpubic concavity traits. We fit the model using the package “mvord” (Hirk, Hornik, Vana, & Genz, 2018) which uses composite likelihood estimation (Varin, Reid, & Firth, 2011). The estimated threshold parameters were 1.025, 1.807, 1.930, and 2.071 for VA and 1.606, 1.976, 2.000, and 2.185 for SPC. The male means for these two traits were 3.278 and 3.847 (female means were fixed at 0.0 and 0.0), and the residual polychoric correlation was 0.795 (the original polychoric correlation was 0.988). Using numerical bivariate integration from the R package “mvtnorm” (Genz et al., 2014), we can find the posterior probability of being female (or male) after assuming some prior probability. For the prior probability of being female, we used both a prior probability of 0.5 and the actual prior of 0.439, which made no difference in our results. As in the logistic regression, we correctly classified 339 of the 347 actual females and 420 of the actual 427 males.

Given that our results from bivariate ordinal probit are identical to those from the logistic regression, why use the computationally more complex multivariate ordinal probit? First, the probit, unlike the logistic regression, reflects the actual causal relationship between sex and indicators. Second, the multivariate ordinal probit allows for maximum likelihood estimation of the proportion of females in a sample, which in turn enables us to calculate highest posterior densities (HPD) and to thus evaluate the risk of misclassification. We found the proportion of females in the sample using numerical maximization of the log-likelihood given in Equation 10 of Konigsberg and Hens (1998). The 95% highest posterior density (HPD) for this estimate can be found by integrating the normed likelihood. To demonstrate this, we used a parametric bootstrap (i.e., Monte Carlo simulation from our estimated parameters) to form a sample with 25 females and 975 males for a proportion of females of 0.0250. We then found the maximum likelihood estimate of the proportion of females at 0.0251 with a 95% HPD of from 0.0149 to 0.0382. Using 0.0251 as an informative prior for the proportion of females, 21 of the 25 females were correctly classified and 971 of the 975 males were correctly classified for a percent correct classification of 99.2%. Conversely, using the original 774 individuals as a training sample for the logistic regression and using these results to predict the sex for the 1,000 simulated individuals led to one of the 25 simulated females being misclassified and 34 of the 975 simulated males being misclassified. This gives an estimated proportion of females of 0.058, which is outside of the 95% HPD from the correct analysis.

The contrast between multivariate ordinal probit and logistic regression becomes even starker if we consider a simulation example where the traits under consideration are less sexually dimorphic. For this simulation, we considered two traits where the thresholds for each were at 0.0, 0.25, 0.35, and 0.6, the male means were both at 0.6, and the residual correlation between the two traits was 0.5. We formed a training sample of 500 females and 500 males and a test sample of 680 females and 320 males. Logistic regression in this simulation produced a correct classification of 64.2% on the test sample, whereas multivariate ordinal probit with estimation of the proportion of females produced a correct classification of 72.1%. The logistic regression produced an estimated proportion of females of 0.558, whereas multivariate ordinal probit produced a comparable proportion of 0.6671 (close to the true value of 0.680) with a 95% HPD of 0.6131 to 0.8024.

Although these numerical acrobatics might seem unnecessary for the Phenice characteristics, given that even incorrect methods can give correct answers, we advocate for adopting the more complex and logically correct methods for two principal reasons. First, for traits that are less sexually dimorphic than the Phenice characteristics, the differences in results from logistic regression versus multivariate ordinal probit become more apparent, especially when the prior differs from 1:1. This is because the likelihood no longer dominates the prior. Second, multivariate ordinal probit can handle missing data in a straightforward manner, whereas logistic regression cannot. The trivariate ordinal probit estimated using “mvord” in R produces parameter values for VA and SPC that only differ in the hundreds decimal place and adds threshold values for the medial aspect of the ischio-pubic ramus (MA) of 1.419, 1.674, 1.861, and 2.009 with an associated mean for males of 3.670. The additional residual correlations are 0.742 between VA and MA and 0.919 between SPC and MA. From these trivariate parameters, one can find posterior probabilities under any missing data pattern by placing limits of negative infinity and infinity for any missing variables. Conversely, logistic regression requires finding seven different equations to cover all the missing data patterns. The problem becomes even more acute as the number of traits increases. For four traits, there are 15 patterns of missing data requiring 15 logistic regressions. For five traits, as in Konigsberg and Hens (1998), there are 31 patterns of missing data. We therefore see advantages in using multivariate ordinal probit over logistic regression, as was the case for Konigsberg and Hens in 1998.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.