Volume 116, Issue 3 e2328
RESEARCH ARTICLE
Full Access

A search for factors associated with reduced carbohydrate intake and NTD risk in two population-based studies

Gary M. Shaw

Corresponding Author

Gary M. Shaw

Stanford University School of Medicine, Department of Pediatrics, Division of Neonatology, Stanford University School of Medicine, Stanford, California, USA

Correspondence

Gary M. Shaw, Department of Pediatrics, Stanford University, 453 Quarry Road, Palo Alato, CA 94304, USA.

Email: [email protected]

Search for more papers by this author
Wei Yang

Wei Yang

Stanford University School of Medicine, Department of Pediatrics, Division of Neonatology, Stanford University School of Medicine, Stanford, California, USA

Search for more papers by this author
Kari A. Weber

Kari A. Weber

Department of Epidemiology, Fay. W. Boozman College of Public Health, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA

Search for more papers by this author
Andrew F. Olshan

Andrew F. Olshan

Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

Search for more papers by this author
Tania A. Desrosiers

Tania A. Desrosiers

Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

Search for more papers by this author
The National Birth Defects Prevention Study

The National Birth Defects Prevention Study

Search for more papers by this author
First published: 07 March 2024

Abstract

Background

Two population-based case–control studies have reported an increased risk of neural tube defect (NTD)-affected pregnancies among women with low carbohydrate diet in the periconceptional period. Given that only two studies have investigated this association, it is unclear to what degree the findings could be impacted by residual confounding. Here, we further interrogated both studies that observed this association with the objective to identify factors from a much larger number of factors that might explain the association.

Methods

By employing a machine learning algorithm (random forest), we investigated a baseline set of over 200 variables. These analyses produced the top 10 variables in each data set for cases and controls that predicted periconceptional low carbohydrate intake.

Results

Examining those prediction variables with logistic regression modeling, we did not observe any particular variable that substantially contributed to the NTD-low carbohydrate association in either data set.

Conclusions

If there are underlying factors that explain the association, our findings suggest that none of the 200+ variables we examined were sufficiently correlated with what that true explanatory exposure may be. Alternatively, our findings may suggest that there are other unidentified factor(s) at play, or the association observed in two independent data sets is directly related to low carbohydrate intake.

1 INTRODUCTION

Two recent studies observed that women's low carbohydrate diet (≤5th percentile) in the periconceptional period was associated with an increased odds of neural tube defect (NTD)-affected pregnancies. Desrosiers et al. (2018) observed that women's low carbohydrate diet in the year before conception was associated with a modest increased risk (adjusted OR = 1.3) of NTD-affected pregnancies. Their investigation was conducted in the National Birth Defects Prevention Study (NBDPS) corresponding to the years following US mandatory folic acid fortification of grains. Their novel findings, however, could not disentangle whether the increased NTD risk observed with restricted carbohydrate intake was a consequence of lower folate intake or other factors related to low carbohydrate intake because in this study period, folic acid and carbohydrate measures would derive from similar foods (e.g., enriched grains such as breads and pastas). This association was recapitulated in data from a NTD study conducted in California (CA-NTD Study) prior to folic acid fortification showing a 2-fold increased NTD risk among women reporting low carbohydrate intake (Shaw & Yang, 2019). This subsequent finding suggested that the observed increase in NTD risk was attributable, at least in part, to something other than reduced folic acid intake.

If the low carb-NTD association is explained (at least in part) by something other than folic acid intake, it's unclear what those factor(s) could be. Each of the original studies investigated only a limited set of suspected confounding factors based on substantive inferences conceptualized using directed acyclic graphs (e.g., glycemic index, elevated body mass index) with no factor being identified as a major source of confounding. However, there may be residual confounding by other factor(s) not considered in the original studies. Given the unknown mechanism underlying the observed association between low carbohydrate intake and NTDs, and the unknown degree to which that association may be due to residual confounding, our objective was to use machine learning to further interrogate both NBDPS and CA-NTD Study data to investigate over 200 additional, previously unconsidered nutritional, demographic, and behavioral factors that might potentially explain the observed association.

2 METHODS

2.1 NBDPS data

Details of the population-based case-control National Birth Defects Prevention Study (NBDPS) conducted at nine US centers can be found elsewhere (Reefhuis et al., 2015). Briefly, NBDPS included data from women with pregnancies affected with selected birth defects as well as women who had pregnancies without birth defects (controls) corresponding to estimated dates of delivery between October 1997 and December 2011. Cases included for this analysis were livebirths, stillbirths, and terminations with an NTD, specifically spina bifida or anencephaly. These phenotypes were ascertained prenatally through 1-year after delivery employing criteria of diagnostic descriptions based on physical examination, surgery, imaging, or autopsy. Further diagnosis review for eligibility was also conducted by clinical geneticists (Rasmussen et al., 2003). NTDs occurring in conjunction with chromosomal abnormalities or single gene disorders were not included. Controls were selected randomly from the same area and time period as cases.

Interviews were conducted with women with 1959 NTD-affected pregnancies (66%) and 11,829 women who delivered control infants (64%). The computer-assisted-telephone interview was structured, approximately an hour in duration, conducted in English or Spanish, and completed within 6 weeks to 24 months from the woman's estimated delivery date. Women were queried on demographic factors, their medical history, and myriad lifestyle and behavioral factors. Usual consumption of foods in the year before pregnancy was obtained using a previously validated, modified 58-item food frequency questionnaire (Willett et al., 1987). Additional questions elicited information on cereal consumption for the 3 months prior to conception. Intake amounts of dietary nutrients including carbohydrates from food and nonalcoholic beverages (e.g., sodas and juices) were estimated using the USDA National Nutrient Database for Standard Reference, Version 27, which contains values for nutrients and for numerous food items (Pehrsson et al., 2015; USDA Agricultural Research Service, 2007).

Among interviewed participants we further restricted to (i) cases (n = 1870) and controls (n = 11,452) that were singletons; (ii) cases (n = 1831) and controls (n = 11,370) where the pregnancy was without pregestational diabetes; (iii) cases (n = 1730) and controls (n = 10,691) with energy intake ranging from 500 to 5000 kcal and missing fewer than two food frequency questionnaire items as a quality control; and (iv) cases (n = 1726) and controls (n = 10,649) where interview data had fewer than 10% missing questions. These 1726 cases and 10,649 controls served as the analytic base, with the case group phenotype further classified in some analyses as anencephaly (n = 561) or spina bifida (n = 1165). This analytic sample resembled that in the Desrosiers et al. (2018) study with the exception that here we also included pregnancies with the date of conception before April 1, 1998 as well as cases and controls that derived from the New Jersey center early in the conduct of the NBDPS. These additions to the analytic sample were considered appropriate as they derived from the larger NBDPS study and were appropriate for our research queries.

2.2 CA-NTD study data

Details of this population-based case–control study can be found elsewhere (Shaw et al., 1995). Briefly, 653 infants or fetuses with a NTD (cases) were ascertained by reviewing medical records, including ultrasonography, at all hospitals and clinics for those infants/fetuses delivered in select California counties between February 1989 and January 1991. Singleton, live born infants and fetuses with NTDs, including those prenatally diagnosed and electively terminated, were considered as cases. Controls (n = 644) were selected randomly from each area hospital in proportion to the hospital's estimated contribution to the total population of singleton infants born alive in the same time period as cases. We excluded women who only spoke languages other than English or Spanish, or who had a previous NTD-affected pregnancy, leaving 613 cases and 611 controls.

Interviews were conducted in-person with mothers of 538 (88%) cases and of 539 (88%) controls an average of 5 months from the actual or projected date of term delivery. Interviews obtained information on pregnancy and lifestyle factors including vitamin/mineral supplements women used in the period 3 months before conception and in each trimester of pregnancy. Information about average daily intake of food items was obtained by administering a well-established, 100-item semi quantitative food frequency questionnaire (Block et al., 1986). Women were asked to estimate usual frequency and portion size of food items consumed during the 3 months before conception. Of the 1077 women who completed an interviewer-administered questionnaire, 1007 completed a food frequency questionnaire, and 916 (of 454 cases and 462 controls) contained suitable data based on error checks built into the associated analytic software. We further restricted analyses to 449 case and 458 control women without pregestational diabetes. Cases were further classified as: anencephaly (n = 175) or spina bifida (n = 253) for some analyses. This is the same analytic sample as used in Shaw & Yang, 2019.

2.3 Analyses

We defined low carbohydrate diet as intake in the 5th percentile or lower among control women in each data set separately, corresponding to an estimated carbohydrate intake of approximately 95 g/day in NBDPS data and 122 g/day in CA-NTD Study data. This percentile cutoff was employed in the two previous studies that demonstrated the association between restricted carbohydrate intake and NTDs (Desrosiers et al., 2018; Shaw & Yang, 2019).

Using a random forest approach, we assessed >200 variables from interview responses. Random forest, a data mining algorithm, produces a set of decision trees using random subsets of the data and combines them to produce a mean prediction model based on variable importance (Strobl et al., 2009). This method accounts for interactions and nonlinear associations among many factors simultaneously to determine their prediction of a variable of interest (i.e., low carbohydrate) (Strobl et al., 2009). Importance for each potential predictor was estimated using the “varimp” function (Party package in R software) to obtain the metric mean decrease accuracy (MDA) specifying “ntree = 2500” and “mtry = 15” as the number of trees and number of selected predictor variables per split, respectively. The MDA is essentially the decrease in accuracy of the model. So the more a permutation decreases the accuracy, the more important the variables are to the model. As an example, the MDA of irrelevant variables should vary randomly around zero since their removal does not decrease the accuracy (Strobl et al., 2009). Alternatively, variables that were “important” had an MDA that fell outside of the magnitude of the lowest negative value in either the positive or negative direction.

The >200 variables assessed in each study represented demographic (e.g., maternal age and education), behavioral (e.g., cigarette smoking and cannabis use), medical (e.g., epilepsy and diabetes), and dietary factors (e.g., supplements used and food intake) (see Appendix 1 for details) for their “prediction” of low carbohydrate intake separately in the case and control groups. This assessment in both cases and controls ensured that prediction variables in either group were identified and included in analyses. Only variables derived from interview questions that had a frequency >0.1% in the NBDPS and >1% in CA-NTD study (i.e., to allow about 10 participants answering “Yes” or reporting being exposed in each study) were considered for further analyses. Values for all time-varying variables were considered for the time-periods 1 month before through 2 (NBDPS) months after conception or 3 (CA-NTD Study) months before and after conception to capture the etiologically relevant window for neural tube development.

The random forest algorithm provided a ranked list of variables that “predict” low carbohydrate diet in NTD cases as well as in controls. We performed additional analyses using the top 10 variables (in case or control groups) identified as predicting low carbohydrate intake (selecting the top 20 did not offer further information).

The association (ORs and 95% CIs) between low carbohydrate and NTDs overall, as well as spina bifida and anencephaly alone was estimated with logistic regression separately for each data set. As an approach to try to explain what might be explaining the associations observed between low carbohydrate and NTDs, we calculated stratified effect measure estimates on each of the top predictor variables. Stratification variables measured on a continuous scale were categorized as low (≤25th percentile of the reported within-sample distribution) versus not low (>25th percentile). Stratified ORs were explored for heterogeneity (Wald Chi-squared test) by adding an interaction term between low carbohydrate and each predictor in logistic regression models.

Random forest analyses were performed in R software (version 4.1.3) using the Party Package. All other analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC). Analytic activities associated with this project were approved by the institutional review boards of each collaborating center as well as the California Department of Public health for both studies.

3 RESULTS

3.1 NBDPS data

The random forest analysis yielded the ranked variables for all NTD cases, anencephaly, spina bifida, and controls shown in Figure 1. The top 10 variables were nearly identical between all NTD cases and controls yielding a combined list of 11 variables that were subjected to further analysis. All 11 variables were related to dietary intake: magnesium, copper, thiamin, glycemic index, caffeine from soda, vitamin B6, iron, diet quality index, dietary folate equivalents, vitamin C, and riboflavin.

Details are in the caption following the image
Random forest analysis of variables predicting the risk of low carbohydrate intake among NTDs and controls. Shown are random forest analyses for NTDs (upper-left, accuracy = 96.3%, sensitivity = 40.0%, specificity = 100%), anencephaly (upper-right, accuracy = 94.5%, sensitivity = 8.8%, specificity = 100%), Spina Bifida (lower-left, accuracy = 96.2%, sensitivity = 38.0%, specificity = 100%), and controls (lower-right, accuracy = 97.5%, sensitivity = 50.4%, specificity = 100%). The x-axis is the value of mean decrease in accuracy and the y-axis is a list of the top 10 predictors. Of note, random forest analysis identifies the relative importance of predictive variables but does not indicate magnitude or direction of potential associations with NTDs, NBDPS 1997–2011.

The association between low carbohydrate and NTDs overall, as well as spina bifida and anencephaly alone is shown in Table 1. The OR for each grouping was 1.2 (95% CI, 1.0–1.5) for NTDs overall, 1.0–1.6 for spina bifida and 0.9–1.9 for anencephaly. The stratified ORs and 95% CIs between low versus not low carbohydrate intake and NTDs for each stratum of each of the top ranked variables (from either cases or controls) are presented in Table 2. ORs for anencephaly and spina bifida are presented in Tables S1 and S2, respectively. We did not observe much evidence for heterogeneity (statistical precision may have been low for some comparisons) in the overall OR of 1.2 for any of the “prediction” variables with the possible exception of glycemic index. Low versus not low strata of glycemic index had some statistical evidence of heterogeneity for the association between low carbohydrate and anencephaly (Table S1). The direction of this finding indicated that low glycemic index and low carbohydrate were associated with the highest OR. This may have arisen by chance owing to the number of comparisons made, but nevertheless is opposite to what we would expect a priori with respect to glycemic index, which is not a decrease but an increase in odds with increasing glycemic index values (Shaw et al., 2003).

TABLE 1. Odds ratios NTDs overall, spina bifida, and anencephaly associated with periconceptional low carbohydrate intake, NBDPS 1997–2011.
Case type Carbohydrate intake Case Control Odds ratio (95% CI)
All NTD Low 105 532 1.2 (1.0,1.5)
Not low 1621 10,117 Referent
Anencephaly Low 34 532 1.2 (0.9,1.9)
Not low 527 10,117 Referent
Spina Bifida Low 71 532 1.2 (1.0,1.6)
Not low 1094 10,117 Referent
  • a Low intake defined as carbohydrate intake ≤5th percentile, that is, approximately 95 g/day, determined among controls; not low intake considered >95 g/day.
TABLE 2. Odds ratios for associations between NTDs overall comparing low with not low carbohydrate intake across strata of variables identified as important predictors in random forest analyses, NBDPS 1997–2011.
Stratification predictor variable Low carbohydrate intake OR (95% CI) for low carbohydrate intake p-value for interaction term
NTD cases (n = 1726) Controls (n = 10,649)
Magnesium (mg/day)
≤ 25th percentile 97 504 1.1 (0.8, 1.4) 0.2
> 25th percentile 8 28 1.8 (0.8, 4.1)
Copper (mg/day)
≤ 25th percentile 97 497 1.2 (0.9, 1.5) 0.7
> 25th percentile 8 35 1.4 (0.7, 3.1)
Thiamin (mg/day)
≤ 25th percentile 100 496 1.1 (0.9, 1.4) 0.7
> 25th percentile 5 36 0.9 (0.4, 2.3)
Glycemic index
≤ 25th percentile 36 151 1.5 (1.0, 2.2) 0.2
> 25th percentile 69 381 1.1 (0.9, 1.5)
Caffeine from soda (mg/day)
≤ 25th percentile 38 239 1.2 (0.9, 1.8) 0.8
> 25th percentile 67 293 1.3 (1.0, 1.7)
Vitamin B6 (mg/day)
≤ 25th percentile 92 472 1.2 (0.9, 1.5) 0.7
> 25th percentile 13 60 1.4 (0.8, 2.5)
Iron (mg/day)
≤ 25th percentile 86 445 1.2 (0.9, 1.6) 0.7
> 25th percentile 19 87 1.4 (0.8, 2.3)
Diet quality index
≤ 25th percentile 97 490 1.1 (0.9, 1.4) 0.8
> 25th percentile 8 42 1.2 (0.6, 2.6)
Folate DFE (μg/day)
≤ 25th percentile 88 451 1.1 (0.9, 1.4) 0.5
> 25th percentile 17 81 1.4 (0.8, 2.3)
Vitamin C (mg/day)
≤ 25th percentile 85 404 1.2 (0.9, 1.6) 0.5
> 25th percentile 20 128 1.0 (0.6, 1.6)
Riboflavin (mg/day)
≤ 25th percentile 81 425 1.0 (0.8, 1.3) 0.2
> 25th percentile 24 107 1.5 (0.9, 2.3)
Any predictor
≤ 25th percentile 105 532 1.2 (1.0, 1.5) NA
> 25th percentile 0 0 NA
  • Abbreviation: DFE, Dietary folate equivalents.
  • a Low intake defined as carbohydrate intake ≤5th percentile, that is, approximately 95 g/day, determined among controls; not low intake considered >95 g/day.
  • b ≤25th percentile in any predictors listed above versus not.
  • c p-value from Wald Chi-squared statistic.

We also conducted an analysis that explored low intake (≤25th percentile) in any of the prediction variables relative to not low (>25th percentile) in all predictors to determine whether any vs none contributed to the observed association of 1.2 (95% CI: 1.0, 1.5) for low carbohydrate. The results of these analyses gave an OR of 1.2 (Table 2 and Tables S1 and S2) indicating that the collection of prediction variables (11 for overall NTD cases, spina bifida, and controls; 12 for anencephaly and controls) identified by random forest contributed to the association directly or indirectly by some unknown correlate variable to some or all (vs. none) of the prediction variables. Of note, there were no cases and controls in the low carbohydrate intake group who were also in the group of not low in all predictors. This suggests the random forest algorithm functionally identified the strongest predictors.

3.2 CA-NTD study

ORs for low carbohydrate and NTDs, OR = 2.0 (95% CI: 1.2, 3.4) overall, as well as spina bifida, OR = 2.1 (95% CI: 1.2, 3.7) and anencephaly, OR = 1.8 (95% CI: 0.9, 3.5) are shown in Table 3.

TABLE 3. Odds ratios for NTDs overall, spina bifida, and anencephaly associated with periconceptional low carbohydrate intake, CA-NTD Study 1989–1991.
Case type Carbohydrate intake Case Control Odds ratio (95% CI)
All NTD Low 43 23 2.0 (1.2, 3.4)
Not low 406 435 Referent
Anencephaly Low 15 23 1.8 (0.9, 3.5)
Not low 160 435 Referent
Spina Bifida Low 25 23 2.1 (1.2, 3.7)
Not low 228 435 Referent
  • a Low intake defined as carbohydrate intake ≤5th percentile, that is, approximately 122 g/day, determined among controls; not low intake considered >122 g/day.

Random forest results predicting low carbohydrate intake revealed the ranked variables for all NTD cases, anencephaly, spina bifida, and controls in Figure 2. In this data set, the top 10 variables were similar for all NTD cases and controls producing a combined listing of 14 variables to be considered for further analysis. As observed in the NBDPS analysis, the top selected “prediction” variables were all based on dietary intake: total calorie intake, phosphorous, potassium, thiamin, protein, cysteine, sodium, calcium, niacin, sucrose, glucose, fiber, daily grams of bread, cereal, rice and pasta, and grains for fiber.

Details are in the caption following the image
Random forest analysis of variables predicting the risk of low carbohydrate intake among NTDs and controls. Shown are random forest analyses for NTDs (upper-left, accuracy = 98.0%, sensitivity = 83.7%, specificity = 99.5%), anencephaly (upper-right, accuracy = 97.7%, sensitivity = 73.3%, specificity = 100%), Spina Bifida (lower-left, accuracy = 98.0%, sensitivity = 88.0%, specificity = 99.1%), and controls (lower-right, accuracy = 98.3%, sensitivity = 65.2%, specificity = 100%). The x-axis is the value of mean decrease accuracy and the y-axis is a list of the top 10 predictors. Of note, random forest analysis identifies the relative importance of predictive variables but does not indicate magnitude or direction of potential associations with NTDs, CA-NTD Study 1989–91.

The stratified ORs and 95% CIs between low versus not low carbohydrate intake and NTDs overall for each stratum of each of these prediction variables are presented in Table 4 and in Tables S3 and S4 for anencephaly and spina bifida. We did not observe evidence for heterogeneity (statistical precision may have been low for some comparisons) in the overall OR of ~2.0 for any of the “prediction” variables individually, with the possible exceptions of niacin and low calcium. As in the NBDPS data set, we explored whether low in any of the prediction variables relative to not low in all predictors contributed to the observed association. These results similarly showed the OR of ~2.0, indicating that the prediction variables (14 for overall NTDs, spina bifida, and controls; 13 for anencephaly and controls) identified by random forest in total contributed to the association directly or indirectly by some unknown correlate variable to some or all of the prediction variables. That is, these prediction variables considered as any versus none (not low in all variables), are related to the low carbohydrate association but none in isolation accounts for the association.

TABLE 4. Odds ratios for associations between NTDs overall comparing low with not low carbohydrate intake across strata of variables identified as important predictors in random forest analyses, CA-NTD Study 1989–1991.
Stratification predictor variable Low carbohydrate intake OR (95% CI) for low carbohydrate intake p-value for interaction term
NTD cases (n = 449) Controls (n = 458)
Total calories (kcal/day)
≤ 25th percentile 43 23 1.9 (1.0, 3.4) NA
> 25th percentile 0 0 NA
Phosphorus (mg/day)
≤ 25th percentile 42 22 1.7 (1.0, 3.1) 0.8
> 25th percentile 1 1 1.1 (0.1, 18.1)
Potassium (mg/day)
≤ 25th percentile 43 22 1.7 (0.9, 3.0) 0.9
> 25th percentile 0 1 NA
Thiamin (mg/day)
≤ 25th percentile 41 21 1.9 (1.0, 3.4) 0.6
> 25th percentile 2 2 1.1 (0.2, 7.9)
Protein (g/day)
≤ 25th percentile 41 23 1.4 (0.8, 2.4) 0.9
> 25th percentile 2 0 NA
Cysteine (mg/day)
≤ 25th percentile 39 23 1.5 (0.8, 2.6) 0.9
> 25th percentile 4 0 NA
Sodium (mg/day)
≤ 25th percentile 43 23 1.5 (0.9, 2.7) NA
> 25th percentile 0 0 NA
Calcium (mg/day)
≤ 25th percentile 41 17 2.3 (1.2, 4.3) 0.1
> 25th percentile 2 6 0.4 (0.1, 1.9)
Niacin (mg/day)
≤ 25th percentile 41 20 2.2 (1.2, 4.1) 0.2
> 25th percentile 2 3 0.7 (0.1, 4.3)
Sucrose (g/day)
≤ 25th percentile 37 21 2.6 (1.4, 4.9) 0.9
> 25th percentile 6 2 3.0 (0.6, 14.9)
Fiber (g/day)
≤ 25th percentile 40 23 1.7 (0.9, 3.0) 1.0
> 25th percentile 3 0 NA
Bread, cereal, rice and pasta (g/day)
≤ 25th percentile 39 20 2.0 (1.1, 3.6) 0.7
> 25th percentile 4 3 1.5 (0.3, 6.6)
Grains for fiber (g/day)
≤ 25th percentile 38 21 1.9 (1.0, 3.5) 0.7
> 25th percentile 5 2 2.7 (0.5, 14.0)
Glucose (g/day)
≤ 25th percentile 35 21 1.9 (1.0, 3.5) 0.3
> 25th percentile 8 2 4.2 (0.9, 20.1)
Any predictor
≤ 25th percentile 43 23 1.9 (1.1, 3.3) NA
> 25th percentile 0 0 NA
  • a Low intake defined as carbohydrate intake ≤5th percentile, that is, approximately 122 g/day, determined among controls; not low intake considered >122 g/day.
  • b p-value from Wald Chi-squared statistic.
  • c ≤25th percentile in any predictors listed above versus not.

4 DISCUSSION

The objective of this work was to further interrogate two data sets from population-based case-control studies that previously demonstrated an association between low carbohydrate intake and NTDs to attempt to determine whether another variable or set of variables, other than low carbohydrate, might underlie the association. To aid this endeavor we employed the machine learning algorithm random forest to identify potential factors from among more than 200 variables reflecting nutrition, demographics, and behaviors—an approach that was agnostic to variable selection and not hindered by data features such as collinearity. Although the machine learning algorithm could choose from >200 variables from each data set, no single variable was observed to substantially contribute to the low carbohydrate association in either data set. Analyses that explored whether low in any (vs. not low in all) of the top prediction variables chosen (in either cases or controls) by the algorithm in each data set did, however, reflect the elevated OR for low carbohydrate intake. These analyses indicated that the prediction variables identified by random forest in total contributed to the association directly or indirectly by some unknown correlated variable to some or all of the prediction variables. We could not further discriminate what that variable relationship might be with these two data sets. Of note, thiamin was observed to be an important predictor of low carbohydrate in both data sets. Lowered thiamin has been observed as a potential, but only modest, risk factor of NTDs (anencephaly) in the NBDPS (Chandler, 2012) and in the CA-NTD Study among nonusers of vitamin supplements (Shaw et al., 1999).

The meaning of a low carbohydrate diet is complex—(absence of important nutrients, surrogate marker for an at-risk subset of the population with metabolic disease, etc.). Given that only two studies have investigated the association between low carbohydrate intake and NTD risk, and that we did not identify another variable in these studies to “explain” the consistently observed association, we believe it is premature to speculate on what these findings may mean in terms of recommendations for periconceptional diet choices.

The lack of finding a single variable that specifically contributed to the previously observed association of low carbohydrate in these two data sets may suggest that (a) the selected machine learning algorithm may not have been sufficiently sensitive, although it did identify that a “low” value in any of the important predictor variables, in composite, reproduced the elevated odds ratio; (b) none of the 200+ variables are sufficiently correlated with the true exposure, if it is indeed something other than low carbohydrate intake underlying the low carbohydrate association; (c) at least one of the 200+ variables does explain the association directly or indirectly, but is measured with insufficient specificity or subject to exposure misclassification in these studies; or (d) the association observed in two independent data sets is unbiased by residual confounding and is directly related to low carbohydrate intake. The latter explanation will require investigation in other data that may be better able to quantify intake of specific carbohydrates.

ACKNOWLEDGMENTS

This work was partially supported by the Centers for Disease Control and Prevention Centers of Excellence (U01DD001226 and U01DD001302). The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the California Department of Public Health.

    DATA AVAILABILITY STATEMENT

    The data are from the two studies interrogated here (a CDC-funded multi-site study and a CA-only study) are not publicly available at this time due to privacy or ethical restrictions.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.