Prediction of adolescent weight status by machine learning: a population-based study – BMC Public Health – BMC Public Health

Design and setting

We conducted a retrospective cohort study of P4 students from the 1995/1996 to 2015/2016 academic cohorts, who were followed until Secondary 6 (S6, Grade 12 in the US). P4 students are cognitively competent to provide self-reported measurements [22]. Additionally, we chose a cohort of P6 students from 1995/1996 to 2013/14 academic cohorts to predict weight status after P6, the last year of primary education in Hong Kong before students are promoted to the secondary level. Students who visited at least two years and had completed health measurements records were included. Data were obtained from the Student Health Service (SHS) of the Department of Health in Hong Kong, which has provided voluntary territory-wide annual health assessment services for primary and secondary students since 1995/1996. The health assessment questionnaire changed in 2015/16 [23]. Therefore, we included P4 students during 1995/1996 to 2014/2015, allowing at least one year of follow-up prediction. Fruther details of the survey health assessment scheme can be found elsewhere [24, 25].

Weight was measured to the nearest 0.1kg and height to the nearest 0.1cm were assessed annually at the SHS by well trained healthcare workers or nurses according to the study protocol. Demographics included sex, age and family socioeconomic level. Familys socioeconomic status was indicated by parental educational level, parental occupation and the type of housing [26].

Dietary habits were assessed by breakfast eating habit, sweetness preference during past 7 days, junk food intake habit, fruit/vegetable intake, and milk consumption habit. Physical activity behaviors were assessed by frequency of aerobic exercise each week, hours of doing aerobic exercise each week, and daily hours of TV viewing. All of these predictors in the structured questionnaires had four response options representing different degrees of frequency or duration. Breakfast habits were assessed by the item I usually have breakfast at?, we considered three response categories: (i) home, representing frequently eating at home, (ii) rarely at home, after combining the original categories of fast food stall/cafeteria/restaurant and some other places, and (iii) no breakfast at all, representing never eating at home. Thus, this item can be considered an assessment of the frequency of breakfast eating at home.

Psychological development was assessed using the 60-item self-reported Culture Free Self-Esteem Inventory for Children Questionnaire (CFSEI-2), which has been validated in Hong Kong children and adolescents [27, 28]. The Self-Esteem Inventory (SEI) comprises a total score and four domain scores: (i) general self-esteem denoting childrens overall perception of themselves, the score7 was considered as very-low; (ii) social self-esteem denoting childrens perception of their peer relationship, (iii) school-related self-esteem denoting childrens perception on their ability to achieve academic success, (iv) parent-related self-esteem denoting childrens perception on their familys thoughts. Scores2 in any of these three subscales were considered very-low [27]. Children with a total score19 or a very-low score in any domain were considered to have low self-esteem. A lie scale score was also obtained, and a score2 indicates the corresponding childs self-reported assessment is unreliable [27].

Potential behavioral problems of children and adolescents were assessed using the 4-item Rutter Behavior Questionnaire (RBQ), which has been validated in Hong Kong children [29]. It inquired about behaviors on hyperactivity, conduct, and emotional disturbances and were completed by parents. A RBQ total score19 indicated a potential behavior problem [30]. In total, 25 predictors were considered as input variables in developing multiclass prediction models.

Prediction weight status was classified as normal, obese, overweight, and underweight, based on the next measurement year of the body mass index (BMI, expressed in kg/m2) and the age- and sex-specific BMI references in the international Obesity Task Force Standards (IOTF).

Children with a lie self-esteem score2 were considered unreliable and removed. For the type of housing and parental occupation, we ordered their response categories in order of socioeconomic level by using the median monthly domestic household income for each type of housing and occupation obtained from the Hong Kong Census and Statistics Department. Sex as categorical variables was one-hot encoded. The responses of dietary and physical activity behavioral measurements were treated as ordinal variables, and other predictors were considered as continuous variables. Missing data on socioeconomic status were filled out according to the information reported in the students other assessment years. The other measurements had less than 5% missing data, which was considered inconsequential to the validity of the model development [31]. We used k nearest neighbour imputation algorithms to the training and test sets separatly to facilitate the use of ML that required complete data [32].

Categorical data were expressed as the number with a percentage for each weight status and compared using chi-square test. Numberical data were presented as the meanstandard deviation (SD).

P4 students were randomly divided into a training set and a test set at an 80:20 ratio. Multiclass prediction models were developed using the P4 training data to predict weight status in each subsequent year until S6, creating eight prediction windows. We used the same procedure to develop prediction models for the P6 training cohort, creating six prediction windows until S6. The weight status in our cohorts was imbalanced, with underweight, overweight and obese categories being underpresented. The imbalance could have led to biased model performance, where the model may have been more accurate at predicting the majority weight status while performing poorly on the minority weight status. To address this issue, we used the Synthetic Minority Oversampling Technique (SMOTE) sampling technique to the training sets [33]. SMOTE was a widely used technique that creates synthetic samples for the minority categories by generating new instances that are similar to the original underpresented categories. We attempted several ML approaches, including Decision Tree (DT), Random Forest (RF), Supportive Vector Machine (SVM), k-Nearest Neighbor (k-NN), and eXtreme Gradient Boosting (XG Boost), as well as the LG approach for comparison. The short- and long-term prediction abilities of the models were compared by calculating the correct classification rate, overall accuracy of the test set and micro-, macro-averaging area under the curve (AUC). Receiver operating characteristics (ROC) curves for each weight status on test set were also obtained. The AUC, precision, recall and F1-score were calculated to evaluate the model prediction accuracy, and assess the ability to predict an abnormal weight status. The precision and recall are conceptually equivalent to the sensitivity and positive predictive value, and the F1 score is the harmonic mean of precision and recall [34]. For predicting a specific weight status, all accuracy measures ranged from 0 to 1, with a higher value indicating a higher accuracy.

To examine the importance of each predictor at both population and individual levels, based on the best performing prediction models, we used the Shapley Additive Explanations (SHAP) to obtain their contributions for a prediction window [35]. SHAP value is assigned to each predictor and can quantify them by comparing the differences with and without that predictor. The Shapley values from all prediction windows in each cohort were used to compare the summary importance of predictors by different weight status. Furthermore, to better understand the individual-level prediction of weight status, we selected two students as examples and used SHAP waterfall plots to illustrate the importance of different predictors for each student. Figure1 shows the workflow used for this study. All prediction models were developed and compared using Python software (version 3.10) with Scikit-Learn.

Graphical illustration of the workflow used for this study

See original here:
Prediction of adolescent weight status by machine learning: a population-based study - BMC Public Health - BMC Public Health

Related Posts

Comments are closed.