In the present issue of the American Journal of Psychiatry, Smucny et al. suggest that predictive algorithms for psychosis using machine learning (ML) methods may already achieve a clinically useful level of accuracy (1). In support of this perspective, these authors report on the results of an analysis using the North American Prodrome Longitudinal Study, Phase 3 (NAPLS3) data set (2), which they accessed through the National Institute of Mental Health Data Archive (NDAR). This is a large multisite study of youth at clinical high risk for psychosis followed up on multiple occasions with clinical, cognitive, and biomarker assessments. Several ML approaches were compared with each other and with Cox (time-to-event) and logistic regression using the clinical, neurocognitive, and demographic features from the NAPLS2 individualized risk calculator (3), with salivary cortisol also tested as an add-on biomarker. When these variables were analyzed using Cox and logistic regression, the model applied to the NAPLS3 cohort attained a level of predictive accuracy comparable to that observed in the original NAPLS2 cohort (overall accuracy in the 66%68% range). However, several ML algorithms produced nominally better results, with a random forest model performing best (overall accuracy in the 90% range). Acknowledging that a predictive algorithm with 90% or higher predictive accuracy will have greater clinical utility than one with substantially lower accuracy, several issues remain to be resolved before it can be determined whether ML methods have attained this utility threshold.
First and foremost, an ML models expected real-world performance can only be ascertained when tested in an independent sample/data set that the model has never before encountered. ML methods are very adept at finding apparent structure in data that predict an outcome, but if that structure is idiosyncratic to the training data set, the model will fail to generalize to other contexts and thus not be useful, a problem known as overfitting (4). Internal cross-validation methods are not sufficient to overcome this problem, since the model sees all the training data at certain points in the process, even if some is left out on a particular iteration (5). Overfitting is indicated by a big drop-off in model accuracy moving from the original internally cross-validated training data set to an external, independent cross-validation test. Smucny et al. (1) acknowledge the need for an external replication test before the utility of the ML models they evaluated using only internal cross-validation methods can be fully appreciated.
Is there likely to be a big drop-off in accuracy of the ML models reported by Smucny et al. (1) when such an external validation test is performed? On one hand, they limited consideration to a small number of features that have previously been shown to predict psychosis in numerous independent samples (i.e., the variables in the NAPLS2 risk calculator [3]). This aspect mitigates the overfitting issue to some extent because the features used in model building are already filtered (based on prior work) to be highly likely to predict conversion to psychosis, both individually and when combined in a regression model. On the other hand, the ML models employed in the study use various approaches to find higher-order interactive and nonlinear amalgamations among this set of feature variables that maximally discriminate outcome groups. This aspect increases the risk of overfitting given that a very large number of such higher-order interactive effects are assessed in model building, with relatively few subjects available to represent each unique permutation, a problem known as the curse of dimensionality (6). Tree-based methods such as the random forest model that performed best in the NAPLS3 data set are not immune from this problem and, in fact, are particularly vulnerable to it when applied on data sets with relatively small numbers of individuals with the outcome of interest (7).
The relatively low base rate of conversion to psychosis (i.e., 10%15%), even in a sample selected to be at elevated risk as in NAPLS3, creates another problem for ML methods; namely, such models can achieve high levels of predictive accuracy in the training data set simply by guessing that each case is a nonconverter. Smucny et al. (1) attempt to overcome this issue using a synthetic approach that effectively up samples the minority class (in this case, converters to psychosis) to the point that it has 50% representation in the synthetic sample (8). Although this approach is very helpful in preventing ML models from defaulting to prediction of a majority class, its use in computing cross-validation performance metrics is likely to be highly misleading, given that real-world application of the model is not likely to occur in a context in which there is a 50:50 rate of future converters and nonconverters. Rather, the model will be applied in circumstances in which new clinical high risk (CHR) individuals likelihoods of conversion are computed, and those CHR individuals will derive from a population in which the base rate of conversion is 15%. It is now well established that the same predictive model will result in different risk distributions (and, thereby, different thresholds in model-predicted risk for making binary predictions) in samples that vary in base rates of conversion to psychosis (9). Given this, a 90% predictive accuracy of an ML algorithm in a synthetically derived sample in which the base rate of psychosis conversion is artificially created to be 50% is highly unlikely to generalize to an independent, real-world CHR sample, at least as ascertained using current approaches.
When developing the NAPLS2 risk calculator, the investigators made purposeful decisions to allow the resulting algorithm to be applied validly in scaling the risk of newly ascertained CHR individuals (3). Key among these decisions was to avoid using the NAPLS2 data set to test different possible models, which would then necessitate an external validation test. Rather, a small number of predictor variables was chosen based on their empirical associations with conversion to psychosis in prior studies, and Cox regression was employed to generate an additive multivariate model of predicted risk (i.e., no interactive or non-linear combinations of the variables were included). As a result, the ratio of converters to predictor variables was 10:1 (helping to create adequate representation of the scale values of each predictor in the minority class), and there was no need to use a synthetic sampling approach given that Cox regression is well suited for prediction of low base rate outcomes. The predictor variables chosen for inclusion are ones that are easily ascertained in standard clinical settings and have a high level of acceptability (face validity) for use in clinical decision making. It is important to note that the NAPLS2 model has been shown to replicate (in terms of area under the curve or concordance index) when applied to multiple external independent data sets (10).
Nevertheless, two issues continue to limit the utility of the NAPLS2 risk calculator. One is that it will generate differently shaped risk distributions on samples that vary in conversion risk and in distributions of the individual predictor variables, making it problematic to apply the same threshold of predicted risk for binary predictions across samples that differ in these ways (9, 11). However, it appears possible to derive comparable prediction metrics across samples with differing conversion risks when considering the relative recency of onset or worsening of attenuated positive symptoms at the baseline assessment (11). A more recent onset or worsening of attenuated positive symptoms defines a subgroup of CHR individuals with a higher average predicted risk and higher overall transition rate and in whom particular putative illness mechanisms, in this case an accelerated rate of cortical thinning (12), appear to be differentially relevant (11).
The second rate-limiting issue for the utility of the NAPLS2 risk calculator is that its performance in terms of sensitivity, specificity, and balanced accuracy, even when accounting for recency of onset of symptoms, is still in the 65%75% range. Although ML methods represent one approach that, if externally validated, could conceivably result in predictive models at the 90% or higher level of accuracy, such models would continue to have the disadvantage of being relatively opaque (black box) in terms of how the underlying predictor variables aggregate in defining risk and for that reason may not be used as readily in clinical practice. Alternatively, it may be possible to rely on more transparent analytic approaches to achieve the needed level of accuracy. It has recently been demonstrated that integrating information on short-term (baseline to 2-month follow-up) change on a single clinical variable (e.g., deterioration in odd behavior/appearance) improves the performance of the NAPLS2 risk calculator to >90% levels of sensitivity, specificity, and balanced accuracy; i.e., a range that would support its use in clinical trial design and clinical decision-making (13). Importantly, although the Cox regression model aspect of this algorithm has been externally validated, the incorporation of short-term clinical change (via mixed effects growth modeling) requires replication in an external data set.
Smucny et al. (1) are to be congratulated on a well-motivated and well-executed analysis of the NAPLS3 data set. It is heartening to see such creative uses of this unique shared resource for our field bear fruit, reinforcing the value of open science. As we move forward toward the time and place in which prediction models of psychosis and related outcomes have utility for clinical decision making, whether those models rely on machine learning methods or more traditional approaches, it will be crucial to insist on external validation of results before deciding that we are, in fact, there.
Clark L. Hull Professor of Psychology and Professor of Psychiatry, Yale University, New Haven, Conn.
Dr. Cannon reports no financial relationships with commercial interests.
1. Smucny J, Davidson I, Carter CS: Are we there yet? Predicting conversion to psychosis using machine learning. Am J Psychiatry 2023; 180:836840 Abstract,Google Scholar
2. Addington J, Liu L, Brummitt K, et al.: North American Prodrome Longitudinal Study (NAPLS 3): methods and baseline description. Schizophr Res 2022; 243:262267Crossref, Medline,Google Scholar
3. Cannon TD, Yu C, Addington J, et al.: An individualized risk calculator for research in prodromal psychosis. Am J Psychiatry 2016; 173:980988Link,Google Scholar
4. Cawley GC, Talbot NLC: On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 2010; 11:20792107 Google Scholar
5. Arlot S, Celisse A: A survey of cross-validation procedures for model selection. Statist Surv 2010; 4:4079 Crossref,Google Scholar
6. Hughes G: On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theor 1968; 14:5563 Crossref,Google Scholar
7. Peng Y, Nagata MH: An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data. Chaos Solitons Fractals 2020; 139:110055Crossref, Medline,Google Scholar
8. Chawla NV, Bowyer KW, Hall LO, et al.: SMOTE: Synthetic Minority Over-Sampling Technique. J Artif Intell Res 2002; 16:321357 Crossref,Google Scholar
9. Koutsouleris N, Worthington M, Dwyer DB, et al.: Toward generalizable and transdiagnostic tools for psychosis prediction: an independent validation and improvement of the NAPLS-2 risk calculator in the multisite PRONIA cohort. Biol Psychiatry 2021; 90:632642Crossref, Medline,Google Scholar
10. Worthington MA, Cannon TD: Prediction and prevention in the clinical high-risk for psychosis paradigm: a review of the current status and recommendations for future directions of inquiry. Front Psychiatry 2021; 12:770774Crossref, Medline,Google Scholar
11. Worthington MA, Collins MA, Addington J, et al.: Improving prediction of psychosis in youth at clinical high-risk: pre-baseline symptom duration and cortical thinning as moderators of the NAPLS2 risk calculator. Psychol Med 2023:19Crossref, Medline,Google Scholar
12. Collins MA, Ji JL, Chung Y, et al.: Accelerated cortical thinning precedes and predicts conversion to psychosis: the NAPLS3 longitudinal study of youth at clinical high-risk. Mol Psychiatry 2023; 28:11821189Crossref, Medline,Google Scholar
13. Worthington MA, Addington J, Bearden CE, et al.: Dynamic prediction of outcomes for youth at clinical high risk for psychosis: a joint modeling approach. JAMA Psychiatry 2023:e232378Google Scholar
Read the original here:
Predicting Conversion to Psychosis Using Machine Learning: Are ... - Am J Psychiatry
- What Is Machine Learning? | How It Works, Techniques ... [Last Updated On: September 5th, 2019] [Originally Added On: September 5th, 2019]
- Start Here with Machine Learning [Last Updated On: September 22nd, 2019] [Originally Added On: September 22nd, 2019]
- What is Machine Learning? | Emerj [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Microsoft Azure Machine Learning Studio [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Machine Learning Basics | What Is Machine Learning? | Introduction To Machine Learning | Simplilearn [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- What is Machine Learning? A definition - Expert System [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- Machine Learning | Stanford Online [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- How to Learn Machine Learning, The Self-Starter Way [Last Updated On: October 17th, 2019] [Originally Added On: October 17th, 2019]
- definition - What is machine learning? - Stack Overflow [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Artificial Intelligence vs. Machine Learning vs. Deep ... [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning in R for beginners (article) - DataCamp [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning | Udacity [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning Artificial Intelligence | McAfee [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- AI-based ML algorithms could increase detection of undiagnosed AF - Cardiac Rhythm News [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- The Cerebras CS-1 computes deep learning AI problems by being bigger, bigger, and bigger than any other chip - TechCrunch [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Can the planet really afford the exorbitant power demands of machine learning? - The Guardian [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- New InfiniteIO Platform Reduces Latency and Accelerates Performance for Machine Learning, AI and Analytics - Business Wire [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- How to Use Machine Learning to Drive Real Value - eWeek [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Machine Learning As A Service Market to Soar from End-use Industries and Push Revenues in the 2025 - Downey Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Rad AI Raises $4M to Automate Repetitive Tasks for Radiologists Through Machine Learning - - HIT Consultant [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning Improves Performance of the Advanced Light Source - Machine Design [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Synthetic Data: The Diamonds of Machine Learning - TDWI [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The transformation of healthcare with AI and machine learning - ITProPortal [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Workday talks machine learning and the future of human capital management - ZDNet [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning with R, Third Edition - Free Sample Chapters - Neowin [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Verification In The Era Of Autonomous Driving, Artificial Intelligence And Machine Learning - SemiEngineering [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Podcast: How artificial intelligence, machine learning can help us realize the value of all that genetic data we're collecting - Genetic Literacy... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The Real Reason Your School Avoids Machine Learning - The Tech Edvocate [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Siri, Tell Fido To Stop Barking: What's Machine Learning, And What's The Future Of It? - 90.5 WESA [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Microsoft reveals how it caught mutating Monero mining malware with machine learning - The Next Web [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The role of machine learning in IT service management - ITProPortal [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Global Director of Tech Exploration Discusses Artificial Intelligence and Machine Learning at Anheuser-Busch InBev - Seton Hall University News &... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The 10 Hottest AI And Machine Learning Startups Of 2019 - CRN: The Biggest Tech News For Partners And The IT Channel [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Startup jobs of the week: Marketing Communications Specialist, Oracle Architect, Machine Learning Scientist - BetaKit [Last Updated On: November 30th, 2019] [Originally Added On: November 30th, 2019]
- Here's why machine learning is critical to success for banks of the future - Tech Wire Asia [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- 3 questions to ask before investing in machine learning for pop health - Healthcare IT News [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Caterpillar Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Measuring Employee Engagement with A.I. and Machine Learning - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Amazon Wants to Teach You Machine Learning Through Music? - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Nvidia Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- AI and machine learning platforms will start to challenge conventional thinking - CRN.in [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Twitter Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Seagate Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If BlackBerry Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Amazon Releases A New Tool To Improve Machine Learning Processes - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Another free web course to gain machine-learning skills (thanks, Finland), NIST probes 'racist' face-recog and more - The Register [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Kubernetes and containers are the perfect fit for machine learning - JAXenter [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- TinyML as a Service and machine learning at the edge - Ericsson [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- AI and machine learning products - Cloud AI | Google Cloud [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning | Blog | Microsoft Azure [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning in 2019 Was About Balancing Privacy and Progress - ITPro Today [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- CMSWire's Top 10 AI and Machine Learning Articles of 2019 - CMSWire [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- Here's why digital marketing is as lucrative a career as data science and machine learning - Business Insider India [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Dell's Latitude 9510 shakes up corporate laptops with 5G, machine learning, and thin bezels - PCWorld [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Finally, a good use for AI: Machine-learning tool guesstimates how well your code will run on a CPU core - The Register [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Cloud as the enabler of AI's competitive advantage - Finextra [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Forget Machine Learning, Constraint Solvers are What the Enterprise Needs - - RTInsights [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Informed decisions through machine learning will keep it afloat & going - Sea News [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- The Problem with Hiring Algorithms - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- New Program Supports Machine Learning in the Chemical Sciences and Engineering - Newswise [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- AI-System Flags the Under-Vaccinated in Israel - PrecisionVaccinations [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- New Contest: Train All The Things - Hackaday [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- AFTAs 2019: Best New Technology Introduced Over the Last 12 MonthsAI, Machine Learning and AnalyticsActiveViam - www.waterstechnology.com [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Educate Yourself on Machine Learning at this Las Vegas Event - Small Business Trends [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Seton Hall Announces New Courses in Text Mining and Machine Learning - Seton Hall University News & Events [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Looking at the most significant benefits of machine learning for software testing - The Burn-In [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Leveraging AI and Machine Learning to Advance Interoperability in Healthcare - - HIT Consultant [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Adventures With Artificial Intelligence and Machine Learning - Toolbox [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Five Reasons to Go to Machine Learning Week 2020 - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Uncover the Possibilities of AI and Machine Learning With This Bundle - Interesting Engineering [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Learning that Targets Millennial and Generation Z - HR Exchange Network [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Red Hat Survey Shows Hybrid Cloud, AI and Machine Learning are the Focus of Enterprises - Computer Business Review [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Vectorspace AI Datasets are Now Available to Power Machine Learning (ML) and Artificial Intelligence (AI) Systems in Collaboration with Elastic -... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- What is Machine Learning? | Types of Machine Learning ... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- How Machine Learning Will Lead to Better Maps - Popular Mechanics [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Jenkins Creator Launches Startup To Speed Software Testing with Machine Learning -- ADTmag - ADT Magazine [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- An Open Source Alternative to AWS SageMaker - Datanami [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Machine Learning Could Aid Diagnosis of Barrett's Esophagus, Avoid Invasive Testing - Medical Bag [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- OReilly and Formulatedby Unveil the Smart Cities & Mobility Ecosystems Conference - Yahoo Finance [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]