Category Archives: Machine Learning
First-of-its-kind test uses machine learning to predict dementia up to 9 years in advance – PsyPost
In a groundbreaking study published in Nature Mental Health, researchers from Queen Mary University of London have developed a new method for predicting dementia with over 80% accuracy up to nine years before a clinical diagnosis. This method, which outperforms traditional memory tests and measurements of brain shrinkage, relies on detecting changes in the brains default mode network (DMN) using functional magnetic resonance imaging (fMRI).
Dementia is a collective term used to describe a variety of conditions characterized by the gradual decline in cognitive function severe enough to interfere with daily life and independent functioning. It affects memory, thinking, orientation, comprehension, calculation, learning capacity, language, and judgment.
Alzheimers disease is the most common cause of dementia, accounting for 60-70% of cases. Other types include vascular dementia, dementia with Lewy bodies, and frontotemporal dementia.
Dementia is a progressive condition, meaning symptoms worsen over time, often leading to significant impairments in daily activities and quality of life. Currently, there is no cure for dementia, and treatments primarily focus on managing symptoms and supporting patients and their caregivers.
Early diagnosis is important because it opens the door to interventions that might slow the progression of the disease, improve quality of life, and provide individuals and their families more time to plan for the future. Traditional diagnostic methods, such as memory tests and brain scans to detect atrophy, often catch the disease only after significant neural damage has occurred. These methods are not sensitive enough to detect the very early changes in brain function that precede clinical symptoms.
Predicting who is going to get dementia in the future will be vital for developing treatments that can prevent the irreversible loss of brain cells that causes the symptoms of dementia, said Charles Marshall, who led the research team within the Centre for Preventive Neurology at Queen Marys Wolfson Institute of Population Health. Although we are getting better at detecting the proteins in the brain that can cause Alzheimers disease, many people live for decades with these proteins in their brain without developing symptoms of dementia.
We hope that the measure of brain function that we have developed will allow us to be much more precise about whether someone is actually going to develop dementia, and how soon, so that we can identify whether they might benefit from future treatments.
The study involved a nested case-control design using data from the UK Biobank, a large-scale biomedical database. The researchers focused on a subset of participants who had undergone functional magnetic resonance imaging (fMRI) scans and either had a diagnosis of dementia or developed it later. The sample consisted of 148 dementia cases and 1,030 matched controls, ensuring a robust comparison group by matching on age, sex, ethnicity, handedness, and the geographical location of the MRI scanning center.
Participants underwent resting-state fMRI (rs-fMRI) scans, which measure brain activity by detecting changes in blood flow. The researchers specifically targeted the default mode network (DMN), a network of brain regions active during rest and involved in high-level cognitive functions such as social cognition and self-referential thought.
Using a technique called dynamic causal modeling (DCM), they analyzed the rs-fMRI data to estimate the effective connectivity between different regions within the DMN. This method goes beyond simple correlations to model the causal influence one brain region has over another, providing a detailed picture of neural connectivity.
The researchers then used these connectivity estimates to train a machine learning model. This model aimed to distinguish between individuals who would go on to develop dementia and those who would not. The training process involved a rigorous cross-validation technique to ensure the models reliability and to prevent overfitting. Additionally, a prognostic model was developed to predict the time until dementia diagnosis, using similar data and validation techniques.
The predictive model achieved an area under the curve (AUC) of 0.824, indicating excellent performance in distinguishing between future dementia cases and controls. This level of accuracy is significantly higher than traditional diagnostic methods, which often struggle to detect early-stage dementia.
The model identified 15 key connectivity parameters within the DMN that differed significantly between future dementia cases and controls. Among these, the most notable changes included increased inhibition from the ventromedial prefrontal cortex (vmPFC) to the left parahippocampal formation (lPHF) and from the left intraparietal cortex (lIPC) to the lPHF, as well as attenuated inhibition from the right parahippocampal formation (rPHF) to the dorsomedial prefrontal cortex (dmPFC).
In addition to its diagnostic capabilities, the study also developed a prognostic model to predict the time until dementia diagnosis. This model showed a strong correlation (Spearmans = 0.53) between predicted and actual times until diagnosis, indicating its potential to provide valuable timelines for disease progression. The predictive power of these connectivity patterns suggests that changes in the DMN can serve as early biomarkers for dementia, offering a window into the disease process years before clinical symptoms appear.
Furthermore, the study explored the relationship between DMN connectivity changes and various risk factors for dementia. They found a significant association between social isolation and DMN dysconnectivity, suggesting that social isolation might exacerbate the neural changes associated with dementia. This finding highlights the importance of considering environmental and lifestyle factors in dementia risk and opens up potential avenues for intervention.
Using these analysis techniques with large datasets we can identify those at high dementia risk, and also learn which environmental risk factors pushed these people into a high-risk zone, said co-author Samuel Ereira. Enormous potential exists to apply these methods to different brain networks and populations, to help us better understand the interplays between environment, neurobiology and illness, both in dementia and possibly other neurodegenerative diseases. fMRI is a non-invasive medical imaging tool, and it takes about 6 minutes to collect the necessary data on an MRI scanner, so it could be integrated into existing diagnostic pathways, particularly where MRI is already used.
Despite the promising results, there are some caveats to consider. One limitation of the study is the use of data from the UK Biobank, which may not be fully representative of the general population. Participants in this cohort tend to be healthier and less socio-economically deprived. Future research should validate these findings in more diverse and representative samples.
One in three people with dementia never receive a formal diagnosis, so theres an urgent need to improve the way people with the condition are diagnosed. This will be even more important as dementia becomes a treatable condition, Julia Dudley, the head of Strategic Research Programmes at Alzheimers Research UK, told the Science Media Centre.
This study provides intriguing insights into early signs that someone might be at greater risk of developing dementia. While this technique will need to be validated in further studies, if it is, it could be a promising addition to the toolkit of methods to detect the diseases that cause dementia as early as possible. An earlier and accurate diagnosis is key to unlocking personalised care and support, and, soon, to accessing first-of-a-kind treatments that are on the horizon.
Eugene Duff, an advanced research fellow at the UK Dementia Research Institute at Imperial College London, added: This work shows how advanced analysis of brain activity measured using MRI can predict future dementia diagnosis. Early diagnosis of dementia is valuable for many reasons, particularly as improved pharmaceutical treatments become available.
Brain activity measures may be complementary to cognitive, blood and other markers for identifying those at risk for dementia. The brain modelling approach they use has the benefit of potentially clarifying the brain processes affected in the early stages of disease. However, the study cohort of diagnosed patients was relatively small (103 cases). Further validation and head-to-head comparisons of predictive markers is needed.
The study, Early detection of dementia with default-mode network effective connectivity, was authored by Sam Ereira, Sheena Waters, Adeel Razi, and Charles R. Marshall.
The rest is here:
First-of-its-kind test uses machine learning to predict dementia up to 9 years in advance - PsyPost
Implementing Neural Networks in TensorFlow (and PyTorch) | by Shreya Rao | Jul, 2024 – Towards Data Science
Step-by-step code guide on building a Neural Network 6 min read
Welcome to the practical implementation guide of our Deep Learning Illustrated series. In this series, well bridge the gap between theory and application, bringing to life the neural network concepts explored in previous articles.
Remember the simple neural network we discussed for predicting ice cream revenue? We will build that using TensorFlow, a powerful tool for creating neural networks.
And the kicker: well do it in less than 5 minutes with just 27 lines of code!
Lets first start with: what is TensorFlow?
TensorFlow is a comprehensive ecosystem of tools, libraries, and community resources for building and deploying machine learning applications. Developed by Google, its designed to be flexible and efficient, capable of running on various platforms from CPUs to GPUs and even specialized hardware
See the original post here:
Implementing Neural Networks in TensorFlow (and PyTorch) | by Shreya Rao | Jul, 2024 - Towards Data Science
The Weather Company enhances MLOps with Amazon SageMaker – AWS Blog
This blog post is co-written with Qaish Kanchwala from The Weather Company.
As industries begin adopting processes dependent on machine learning (ML) technologies, it is critical to establish machine learning operations (MLOps) that scale to support growth and utilization of this technology. MLOps practitioners have many options to establish an MLOps platform; one among them is cloud-based integrated platforms that scale with data science teams. AWS provides a full-stack of services to establish an MLOps platform in the cloud that is customizable to your needs while reaping all the benefits of doing ML in the cloud.
In this post, we share the story of how The Weather Company (TWCo) enhanced its MLOps platform using services such as Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch. TWCo data scientists and ML engineers took advantage of automation, detailed experiment tracking, integrated training, and deployment pipelines to help scale MLOps effectively. TWCo reduced infrastructure management time by 90% while also reducing model deployment time by 20%.
TWCo strives to help consumers and businesses make informed, more confident decisions based on weather. Although the organization has used ML in its weather forecasting process for decades to help translate billions of weather data points into actionable forecasts and insights, it continuously strives to innovate and incorporate leading-edge technology in other ways as well. TWCos data science team was looking to create predictive, privacy-friendly ML models that show how weather conditions affect certain health symptoms and create user segments for improved user experience.
TWCo was looking to scale its ML operations with more transparency and less complexity to allow for more manageable ML workflows as their data science team grew. There were noticeable challenges when running ML workflows in the cloud. TWCos existing Cloud environment lacked transparency for ML jobs, monitoring, and a feature store, which made it hard for users to collaborate. Managers lacked the visibility needed for ongoing monitoring of ML workflows. To address these pain points, TWCo worked with the AWS Machine Learning Solutions Lab (MLSL) to migrate these ML workflows to Amazon SageMaker and the AWS Cloud. The MLSL team collaborated with TWCo to design an MLOps platform to meet the needs of its data science team, factoring present and future growth.
Examples of business objectives set by TWCo for this collaboration are:
Functional objectives were set to measure the impact of MLOps platform users, including:
The solution uses the following AWS services:
The following diagram illustrates the solution architecture.
This architecture consists of two primary pipelines:
The proposed MLOps architecture includes flexibility to support different use cases, as well as collaboration between various team personas like data scientists and ML engineers. The architecture reduces the friction between cross-functional teams moving models to production.
ML model experimentation is one of the sub-components of the MLOps architecture. It improves data scientists productivity and model development processes. Examples of model experimentation on MLOps-related SageMaker services require features like Amazon SageMaker Pipelines, Amazon SageMaker Feature Store, and SageMaker Model Registry using the SageMaker SDK and AWS Boto3 libraries.
When setting up pipelines, resources are created that are required throughout the lifecycle of the pipeline. Additionally, each pipeline may generate its own resources.
The pipeline setup resources are:
The pipeline run resources are:
You should delete these resources when the pipelines expire or are no longer needed.
In this section, we discuss the manual provisioning of pipelines through an example notebook and automatic provisioning of SageMaker pipelines through the use of a Service Catalog product and SageMaker project.
By using Amazon SageMaker Projects and its powerful template-based approach, organizations establish a standardized and scalable infrastructure for ML development, allowing teams to focus on building and iterating ML models, reducing time wasted on complex setup and management.
The following diagram shows the required components of a SageMaker project template. Use Service Catalog to register a SageMaker project CloudFormation template in your organizations Service Catalog portfolio.
To start the ML workflow, the project template serves as the foundation by defining a continuous integration and delivery (CI/CD) pipeline. It begins by retrieving the ML seed code from a CodeCommit repository. Then the BuildProject component takes over and orchestrates the provisioning of SageMaker training and inference pipelines. This automation delivers a seamless and efficient run of the ML pipeline, reducing manual intervention and speeding up the deployment process.
The solution has the following dependencies:
In this post, we showed how TWCo uses SageMaker, CloudWatch, CodePipeline, and CodeBuild for their MLOps platform. With these services, TWCo extended the capabilities of its data science team while also improving how data scientists manage ML workflows. These ML models ultimately helped TWCo create predictive, privacy-friendly experiences that improved user experience and explains how weather conditions impact consumers daily planning or business operations. We also reviewed the architecture design that helps maintain responsibilities between different users modularized. Typically data scientists are only concerned with the science aspect of ML workflows, whereas DevOps and ML engineers focus on the production environments. TWCo reduced infrastructure management time by 90% while also reducing model deployment time by 20%.
This is just one of many ways AWS enables builders to deliver great solutions. We encourage to you to get started with Amazon SageMaker today.
Qaish Kanchwala is a ML Engineering Manager and ML Architect at The Weather Company. He has worked on every step of the machine learning lifecycle and designs systems to enable AI use cases. In his spare time, Qaish likes to cook new food and watch movies.
Chezsal Kamaray is a Senior Solutions Architect within the High-Tech Vertical at Amazon Web Services. She works with enterprise customers, helping to accelerate and optimize their workload migration to the AWS Cloud. She is passionate about management and governance in the cloud and helping customers set up a landing zone that is aimed at long-term success. In her spare time, she does woodworking and tries out new recipes while listening to music.
Anila Joshi has more than a decade of experience building AI solutions. As an Applied Science Manager at the AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and guides customers to strategically chart a course into the future of AI.
Kamran Razi is a Machine Learning Engineer at the Amazon Generative AI Innovation Center. With a passion for creating use case-driven solutions, Kamran helps customers harness the full potential of AWS AI/ML services to address real-world business challenges. With a decade of experience as a software developer, he has honed his expertise in diverse areas like embedded systems, cybersecurity solutions, and industrial control systems. Kamran holds a PhD in Electrical Engineering from Queens University.
Shuja Sohrawardy is a Senior Manager at AWSs Generative AI Innovation Center. For over 20 years, Shuja has utilized his technology and financial services acumen to transform financial services enterprises to meet the challenges of a highly competitive and regulated industry. Over the past 4 years at AWS, Shuja has used his deep knowledge in machine learning, resiliency, and cloud adoption strategies, which has resulted in numerous customer success journeys. Shuja holds a BS in Computer Science and Economics from New York University and an MS in Executive Technology Management from Columbia University.
Francisco Calderon is a Data Scientist at the Generative AI Innovation Center (GAIIC). As a member of the GAIIC, he helps discover the art of the possible with AWS customers using generative AI technologies. In his spare time, Francisco likes playing music and guitar, playing soccer with his daughters, and enjoying time with his family.
Continue reading here:
The Weather Company enhances MLOps with Amazon SageMaker - AWS Blog
Factors of acute respiratory infection among under-five children across sub-Saharan African countries using machine … – Nature.com
World Health Organization. Children: Reducing Mortality (World Health Organization, 2019).
Google Scholar
Rudan, I. et al. Global estimate of the incidence of clinical pneumonia among children under five years of age. Bull. World Health Organ. 82(12), 895903 (2004).
PubMed Google Scholar
Goodarzi, E. et al. Epidemiology of mortality induced by acute respiratory infections in infants and children under the age of 5 years and its relationship with the Human Development Index in Asia: An updated ecological study. J. Public Health 29(5), 10471054 (2021).
Article Google Scholar
Organization, W. H. World Report on Ageing and Health (World Health Organization, 2015).
Google Scholar
Anjum, M. U., Riaz, H. & Tayyab, H. M. Acute respiratory tract infections (Aris);: Clinico-epidemiolocal profile in children of less than five years of age. Prof. Med. J. 24(02), 322325 (2017).
Google Scholar
Ujunwa, F. & Ezeonu, C. Risk factors for acute respiratory tract infections in under-five children in enugu Southeast Nigeria. Ann. Med. Health Sci. Res. 4(1), 9599 (2014).
Article PubMed PubMed Central Google Scholar
Sultana, M. et al. Prevalence, determinants and health care-seeking behavior of childhood acute respiratory tract infections in Bangladesh. PloS one 14(1), e0210433 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kjrgaard, J. et al. Diagnosis and treatment of acute respiratory illness in children under five in primary care in low-, middle-, and high-income countries: A descriptive FRESH AIR study. PLoS One 14(11), e0221389 (2019).
Article PubMed PubMed Central Google Scholar
Banda, B. et al. Risk factors associated with acute respiratory infections among under-five children admitted to Arthurs Children Hospital, Ndola, Zambia. Asian Pac. J. Health Sci. 3(3), 153159 (2016).
Article Google Scholar
Harerimana, J.-M. et al. Social, economic and environmental risk factors for acute lower respiratory infections among children under five years of age in Rwanda. Arch. Public Health 74(1), 17 (2016).
Article Google Scholar
Landrigan, P. J. et al. The Lancet Commission on pollution and health. Lancet 391(10119), 462512 (2018).
Article PubMed Google Scholar
Lelieveld, J. et al. Loss of life expectancy from air pollution compared to other risk factors: A worldwide perspective. Cardiovasc. Res. 116(11), 19101917 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mirabelli, M. C., Ebelt, S. & Damon, S. A. Air quality index and air quality awareness among adults in the United States. Environ. Res. 183, 109185 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fleming, S. et al. Normal ranges of heart rate and respiratory rate in children from birth to 18 years of age: A systematic review of observational studies. Lancet 377(9770), 10111018 (2011).
Article PubMed PubMed Central Google Scholar
Gasana, J. et al. Motor vehicle air pollution and asthma in children: A meta-analysis. Environ. Res. 117, 3645 (2012).
Article CAS PubMed Google Scholar
Osborne, S. et al. Air quality around schools: Part II-mapping PM2.5 concentrations and inequality analysis. Environ. Res. 197, 111038 (2021).
Article CAS PubMed Google Scholar
Vong, C.-M. et al. Imbalanced learning for air pollution by meta-cognitive online sequential extreme learning machine. Cognit. Comput. 7, 381391 (2015).
Article Google Scholar
Ginantra, N., Indradewi, I. & Hartono E. Machine learning approach for acute respiratory infections (ISPA) prediction: Case study indonesia. in Journal of Physics: Conference series. (IOP Publishing, 2020).
Ku, Y. et al. Machine learning models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. Clin. Exp. Otorhinolaryngol. 15(2), 168 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ravindra, K. et al. Application of machine learning approaches to predict the impact of ambient air pollution on outpatient visits for acute respiratory infections. Sci. Total Environ. 858, 159509 (2023).
Article CAS PubMed Google Scholar
Aliaga, A. & Ren, R. The Optimal Sample Sizes for Two-Stage Cluster Sampling in Demographic and Health Surveys (ORC Macro, 2006).
Google Scholar
Hammer, M. S. et al. Global estimates and long-term trends of fine particulate matter concentrations (19982018). Environ. Sci. Technol. 54(13), 78797890 (2020).
Article ADS CAS PubMed Google Scholar
Croft, T. N. et al. Guide to DHS Statistics Vol. 645 (Rockville, ICF, 2018).
Google Scholar
Organization, W.H., Global influenza strategy 20192030. (2019).
Kjrgaard, J. et al. Correction: Diagnosis and treatment of acute respiratory illness in children under five in primary care in low-, middle-, and high-income countries: A descriptive FRESH AIR study. Plos one 15(2), e0229680 (2020).
Article PubMed PubMed Central Google Scholar
Fetene, M. T., Fenta, H. M. & Tesfaw, L. M. Spatial heterogeneities in acute lower respiratory infections prevalence and determinants across Ethiopian administrative zones. J. Big Data 9(1), 116 (2022).
Article Google Scholar
Yu, H.-F., Huang, F.-L. & Lin, C.-J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(12), 4175 (2011).
Article MathSciNet Google Scholar
Arthur, E. H. & Robert, W. K. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 5567 (1970).
Article Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267288 (1996).
Article MathSciNet Google Scholar
Zou, H. & Hastie, T. Addendum: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(5), 768768 (2005).
Article MathSciNet Google Scholar
Gron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (OReilly Media, 2019).
Google Scholar
James, G. et al. An Introduction to Statistical Learning Vol. 112 (Springer, 2013).
Book Google Scholar
Patrick, E. A. & Fischer, F. P. III. A generalized k-nearest neighbor rule. Inform. Control 16(2), 128152 (1970).
Article MathSciNet Google Scholar
McCallum, A. & Nigam K. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization. (Madison, 1998).
Zhang, D. Bayesian classification. In Fundamentals of Image Data Mining 161178 (Springer, 2019).
Chapter Google Scholar
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2016), KDD 16, ACM. (2016).
Chen, T. & Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016).
Hecht-Nielsen, R. Theory of the backpropagation neural network. In Neural networks for perception 6593 (Elsevier, 1992).
Chapter Google Scholar
Abdelhafiz, D. et al. Deep convolutional neural networks for mammography: Advances, challenges and applications. BMC Bioinform. 20(11), 120 (2019).
Google Scholar
Hoerl, A. E. & Kennard, R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 5567 (1970).
Article Google Scholar
Molina, M. & Garip, F. Machine learning for sociology. Ann. Rev. Sociol. 45, 2745 (2019).
Article Google Scholar
Marsland, S. Machine Learning: An Algorithmic Perspective (CRC Press, 2015).
Google Scholar
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301320 (2005).
Article MathSciNet Google Scholar
Yuan, G.-X., Ho, C.-H. & Lin, C.-J. An improved glmnet for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 19992030 (2012).
MathSciNet Google Scholar
Breiman, L. Random forests. Mach. Learn. 45(1), 532 (2001).
Article Google Scholar
Genuer, R., Poggi, J.-M. & Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 31(14), 22252236 (2010).
Article ADS Google Scholar
Janitza, S., Tutz, G. & Boulesteix, A.-L. Random forest for ordinal responses: Prediction and variable selection. Comput. Stat. Data Anal. 96, 5773 (2016).
Article MathSciNet Google Scholar
Genuer, R., Poggi, J.-M. & Tuleau-Malot, C. VSURF: An R package for variable selection using random forests. R J. 7(2), 1933 (2015).
Article Google Scholar
Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217222 (2005).
Article Google Scholar
Rodriguez-Galiano, V. F. et al. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 67, 93104 (2012).
Article ADS Google Scholar
Liaw, A. & Wiener, M. Classification and regression by randomForest. R news 2(3), 1822 (2002).
Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 532 (2001).
Article Google Scholar
Quinlau, R. Induction of decision trees. Mach. Learn. 1(1), S1S106 (1986).
See the original post here:
Factors of acute respiratory infection among under-five children across sub-Saharan African countries using machine ... - Nature.com
A comprehensive investigation of morphological features responsible for cerebral aneurysm rupture using machine … – Nature.com
In this section, we discuss the outcomes produced by different machine learning models, aiming to compare and determine the most effective model for predicting cerebral aneurysm rupture based on 35 morphological and 3 clinical inputs. The evaluation criteria include accuracy for the train and test datasets, recall, precision, and accuracy for the test dataset and the receiver characteristic operation (ROC) curve. Following these evaluations for each model, we discuss the most significant features identified by the models. We aim to shed light on the correlation between each parameter and the rupture status of cerebral aneurysms. This analysis provides a comprehensive understanding of the influential factors contributing to the accurate prediction of aneurysm rupture.
The main metric for evaluating model performance and enabling comparisons between different models is accuracy, which is measured on both the train and test datasets. Accuracy is defined as the ratio of correctly predicted cases to all predicted cases. It is important to note that while high accuracy is desirable, achieving 100% accuracy is not optimal, as it may indicate overfitting and a lack of generalization to unseen data. Ideally, the train and test datasets should have similar accuracy, with a recommended maximum difference of 10%. In Fig.5, we present the accuracy results for all models. It is evident that all models can achieve an accuracy exceeding 0.70. XGB demonstrates the highest accuracy at 0.91, while KNN exhibits the lowest accuracy at 0.74. Assessing the generalizability of the models to new data, both MLP and SVM demonstrated superior performance, achieving an accuracy of 0.82 for the test dataset. This indicates that MLP and SVM outperform the other models in terms of predictive accuracy for unseen data.
Accuracy of train and test datasets.
In addition to accuracy, we included precision and recall as important metrics to comprehensively evaluate model performance. We made this decision due to the sensitivity of the medical data under consideration, emphasizing the importance of timely disease recognition. In simple terms, recall measures the models ability to correctly identify the presence of a disease. Recall is defined as the ratio of true positive predictions to the total number of actual positive cases. Similarly, precision reflects the models ability to accurately predict positive occurrences. Precision is defined as the ratio of true positive predictions to the total number of predicted positive cases.
In the medical context, recall holds particular significance, but accuracy and precision should not be overlooked, as they collectively contribute to overall model efficacy. Figure6 presents the evaluation of all three metrics (accuracy, precision, and recall) for the test dataset, with a specific focus on the ruptured class, representing the occurrence scenario in our study. SVM and MLP are the top-performing models once again. The results show that SVM and MLP have high recall rates of 0.92 and 0.90, respectively, in predicting the occurrence of cerebral aneurysm rupture. SVM also has an accuracy and precision of 0.82, whereas MLP has a precision of 0.83 and an accuracy of 0.82. In contrast, RF performed relatively poorly in all three criteria. However, it is noteworthy that even for RF, all performance metrics for the test dataset exceeded 0.75, indicating a high level of predictive capability.
Accuracy, precision, and recall for the test dataset.
Another metric used for evaluation is the ROC curve, which illustrates the true positive rate versus the false positive rate. Linear behavior, where the true and false positive rates are equal, represents a random classifier. As the model improves, the curve shifts toward the upper-left point. An ideal model would have a true positive rate of 1 and a false positive rate of 0. The area under the curve (AUC) is a representative measure of the models performance, with an AUC of 0.5 indicating a random classifier and an AUC of 1 indicating an ideal classifier. Figure7 presents the behavior of the ROC curve for each model, along with the corresponding AUC. Based on these criteria, SVM and MLP are the top-performing models engaged in close competition. Their ROC curves exhibit a favorable trajectory, and their AUC values affirm their strong performance. Conversely, RF demonstrates a comparatively poorer performance than the other models. In summary, all models demonstrate highly acceptable performance and scores. Optimizing these models to improve their reliability and effectiveness in predicting cerebral aneurysm rupture represents a valuable endeavor.
Receiver operating characteristic (ROC) curve for all models.
Given that each machine learning model employs a unique set of algorithms and mathematical relations, a difference in the weight assigned to each parameter for the final classification decision is expected. Figure8 displays the weights of each parameter for the two top-performing models in this study. The SVM model identifies the first five dominant features as EI (Ellipticity Index), SR (Size Ratio), I (Irregularity), UI (Undulation Index), and IR (Ideal Roundness), a new parameter introduced in this study. The MLP model, on the other hand, prioritizes EI, I, Location, NA (Neck Area), and IR, with IR once again demonstrating a significant impact.
Dominant Features for the two top-performing models.
Other novel parameters introduced in this study include NC, IS, ON, IRR, COD, ISR, and IOR, which occupy positions 6, 9, 13, 19, 27, 30, and 36, respectively, for SVM. For the MLP model, the order of these new parameters is IR (5), NC (7), ON (18), IS (24), ISR (27), COD (32), IOR (34), and IRR (38). Notably, some parameters for the MLP model exhibit negative values, indicating an inverse effect on the models prediction and an inverse correlation with the output. It is important to acknowledge that this pattern may vary depending on the architecture used for the MLP model.
One potential question that may arise in this study is whether bifurcation aneurysms are more prone to rupture than lateral aneurysms, based on physicians experience. However, our study does not show a significant contribution from this factor. This discrepancy does not imply that the bifurcation and lateral status are insignificant. Instead, it highlights that when other features are considered alongside this parameter, there is a stronger correlation among other parameters than with this specific one. Essentially, by expanding our input variables and making decisions based on more comprehensive information, we uncover the significance of parameters that may not have been previously considered. Thanks to modern machine learning models, it is now possible to compare several parameters simultaneously and discern the contribution of each in relation to others. This approach allows for more reliable decision-making by considering a broader set of factors and better understanding the complex interplay of variables that contribute to the prediction of cerebral aneurysm rupture.
We now undertake a brief comparison between prior research and the current study, focusing specifically on the testing datasets used across all studies. To facilitate this analysis, we refer to which presents the outcomes of six comparable studies alongside those of our own investigation. As previously indicated, we endeavored to incorporate a comprehensive array of morphological parameters to ensure the robustness of our findings.
As the scope of parameters considered expands, shifts in the relative importance assigned to each parameter are anticipated. Furthermore, increasing the size of the dataset can enhance the reliability of the results. Among the parameters of significance, the size ratio emerges as a recurrent focal point, underscoring its inherent importance in assessing the risk of rupture. Once more, we underscore the significance of the recall score, given the sensitivity inherent in medical data. It is noteworthy that our study achieves an outstanding recall score, a metric that is unfortunately absent from prior studies, thus limiting direct comparison.
Table 3, which presents the outcomes of six comparable studies alongside those of our own investigation. As previously indicated, we endeavored to incorporate a comprehensive array of morphological parameters to ensure the robustness of our findings.
As the scope of parameters considered expands, shifts in the relative importance assigned to each parameter are anticipated. Furthermore, increasing the size of the dataset can enhance the reliability of the results. Among the parameters of significance, the size ratio emerges as a recurrent focal point, underscoring its inherent importance in assessing the risk of rupture. Once more, we underscore the significance of the recall score, given the sensitivity inherent in medical data. It is noteworthy that our study achieves an outstanding recall score, a metric that is unfortunately absent from prior studies, thus limiting direct comparison.
AI tools to be developed for DARPA by Lockheed Martin – Military Embedded Systems
News
July 09, 2024
Technology Editor
Military Embedded Systems
LITTLETON, Colorado. Lockheed Martin will develop artificial intelligence (AI) tools for dynamic airborne missions under a $4.6 million contract from the Defense Advanced Research Projects Agency (DARPA), the company announced in a statement.
This initiative is part of DARPA's Artificial Intelligence Reinforcements (AIR) program, which aims to enhance modeling and simulation (M&S) approaches and create AI agents for live, multi-ship, beyond visual range (BVR) missions, the statement reads.
The AIR program seeks to improve the speed and predictive performance of government-provided baseline models to better reflect real-world Department of Defense systems. Over an 18-month period, Lockheed Martin will apply AI and machine learning (ML) techniques to develop surrogate models of aircraft, sensors, electronic warfare, and weapons in dynamic and operationally representative environments, the statement reads.
Lockheed Martin will leverage its ARISE infrastructure to deliver significant data, allowing service members to make faster and more informed decisions, the company says.
Read more:
AI tools to be developed for DARPA by Lockheed Martin - Military Embedded Systems
European beech spring phenological phase prediction with UAV-derived multispectral indices and machine learning … – Nature.com
Phenological data historical overview
In the observation period from 2006 to 2020 the process for spring leafing out observations typically began before budburst, at the point when buds began swelling (phase 0.5). Figure3 shows a comprehensive overview of the yearly averaged spring phenological observations from 2006 to 2020 for the Beech plot with a maximum variation of the start of 40 days and end of all phases of 20 days between the years.
The phenological spring phase development for Beech at the Britz research station between 2006 and 2020. As shown in the figure, the timing of phenological phases can vary considerably over the years, due to a variety of climatic factors.
The analysis of the duration between different phenological phases is crucial for understanding two key aspects: first, the timing of budburst in relation to climate change impacts, and second, the progression to later stages, such as phase 4 and phase 5, when leaves are nearing full development. The "hardening" of leaf cell tissues, which occurs at these later stages, renders the leaves less vulnerable to late frosts, intense early spring solar radiation, and biotic pests such as Orchestes fagi. Additionally, in early spring drought conditions, certain phases may be delayed, extending the development period from phases 1.0 to 5.0. This phenomenon was observed at the Britz research station in 2006, 2012, 2015, and 2019.
Figure4 in the study visually illustrates the variability in phase duration from 2006 to 2020, which ranged from 23 to 41 days. Meanwhile, Table 5 offers a comprehensive summary with descriptive statistics for the length of time between phases. The phase lengths presented in Fig.4 and Table 5 are derived from the average timings across all sampled beech trees in the phenology plot. For more accurate predictions of other phases based on a single observed phase, it might be more effective to model using data from individual trees, given the significant heterogeneity that can exist among them during the spring phenological phases. Further research in this direction is warranted to explore these possibilities.
The average spring phenological phases at the Britz research station shown in length between phase 1 and 5 from years 2006 to 2020.
Trends show an earlier onset of phase 1.0 (see Fig.5; left), as well as phase 5.0 (see Fig.5; right). A gradual increase in average yearly air temperature (see Fig.6; left) is also evident, alongside a steady decrease in yearly precipitation (Fig.6; right).
(left) Yearly linear trend in phenological phase 1.0; (right) Yearly linear trend in phenological phase 5.0.
(left) Yearly linear trend of average air temperature between 2006 and 2020; (right) Yearly linear trend of average precipitation between 2006 and 2020. Both are results from the Britz research station.
Several of the trees used for phenological observations at the research site are equipped with electronic band dendrometers and sap flow measurement devices. Figure7 depicts the relationship between the phenological phases and the onset of stem growth for tree number 328 during the growth season. Notably, in both 2017 and 2018, the onset of stem diameter growth in this tree coincided with the achievement of phase 3.0, which is marked by the emergence of the first fully unfolded leaves.
Spring phenological phases shown in relation to band dendrometer measurements from 2017 (left) and 2018 (right). Stem growth typically began around the arrival of phase 3.0.
The dendrometer data from 2018 reveal significant fluctuations in growth deficit throughout the growth season. These fluctuations align with the prolonged drought conditions reported in that year, as documented by Schuldt et al.45. This correlation highlights the impact of environmental factors, such as drought, on the growth patterns and phenological development of trees, providing valuable insights into the interplay between climatic conditions and tree physiology.
The analysis of the phase and foliation datasets is further elaborated through the histograms presented in Fig.8. These histograms exhibit a distinct bimodal distribution, characterized by noticeable left- and right-skewed distributions on the tail ends. This pattern arises from a typical surplus of observations occurring before phase 1.0, which is primarily due to the intensified frequency of observations in anticipation of budburst. Additionally, the extended duration between phases 4.0 and 5.0 contributes to this bimodal distribution. This phenomenon highlights the uneven distribution of observations across different phenological phases, influenced by the varying rates of development and the specific focus of the observation periods.
Histograms showing a distinct biomodial distribution of the phase and foliation ground observations from 2019 and 2020.
Due to the spectral reflectance characteristics of vegetation, visible bands tend to show a positive correlation among each other, whereas the NIR band shows a negative correlation (Mather & Koch, 2011). All the vegetation indices, whether derived from visible or NIR bands or a combination thereof, have a positive correlation with the phase and foliation datasets except for the NDWI, which typically has an inverse relationship with the phases and foliation (see Fig.9). The most consistent index throughout all datasets, whether originating from single or combined years, is evidently the NDVI with a persistent correlation of r>0.9 (p<0.001) over all datasets.
Spearman correlation analysis of the spectral indices derived from the 2019 and 2020 datasets in relation to the ground observations.
Indices derived from visual bands (i.e., GCC and NGRDI) showed a correlation of r=0.65 (p<0.001), and those uncalibrated were even poorer. Interestingly, the AIRTEMP meteorological-based feature correlated very well with the ground observations (r=0.9; p<0.001), with a very high correlation coefficient to the phenological phases at r=0.95 (p<0.001).
In terms of correlation among independent features (see Fig.10), the aim was to refrain from implementing highly correlated features when multiple independent features were incorporated into the modeling process. This could be especially problematic when multiple indices are derived from the same bands (i.e., NDVI and EVI). Here, we could deduce that the NDREI and GCC, when used together for the modeling process, have a lower correlation (r=0.73) and do not share any similar bands. Likewise, the NDRE and the NDWI do not share the same bands and have a negative correlation coefficient of r=0.8. The NDWI and the GCC share only the green band and correlate negatively at r=0.74.
Between-variable Spearman correlation assessment of the 2019/2020 features.
In analyzing the use of correlation for feature selection, it is important to note that while this method is informative, particularly for evaluating multicollinearity, it can potentially be misleading. This is because correlation coefficients might be artificially high due to the bimodal influence on the dataset. The aggregation of data points at the tail ends of the distribution results in a biased similarity caused by an oversampling of similar phases, thus leading to high correlation coefficients. Consequently, correlation filtering methods were not the sole reliance for feature selection, as outlined by Chandrashekar and Sahin46. This approach recognizes the limitations of using correlation analysis in isolation, especially in datasets with unique distribution characteristics such as the one described here.
The addition of polynomial terms into regression models can aid in the characterization of nonlinear patterns43 and is conducive to representing phenological trends, particularly those of the spring green-up phases. As polynomial fitting may not be capable of identifying the complexities of phenology metrics in comparison to other algorithms47,48, we used the fitting of polynomials here for the purpose of feature selection, where the aim was to identify which features best correspond to the typical spring phenology curve. Figure11 shows the fitting of the five polynomial orders using the example for the NDVI, resulting in an RMSE of 0.55, MAE of 0.41 and R-squared of 0.91. Here, the third polynomial order was deemed the best choice for further analysis where the curve is not oversimplified or too complex.
Modelling of the spring phenological phases (2019/2020) dataset with polynomial regression of the first to fifth order.
To follow, each of the selected individual features was tested with the 3rd-order polynomial separately for the 2019/2020 and 2020/2021 datasets for both phase (Fig.12) and foliation (Fig.13). In terms of the phenological phases, the GNDVI shows quite a low dispersal of RMSE for the 2019/2020 dataset, yet the dispersal is higher for the 2020/2021 dataset. A similar result is evident for the NDVI, where less dispersal is found in the 2020/2021 dataset than in the 2019/2020 dataset. The cumulative warming days (AIRTEMP) as well as the indices derived from the uncalibrated visible bands (GCC_UC and NGRDI_UC) fared poorly for both datasets. This was also the case for foliation; however, AIRTEMP performed better for the 2019/2020 dataset. Regarding foliation, the NDVI also performed well for the 2020/2021 dataset, as did the NDREI for both datasets.
Overview of the spring phenological phases and indices modelled with third-order polynomial regression for the 2019/2020 (left) and 2020/2021 (right) datasets.
Overview of spring foliation and indices modelled with polynomial regression of the third order for the 2019/2020 (left) and 2020/2021 (right) datasets.
Based on the results of the correlation analysis and polynomial fitting, we were able to select the most relevant features for further scrutinization during the subsequent modeling process. It is important to note here that in the initial feature selection process using only the correlation analysis alone could have produced an unseen bias due to an aggregation of data points at the tail ends of the datasets, which was especially evident for the 2019/2020 dataset. We proceeded to build three models based on ML algorithms that aided in choosing the best performing algorithms as well as features. Each of the selected individual and combined indices were modeled with each algorithm and evaluated using an 80/20 training/validation data split. This not only helped in choosing the best ML algorithm but also assisted in a type of model-based feature selection by further narrowing down the selected features. In terms of the phenological phases, an RMSE of0.5 (0.6) is deemed acceptable and similar to the magnitude of potential human error. For the Britz method of foliation, an RMSE of10% is assumed to be acceptable; however, some may argue that an RMSE of5% in terms of foliage observations is possible with ground observations. Here, it should be noted that the Britz method of foliation is based on the percentage of leaves that have fully opened rather than fractional cover or greening-up.
Regarding the phenological phases, the GAM boosting algorithm showed the best results overall (see Table 6). The GAM models with the features NDREI+GCC resulted in an RMSE of 0.51, MAE of 0.33 and an R-squared of 0.95. The feature combination of NDWI+GCC resulted in an RMSE of 0.46, MAE of 0.3 and R-squared of 0.96. The top performing model was that of GAM boosting with the NDVI, which produced an RMSE of 0.28, MAE of 0.18, and R-squared of 0.98. The second-best performing model was that of the GAM model with the NDRE+NDWI input features, resulting in an RMSE of 0.44, MAE of 0.31 and R-squared of 0.96. Interestingly, the uncalibrated GCC (GCC_UC) outperformed the calibrated GCC with an RMSE of 0.73 for gradient boosting and the GCC_UC index as opposed to an RMSE of 0.81 for GAM boosting and the GCC.
At this stage of the modeling process, the NDVI and GAM boosting algorithms showed very good results (RMSE=0.28), and the question is here whether the dataset is overfit for the Britz research station beech stand (Table 7). At this point, it is imperative to test the models with unseen data and assess which ones are generalizable over various beech stands, especially those of increased age. In terms of the models derived from indices from the visual bands, the uncalibrated GCC performed slightly better than the radiometrically calibrated GCC and better than some of the models derived from the calibrated multispectral bands, which is particularly interesting, as RGB sensors are typically acquired at a much cheaper price.
For the most part, all models failed the 10% cutoff point except for those using the NDVI as an input feature. Both the NDVI-based GAM boosting and gradient boosting models obtained an RMSE of 7%, MAE of 4% and R-squared of 0.98. Here, overfitting could also be a factor; however, it will still be interesting for further model assessment of the prediction of foliation on a new dataset (2022) as well as datasets outside of the Britz research station. The worst performing models were those utilizing the radiometrically calibrated GCC, which acquired an RMSE of 22%, MAE of 16%, and R-squared of 0.92.
With the aim of testing the robustness and generalizability of the developed models, new data from 2022 as well as data from different forest stands (beech) were introduced (Table 7). Here, we tested the models on new spring phenological data from the same stand from 2022 (n=17) as well as an older beech stand in Kahlenberg (n=10) located in the same region as the Britz research station and a beech stand in the more mountainous region of the Black Forest (n=8) in southwestern Germany. The three test datasets are limited to only one Epoch, where the Kahlenberg site is comprised of mostly later phases and the Britz and Black Forest datasets have a wide range of earlier phases (<4.0). Additionally, training datasets were divided into three different subdivisions based on the year of origin: 2019/2020, 2020/2021 and all datasets together (20192021). This was carried out for the purpose of distinguishing whether data acquisition methods from a certain year contributed to error propagation. For example, the 2019 field data were collected by a different observer and often not recorded on the same day as flights (3 days), as well as low-quality radiometric calibration. The models chosen for testing were those implementing GAM boosting and the RGB-derived indices GCC (Micasense Altum) and GCC_UC (Zenmuse X7) and the NDVI (Micasense Altum). Table 8 displays a list of all the tested models with reference to the applied index, location, training data subdivision and date.
The results of the model testing of the phenological phase prediction (see Fig.14) and foliation (see Fig.15) were ranked in order of the RMSE. Notably, all the models of the phenological phase prediction that achieved the 0.5 threshold (left of green dotted line) were those of the calibrated and uncalibrated GCC, which originate from bands of the visible portion of the electromagnetic spectrum. Five of six of these models were from the Kahlenberg dataset, and one was from the Black Forest dataset. The best performing models were selected for each of the test sites and are mapped out in Figs.16, 17, 18, 19. All image data acquired for the test sites with Zenmuse X7 lack radiometric calibration except for the Britz dataset (see Fig.19), which was acquired with both the X7 and radiometrically calibrated Micasense Altum data.
graph showing the RMSE for the phase prediction ranked in order from poorest to best RMSE. The green dashed line depicts the cut-off point of acceptable accuracy. Allowing an RMSE of up to 0.6 would enable the NDVI model derived from the multispectral datasets. Otherwise, only models originating from the visible bands are considered operational.
graph showing the RMSE for foliation prediction ranked in order from poorest to best. The green dashed line depicts the cut-off point of 10%. None of the models for foliation prediction are considered functional.
Phase prediction of an older beech stand (>100 years) utilizing the model originating from the uncalibrated GCC 2020/2021 dataset. The very low RMSE of 0.22 proves a highly generalizable model; however, it should be noted that this is a relatively small dataset (n=10) and comprised of only later phases (>3.0). The ML phase is the predicted phase, and the Phase originates from ground-based observations.
Phase prediction of a beech stand (<70 years) utilizing the model originating from the calibrated GCC 2019/2020 dataset. The Black Forest dataset is particularly challenging, as a wide range of phases are available. An RMSE of 0.43 is within the accepted error cut-off of0.5.
Phase prediction of a beech stand (47 years) utilizing the model originating from the calibrated GCC 2020/2021 dataset. Despite being a larger dataset (n=17) in comparison to the other test sites, an RMSE of 0.54 was achieved, which can be regarded as achieving the 0.5 threshold.
Phase prediction of a beech stand (50 years) utilizing the model originating from the calibrated NDVI 2020/2021 dataset. This is the only model derived from the nonvisible band (NIR), which is in proximity to the 0.5 threshold RMSE=0.61). CIR=Color-infrared.
The Kahlenberg dataset (see Fig.16) with the gcc-uc-2021 model resulted in a very low RMSE of 0.22, MAE of 0.16 and R-squared of 0.08 (n=10). Such a low RMSE for an uncalibrated RGB-based model is an unexpected result here and shows that the later phases, in particular phase 4.0, predict well. Phase 4.0 is a significant phase in the spring green-up, as it corresponds to the completion of all leaf and shoot development. The transition to Phase 5.0 would then follow with the hardening of leaf tissue alongside a change to darker green and increased late-frost hardiness.
Regarding the Black Forest dataset with the bf-gcc-19-20 model, an RMSE of 0.43, MAE of 0.32, and R-squared of 0.02 (n=8) were achieved (see Fig.17). Here, a scene with a wide range of phases (0.93.8) was available, and a successful phenological phase prediction was possible with the calibrated GCC model and training data from 2019 and 2020. It is important to note that the radiometrically calibrated GCC model was used to predict the GCC, which is derived from the noncalibrated Zenmuse X7. Significant here is that sensor mixing in terms of model training with the multispectral sensor and prediction with a consumer grade RGB sensor is attainable. We considered the low R-squared as insignificant due to the overall low sample rate of the test datasets.
The Britz dataset (seeFig.18) also implemented the GCC and 2019/2020 training model (br-gcc-19-20) and resulted in an RMSE of 0.54, MAE of 0.45 and R-squared of 0.65 (n=17). It is important to note that the Britz test dataset possesses more samples than other test sites and achieves the 0.5 threshold. This test dataset, however, comprises the same trees as those in the training dataset, providing the model with an advantage at the Britz test site. It is important to note, however, that this advantage might not extend to other test sites, potentially limiting the model's ability to generalize well in different settings.
With respect to the test sites involving phase prediction from the multispectral sensor (Micasesense Altum), only the Britz and Kahlenberg sites were available. The only NDVI-based model that was in proximity to the 0.5 threshold was the Britz test dataset (br-ndvi-20-21), with an RMSE of 0.61, MAE of 0.52, and R-squared of 0.58 (n=17). We hypothesized that the radiometric calibration methods from 2019 would influence the model accuracy; however, there was only a marginal difference in the RMSEs of the 2019/2020 and 2020/21 datasets.
Overall, the best performing and most consistent model for predicting the spring phenological phases was the calibrated GCC model trained on the 2019/2020 dataset. This model (gcc-uc-19-20) demonstrated strong generalization across all test sites, including the Black Forest (bf-gcc-19-20) and Kahlenberg (ka-gcc-uc-19-20), with the highest RMSE observed at the Britz (br-gcc-uc-19-20) 2022 test site (RMSE=0.54). For a visual representation of the model's performance, please refer back to Fig.14.
This research highlights the challenges in obtaining radiometrically calibrated datasets over multiple growing seasons, despite pre- and post-mission calibration panel acquisition and DLS data usage. Issues arise when reflectance values bottom out, such as during the calculation of NDVI or other indices involving the NIR band, which occurs when clouds temporarily during flight missions, exposing the terrain to direct sunlight. This issue of oversaturation in the NIR band was also reported by Wang41. While the DLS compensates for fluctuations in irradiance, it is effective only for global changes in lighting conditions. While the DLS compensates for fluctuations in irradiance, it is effective only for global changes in lighting conditions. The problem is exacerbated in dense forests, where obtaining shadow-free reference panels is nearly impossible, and capturing calibration data at different locations before and after missions is impractical. This could result in time differences from the actual flight mission, during which considerable changes in solar angle might occur.
The size of the reflectance panels also impacts the difficulty of radiometric calibration. Honkavaara et al.49 showed a better calibration for larger, custom-made reference panels of 11m than the manufacturers provided method. Some studies have also demonstrated improved calibration methods using even larger reflectance tarps50,51,52. However, this does not alleviate the problem of acquiring calibration data in dense forests or the previously mentioned sudden changes in illumination. Therefore, further testing and development of improved field radiometric calibration strategies are imperative to more effectively utilize multispectral sensor capabilities.
Despite the challenges with multispectral sensors, particularly in the NIR band, the utility of the RGB bands is notable. Low-cost UAV setups with RGB sensors are widely available, facilitating the collection of vast data. This high data volume is crucial for developing models for various tree species in intensive monitoring plots. A key question is whether training data for models derived from visible bands need calibration from the multispectral sensor. In this case, the model trained with calibrated GCC generalized well with the uncalibrated GCC, but it remains to be seen if this holds true for new datasets and other tree species.
Errors can also arise from crown segmentation in pixel value extraction. For instance, branches from a neighboring tree with earlier phenological onset could overlap into the segmented crown area of the target tree. As segmentation is typically performed with a fully developed canopy (after phase 5.0), such overlapping branches are challenging to account for. Recording influential branches from neighboring trees during ground observations and excluding them from training datasets could improve the quality of training data.
The feature selection process in this research, especially partitioning training datasets by year for testing, was effective. It allowed for scrutinizing and removing training data portions that could affect model generalizability. For instance, the br-ndvi-20-21 derived from multispectral sensors excludes the 2019 dataset due to its lower quality radiometric calibration, time differences between observations, a slightly different multispectral sensor, and a different observer for ground observations. Conversely, the gcc-19-20 models generalized well with the 2019 datasets incorporated, using only bands from the visible spectrum. This suggests that the main factors in error propagation lie in the quality of radiometric calibration and sensor mixing with NIR bands, a conclusion that might not have been apparent without partitioning training by year. Interestingly, sensor mixing does not seem to be an issue with RGB imagery, which is advantageous for acquiring large data volumes.
Incorporating meteorological data, such as warming days (AIRTEMP), as a model feature suggests that other factors, such as a dynamic start date and chilling days, should also be considered for a successful phenological model in fusion with spectral data. However, this concept is somewhat limited, as meteorological data at the individual tree level might not explain the heterogeneity of individual trees in phenological development. The fusion of meteorological and spectral data is more suited for larger-scale applications, where phenological data are applied standwise rather than at the individual tree level.
Regarding the Britzer foliation method, translating ground observations into remote sensing data was not feasible. Consequently, the Britzer method of foliation has been abandoned at the Britz research station and replaced with the ICP Forests flushing method. Currently, the long-term Britzer phase method, alongside the flushing method, is conducted with the aim of simplifying observations and enabling harmonization of Britz research station data with the ICP Forests network at the international level.
From overlooked to overachiever: Using AI to drive worker mobility – Human Resource Executive
In the last few years, as the labor market tightened, companies increasingly turned inward, searching within their ranks for hidden gems of talent.
The strategy, although well-intentioned, often fell short of its potential. HR managers and department heads rolled out skills assessments, performance evaluations and career development programs in a bid to squeeze out more productivity and promote from within.
A knock-on effect was, in the best of cases, a more spirited corporate culture and a boost in loyalty and retention, as employees felt seen and valued.
However, lets be honest: Little headway was made. Rather than identifying internal prospects for promotion, these approaches were best suited to measuring what employees did well in the jobs they were in.
On top of that, these efforts to advance worker mobility were costly, time-consuming and challenging to scale throughout the organization. Frustrated leaders often resorted to a default tactic: hiring externally. Place job ads and watch the resumes roll in.
The grow-your-own push seemed to be fading. Data showed that worker mobility has been waning for years, partly because companies fail to benchmark metrics that produce better outcomes.
But theres a more promising approach to finding these hidden gems: leveraging technologyspecifically the hot new capabilities unlocked by artificial intelligence, machine learning and large language models.
This tech combination could be a perfect fit for companiesespecially medium-sized ones seeking to boost their workforces productivity. Whats key is that the AI combo can process reams of valuable information produced by workers in a variety of interactions that we tend to overlook: presentations, video conferences, reports and emails.
Leaders can discover talents and capabilities obscured by what a worker is currently being asked to do. Data is produced in real time, continuously updated and can be analyzed on the spot.
With AI engines capable of crunching that large volume of data, comparisons among employees become easier for managers. Some workers may need more training. Others are ripe for promotion. As a workforce development tool, it is very empowering for employees.
Here are examples of what AI can uncover on the job:
In addition to analyzing hard skills, AI can also hone in on soft skills, which are in high demand. This technology can scrutinize and benchmark problem-solving, collaboration, critical thinking and communication skills. Recent developments in LLMs, particularly with omnichannel models, show new applications in scaling coaching for employees in these essential skills. By leveraging personalized engagement and AI-driven insights, organizations can provide tailored coaching and development opportunities that were previously impractical at scale.
Because AI, machine learning and large language models can review most work done on the job, employees true skill sets are uncovered. These are the hidden gems that can bring overlooked value to companies. It might turn out that a reticent programmer has the gift of persuasion. An entry-level worker may possess gifted math skills. And so on.
Companies can also discover new information in their industry. For example, AI might reveal an uptick in the demand for prompt engineers in job ads or indicate a decline in capital expenditures in quarterly reports, potentially signaling a recession.
A challenge for companies considering using this AI combo is how to implement it. Up-front effort is required. It works like this: An organization starts with a pre-trained model with generic data and then spends four to six weeks calibrating it so the information captured is specific to that entity. Once the knowledge engine starts learning the companys data, leaders can expect a high level of accuracy in picking up the presence and proficiency levels in the skills of each employee.
A misunderstood part of AI that I consider good news is the fact that it will end up helping, not hurting, workers. After all, it can pinpoint hard and soft skills, leading to their development and promotion within a company.
In that way, AI benefits both employer and employee. As I like to say: The employer is sitting on the best evidence of what an employee is capable of.
See the original post:
From overlooked to overachiever: Using AI to drive worker mobility - Human Resource Executive
ARG Introduces Market Insights on AI-Enabled Communications – GlobeNewswire
MCLEAN, Va., July 09, 2024 (GLOBE NEWSWIRE) -- Expanding on its growing portfolio of exclusive market insights and decision guides, ARG announced the availability of the AI-Enabled Communications Market Insights & Decision Guide, a comprehensive report based on information gathered from hundreds of meetings with ARG clients, and created in partnership with 8x8, Dialpad, Fusion Connect, and Vonage.
The AI-Enabled Communications Market Insights and Decision Guide examines how businesses can maximize their investments in enterprise software when they leverage communication platforms that utilize artificial intelligence (AI) and machine learning (ML). Among the reports key findings:
Generative AI and machine learning are changing how people work, said Jason Hart, Managing Partner, ARG. Choosing a technology partner that can deliver a communications platform with the right native AI integrations is pivotal to maximizing efficiencies and achieving other business benefits such as improved decision-making, an enhanced customer experience, increased productivity, cost savings, and revenue growth.
Innovation provides businesses with the power to unlock their full potential The migration from on-premises to cloud-based communication infrastructure has facilitated the implementation of the impactful AI that benefits businesses today. For example, organizations can now leverage AI to solve problems and break down silos, mine data for deeper customer interactions, automate administrative tasks and decision-making, provide customized training, collect real-time data and insights to predict customer needs, and monitor customer interactions.
The advantages of AI-enablement go well beyond the employee and customer experience, said Jeff Milford, Product Lead, UC and CC, ARG. There are significant efficiencies to be gained by integrating an AI-enabled communications platform with line-of-business applications such as CRM, ERP, digital workflow management, and helpdesk. Through native integrations, transcription, and automation, these implementations can directly impact nearly every part of an organization, helping these businesses to unlock their full potential.
Technology advances maximize efficiency and improve customer engagement According to the Market Insights and Decision Guide for AI-Enabled Communications, there are several areas where AI excels:
Discovering the right AI-enabled communications platform with ARG ARG has a well-established framework to help organizations select the right AI-enabled communications platform and map their desired business outcomes and software investments to service providers' most robust integration and performance criteria. With a vast client base, organizations can benefit from valuable insights into negotiating better deals with suppliers, reducing procurement costs, and improving vendor management. ARG can also manage the entire procurement life cycle, service management, and vendor and account management at no additional cost.
The AI-Enabled Communications Market Insights & Decision Guide provides an overview of how generative AI is used to enhance customer experience, the impact of SaaS integrations with AI on further innovation, the departmental implications of AI across the organization, and peer insights and successful use cases demonstrating the deployment of AI-enabled communications platforms. To access the full report, click here. For ARGs entire resource library, visit https://www.myarg.com/resources/.
About ARG There are two problems in the IT market; the first is the overwhelming choice, and the second is the pace of change. Companies are afraid of making the wrong choice or not choosing the latest technology because they are simply not aware of it. For 32 years, ARG has helped over 4,000 companies make the right choice from thousands of options and bleeding-edge new products. We call it IT Clarity our clients call it brilliant. To learn more about ARG, contact info@myarg.com.
See more here:
ARG Introduces Market Insights on AI-Enabled Communications - GlobeNewswire
AI and Big Data Governance: Challenges and Top Benefits – AiThority
Artificial intelligence (AI) and big data share a symbiotic relationship. One of the primary challenges in implementing big data governance is ensuring data awareness and understanding across the organization. Data governance initiatives often fail when stakeholders are not aware of the importance of data governance or lack the knowledge to implement it effectively. Automation plays a pivotal role in modern data governance, significantly enhancing cost-effectiveness. By automating processes, organizations can streamline governance efforts and allocate resources more efficiently. Machine learning further advances these efforts by accelerating metadata collection and improving categorization accuracy, highlighting its critical role in optimizing data governance practices.
Also Read: How AI Is Transforming Big Data?
AI relies heavily on vast datasets for enhancing model training, enabling more precise predictions. Concurrently, big data leverages AI tools to bolster its analytical capabilities. AIs effectiveness hinges on data availability. Without sufficient data, AI functions merely as a theoretical concept. This interplay becomes increasingly crucial as data accessibility expands, facilitating machine learning and iterative processes that drive improved accuracy and operational efficiency autonomously.
A recent report from Drexel Universitys LeBow Center for Business Analytics reveals that 77% of data and analytics professionals prioritize data-driven decision-making within their data programs. However, less than half of survey respondents express high or very high levels of trust in their data. This lack of confidence is largely attributed to poor data quality, which not only obstructs the success of data programs but also undermines data integration efforts and compromises data integrity, presenting significant challenges for big data governance.
Big data governance refers to the management framework implemented within an organization to ensure the proper handling, integrity, usability, and security of big data sets. This framework includes policies, procedures, and standards that govern data access, data quality, compliance with data-related regulations, and data protection.
The AIGA AI Governance and Auditing, led by the University of Turku in Finland, collaborates with academic and industry partners, akin to Google, to offer guidance on the responsible development and deployment of AI.
The AIGA AI Governance Framework serves as a practical manual for organizations aiming to implement ethical and responsible AI systems. Its primary objectives include:
At the core of the discussion, there is an overlaying of ethical AI, data policies, and current data governance. Besides mere algorithmic technical acquaintance, ethical AI encompasses efforts to imbue fairness, transparency, and accountability in the conduction and implementation of AI. Conversely, data governance avails the scaffold for dealing responsibly in the management, protection, and usage of data assets.
Fairness ensures that AI systems do not reflect bias or result in discrimination. This is the principle that will assure stakeholders about the operations of the AI algorithms. Accountability creates liability for developers and operators in AI systems decisions and results. These principles make AI applications greater in lowering ethical risks and increasing trust from users and society.
It addresses the interplay between a concern for ethics in AI and data governance by identifying a series of challenges and opportunities. It emphasizes the requirement to establish a basis for a culture of conscientiousness and responsibility concerning ethical AI. Companies engaging with such matters of ethics will be able to maximize AI transformation and guard individual rights and aspirations.
Also Read: The Rise of AI in Data Collection: Implications and Opportunities for Businesses
Organizations are nowadays using AI more and more to strengthen their data analytics ability and maintain an advantage in the market. When AI is combined with data governance rules, companies can maximize the ROI by measuring ineffective practices and boosting successful strategies:.
Their use is different in different organizational departments for varied data sources that are used in their respective industriesfor example, sales departments that analyze consumer trends. This use has been quite populous with the use of predictive analytics, which increases operational efficiency.
The manufacturing departments in organizations base their investment in AI on analytics to meet their industry needs, targeting the betterment of productive processes, when hardly anything else is. Root causes for quality issues are identified, and then management is equipped to make decisions, and just maybe, those issues are prevented through predictive maintenance strategies.
AI is important for the detection of anomalies and cybersecurity. Machine learning makes AI perform the detection and response of threats timely, especially those concerning data breaches. This proactive approach ensures data integrity and compliance through continuous monitoring and rapid response capabilities.
The democratization of data governance is greatly increased, with AI providing secure data access and not intercepted by cybercriminalsmeaning sophisticated tactics such as Man-In-The-Middle or ransomware. By automating privacy, compliance, and security measures, AI acts as a 24/7 safeguard against cyber threats, thus enhancing data protection.
Moreover, AI also enables the automated discovery of processes, while being able to analyze behavioral data and develop digital records with ease, hence effectively streamlining processes for data management.
AI systems heavily rely on extensive datasets for learning and operational tasks. However, ensuring data accuracy and fairness poses challenges when dealing with incomplete, outdated, inconsistent, or biased data. Organizations must establish stringent data standards, validate sources rigorously, and continually monitor and audit data quality throughout the AI lifecycle to mitigate these issues effectively.
The processing of sensitive data by AI systems, such as health records or financial information, exposes organizations to significant risks like breaches and misuse. Securing data through robust encryption, access controls, and anonymization techniques is crucial. Moreover, compliance with data protection regulations and ethical principles is essential to safeguard against unauthorized access and ensure data privacy.
Integrating diverse data types (structured, unstructured, streaming) from various sources (internal, external, cloud-based) presents significant challenges in data consistency and compatibility. Adopting standardized data models, schemas, and formats, along with leveraging integration tools and platforms, helps organizations achieve seamless data exchange and interoperability across systems.
Effective utilization of AI requires a workforce equipped with strong data literacy skills and a supportive data-driven organizational culture. Enhancing data literacy involves enabling employees to understand, analyze, and effectively utilize data. Fostering a data-driven culture encourages informed decision-making and innovation. Organizations should invest in comprehensive data education, training, and collaborative initiatives to build trust and maximize the adoption of AI technologies among stakeholders.
Improve Data Quality
Data quality is fundamental to any effective data strategy. AI enhances data quality by automating error detection and correction within datasets, thereby reducing inconsistencies and inaccuracies. AI algorithms also standardize data structures, facilitating easier comparison and analysis while uncovering hidden trends and patterns.
Automate Data Compliance
In todays landscape of escalating cyber threats, maintaining data compliance is crucial. AI plays a pivotal role in ensuring continuous compliance by monitoring data flows in real-time. It detects anomalies, unauthorized access attempts, and potential violations of data regulations, triggering alerts and recommendations for corrective actions. Additionally, AI automates the classification and labeling of sensitive data and generates compliance reports, thereby reducing administrative burdens.
Strengthen Data Security
AI enhances data security by proactively analyzing data access patterns to detect suspicious activities such as intrusions or unauthorized access attempts. Leveraging machine-learning-based malware detection systems, AI identifies and mitigates both known and unknown threats by analyzing behavioral patterns. Moreover, AI automates security patch management and monitors adherence to security policies, bolstering overall cybersecurity measures.
Democratize Data
Central to effective data strategy is fostering a data-driven culture within organizations. AI facilitates this by simplifying data access and analysis. AI-powered search engines swiftly extract relevant information from extensive datasets, enabling employees to efficiently retrieve necessary data. Furthermore, AI automates data aggregation and presentation through interactive dashboards, enhancing data accessibility and facilitating seamless information sharing across teams.
The volume of data is growing exponentially, projected to reach 180 zettabytes by 2025. To navigate this vast landscape effectively, artificial intelligence (AI) plays a pivotal role in extracting actionable insights.
AI utilizes machine learning and deep learning tools that leverage big data to learn and evolve over time. These algorithms iteratively refine models to optimize solutions and generate valuable insights for informed decision-making.
Traditionally, data analysis provided a snapshot of current conditionsThis is what has occurred. With AI and machine learning, predictive capabilities extend to forecasting future scenarios and prescribing optimal strategies for sustainable outcomes.
Moreover, AI has revolutionized data analysis by automating complex tasks that were once labor-intensive. Previously, analysts relied on SQL queries and manual statistical modeling, which could take weeks to yield insights. Today, AI-driven analytics processes data swiftly, reducing analysis times to just one or two days.
This section illustrates how AI enhances data insights by harnessing advanced technologies to derive deeper, faster, and more accurate business intelligence from expansive datasets.
The future of data governance is intricately intertwined with the evolution of Artificial Intelligence (AI). In response to escalating data complexity and volume, AI is poised to become an indispensable tool, elevating data governance to a more sophisticated, agile, and proactive level.
AIs capacity for learning, adaptation, and prediction will revolutionize compliance, security processes, and policy adjustments in real time, introducing a forward-thinking approach to governance. By leveraging predictive capabilities, organizations can anticipate challenges and capitalize on opportunities, ensuring that data remains a secure and reliable asset for informed decision-making.
Looking ahead, the integration of AI into data governance transcends mere enhancement; it is essential for unlocking the full potential of data while upholding compliance and strategic integrity. This transformation towards AI-enhanced governance represents a crucial adaptation to a digital landscape where data plays a pivotal role in driving business operations forward.
[To share your insights with us as part of editorial or sponsored content, please write topsen@itechseries.com]
See the original post here:
AI and Big Data Governance: Challenges and Top Benefits - AiThority