Category Archives: Machine Learning

Undergraduate Researchers Help Unlock Lessons of Machine Learning and AI – College of Natural Sciences

Brain-Machine Interface

AI also intersects with language in other research areas. Nihita Sarma, a computer sciencethird-year student and member of Deans Scholars and Turing Scholars, researches theintersection of neuroscience and machine learning to understand language in the brain, workingwith Michael Mauk, professor of neuroscience, and Alexander Huth, an assistant professor ofcomputer science and neuroscience.

As research subjects listen to podcasts, they lie in an MRI machine and readings track their brainactivity. These customized-to-the-subject readings are then used to train machine learningmodels called encoding models, and Sarma then passes them through decoding models.

My research is taking those encodings and trying to backtrack and figure out based on thisneural representation based on the brain activity that was going on at that moment whatcould the person inside the MRI machine possibly have been thinking or listening to at thatmoment? Sarma said.

Along with gaining a better understanding of how language is represented in the brain, Sarmasaid the research has possible applications for a noninvasive communication tactic for peopleunable to speak or sign.

We would be able to decode what theyre thinking or what theyre trying to say, and allow themto communicate with the outside world, Sarma said.

Read more from the original source:
Undergraduate Researchers Help Unlock Lessons of Machine Learning and AI - College of Natural Sciences

Safeguarding AI: A Policymakers Primer on Adversarial Machine Learning Threats – R Street

Artificial intelligence (AI) has become increasingly integrated into the digital economy, and as weve learned from the advent of the internet and the expansion of Internet-of-Things products and services, mass adoption of novel technology comes with widespread benefitsas well as security tradeoffs. For policymakers to support the resilience of AI and AI-enabled technology, it is crucial for them to understand malicious attacksassociated with AI integration, such as adversarial machine learning (ML); to support responsible AI development; and to develop robust security measures against these attacks.

Adversarial Machine Learning Attacks Adversarial ML attacksaim to undermine the integrity and performance of ML models by exploiting vulnerabilities in their design or deployment or injecting malicious inputs to disrupt the models intended function. ML models power a range of applications we interact with daily, including search recommendations, medical diagnosis systems, fraud detection, financial forecasting tools, and much more. Malicious manipulation of these ML models can lead to consequences like data breaches, inaccurate medical diagnoses, or manipulationof trading markets. Though adversarial ML attacks are often explored in controlled environments like academia, vulnerabilities have the potential to be translated into real-world threats as adversaries consider how to integrate these advancements into their craft. Adversarial ML attacks can be categorized into white-box and black-box attacksbased on the attackers ability to access the target model.

White-box attacksimply that the attacker has open access to the models parameters, training data, and architecture. In black-box attacks, the adversary has limited access to the target model and can only access additional information about it through application programming interfaces (APIs) and reverse-engineering behavior using output generated by the model. Black-box attacks are more relevant than white-box attacks because white-box attacks assume the adversary has complete access, which isnt realistic. It can be extremely complicated for attackers to gain complete access to fully trained commercial models in the deployment environments of the companies that own them.

Types of Adversarial Machine Learning Attacks Query-based Attacks Query-based attacks are a type of black-box ML attack where the attacker has limited information about the models internal workings and can only interact with the model through an API. The attacker submits various queries as inputs and analyzes the corresponding output to gain insight into the models decision-making process. These attacks can be broadly classified into model extraction and model inversion attacks.

Figure 1 Explaining query-based ML attacks (Source: Adversarial Robustness Toolbox)

Model Extraction: The attackers goal is to reconstruct or replicate the target models functionality by analyzing its responsesto various inputs. This stolen knowledge can be used for malicious purposes like replicating the model for personal gain, conducting intellectual property theft, or manipulating the models behavior to reduce its prediction accuracy.

Model Inversion:The attacker attempts to decipher characteristicsof the input data used to train the model by analyzing its outputs. This can potentially expose sensitive information embedded in the training data, raising significant privacy concerns related to personally identifiable information of the users in the dataset. Even if the models predictions are not directly revealing, the attacker can reconstruct the outputs to infer subtle patterns or characteristics about the training dataset. State-of-the-art models offer some resistance to such attacks due to their increased infrastructure complexity. New entrants, however, are more susceptible to these attacks because they possess limited resources to invest in security measures like differential privacyor complex input validation.

Data Poisoning Attacks Data poisoningattacks occur in both white- and black-box settings, where attackers deliberately add malicious samplesto manipulate data. Attackers can also use adversarial examplesto deceive the model by skewing its decision boundaries. Data poisoning occurs at different stages of the ML pipeline, including data collection, data preprocessing, and model training. Generally, the attacks are most effective during the model training phase because that is when the model learns about different elements within the data. Such attacks induce biases and reduce the models robustness.

Figure 2 Explaining data poisoning attack (Source: Adversarial Robustness Toolbox)

Adversaries face significant challenges when manipulating data in real time to affect model output thanks to technical constraints and operational hurdles that make it impractical to alter the data stream dynamically. For example, pre-trained models like OpenAIs ChatGPT or Googles Gemini trained on large and diverse datasets may be less prone to data poisoning compared to models trained on smaller, more specific datasets. This is not to say that pre-trained models are completely immune; these models sometimes fall prey to adversarial ML techniques like prompt injection, where the chatbot either hallucinatesor produces biased outputs.

Protecting Systems Against Adversarial Machine Learning Attacks Addressing the risk of adversarial ML attacks necessitates a balanced approach. Adversarial attacks, while posing a legitimate threat to user data protections and the integrity of predictions made by the model, should not be conflated with speculative, science fiction-esquenotions like uncontrolled superintelligence or an AI doomsday. More realistic ML threats relate to poisoned and biased models, data breaches, and vulnerabilities within ML systems. It is important to prioritize the development of secure ML systems alongside efficient deployment timelines to ensure continued innovation and resilience in a highly competitive market. Following is a non-exhaustive list of approaches to secure systems against adversarial ML attacks.

Secure-by-design principles for AI development: One method to ensure the security of an ML system is to employ security throughout its design, development, and deployment processes. Resources like the U.S. Cybersecurity and Infrastructure Security Agency and U.K. National Cyber Security Centre joint guidelineson secure AI development and the National Institute of Standards and Technology (NIST) Secure Software Development Frameworkprovide guidance on how to develop and maintain ML models properly and securely.

Incorporating principles from the AI Risk Management Framework: NISTsAI Risk Management Framework(RMF) is a flexible framework to address and assess AI risk. According to the RMF, ML models should prioritize anonymity and confidentiality of user data. The AI RMF also suggests that models should consider de-identification and aggregation techniques for model outputs and balance model accuracy with user data security. While specialized techniques for preventing adversarial ML attacks are essential, traditional cybersecurity defensive tools like red teamingand vulnerability management remain paramount to systems protection.

Supporting new entrants with tailored programs and resources:Newer players like startups and other smaller organizations seeking to integrate AI capabilities into their products are more likely to be vulnerable to these attacks due to their reliance on third-party data sources and any potential deficiencies in their technology infrastructure to secure their ML systems. Its important that these organizations receive adequate support from tailored programs or resources.

Risk and threat analysis: Organizations should conduct an initial threat analysis of their ML systems using tools like MITREs ATLASto identify interfaces prone to attacks. Proactive threat analysis helps organizations minimize risks by implementing safeguards and contingency plans. Developers can also incorporate adversarial ML mitigation strategiesto verify the security of their systems.

Data sanitization: Detecting individual data points that hurt the models performance and removing them from the final training dataset can defend the system from data poisoning. Data sanitization can be expensive to conduct due to its need for computational resources. Organizations can reduce the risk of data poisoning with stricter vetting standards for imported data used in the ML model. This can be accomplished through data validation, anomaly detection, and continual monitoring of data quality over time.

Because these attacks have the potential to compromise user data privacy and undermine the accuracy of results in critical sectors, it is important to stay ahead of threats. Understanding policy implications and conducting oversight is essential, but succumbing to fear and hindering innovation through excessive precaution is detrimental. Policymakers can foster environments conducive to secure ML development by providing resources and frameworks to navigate the complexities of securing ML technologies effectively. A balance between developing resilient systems and sustained innovation is key for the United States to maintain its position as a leading AI innovator.

The rest is here:
Safeguarding AI: A Policymakers Primer on Adversarial Machine Learning Threats - R Street

Machine Learning Accelerates the Simulation of Dynamical Fields – Eos

Editors Highlights are summaries of recent papers by AGUs journal editors. Source: Journal of Advances in Modeling Earth Systems

Accurately simulating and appropriately representing the aerosol-cloud-precipitation system poses significant challenges in weather and climate models. These challenges are particularly daunting due to knowledge gaps in crucial processes that occur at scales smaller than typical large-eddy simulation model grid sizes (e.g., 100 meters). Particle-resolved direct numerical simulation (PR-DNS) models offer a solution by resolving small-scale turbulent eddies and tracking individual particles. However, it requires extensive computational resources, limiting its use to small-domain simulations and limited number of physical processes.

Zhang et al. [2024] develop the PR-DNS surrogate models using the Fourier neural operator (FNO), which affords improved computational performance and accuracy. The new solver achieves a two orders of magnitude reduction in computational cost, especially for high-resolution simulations, and exhibits excellent generalization, allowing for different initial conditions and zero-shot super resolution without retraining. These findings highlight the FNO method as a promising tool to simulate complex fluid dynamics problems with high accuracy, computational efficiency, and generalization capabilities, enhancing our ability to model the aerosol-cloud-precipitation system and develop digital twins for similarly high-resolution measurements.

Citation: Zhang, T., Li, L., Lpez-Marrero, V., Lin, M., Liu, Y., Yang, F., et al. (2024). Emulator of PR-DNS: Accelerating dynamical fields with neural operators in particle-resolved direct numerical simulation. Journal of Advances in Modeling Earth Systems, 16, e2023MS003898. https://doi.org/10.1029/2023MS003898

Jiwen Fan, Editor, JAMES

View original post here:
Machine Learning Accelerates the Simulation of Dynamical Fields - Eos

Wall Street’s Favorite Machine Learning Stocks? 3 Names That Could Make You Filthy Rich – InvestorPlace

Machine learning stocks receive a lot of love in 2024

Source: a-image / Shutterstock.com

United States equities are on the rise again in 2024. TheS&P 500and Nasdaq have appreciated 7.2% and 7.4%, respectively. While stocks may be back on the rise, equities investors may want to reconsider putting money in innovative companies. Given the traction AI-related technology companies got last year, machine learning stocks may also receive a lot of love in 2024.

Machine learning (ML)is a branch of artificial intelligence (AI) that enables computers to learn from data and experience without explicit programming. Over the past decade, the technology has also garnered attention for its numerous applications. ML has also received positive attention from Wall Street. Below are three machine learning stocks that could make investors rich in the long-term.

Source: rafapress / Shutterstock.com

UiPath(NYSE:PATH) creates and implements software allowing customers to automate various business processes using robotic process automation (RPA) and artificial intelligence.

TheUiPath Business Automation Platformenables employees to quickly build automations for both existing and new processes by using software robots to perform a myriad of repetitive tasks. These range from simply logging into applications or moving folders to extracting information from documents along with updating information fields and databases. UiPath also provides a number ofturnkey automation solutions, allowing the company to target customers in a variety of industries including banking, healthcare and manufacturing.

Last year, shares of PATH almost doubled. Since the start of the new year, there has been pullback from all the major indices and, of course, UiPath, at its frothy valuation, saw some selling pressure. The companys share price has fallen 7% YTD. Selling pressure has continued slightly after weaker-than-expected guidance in UiPaths Q4 2023 earnings report. Outside of guidance, the company beat both revenue and earnings estimates. Q4 revenue increased 31% YOY to $405 million, and annual recurring revenue increased 22% to $1.5 billion. The company also achieved its first quarter of GAAP profitability as a public company in the fourth quarter.

Strong financial figures, despite weaker-than-expected guidance, could make UiPath a strong performer in 2024.

Source: JHVEPhoto / Shutterstock.com

Its hard to make a machine learning list without listing a semiconductor name, since semiconductors help machine learning programs to work the way they do. Advanced Micro Devices (NASDAQ:AMD) has built a number of advanced hardware for gaming and other computing applications. AMDs Radeon GPUs nowadays support RDNA 3 architecture-based GPUs for desktop-level AI and machine learning workflows.

2024 will be a big year for AMD in terms of AI and ML computing. The chipmaker announced the MI300x GPU chipset almost a year ago in its second quarter 2023 earnings report. To follow that up, in the third-quarter earnings report, AMD announced itexpects to sell $2 billion in AI chips next year. Because these AI chips arestillin high demand in North America, Europe and Asia, AMD will likely reap a significant profit upon entering the space.

Wall Street, notably, is loving AMDs stock. Wall Street firms have recently begun to boost their target prices for the chipmaker. The investment bank Jefferiesraisedtheir target price for AMD to $200/share from $130/share. JPMorgan, Goldman Sachs, Baird and a host of other investment banksalso made significant increases to their target pricesin late January 2024. Moreover, Japanese bank Mizuho Securities has recently raised its target price for $200/share to $235/share.

Source: Mamun sheikh K / Shutterstock.com

Last on our list of machine learning stocks is Palantir Technologies(NYSE:PLTR). Palantir has received a lot of love from some on Wall Street and a number of retail investors. Shares have risen 37% YTD. For those who dont know, Palantir initially focused on serving the defense and intelligence sectors but has since expanded its customer base to include various industries such as healthcare, energy and finance. The company provides a number of AI and ML-based data analytics tools for a number of businesses.

Most recently, Palantir has enjoyed a lot of attention due to its new AI Platform (AIP). AIP candeploycommercial and open-source large language models onto internally held data sets and, from there, recommend business processes and actions. Although I think Palantir has become too overvalued based on many believing its a fully-grown AI company when its just in the beginning, the company certainly has the potential to make investors money in the long-term.

On the date of publication, Tyrik Torresdid not have (either directly or indirectly) any positions in the securities mentioned in this article.The opinions expressed in this article are those of the writer, subject to the InvestorPlace.comPublishing Guidelines.

Tyrik Torres has been studying and participating in financial markets since he was in college, and he has particular passion for helping people understand complex systems. His areas of expertise are semiconductor and enterprise software equities. He has work experience in both investing (public and private markets) and investment banking.

See more here:
Wall Street's Favorite Machine Learning Stocks? 3 Names That Could Make You Filthy Rich - InvestorPlace

GE HealthCare and Hartford renew imaging agreement around AI and machine learning – DOTmed HealthCare Business News

Hartford HealthCare HealthCenter - Southington (Photo courtesy of Hartford HealthCare)

The collaboration dates back to 2016 and includes AI and machine learning software deployments to enhance clinical expertise as well as upgrades through a phased approach of Hartford HealthCares CT, PET/CT, MR, X-ray, nuclear medicine, mammography, ultrasound, and OEC 3D surgical imaging C-arm solutions. GE HealthCare will also provide its most recent patient monitoring, anesthesia, maternal infant care, and diagnostic cardiology technologies.

As part of the agreement, GE HealthCare technicians will be available in-house for repairs and maintenance, and regular upgrades will be performed as well as build-in-place upgrades with some existing MR, CT, PET/CT, and X-rays to refresh older systems to minimize construction costs, waste, equipment downtime, and disruptions to patient care.

This is especially important now, as technologies, equipment, and training are advancing at an ever-increasing pace, said Karen Goyette, executive vice president and chief strategy and transformation officer, in a statement.

Hartford HealthCare is made up of nearly 500 locations, including two tertiary-level teaching hospitals, an acute-care community teaching hospital, an acute-care hospital and trauma center, three community hospitals, a behavioral health network, a multispecialty physician group, a clinical care organization, a regional home care system, an array of senior care services, a mobile neighborhood health program and a comprehensive physical therapy and rehabilitation network. It serves 185 towns and cities.

Many of the software and AI solutions will be deployed within various imaging modalities to accelerate speed of use and improve accuracy, including:

X-ray GE HealthCares Critical Care Suite 2.0 will assess scans for signs of critical conditions, such as collapsed lungs or errors in chest X-ray acquisition, using AI-powered insights and analytics and provide feedback to ICU clinicians to help expedite diagnosis, optimize treatment decisions, and improve patient outcomes.

CT GE HealthCares TrueFidelity CT image-reconstruction technology is powered by a deep neural network that improves reading confidence for head, whole-body, cardiovascular, and other anatomical applications for patients of all ages.

MR Using AI, AIR Recon DL technology reconstructs MR images, improving the quality, speed, and workflow of the scanning process by reducing artifacts, increasing clarity, and facilitating faster acquisitions. This, in turn, improves patient comfort.

As part of the initial collaboration, the jointly created Care Logistics Center, formed in 2017, will match patients based on their needs with the best care regimens.

The renewal extends the collaboration to 2030.

See the original post here:
GE HealthCare and Hartford renew imaging agreement around AI and machine learning - DOTmed HealthCare Business News

EASA Discusses Autonomous Operations in New Artificial Intelligence Paper – Inside Unmanned Systems

The European Union Aviation Safety Agency (EASA) has published Issue 2 of its Concept Paper on Artificial Intelligence (AI) and Machine Learning (ML).

AI is being adopted widely and rapidly, including in the aviation domain, with its development significantly accelerating in the last decade due to an rising capacity to collect and store massive amounts of data. Increasing computing power and the development of more and more potent algorithms and architectures are also playing a role, affecting aviation products, services and business plans.

In its new Artificial Intelligence Concept Paper Issue 02 Guidance for Level 1 & 2 machine learning applications, EASA lays out a number of autonomous operations-related scenarios that are likely to become relevant in the near future. To cite an example, the report describes an ongoing innovation partnership contract (IPC) between Boeing and EASA involving an experimental auto-taxi system.

As currently envisaged, the system would receive, via standard radio communication, taxi clearance from ground control, provide a readback of the clearance, and plan an appropriate ground taxiing route based on that clearance. The system then executes the plan and autonomously controls the aircraft as it travels from one location to another at an airfield, such as from the boarding gate to the departure runway. While executing the plan, the system detects potential obstacles in the aircrafts path to which it can then react accordingly. The system employs a LIDAR system for the detection of obstacles. Optical cameras can also be added to the sensor array for object classification, to support improved awareness and intent prediction capabilities for objects and people in the environment. System operations are monitored by the flight crew, who retain the ability to override and disconnect the system at any time.

More widely, the newConcept Paperfocuses on strengthening four aviation pillars safety, efficiency, sustainability, and passenger experience while positioning ML at the forefront of aviation innovation. EASA acknowledges that the path to ML deployment is bringing unique challenges, particularly in terms of safeguarding operational safety.

The Concept Paper refines EASA guidance for Level 1 AI applications, i.e. those enhancing human capabilities, while broadening the discussion on topics such as learning assurance, AI explainability and ethics-based assessment. It also provides comprehensive guidance for the development and deployment of Level 2 AI-based systems. Level 2 AI includes the groundbreaking concept of human-AI teaming (HAT), setting the stage for AI systems that automatically make decisions under human oversight.

With the paper, EASA highlights its commitment to a future where AI and ML are fully integrated into aviation systems, while emphasizing the building of trust in AI applications, ensuring they complement human expertise and enhance overall aviation safety and sustainability.

As an independent and neutral body, EASA works to ensure confidence in safe air operations in Europe and world-wide, proposing and formulating rules, standards and guidance, certifying aircraft, parts, and equipment; and endorsing and overseeing organizations in all aviation domains.

Read the original here:
EASA Discusses Autonomous Operations in New Artificial Intelligence Paper - Inside Unmanned Systems

Self-supervised learning: What is it? How does it work? – DataScientest

In the case of Natural Language Processing (NLP), we use self-supervised learning to train the model on sentences from which words have been randomly omitted. It must then predict these removed words.

This method, applied to NLP, has proved effective and highly relevant. For example, the wav2vec and BERT models developed respectively by Facebook and Google AI are among the most revolutionary in NLP. Wav2vec has proved its worth in the field of Automatic Speech Recognition (ASR).

In this way, certain parts of audios are masked and the model is trained to predict these parts. BERT, an acronym for Bidirectional Encoder Representations from Transformers, is a Deep Learning model that currently offers the best results for most NLP tasks.

Unlike previous models, which scan text one-dimensionally to predict the next word, the BERT algorithm hides words randomly in the sentence and tries to predict them. To do this, it uses the full context of the sentence, both left and right.

Read more:
Self-supervised learning: What is it? How does it work? - DataScientest

Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in … – Nature.com

The FE-related traits and genomic information were obtained for 1,156 animals from an experimental breeding program at the Beef Cattle Research Center (Institute of Animal Science IZ).

Animals were from an experimental breeding program at the Beef Cattle Research Center at the Institute of Animal Science (IZ) in Sertozinho, So Paulo, Brazil. Since the 1980s, the experimental station has maintained three selection herds: Nellore control (NeC) with animals selected for yearling body weight (YBW) with a selection differential close to zero, within birth year and herd, while Nellore Selection (NeS) and Nellore Traditional (NeT) animals are selected for the YBW with a maximum selection differential, also within birth year and herd25. In the NeT herd, sires from commercial herds or NeS eventually were used in the breeding season, while the NeC and NeS were closed herds (only sires from the same herd were used in the breeding season), with controlled inbreeding rate by planned matings. In addition, the NeT herd has been selected for lower residual feed intake (RFI) since 2013. In the three herds, the animal selection is based on YBW measured at 378days of age in young bulls.

The FE-related traits were evaluated on 1156 animals born between 2004 and 2015 in a feeding efficiency trial, in which they were either housed in individual pens (683 animals) or group pens equipped with the GrowSafe feeding system (473 animals), with animals grouped by sex. From those, 146 animals were from the NeC herd (104 young bulls and 42 heifers), 300 from the NeS herd (214 young bulls and 86 heifers), and 710 from the NeT herd (483 young bulls and 227 heifers). Both feeding trials comprised at least 21 days for adaptation to the feedlot diet and management and at least 56 days for the data collection period. The young bull and heifers showed an average age at the end of the feeding trial was 36627.5 and 38445.4 days, respectively.

A total of 780 animals were genotyped with the Illumina BovineHD BeadChip assay (770k, Illumina Inc., San Diego, CA, USA), while 376 animals were genotyped with the GeneSeek Genomic Profiler (GGP Indicus HD, 77K). The animals genotyped with the GGP chip were imputed to the HD panel using FImpute v.326 with an expected accuracy higher than 0.97. Autosomal SNP markers with a minor allele frequency (MAF) lower than 0.10 and a significant deviation from HardyWeinberg equilibrium (P105) were removed, and markers and samples with call rate lower than 0.95 were also removed. An MAF lower than 10% was used to remove genetic markers with lower significance and noise information in a stratified population. After this quality control procedure, genotypes from 1,024 animals and 305,128 SNP markers remained for GS analyses. Population substructure was evaluated using a principal component analysis (PCA) based on the genomic relationship matrix using the ade4 R package (Supplementary Figure S1)27.

Animals were weighed without fasting at the beginning and end of the feeding trial, as well as every 14 days during the experimental period. The mixed ration (dry corn grain, corn silage, soybean, urea, and mineral salt) was offered ad libitum and formulated with 67% of total digestible nutrients (TDN) and 13% of crude protein (CP), aiming for an average daily gain (ADG) of 1.1kg.

The following feed efficiency-related traits were evaluated: ADG, dry matter intake (DMI), feed efficiency (FE), and RFI. In the individual pens, the orts were weighed daily in the morning before the feed delivery to calculate the daily dietary intake. In the group pens, the GrowSafe feeding system automatically recorded the feed intake. Thus, the DMI (expressed as kg/day) was estimated as the feed intake by each animal with subsequent adjustments for dry matter content. ADG was estimated as the slope of the linear regression of body weight (BW) on feeding trial days, and the FE was expressed as the ratio of ADG and DMI. Finally, RFI was calculated within each contemporary group (CG), as the difference between the observed and expected feed intake considering the average metabolic body weight (MBW) and ADG of each animal (Koch et al., 1963) as follows:

$$DMI=CG+ {beta }_{0}+{beta }_{1}ADG+{beta }_{2}MBW+varepsilon$$

where ({beta }_{0}) is the model intercept, ({beta }_{1}) and ({beta }_{2}) are the linear regression coefficients for (ADG) and ({MBW=BW}^{0.75}), respectively, and (varepsilon) is the residual of the equation representing the RFI estimate.

The contemporary groups (CG) were defined by sex, year of birth, type of feed trial pen (individual or collective) and selection herd. Phenotypic observations with values outside the interval of3.5 standard deviations below and above the mean of each CG for each trait were excluded, and the number of animals per CG ranged from 10 to 70.

The (co)variance components and heritability for FE-related traits were estimated considering a multi-trait GBLUP (MTGBLUP) as follows:

$$mathbf{y}=mathbf{X}{varvec{upbeta}}+mathbf{Z}mathbf{a}+mathbf{e},$$

Where ({varvec{y}}) is the matrix of phenotypic FE-related traits (ADG, FE, DMI, and RFI) of dimension Nx4 (N individuals andfour traits); ({varvec{upbeta}}) is the vector of fixed effects, linear and quadratic effects of cow age, and linear effect of animals age at the beginning of the test; (mathbf{a}) is the vector of additive genetic effects (breeding values) of animal, and (mathbf{e}) is a vector with the residual terms. The (mathbf{X}) and (mathbf{Z}) are the incidence matrices related to fixed (b) and random effects (a), respectively. It was assumed that the random effects of animals and residuals were normally distributed, as (mathbf{a}sim {text{N}}(0,mathbf{G}otimes {mathbf{S}}_{mathbf{a}})) and (mathbf{e}sim {text{N}}(0,mathbf{I}otimes {mathbf{S}}_{mathbf{e}})), where (mathbf{G}) is the additive genomic relationship matrix between genotyped individuals according to VanRaden28, (mathbf{I}) is an identity matrix,is the Kronecker product, and ({mathbf{S}}_{mathbf{a}}=left[begin{array}{ccc}{upsigma }_{{text{a}}1}^{2}& cdots & {upsigma }_{mathrm{a1,4}}\ vdots & ddots & vdots \ {upsigma }_{mathrm{a1,4}}& cdots & {upsigma }_{{text{a}}4}^{2}end{array}right]) and ({mathbf{S}}_{mathbf{e}}=left[begin{array}{ccc}{upsigma }_{{text{e}}1}^{2}& cdots & {upsigma }_{mathrm{e1,4}}\ vdots & ddots & vdots \ {upsigma }_{mathrm{e1,4}}& cdots & {upsigma }_{{text{e}}4}^{2}end{array}right]) are the additive genetic and residual (co)variance matrices, respectively. The G matrix was obtained according to VanRaden28: (mathbf{G}=frac{mathbf{M}{mathbf{M}}^{mathbf{^{prime}}}}{2sum_{{text{j}}=1}^{{text{m}}}{{text{p}}}_{{text{j}}}left(1-{{text{p}}}_{{text{j}}}right)}) where (mathbf{M}) is the SNP marker matrix with codes 0, 1, and 2 for genotypes AA, AB, and BB adjusted for allele frequency expressed as (2{{text{p}}}_{{text{j}}}), and ({{text{p}}}_{{text{j}}}) is the frequency of the second allele jth SNP marker.

The analyses were performed using the restricted maximum likelihood (REML) method through airemlf90 software29. The predictf90 software29 was used to obtain the phenotypes adjusted for the fixed effects and covariates (({{text{y}}}^{*}={text{y}}-{text{X}}widehat{upbeta })). The adjusted phenotypes were used as the response variable in the genomic predictions.

Tthe GEBVs accuracy (({{text{Acc}}}_{{text{GEBV}}})) in the whole population, was calculated based on prediction error variance (PEV) and the genetic variance for each FE-related trait (({upsigma }_{{text{a}}}^{2})) using the following equation30: ({text{Acc}}=1-sqrt{{text{PEV}}/{upsigma }_{{text{a}}}^{2}}) .

A forward validation scheme was applied for computing the prediction accuracies using machine learning and parametric methods, splitting the dataset based on year of birth, with animals born between 2004 and 2013 assigned as the reference population (n=836) and those born in 2014 and 2015 (n=188) as the validation set. For ML approaches, we randomly split the training dataset into fivefold to train the models.

Genomic prediction for FE-related traits considering the STGBLUP can be described as follows:

$${mathbf{y}}^{mathbf{*}}={varvec{upmu}}+mathbf{Z}mathbf{a}+mathbf{e}$$

where ({mathbf{y}}^{mathbf{*}}) is the Nx1 vector of adjusted phenotypic values for FE-related traits, (upmu) is the model intercept, (mathbf{Z}) is the incidence connecting observations; (mathbf{a}) is the vector of predicted values, assumed to follow a normal distribution given by ({text{N}}(0,{mathbf{G}}sigma_{a}^{2})) and (mathbf{e}) is the Nx1 vector of residual values considered normally distributed as ({text{N}}(0,mathbf{I}{upsigma }_{{text{e}}}^{2})), in which I is an identity matrix, ({upsigma }_{{text{e}}}^{2}) is the residual variance. The STGBLUP model was performed using blupf90+software29.

Genomic prediction for FE-related traits considering MTGBLUP can be described as follows:

$${mathbf{y}}^{mathbf{*}}={varvec{upmu}}+mathbf{Z}mathbf{a}+mathbf{e}$$

where ({mathbf{y}}^{mathbf{*}}) is the matrix of adjusted phenotypes of dimension Nx4, (upmu) is the trait-specific intercept vector, (mathbf{Z}) is the incidence matrix for the random effect; (mathbf{a}) is an Nx4 matrix of predicted values, assumed to follow a normal distribution given by ({text{MVN}}(0,{mathbf{G}} otimes {mathbf{S}}_{{mathbf{a}}})) where ({mathbf{S}}_{mathbf{a}}) represents genetic (co)variance matrix for the FE-related traits (44). The residual effects (e) were considered normally distributed as ({text{MVN}}(0,mathbf{I}otimes {mathbf{S}}_{mathbf{e}})) in which I is an identity matrix, and ({mathbf{S}}_{mathbf{e}}) is the residual (co)variance matrix for FE-related traits (44). The MTGBLUP was implemented in the BGLR R package14 considering a Bayesian GBLUP with a multivariate Gaussian model with an unstructured (co)variance matrix between traits (({mathbf{S}}_{mathbf{a}})) using Gibbs sampling with 200,000 iterations, including 20,000 samples as burn-in and thinning interval of 5 cycles. Convergence was checked by visual inspection of trace plots and distribution plots of the residual variance.

Five Bayesian regression models with different priors were used for GS analyses: Bayesian ridge regression (BRR), Bayesian Lasso (BL), BayesA, BayesB, and BayesC. The Bayesian algorithms for GS were implemented using the R package BGLR version 1.0914. The BGLR default priors were used for all models, with 5 degrees of freedom (dfu), a scale parameter (S), and . The Bayesian analyses were performed considering Gibbs sampling chains of 200,000 iterations, with the first 20,000 iterations excluded as burn-in and a sampling interval of 5 cycles. Convergence was checked by visual inspection of trace plots and distribution plots of the residual variance. For Bayesian regression methods, the general model can be described as follows:

$${mathbf{y}}^{mathbf{*}}=upmu +sum_{{text{w}}=1}^{{text{p}}}{{text{x}}}_{{text{iw}}}{{text{u}}}_{{text{w}}}+{{text{e}}}_{{text{i}}}$$

where (upmu) is the model intercept; ({{text{x}}}_{{text{iw}}}) is the genotype of the ith animal at locus w (coded as 0, 1, and 2); ({{text{u}}}_{{text{w}}}) is the SNP marker effect (additive) of the w-th SNP (p=305,128); and ({{text{e}}}_{{text{i}}}) is the residual effect associated with the observation of ith animal, assumed to be normally distributed as (mathbf{e}sim {text{N}}(0,{mathbf{I}upsigma }_{{text{e}}}^{2})).

The BRR method14 assumes a Gaussian prior distribution for the SNP markers (({{text{u}}}_{{text{w}}})), with a common variance ({(upsigma }_{{text{u}}}^{2})) across markers so that ({text{p}}left({{text{u}}}_{1},dots ,{{text{u}}}_{{text{w}}}|{upsigma }_{{text{u}}}^{2}right)=prod_{{text{w}}=1}^{{text{p}}}{text{N}}({{text{u}}}_{{text{w}}}{|0,upsigma }_{{text{u}}}^{2})). The variance of SNP marker effects is assigned a scaled-inverse Chi-squared distribution [({text{p}})(({upsigma }_{{text{u}}}^{2})={upchi }^{-2}({upsigma }_{{text{u}}}^{2}|{{text{df}}}_{{text{u}}},{{text{S}}}_{{text{u}}}))], and the residual variance is also assigned a scaled-inverse Chi-squared distribution with degrees of freedom (dfe)and scale parameters (Se).

Bayesian Lasso (BL) regression31 used an idea from Tibshirani32 to connect the LASSO (least absolute shrinkage and selection operator) method with the Bayesian analysis. In the BL, the source of variation is split intoresidual term(({upsigma }_{{text{e}}}^{2}))and variation due to SNP markers (({upsigma }_{{{text{u}}}_{{text{w}}}}^{2})). The prior distribution for the additive effect of the SNP marker (left[{text{p}}left({{text{u}}}_{{text{w}}}|{uptau }_{{text{j}}}^{2},{upsigma }_{{text{e}}}^{2}right)right]) follows a Gaussian distribution with marker-specific prior variance given by ({text{p}}left({{text{u}}}_{{text{w}}}|{uptau }_{{text{j}}}^{2},{upsigma }_{{text{e}}}^{2}right)=prod_{{text{w}}=1}^{{text{p}}}{text{N}}({{text{u}}}_{{text{w}}}left|0,{uptau }_{{text{j}}}^{2}{upsigma }_{{text{e}}}^{2}right)). This prior distribution leads to marker-specific shrinkage of their effect, whose their extent depends on the variance parameters (left({uptau }_{{text{j}}}^{2}right)). The variance parameters (left({uptau }_{{text{j}}}^{2}right)) is assigned as exponential independent and identically distributed prior,({text{p}}left( {{uptau }_{{text{j}}}^{2} left| {uplambda } right.} right) = mathop prod limits_{{{text{j}} = 1}}^{{text{p}}} {text{Exp}}left( {{uptau }_{{text{j}}}^{2} left| {{uplambda }^{2} } right.} right)) and the square lambda regularization parameter (({uplambda }^{2})) follows a Gamma distribution (({text{p}}left({uplambda }^{2}right)={text{Gamma}}({text{r}},uptheta ))), where r and (uptheta) are the rate and shape parameters, respectively31. Thus, the marginal prior for SNP markers is given by a double exponential (DE) distribution as follows: ({text{p}}left( {{text{u}}_{{text{w}}} left| {uplambda } right.} right) = int {{text{N}}left( {{text{u}}_{{text{w}}} left| {0,{uptau }_{{text{j}}}^{2} ,{upsigma }_{{text{e}}}^{2} } right.} right){text{Exp}}left( {{uptau }_{{text{j}}}^{2} left| {{uplambda }^{2} } right.} right)}), where the DE distribution places a higher density at zero and thicker tails, inducing stronger shrinkage of estimates for markers with relatively small effect and less shrinkage for markers with substantial effect. The residual variance (({upsigma }_{{text{e}}}^{2})) is specified as a scaled inverse chi-squared prior density, with degrees of freedom dfe and scale parameter Se.

BayesA method14,33 considers Gaussian distribution with null mean as prior for SNP marker effects (({{text{u}}}_{{text{w}}})), and a SNP marker-specific variance (({upsigma }_{{text{w}}}^{2})). The variance associated with each marker effect assumes a scaled inverse chi-square prior distribution, ({text{p}}left({upsigma }_{{text{w}}}^{2}right)={upchi }^{-2}left({upsigma }_{{text{w}}}^{2}|{{text{df}}}_{{text{u}}},{{text{S}}}_{{text{u}}}^{2}right)), with degrees of freedom (({{text{df}}}_{{text{u}}})) and scale parameter (({{text{S}}}_{{text{u}}}^{2})) treated as known14. Thus, BayesA places a t-distribution for the markers effects, i.e., ({text{p}}left({{text{u}}}_{{text{w}}}|{{text{df}}}_{{text{u}}},{{text{S}}}^{2}right)={text{t}}left(0,{{text{df}}}_{{text{u}}},{{text{S}}}_{{text{u}}}^{2}right)), providing a thicker-tail distribution compared to the Gaussian, allowing a higher probability of moderate to large SNP effects.

BayesB assumes that a known proportion of SNP markers have a null effect (i.e., a point of mass at zero), and a subset of markers with a non-null effect that follow univariate t-distributions3,12, as follows:

$${text{p}}left({{text{u}}}_{{text{w}}}|{text{df}},uppi ,{{text{df}}}_{{text{u}}},{S}_{B}^{2}right)=left{begin{array}{cc}0& mathrm{with probability pi }\ {text{t}}left({{text{u}}}_{{text{w}}}|{{text{df}}}_{{text{u}}},{S}_{B}^{2}right)& mathrm{with probability }left(1-uppi right)end{array}right.$$

where (uppi) is the proportion of SNP markers with null effect, and (1-uppi) is the probability of SNP markers with non-null effect contributing to the variability of the FE-related trait3. Thus, the prior distribution assigned to SNP with non-null effects is a scaled inverse chi-square distribution.

BayesC method34 assumes a spikeslab prior for marker effects, which refers to a mixture distribution comprising a fixed amount with probability (uppi) of SNP markers have a null effect, whereas a probability of 1 of markers have effects sampled from a normal distribution. The prior distribution is as follows:

$${text{p}}left({{text{u}}}_{{text{w}}},{upsigma }_{{text{w}}}^{2},uppi right)=left{prod_{{text{j}}=1}^{{text{w}}}left[uppi left({{text{u}}}_{{text{w}}}=0right)+left(1-uppi right){text{N}}(0,{upsigma }_{{text{w}}}^{2})right]*{upchi }^{-2}left({upsigma }_{{text{w}}}^{2}|{{{text{df}}}_{{text{u}}},mathrm{ S}}_{{text{B}}}^{2}right)*upbeta (uppi |{{text{p}}}_{0},{uppi }_{0}right},$$

Where ({upsigma }_{{text{w}}}^{2}) is the common variance for marker effect, ({{text{df}}}_{{text{u}}}) and ({{text{S}}}_{{text{B}}}^{2}) are the degrees of freedom and scale parameter, respectively, ({{text{p}}}_{0}) and ({uppi }_{0})[0,1] are the prior shape parameters of the beta distribution.

Two machine learning (ML) algorithms were applied for genomic prediction: Multi-layer Neural Network (MLNN) and support vector regression (SVR). The ML approaches were used to alleviate the standard assumption adopted in the linear methods, which restrict to additive genetic effects of markers without considering more complex gene action modes. Thus, ML methods are expected to improve predictive accuracy for different target traits. To identify the best combination of hyperparameters (i.e., parameters that must be tuned to control the learning process to obtain a model with optimal performance) in the supervised ML algorithms (MLNN and SVR), we performed a random grid search by splitting the reference population from the forward scheme into five-folds35.

In MLNN, handling a large genomic dataset, such as 305,128 SNPs, is difficult due to the large number of parameters that need to be estimated, leading to a significant increase in computational demand36. Therefore, an SNP pre-selection strategy based on GWAS results in the training population using an MTGBLUP method (Fig.1A) was used to reduce the number of markers to be considered as input on the MLNN. In addition, this strategy can remove noise information in the genomic data set. In this study, the traits displayed major regions explaining a large percentage of genetic variance, which makes using pre-selected markers useful37.

(A) Manhattan plot for percentage of genetic variance explained by SNP-marker estimated through multi-trait GWAS in training population to be used as pre-selection strategies for multi-layer neural network. (B) General representation of neural networks with two hidden layers used to model nonlinear dependencies between trait and SNP marker information. The input layer ((X={x}_{i,p})) considered in the neural network refers to the SNP marker information (coded as 0, 1, and 2) of the ith animal. The selected node represents the initial weight ((W={w}_{p})), assigned as random values between -0.5 and 0.5, connecting each input node to the first hidden layer and in the second layer the ({w}_{up}) refers to the output weight from the first hidden layer, b represents the bias which helps to control the values in the activation function. The output ((widehat{y})) layer represents a weighted sum of the input features mapped in the second layer.

The MLNN model can be described as a two-step regression38. The MLNN approach consists of three different layer types: input layer, hidden layer, and output layer. The input layer receives the input data, i.e., SNP markers. The hidden layer contains mapping processing units, commonly called neurons, where each neuron in the hidden layer computes a non-linear function (activation) of the weighted sum of nodes on the previous layer. Finally, the output layer provides the outcomes of the MLNN. Our proposed MLNN architecture comprises two fully connected hidden layers schematically represented in Fig.1B. The input layer in MLNN considered SNP markers that explained more than 0.125% of the genetic variance for FE-related traits (Fig.1A;~15k for ADG and DMI, and~16k for FE and RFI). The input covariate (X={{x}_{p}}) contains pre-selected SNP markers (p) with a dimension Nxp (N individuals and p markers). The pre-selected SNP markers are combined with each k neuron (with k=1, , Nr) through the weight vector ((W)) in the hidden layer and then summed with a neuron-specific bias (({b}_{k})) for computing the linear score for the neuron k as:({Z}_{i}^{[1]}=f({{b}_{k}}^{[1]}+X{W}^{[1]})) (Fig.1B). Subsequently, this linear score transformed using an activation function (fleft(.right)) to map k neuron-specific scores and produce the first hidden layer output ((fleft({z}_{1,i}right))). In the second-hidden layer, each neuron k receives a net input coming from hidden layer 1 as: ({Z}_{i}^{[2]}={{b}_{k}}^{left[2right]}+{Z}_{i}^{[1]}{W}^{[2]}), where ({W}^{[2]}) represents the weight matrix of dimension k x k (knumber of neurons) connecting the ({Z}_{i}^{[1]}) into the second hidden layer, and ({{b}_{k}}^{left[2right]}) is a bias term in hidden layer 2. Then, the activation function is applied to map the kth hidden neuron unit in the second hidden layer and generate the output layer as ({V}_{2,i}=fleft({z}_{2,i}right)). In the MLNN, a hyperbolic tangent activation function (({text{tanh}}left({text{x}}right)={{text{e}}}^{{text{x}}}-{{text{e}}}^{-{text{x}}}/{{text{e}}}^{{text{x}}}+{{text{e}}}^{-{text{x}}})) was adopted in the first and second layers, providing greater flexibility in the MLNN39.

The prediction of the adjusted FE-related trait was obtained as follows38:

$${mathbf{y}}^{mathbf{*}}=mathbf{f}left(mathbf{b}+{mathbf{V}}_{2,mathbf{i}}{mathbf{W}}_{0}right)+mathbf{e}$$

where ({mathbf{y}}^{mathbf{*}}) represents the target adjusted feed efficiency-related trait for the ith animal; (k) the number of neurons considered in the model and assumed the same in the first and second layer; ({mathbf{W}}_{0}) represents the weight from the k neuron in layer 2, (mathbf{b}) is related to the bias parameter. The optimal weights used in MLNN were obtained by minimizing the mean square error of prediction in the training subset40.

The MLNN model was implemented using the R package h2o (https://github.com/h2oai/h2o-3), with the random grid search using the h2o.grid function (https://cran.r-project.org/web/packages/h2o) to determine the number of neurons to maximize the prediction accuracy. We used the training population split into fivefold to assess the best neural network architecture and then apply it in the disjoint validation set41,42. We considered a total of 1000 epochs36, numbers of neurons ranging from 50 to 2500 with intervals of 100, and applied a dropout ratio of 0.2 and regularization L1 and L2 parameters as 0.0015 and 0.0005, respectively. In this framework, the MLNN was performed using two hidden layers of neural networks with the number of neurons (k) of 750 for ADG, 1035 for DMI, 710 for FE, and 935 for RFI obtained during the training process.

Support vector regression (SVR) is a kernel-based supervised learning technique used for regression analysis43. In the context of GS, the SVR uses linear models to implement nonlinear regression by mapping the predictor variables (i.e., SNP marker) in the feature space using different kernel functions (linear, polynomial, or radial basis function) to predict the target information, e.g., adjusted phenotype the GS44. SVR can map linear or nonlinear relationships between phenotypes and SNP markers depending on the kernel function. The best kernel function mapping genotype to phenotype (linear, polynomial, and radial basis) was determined using the training subset split into fivefold. The radial basis function (RBF) was chosen as it outperformed the linear and polynomial (degree equal 2) kernels in the training process, increasing 8.25% in predictive ability and showing the lowest MSE.

The general model for SVR using a RBF function can be described as38,45: ({mathbf{y}}_{mathbf{i}}^{mathbf{*}}=mathbf{b}+mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}mathbf{w}+mathbf{e}), where (mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}) represents the kernel radial basis function used to transform the original predictor variables, i.e. SNP marker information (({text{m}})), (b) denotes the model bias, and (w) represents the unknown regression weight vector. In the SVR, the learn function (mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}) was given by minimizing the loss function. The SVR was fitted using an epsilon-support vector regression that ignores residual absolute value ((left|{y}_{i}^{*}-{widehat{y}}_{i}^{*}right|)) smaller than some constant () and penalize larger residuals46.

The kernel RBF function considered in the SVR follows the form: (mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}=mathbf{exp}left(-{varvec{upgamma}}{Vert {mathbf{m}}_{mathbf{i}}-{mathbf{m}}_{mathbf{j}}Vert }^{2}right)), where the ({varvec{upgamma}}) is a gamma parameter to quantity the shapes of the kernel functions, (m)and({m}_{i}) are the vectors of predictor variables for labels i and j. The main parameters in SVR are the cost parameter (({text{C}})), gamma parameter (({varvec{upgamma}})), and epsilon ((upepsilon)). The parameters ({text{C}}) and (upepsilon) were defined using the training data set information as proposed by Cherkasky and Ma47: ({text{C}}={text{max}}left(left|overline{{{text{y}} }^{*}}+3{upsigma }_{{{text{y}}}^{*}}right|,left|overline{{{text{y}} }^{*}}-3{upsigma }_{{{text{y}}}^{*}}right|right)) and (upepsilon =3{upsigma }_{{{text{y}}}^{*}}left(sqrt{{text{ln}}left({text{n}}right)/{text{n}}}right)), in which the (overline{{{text{y}} }^{*}}) and ({upsigma }_{{{text{y}}}^{*}}) are the mean and the standard deviation of the adjusted FE-related traits on the training population, and n represents the number of animals in the training set. The gamma () was determined through a random search of values varying from 0 to 5, using the training folder split into fivefold. The better-trained SVR model considered the parameter of 2.097 for ADG, 0.3847 for DMI, 0.225 for FE, and 1.075 for RFI. The SVR was implemented using the e1071 R package48.

Prediction accuracy (acc) of the different statistical approaches was assessed by Pearsons correlation between adjusted phenotypes (({{text{y}}}^{*})) and their predicted values (({widehat{{text{y}}}}_{{text{i}}}^{*})) on the validation set, and root mean squared error (RMSE). The prediction bias was assessed using the slope of the linear regression of ({widehat{y}}_{i}^{*}) on ({{text{y}}}^{*}), for each model. The Hotelling-Williams test49 was used to assess the significance level of the difference in the predictive ability of Bayesian methods (BayesA, BayesB, BayesC, BL, and BRR), MTGBLUP, and machine learning (MLNN and SVR) against STGBLUP. The similarity between the predictive performance of the different models was assessed using Wards hierarchical clustering method with an Euclidian distance analysis. The relative difference (RD) in the predictive ability was measured as ({text{RD}}=frac{({{text{r}}}_{{text{m}}}-{{text{r}}}_{{text{STGBLUP}}})}{{{text{r}}}_{{text{STGBLUP}}}}times 100), where ({{text{r}}}_{{text{m}}}) represents the acc of each alternative approach (SVR, MLNN, and MTGBLUP, or Bayesian regression modelsBayesA, BayesB, BayesC, BL, and BRR), and ({{text{r}}}_{{text{STGBLUP}}}) is the predictive ability obtained using the STGBLUP method.

The animal procedures and data sampling presented in this study were approved and performed following the Animal Care and Ethical Committee recommendations of the So Paulo State University (UNESP), School of Agricultural and Veterinary Science (protocol number 18.340/16).

Read more here:
Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in ... - Nature.com

Research on lightweight algorithm for gangue detection based on improved Yolov5 | Scientific Reports – Nature.com

The following improvements have been made to Yolov5s. The EfficientVIT network was proposed by Liu et al.22 to cascade groups of attentional modules and give different complete features to divide the attentional head, which saves computational costs and increases attentional diversity. Comprehensive experiments demonstrate that the efficiency is significantly better than existing effective models, yielding a better speed-capacity trade-off. Mpdiou is a modern bounding box similarity comparison metric based on minimum point distance, Mpdiou, proposed by Ma23 and others, which incorporates all the relevant factors considered in the existing loss functions, i.e., overlapping or non-overlapping areas, centroid distances, width and height biases while simplifying the computation process. C3_Faster, as a current Partial Convolution (PConv) technique proposed by Chen et al.24, performs spatial feature extraction more efficiently due to both reduced redundant computation and reduced memory access. Based on PConv, FasterNet, a novel family of neural networks, is additionally proposed, which achieves higher operation speed than others on different devices without compromising the accuracy of visual tasks. This is because the lightweight improvement of Yolov5s requires a reduction in both the number of parameters and the amount of computation, which can be achieved by all of the above methods and satisfies the experimental requirements. Thus, firstly, the entire backbone network in the original Yolov5s is replaced by the EfficientVIT network in the backbone module, secondly, the C3 module is replaced by C3_Faster in the HEAD module, and again, the Neck region of the Yolov5 model is appropriately streamlined, the 2020 feature map branch, which has the largest sensory field and is suitable for detecting objects of larger size, is deleted, and finally Mpdiou is used to replace CIOU, while the SE attention mechanism is introduced, which is conducive to the model's better fusion of valuable features to improve the detection performance. A schematic of the structure of the improved model is shown in Fig.2.

Structure of Yolov5s improved model.

EfficientVit is a lightweight network model. EfficientVit designs a different building block with a mezzanine layout, namely a single memoryless bound MHSA between valid FFN layers, which improves channel communication while increasing memory efficiency. EfficientVit also proposes a cascade group attention module that assigns different complete feature segmentations to the attention head25, and the overall framework is shown in Fig.3. Containing three phases, each phase contains a number of sandwich structures, which consist of 2N DWConv (spatially localized communication) and FFN (channel communication) and cascaded packet attention. Cascading group attention differs from previous MHSA in that heads are first segmented and then Q, K, and V are generated. Alternatively, to learn richer feature maps and increase the model capacity, the output of each head is summed with the input of the next head. Finally, multiple header outputs are concatenated and mapped using a linear layer to obtain the final output, which is denoted as Eq:

$${X}_{ij} = Attn(X_{ij} W_{ij}^{Q} ,X_{ij} W_{ij}^{K} ,X_{ij} W_{ij}^{V} )$$

(1)

$${X}_{i + 1} = Concat[{X}_{ij} ]_{j = 1:h} W_{i}^{P}$$

(2)

$$X^{prime}_{ij} = X_{ij} + {X}_{i(j - 1)} ,1 < j le h$$

(3)

The jth head in Eqs. (1), (2) computes the self-attention on Xij, which is the jth partition of the input feature Xi, i.e., Xi=[Xi1, Xi2, , Xih] and 1jh is the total number of heads, (W_{ij}^{Q}), (W_{ij}^{K}), and (W_{ij}^{V}) are the projection layers that partition the input feature into different subspaces, and (W_{i}^{P}) is a linear layer that projects the connected output features back to the input dimension that is consistent with the input.

Equation(3) where (X^{prime}_{ij}) is the sum of the jth input segmentation point Xij and the (j-1)th head output (widetilde{X}_{i(j - 1)}) computed according to Eq.(1). It replaces Xij as the original input feature for the j-th head when computing self-attention. In addition, another label interaction layer is applied after Q-projection, which allows self-attention to jointly capture local and global relations and greatly enhance the feature representation.

The loss function is an influential component in neural networks whose main role is to measure the distance between the information predicted by the network and the desired information, i.e. The closer the two are to each other, the smaller the value of the loss function. The loss functions of the YOLO algorithm family mainly include the localization loss function (lossrect), the confidence prediction loss function (lossobj), and the category loss functions (loscls). The localization loss function used by Yolov5 is the CIOU function, which is computed as follows.

$$CIOU_Loss = 1 - IOU + frac{{lambda^{2} (a,a^{gt} )}}{{c^{2} }} + alpha mu$$

(4)

$$alpha = frac{mu }{(1 - IOU) + mu }$$

(5)

$$mu = frac{4}{pi }left[ {(arctan frac{{w^{gt} }}{{h^{gt} }}) - arctan frac{w}{h}} right]^{2}$$

(6)

Equations(4)(6) in which a and agt are the centroids of the prediction and target frames, respectively, and is the Euclidean distance between the two centroids; C is the diagonal length of the smallest closed region of the predicted and target frames. is the weight of the function; is the consistency of the aspect ratios of the two frames; Here, h and w are the height and width of the predicted frame, respectively. The hgt and wgt are the height and width of the target frames, respectively. The CIOU function mainly notices the overlapping parts of the prediction and target frames. The Mpdiou loss function is used.

Mpdiou is a bounding box similarity comparison metric based on the minimum point distance that includes all the relevant factors considered in existing loss functions. Mpdiou simplifies the similarity comparison between two bounding boxes and is suitable for overlapping or non-overlapping bounding box regression. Therefore, Mpdiou can be a decent alternative to the intersection and merging ratio as a metric for all performance metrics in 2D/3D computer vision tasks. It also simplifies the computation by directly minimizing the upper-left and lower-right point distances between the predicted bounding boxes and the actual labeled bounding boxes. Mpdiou is computed as follows.

$${text{d}}_{1}^{2} = (x_{1}^{B} - x_{1}^{A} )^{2} + (y_{1}^{B} - y_{1}^{A} )^{2}$$

(7)

$${text{d}}_{2}^{2} = (x_{2}^{B} - x_{2}^{A} )^{2} + (y_{2}^{B} - y_{2}^{A} )^{2}$$

(8)

$$M{text{pdiou}} = frac{A cap B}{{A cup B}} - frac{{d_{1}^{2} }}{{w^{2} + h^{2} }} - frac{{d_{2}^{2} }}{{w^{2} + h^{2} }}$$

(9)

In Eqs. (7)(9) d1, d2 denote the intersection and minimum point distance, two arbitrary shapes: A, BSRn, and the width and height of the input image: w, h. Output: Mpdiou.Let ((x_{1}^{A} ,y_{1}^{A} )), ((x_{2}^{A} ,y_{2}^{A} )) denote the coordinates of the upper left and lower right points of A. Let ((x_{1}^{B} ,y_{1}^{B} )), ((x_{2}^{B} ,y_{2}^{B} )) denote the coordinates of the upper left and lower right points of B, respectively.

The object detection head is part of the feature pyramid used to perform object detection, which includes multiple convolutional, pooling, and fully connected layers, among others. In the Yolov5 model, the detection head module is mainly responsible for multiple object detection feature maps extracted from the backbone network. The module consists of three main parts. The C3 module is an essential part of the Yolov5 network and its main role is to increase the depth and receptive field of the network and improve the feature extraction capability. C3-Faster is implemented as C3-Faster by multiple Faster_Blocks, which can be used to replace the C3 module in Yolov5 thereby achieving accelerated network inference, where the Faster_Block is implemented by the lightweight convolutional PConv proposed in the literature21 in combination with additional operations. Replace the C3 module with C3-Faster in the HEAD module.

The Neck region in the Yolov5 model uses a multipath structure to aggregate features and enhance network feature fusion. The size of the coal and gangue is too narrow with respect to the whole image, making the Neck region redundant for large object detection. In order to improve the model detection speed, the Neck region of the Yolov5 model is properly streamlined by removing the 2020 feature map branch that has the largest receptive field and is suitable for detecting objects of larger sizes. Elimination is performed to reduce the model complexity and improve the real-time performance of detection. As shown in Fig.4.

Improved neck and prediction structure.

The SE attention mechanism is introduced into the original model to improve the object detection accuracy. The SE attention mechanism consists of three parts, namely, Tightening Squeeze, Incentive Expiration, and Feature Schema Calibration, with the main purpose of enhancing useful features. First, the global information of the feature maps is obtained by global average pooling, and the individual channels refine this information to derive the channel weights and adjust the weights of the original feature maps for better performance. The resulting feature maps are compressed along the spatial dimension, and the dimensionality of the feature maps is compressed using a global average pooling compression operation to turn each two-dimensional feature channel into a real number, with the output dimension matching the number of input feature channels. The feature map from WHC is compressed into a 11C vector by The feature map is compressed from WHC to a 11C vector by the Excitation operation using the completely connected layer acting on the feature map, and the Sigmoid activation function to obtain the normalized weights. The weight information is obtained through learning, and the weights are applied to the corresponding channels, and finally The scale operation is performed, and the weights of each feature channel obtained after the Excitation operation are multiplied with the original feature map channels one by one, and the generated feature vectors are multiplied with the corresponding channels of the feature map to obtain the weights of the corresponding channels, which are re-calibrated to the feature map. The SE module is shown in Fig. 5.

See more here:
Research on lightweight algorithm for gangue detection based on improved Yolov5 | Scientific Reports - Nature.com

Orange isn’t building its own AI foundation model here’s why – Light Reading

There has been a flurry of interest in generative AI (GenAI) from telcos, each of which has taken its own nuanced approach to the idea of building its own large language models (LLMs). While Vodafone seems todismiss the ideaand Verizon appears content to build on existing foundation models, Deutsche Telekom and SK Telecomannounced last yearthey will develop telco-specific LLMs. Orange, meanwhile, doesn't currently see the need to build a foundation model, its chief AI officer Steve Jarrett has recently told Light Reading.

Jarrett said the company is currently content with using existing models and adapting them to its needs using two main approaches. The first one is retrieval-augmented generation (RAG), where a detailed source of information is passed to the model together with the prompt to augment its response.

He said this allows the company to experiment with different prompts easily, adding that existing methodologies can be used to assess the results. "That is a very, very easy way to dynamically test different models, different styles of structuring the RAG and the prompts. And [] that solves the majority of our needs today," he elaborated.

At the same time, Jarrett admitted that the downside of RAG is that it may require a lot of data to be passed along with the prompt, making more complex tasks slow and expensive. In such cases, he argued, fine-tuning is a more appropriate approach.

Distilling models

In this case, he explained, "you take the information that you would have used in the RAG for [] a huge problem area. And you make a new version of the underlying model that embeds all that information." Another related option is to distill the model.

This involves not just structuring the output of the model, but downsizing it, "like you're distilling fruit into alcohol," Jarrett said, adding "there are techniques to actually chop the model down into a much smaller model that runs much faster."

This approach is, however, highly challenging. "Even my most expert people frequently make mistakes," he admitted, saying: "It's not simple, and the state of the art of the tools to fine tune are changing every single day." At the same time, he noted that these tools are improving constantly and, as a result, he expects fine-tuning to get easier over time.

He pointed out that building a foundation model from scratch would be an even more complex task, which the company currently doesn't see a reason to embark on. Nevertheless, he stressed that it's impossible to predict how things will evolve in the future.

Complexity budget

One possibility is that big foundational models will eventually absorb so much information that the need for RAG and other tools will diminish. In this scenario, Orange may never have to create its own foundation model, Jarrett said, "as long as we have the ability to distill and fine tune models, where we need to, to make the model small enough to run faster and cheaper and so on."

He added: "I think it's a very open question in the industry. In the end, will we have a handful of massive models, and everyone's doing 99% RAG and prompt engineering, or are there going to be millions of distilled and fine-tuned models?"

One factor that may determine where things will go in the future is what Jarrett calls the complexity budget. This is a concept that conveys how much computing was needed from start to finish to produce an answer.

While a very large model may be more intensive to train in the beginning, there may be less computing required for RAG and fine-tuning. "The other approach is you have a large language model that also obviously took a lot of training, but then you do a ton more compute to fine tune and distill the model so that your model is much smaller," he added.

Apart from cost, there is also an environmental concern. While hyperscalers tend to perform relatively well in terms of using clean energy, and Jarrett claimed that Orange is "fairly green as a company," he added that the carbon intensity of the energy used for on-premises GPU clusters tends to vary in the industry.

Right tool for the job

The uncertainty surrounding GenAI's future evolution is one of the reasons why Orange is taking a measured approach to the technology, with Jarrett stressing it is not a tool that's suited to every job. "You don't want to use the large language model sledge hammer to hit every nail," he said.

"I think, fairly uniquely compared to most other telco operators, we actually have the ability, the skill inside of Orange to help make these decisions about what tool to use when. So we prefer to use a statistical method or basic machine learning to solve problems because those results are more [] explainable. They're usually cheaper, and they're usually less impactful on the environment," he added.

In fact, Jarrett says one of the things Orange is investigating at the moment is how to use multiple AI models together to solve problems. The notion, he added, is called agents, and refers to a high-level abstraction of a problem, such as asking how the network in France is working on a given day. This, he said, will enable the company to solve complex problems more dynamically.

In the meantime, the company is making a range of GenAI models available to its employees, including ChatGPT, Dolly and Mistral. To do so, it has built a solution that Jarrett says provides a "secure, European-resident version of leading AI models that we make available to the entire company."

Improving customer service

Jarrett says this is a more controlled and safer way for employees to use models than if they were accessed directly. The solution also notifies the employee of the cost of running a specific model to answer a question. Available for several months, it has so far been used by 12% of employees.

Orange has already deployed GenAI in many countries within its customer service solutions to predict what the most appealing offer may be to an individual customer, Jarrett said, adding "what we're trialling right now is can generative AI help us to customize and personalize the text of that offer? Does that make the offer incrementally more appealing?"

Another potential use case is in transcribing a conversation with a customer care agent in real time, using generative AI to create prompts. The tool is still in development but could help new recruits to improve faster, raising employee and customer satisfaction, said Jarrett.

While Orange doesn't currently use GenAI for any use cases in the network, some are under development, although few details are being shared at this stage. One use case involves predicting when batteries at cell sites may need replacing.

Jarrett admits, however, that GenAI is still facing a number of challenges, such as hallucinations. "In a scenario where the outputs have to be correct 100% of the time, we're not going to use generative AI for that today, because [it's] not correct 100% of the time," he said.

Dealing with hallucinations

Yet it can be applied in areas that are less sensitive. "For example, if for internal use you want to have a summary of an enormous transcript of a long meeting that you missed, it's okay if the model hallucinates a little bit," he added.

Hallucinations cannot be stopped entirely and will likely continue to be a problem for some time, said Jarrett. But he believes RAG and fine-tuning could mitigate the issue to some extent.

"The majority of the time, if we're good at prompt engineering and we're good at passing the appropriate information with the response, the model generates very, very useful, relevant answers," Jarrett said about the results achieved with RAG.

The availability and quality of data is another issue that is often discussed, and also one that Orange is trying to address. Using data historically kept in separate silos has been difficult, said Jarrett. "[The] availability of the data from the marketing team to be able to run a campaign on where was our network relatively strong, for example those use cases were either impossible, or took many, many, many months of manual meetings and collaboration."

As a result, the company is trying to create a marketplace where data is made widely available inside each country and appropriately labeled. Orange calls this approach data democracy.

Visit link:
Orange isn't building its own AI foundation model here's why - Light Reading