Category Archives: Machine Learning

UCF Researcher Clearing the Way for Smart Wireless Networks – UCF

Communicating unimpeded at distances near and far is a dream Murat Yuksel is hoping to realize.

His ongoing research, titled INWADE: INtelligent Waveform Adaptation with DEep Learning, and funded by the U.S. Air Force Research, aims to get us closer to that dream by improving the quality of high frequency wireless networks using machine learning to fine-tune the networks efficacy.

The need to efficiently improve wireless signal quality will grow with the continuing proliferation of wireless networks for use in communications, says Yuksel, who is a UCF Department of Electrical and Computer Engineering professor within the College of Engineering and Computer Science.

The emerging 5G-and-beyond wireless networks regularly use high frequency signals that are very sensitive to the environment, he says. They get blocked easily or attenuate quickly as they travel. Even the nature of the particles in the air affects them significantly. Deep learning enables us to learn the features of the environment. Hence, using these learned features enables us to better tune the wireless signals to attain higher data transfer rates.

INWADE is an automated means to design multiple communication blocks at the transmitter and the receiver jointly by training them as a combination of deep neural networks, benefitting wireless network users.

The development and study of the INWADE network was catalyzed by the need to keep pace with the spread and usage of wireless networks.

Demand for wireless data transfers (such as cellular and Wi-Fi) is ever-increasing and this causes more tussle on the sharing of the underlying natural resource, which is the radio spectrum that supports these wireless transfers, Yuksel says.

The deep learning aspect of the research is an emerging consideration delivering better wireless signals with minimal delay. The deep learning network will select the optimal waveform modifications and beam direction with its perceived radio frequency environment to manage drones and nodes providing wireless signals and modifications.

Our work shows the feasibility of using deep reinforcement learning in real time to fine tune millimeter-wave signals, which operate in part of the super-6 GHz bands, Yuksel says. Further, the project aims to show that deep learning at the link level as well as network level can work together to make the signals deep smart.

Harnessing existing wireless networking resources and navigating fixed obstacles or crowded airways quickly is an omnipresent concern and leads network managers to search for spectra at higher frequencies than the commonly used sub-6 GHz frequency bands, Yuksel says.

These super-6 GHz bands are difficult to access and maintain, so deep learning is something Yuksel is hoping to use to address that challenge.

They operate with highly directional antennas, which makes them brittle to mobility/movement and they cannot reach far as they are sensitive to particles in the air, Yuksel says. Hence, we have to handle them very carefully to support high-throughput data transfers. This requires advanced algorithmic methods that can learn the environment and tune the super-6 GHz wireless signals to the operational environment.

Some initial findings regarding the viability of algorithms that may be implemented in INWADE were published at the International Federation for Information Processing Internet of Things Conference in late 2023.

The project started earlier in 2024 after receiving the first portion of the awarded $250,000 from the Air Force Research Laboratory in late 2023, but there already are promising findings, Yuksel notes.

We have shown in a lab testbed that our deep learning methods can successfully solve some of the fundamental link level problems, such as angle-of-arrival detection or finding the direction of an incoming signal, he says. This capability is very useful for several critical applications, for example determining location and movement direction. Next steps include demonstrating similar capability on drones and showing the feasibility of co-existence of deep learning at the link and network levels.

After developing and testing the INWADE framework, Yuksel foresees additional challenges and considerations that may require further study when implementing machine learning.

A key theoretical endeavor is to understand if multiple machine learning agents can co-exist at different layers of the wireless network and still attain improved performance without jeopardizing each others goals, he says.

Although Yuksel is the principal investigator for the research, he credits his students and collaborators for much of his success.

My students help in performing the experiments and gathering results, he says. I am indebted to them. We are also collaborating with Clemson as they are working on designing new machine learning methods for the problems we are tackling.

Yuksels work continues, and he is optimistic that his research will further benefit the greater scientific endeavor of making wireless networks accessible for all.

The potential for this effort is huge, he says. I consider the radio spectrum to be a critical natural resource, like water or clean air. As machine learning methods are advancing, being able to use them for better sharing the spectrum and solving critical wireless challenges is very much needed.

Distribution A. Approved for public release: Distribution Unlimited: AFRL-2024-2894 on 17 Jun 2024

Researchers Credentials

Yuksel is a professor at UCFs Department of Electrical and Computer Engineering and served as its interim chair from 2021 to 2022. He received his doctoral degree in computer science from Rensselaer Polytechnic Institute in 2002. Yuksels research interests include wireless systems, optical wireless, and network management, and he has multiple ongoing research projects funded by the National Science Foundation.

Read more here:
UCF Researcher Clearing the Way for Smart Wireless Networks - UCF

Uncovering hidden and complex relations of pandemic dynamics using an AI driven system | Scientific Reports – Nature.com

This section presents the experimental results and comprehensive evaluations of BayesCovid. We will explicitly discuss the results of the algorithms applied to the clinical datasets to uncover the hidden patterns of COVID-19 symptoms.

We set up a Spark on Hadoop Yarn cluster consisting of 4 EC2 machines, 1 master and 3 workers in AWS to deploy BayesCovid. We chose Ubuntu Server 20.04 LTS as the operating system for all the machines and installed Hadoop version 3.3.2 and Spark 3.3.1. All the nodes have 4 cores and 16 GB of memory.

The dataset, prepared by Carbon Health and Braid Health14, was obtained through RT-PCR tests from 11,169 individuals, approximately 3% of patients living in the United States who had COVID-positive, and 97% had COVID-negative tests. This dataset, which began to be collected by Carbon Health in early April 2020, was collected under the anonymity standard of the Health Insurance Portability and Accountability Act (HIPAA) privacy rule. This dataset covers multiple physiognomies, including Epidemiological (Epi) Factors, comorbidity, vital signs, healthcare worker-identified, patient-reported, and COVID-19 symptoms. In addition, information about patients, such as heart rate, temperature, diabetes, cancer, asthma, smoking and age, is also available. The Carbon Health team gathered the Braid Health team datasets, which collected radiological information, including CXR information. This dataset includes data from patients with one or more symptoms and no symptoms, and we only used the COVID-19 symptom information indicated in Fig. 1. Radiological information was not included in the analysis. Table 2 shows the statistical information of the COVID-19 dataset. We have 18,538 test results of 11 different COVID-19 symptoms and COVID severity values, belonging to 11,169 individuals. Moreover, Table 3 demonstrates the number of false (negative) and true (positive) values for each symptom.

Cross-validation is an important step in assessing the predictive power of models while mitigating the risk of overfitting15. To rigorously evaluate our models, we implemented ten-fold cross-validation by dividing the dataset into ten equal parts. During each iteration, one part was the validation/test set, while the remaining nine were used for model training. This process was repeated ten times, and the resulting accuracies were averaged across all folds to assess each models performance comprehensively. Importantly, using ten-fold cross-validation ensures that every instance in the dataset is precisely used once as a testing and training sample, which minimises the risk of overfitting16.

This subsection explains three distinct Bayesian networks: Nave Bayesian, Tree-Augmented Nave Bayesian, and Complex Bayesian models. These models have unveiled intricate and concealed patterns within COVID-19, offering valuable insights into the complex dynamics and relationships underlying the disease.

Figure 3a depicts the dependencies for the Nave Bayesian algorithm where the class variable, COVID severity, is the only parent associated with each symptom, and there is no link between symptoms. Figure 4 and 5 show the probability percentages of the symptoms for their positive and negative values. For example, in Fig. 4, while the probability of diarrhea is around 3% for COVID severity level 1, the probability of this symptom for level 3 is about 95%. Moreover, the probabilities of shortness of breath for levels 1, 2, 3, and 4 are very low, about 5%, and the likelihood of having this symptom is very high for levels 5 and 6. In short, the distribution of symptoms differs according to the severity levels of COVID-19, and the probability of some increases as the COVID-19 severity level rises. When we compare Figs. 4 and 5, it is seen that there is an inverse relationship between the incidence and absence of symptoms.

Conditional probability of symptoms with COVID-19 severity if symptoms are positive for Nave Bayes.

Conditional probability of symptoms with COVID-19 severity if symptoms are negative for Nave Bayes.

The dependency network built using the Tree-augmented Nave Bayesian network is depicted in Fig. 3b. COVID severity is the class variable similar to Nave Bayesian network, but the connections between the symptoms (features) are also available. As seen from the figure, for example, cough has an effect on both headache and fever while muscle sore is affected by headache and affects fatigue. For the probabilities, Tables 4 and 5 show some results of the Conditional Probability Table (CPT). In Table 4, when shortness of breath and fever are negative (F), the probability of COVID severity level 1 is 92.95%. In contrast, when shortness of breath is positive (T), and fever is negative, then the probability of COVID severity level 1 is 2.04%. When headache is positive, but cough is negative, then the probability of COVID severity level 4 is 65.92% (see Table 5).

Figure 3c shows the dependencies between all the symptoms (features) and COVID severity (class variable). Cough is most affected by different symptoms and does not affect any features. While the class variable, COVID severity, has impacts on shortness of breath, fever, fatigue, and sore throat, interestingly, it is affected by diarrhea. Another interesting pattern different from the Tree-augmented Nave Bayesian network is that fever affects muscle sore. While Table 6 shows the CPT for three variables, namely COVID severity, shortness of breath, and fever, Table 7 displays the probabilities for four variables, namely diarrhea, fatigue, muscle sore, and headache. When shortness of breath is negative, and fever is positive, the probability of COVID severity level 4 is 98.92%. In contrast, when shortness of breath is positive, but fever is negative, the probability of COVID severity level 6 is 48.18% (see Table 6). For the probabilities based on the situation of four symptoms (see Table 6), for instance, when all three symptoms, diarrhea, fatigue, and muscle sore, are positive, the probability of having headache symptom is 73.18%. Another remarkable finding in Table 7 is that if an individual has fatigue, muscle sore, and headache, the probability of not having diarrhea is 58.43%.

In this study, we have also investigated and implemented three distinct Bayesian models, each representing a unique intersection of deep learning and Bayesian inference. The first model, Deep Learning-based Nave Bayes (DL-NB), is a deep learning-based Nave Bayes structure that capitalises on the capacity of deep neural networks to refine the traditional Nave Bayes model for enhanced feature learning and dependency representation. Additionally, we extended our exploration to traditional Bayesian network structures by implementing Deep Learning-based Tree-Augmented Nave Bayes (DL-TAN), where deep learning principles are integrated to augment the classic Tree-Augmented Nave Bayes algorithm, providing richer feature representations. Furthermore, our investigation includes Deep Learning-based Complex Bayesian (DL-CB), a model designed to overcome the limitations of traditional Complex Bayesian structures in modelling intricate relationships within high-dimensional data. This comprehensive analysis and implementation of DL-NB, DL-TAN, and DL-CB contribute to the broader understanding of the synergies between deep learning and Bayesian techniques in various Bayesian network architectures. Figure 6 demonstrates the network dependencies of deep learning-based Bayesian network algorithms which uncover the complex and hidden relationships between COVID symptoms. As illustrated in Fig. 6ac, our Bayesian deep learning models, namely DL-NB, DL-TAN, and DL-CB, reveal a richer web of relationships among features compared to their traditional counterparts. The Bayesian Deep Learning models exhibit a higher density of connections, which indicates a more nuanced understanding of inter-feature dependencies. This heightened connectivity means the enhanced capacity of Bayesian Deep Learning to capture complex relationships within the data that provide a comprehensive and informative modelling of the underlying dynamics.

Bayesian deep learning dependency networks.

Figure 7 demonstrates the accuracys for the three different algorithms proposed in our system, namely Nave Bayesian Network, Tree-Augmented Nave Bayesian Network, and Complex Bayesian Network. Although the general accuracy of the algorithms is close to each other, there are apparent differences in the accuracy of the symptoms. The algorithms perform between 60% and 68% poorly for cough symptoms, while they show high accuracys for COVID severity ranging from 94% to 97%. The overall accuracys of these three algorithms are 83.52%, 87.35%, and 85.15%, respectively.

Total accuracys of the algorithms.

In the evaluation of the accuracy of deep learning-based Bayesian network algorithms, the results, as depicted in Fig. 8, showcase the performance of three distinct models: DL-NB, DL-TAN, and DL-CB. The overall accuracies reveal nuanced differences among the algorithms. DL-TAN emerges with the highest cumulative accuracy of 95.21%, which indicates its superior predictive capabilities across a spectrum of symptoms. DL-NB and DL-CB follow closely, exhibiting overall accuracies of 91.04% and 92.81%, respectively. These results underscore the efficacy of deep learning-based Bayesian approaches in capturing complex relationships within the dataset.

The comparative analysis of Bayesian deep learning algorithms against traditional Bayesian network algorithms elucidates a discernible advantage favouring the former. Notably, the Bayesian deep learning models, such as DL-NB, DL-TAN, and DL-CB, exhibit superior predictive performance across various symptoms.

Total accuracys of the deep learning-based Bayesian algorithms.

We have developed a web interface for BayesCovid decision support system that can be used by any clinical practitioner or other users. It utilises Python libraries, concerning probabilistic graphical models, data manipulation, network analysis, and data visualization. Additionally, tkinter is adopted for the graphical user interface, and PyMuPDF (fitz) is leveraged for PDF file handling. All the source code and accompanying documentation for BayesCovid decision support system are available as open-source on GitHub (https://github.com/umitdemirbaga/BayesCovid). A demonstration is also available online on YouTube (https://youtu.be/7j36HuC9Zto). The designed user interface provides dual functionality highlighted below.

Dependency analysis: This component of application ensures efficient and accurate relationship analysis between the symptoms and severity assessment, enhancing the decision-making process in clinical settings. Figure 9a depicts the user-friendly interface where a data file can be uploaded using Select CSV button. After the data file is uploaded, six radio buttons are provided for users to select one of the following Bayesian models: (a) Nave Bayesian Network, (b) Tree-Augmented Nave Bayesian Network, (c) Complex Bayesian Network, (d) Nave Bayes Deep Learning, (e) Tree-Augmented Bayes Deep Learning, and (e) Complex Bayes Deep Learning. An Analyse button that starts the processing of the selected model with the selected CSV file. A progress bar populates to show the processing status. After the model is processed, the dependency network plot is generated (see Fig. 9b) and the CPT output is saved as a file.

Severity analysis: This component of the application assists clinical staff in calculating the severity of COVID-19. This feature assists in selecting the detected symptoms that the patient exhibits and subsequently determines the severity of COVID-19. As depicted in Fig. 9c a clinician or user can select the visible symptoms and calculate severity. This will output the COVID-19 severity level based on the input symptoms as shown Fig. 9c.

See the rest here:
Uncovering hidden and complex relations of pandemic dynamics using an AI driven system | Scientific Reports - Nature.com

Improve productivity when processing scanned PDFs using Amazon Q Business | Amazon Web Services – AWS Blog

Amazon Q Businessis a generative AI-powered assistant that can answer questions, provide summaries, generate content, and extract insights directly from the content in digital as well as scanned PDF documents in your enterprise data sources without needing to extract the text first.

Customers across industries such as finance, insurance, healthcare life sciences, and more need to derive insights from various document types, such as receipts, healthcare plans, or tax statements, which are frequently in scanned PDF format. These document types often have a semi-structured or unstructured format, which requires processing to extract text before indexing with Amazon Q Business.

The launch of scanned PDF document support with Amazon Q Business can help you seamlessly process a variety of multi-modal document types through the AWS Management Console and APIs, across all supported Amazon Q Business AWS Regions. You can ingest documents, including scanned PDFs, from your data sources using supported connectors, index them, and then use the documents to answer questions, provide summaries, and generate content securely and accurately from your enterprise systems. This feature eliminates the development effort required to extract text from scanned PDF documents outside of Amazon Q Business, and improves the document processing pipeline for building your generative artificial intelligence (AI) assistant with Amazon Q Business.

In this post, we show how to asynchronously index and run real-time queries with scanned PDF documents using Amazon Q Business.

You can use Amazon Q Business for scanned PDF documents from the console, AWS SDKs, or AWS Command Line Interface (AWS CLI).

Amazon Q Business provides a versatile suite of data connectors that can integrate with a wide range of enterprise data sources, empowering you to develop generative AI solutions with minimal setup and configuration. To learn more, visit Amazon Q Business, now generally available, helps boost workforce productivity with generative AI.

After your Amazon Q Business application is ready to use, you can directly upload the scanned PDFs into an Amazon Q Business index using either the console or the APIs. Amazon Q Business offers multiple data source connectors that can integrate and synchronize data from multiple data repositories into single index. For this post, we demonstrate two scenarios to use documents: one with the direct document upload option, and another using the Amazon Simple Storage Service (Amazon S3) connector. If you need to ingest documents from other data sources, refer to Supported connectors for details on connecting additional data sources.

In this post, we use three scanned PDF documents as examples: an invoice, a health plan summary, and an employment verification form, along with some text documents.

The first step is to index these documents. Complete the following steps to index documents using the direct upload feature of Amazon Q Business. For this example, we upload the scanned PDFs.

You can monitor the uploaded files on the Data sources tab. The Upload status changes from Received to Processing to Indexed or Updated, as which point the file has been successfully indexed into the Amazon Q Business data store. The following screenshot shows the successfully indexed PDFs.

The following steps demonstrate how to integrate and synchronize documents using an Amazon S3 connector with Amazon Q Business. For this example, we index the text documents.

When the sync job is complete, your data source is ready to use. The following screenshot shows all five documents (scanned and digital PDFs, and text files) are successfully indexed.

The following screenshot shows a comprehensive view of the two data sources: the directly uploaded documents and the documents ingested through the Amazon S3 connector.

Now lets run some queries with Amazon Q Business on our data sources.

Your documents might be dense, unstructured, scanned PDF document types. Amazon Q Business can identify and extract the most salient information-dense text from it. In this example, we use the multi-page health plan summary PDF we indexed earlier. The following screenshot shows an example page.

This is an example of a health plan summary document.

In the Amazon Q Business web UI, we ask What is the annual total out-of-pocket maximum, mentioned in the health plan summary?

Amazon Q Business searches the indexed document, retrieves the relevant information, and generates an answer while citing the source for its information. The following screenshot shows the sample output.

Documents might also contain structured data elements in tabular format. Amazon Q Business can automatically identify, extract, and linearize structured data from scanned PDFs to accurately resolve any user queries. In the following example, we use the invoice PDF we indexed earlier. The following screenshot shows an example.

This is an example of an invoice.

In the Amazon Q Business web UI, we ask How much were the headphones charged in the invoice?

Amazon Q Business searches the indexed document and retrieves the answer with reference to the source document. The following screenshot shows that Amazon Q Business is able to extract bill information from the invoice.

Your documents might also contain semi-structured data elements in a form, such as key-value pairs. Amazon Q Business can accurately satisfy queries related to these data elements by extracting specific fields or attributes that are meaningful for the queries. In this example, we use the employment verification PDF. The following screenshot shows an example.

This is an example of an employment verification form.

In the Amazon Q Business web UI, we ask What is the applicants date of employment in the employment verification form? Amazon Q Business searches the indexed employment verification document and retrieves the answer with reference to the source document.

In this section, we show you how to use the AWS CLI to ingest structured and unstructured documents stored in an S3 bucket into an Amazon Q Business index. You can quickly retrieve detailed information about your documents, including their statuses and any errors occurred during indexing. If youre an existing Amazon Q Business user and have indexed documents in various formats, such as scanned PDFs and other supported types, and you now want to reindex the scanned documents, complete the following steps:

"errorMessage": "Document cannot be indexed since it contains no text to index and search on. Document must contain some text."

If youre a new user and havent indexed any documents, you can skip this step.

The following is an example of using the ListDocuments API to filter documents with a specific status and their error messages:

The following screenshot shows the AWS CLI output with a list of failed documents with error messages.

Now you batch-process the documents. Amazon Q Business supports adding one or more documents to an Amazon Q Business index.

The following screenshot shows the AWS CLI output. You should see failed documents as an empty list.

The following screenshot shows that the documents are indexed in the data source.

If you created a new Amazon Q Business application and dont plan to use it further, unsubscribe and remove assigned users from the application and delete it so that your AWS account doesnt accumulate costs. Moreover, if you dont need to use the indexed data sources further, refer to Managing Amazon Q Business data sources for instructions to delete your indexed data sources.

This post demonstrated the support for scanned PDF document types with Amazon Q Business. We highlighted the steps to sync, index, and query supported document typesnow including scanned PDF documentsusing generative AI with Amazon Q Business. We also showed examples of queries on structured, unstructured, or semi-structured multi-modal scanned documents using the Amazon Q Business web UI and AWS CLI.

To learn more about this feature, refer toSupported document formats in Amazon Q Business. Give it a try on theAmazon Q Business consoletoday! For more information, visitAmazon Q Businessand theAmazon Q Business User Guide. You can send feedback toAWS re:Post for Amazon Qor through your usual AWS support contacts.

Sonali Sahu is leading the Generative AI Specialist Solutions Architecture team in AWS. She is an author, thought leader, and passionate technologist. Her core area of focus is AI and ML, and she frequently speaks at AI and ML conferences and meetups around the world. She has both breadth and depth of experience in technology and the technology industry, with industry expertise in healthcare, the financial sector, and insurance.

Chinmayee Rane is a Generative AI Specialist Solutions Architect at AWS. She is passionate about applied mathematics and machine learning. She focuses on designing intelligent document processing and generative AI solutions for AWS customers. Outside of work, she enjoys salsa and bachata dancing.

Himesh Kumar is a seasoned Senior Software Engineer, currently working at Amazon Q Business in AWS. He is passionate about building distributed systems in the generative AI/ML space. His expertise extends to develop scalable and efficient systems, ensuring high availability, performance, and reliability. Beyond the technical skills, he is dedicated to continuous learning and staying at the forefront of technological advancements in AI and machine learning.

Qing Wei is a Senior Software Developer for Amazon Q Business team in AWS, and passionate about building modern applications using AWS technologies. He loves community-driven learning and sharing of technology especially for machine learning hosting and inference related topics. His main focus right now is on building serverless and event-driven architectures for RAG data ingestion.

Follow this link:
Improve productivity when processing scanned PDFs using Amazon Q Business | Amazon Web Services - AWS Blog

Accurate Prediction of Protein Structural Flexibility by Deep Learning Integrating Intricate Atomic Structures and Cryo … – Nature.com

Overview of RMSF-net procedure

We propose a deep learning approach named RMSF-net to analyze protein dynamics based on cryo-electron microscopy (cryo-EM) maps. The primary objective of this method is to predict the RMSF of local structures (residue, atoms) within proteins. RMSF is a widely used measure to assess the flexibility of molecular structures in MD analysis and defined by the following equation:

$${RMSF}=sqrt{frac{1}{T}{sum }_{t=1}^{T}{left(xleft(tright)-widetilde{x}right)}^{2}}$$

where(x) represents the real-time position of atoms or residues, (t) represents time and (widetilde{x}) represents the mean position over a period of time ({T}). In addition to the experimental cryo-EM maps, RMSF-net incorporates fitted PDB models, which represent the mean structures of fluctuating proteins. A schematic overview of RMSF-net is depicted in Fig.1a. The cryo-EM map and PDB model are initially combined to create a dual feature pair. The PDB models are converted into voxelized density maps using the MOLMAP tool in UCSF Chimera23 to facilitate seamless integration with cryo-EM maps. Subsequently, both the density grids of the cryo-EM maps and the PDB simulated maps are divided into uniform-sized density boxes (404040) with a stride of 10. The corresponding density boxes from the mapping pair are concatenated to form a two-channel feature input for the neural network (RMSF-net) to predict the RMSF of atoms within the central subbox (101010). RMSF-net is a three-dimensional convolutional neural network comprising two interconnected modules. The primary module employs a Unet++(L3) architecture24 for feature encoding and decoding on the input density boxes. The other module utilizes 1-kernel convolutions for regression on the channels of the feature map generated by the Unet++ backbone. A center crop is then applied to the regression module output to obtain the central RMSF subboxes, where the voxel values correspond to the RMSF of the atoms contained within them. Finally, the RMSF sub-boxes are spatially merged into an RMSF map using a merging algorithm.

a Overview of RMSF-net. The data preparation and RMSF inference for RMSF-net are illustrated in the upper section. Cryo-EM maps and their fitted atomic structure models were obtained from the EMDB and PDB databases. The PDB models were simulated as density maps resembling cryo-EM maps. Both the cryo-EM map and PDB simulated map were then segmented into 40-cubic density boxes. The density boxes with matching positions from the pair were concatenated into a two-channel tensor and input to the 3D CNN of RMSF-net to infer the RMSF for atoms in the central 103 voxels (subboxes). The RMSF prediction across the entire map was obtained by combining predictions from these subboxes. The lower section depicts the RMSF-net supervised training process. The RMSF-net neural network architecture is shown in the lower right, with the number of channels indicated alongside the hidden feature maps. With Unet++ (L3) as the backbone, a regression head and crop operation were added. The ground truth RMSF for training RMSF-net was derived from MD simulations, as illustrated in the lower left. b Data processing of maps in RMSF-net.

RMSF-net incorporates several data processing strategies for maps, as illustrated in Fig.1b. First, to ensure a consistent spatial scale, all the cryo-EM maps were resampled using the ndimage.zoom module from SciPy25 to obtain a uniform voxel size of 1.5, which is approximately the size of a C atom. Second, a screening algorithm is applied to retain only those boxes that encompass atoms within the central subbox to avoid processing unnecessary boxes located in structure-free regions. This strategy significantly improves the efficiency of RMSF-net when dealing with large cryo-EM maps. The retained box indices are recorded for the subsequent merging algorithm. In addition, the voxel densities within the boxes are normalized to a range of [0, 1] before being input into the network, thus mitigating density distribution variations across cryo-EM maps. Voxel density normalization is achieved through the following process: within each box, any density values less than 0 are set to 0, and then divided by the maximum density value within the box, thus scaling the voxel density to a range from 0 to 1.

We created a high-quality protein dataset with 335 entries for training and evaluating RMSF-net. The dataset was constructed by selecting data from EMDB26 and PDB27. As of November 2022, EMDB contained over 23,593 deposited entries, with more than half being high-resolution maps with resolutions ranging from 24. We focused on maps within this range. Initially, we included a high-resolution cryo-EM map dataset from EMNUSS12 which underwent rigorous screening in EMDB and PDB prior to October 2020 and consisted of 468 entries. In addition, we performed data selection on deposited cryo-EM maps and PDB models from October 2020 to November 2022 to incorporate newly deposited data. The selected data had to meet specific criteria, including a well-fitting cryo-EM map and PDB model with a fitness above 0.7 (measured by the correlation between the PDB simulated map and cryo-EM map); the proteins had to contain at least one alpha helix or beta-strand, with no missing chains or nucleic acids. We further filtered these data by applying a clustering procedure to remove redundancy. Using the K-Medoids28 algorithm with a k value of 50, we defined the distance between two proteins as the maximum distance between any two chains from each protein, where chain distances were determined by sequence identity. After clustering, we selected the 50 medoids and added them to the dataset. Finally, out of the remaining 518 entries, 335 were successfully subjected to MD simulations, resulting in the RMSF-net dataset.

RMSF-net employs a supervised training approach, requiring labeled RMSF values derived from MD simulations19. We conducted MD simulations on the PDB models of the dataset following a standardized procedure using Assisted Model Building with Energy Refinement (AMBER)20, which consists of four stages: energy minimization, heating, equilibration, and production runs. To focus on local structure fluctuations around specific protein conformations, we configured the production run for 30 nanoseconds. Specifically, the initial atomic coordinates of the proteins were set to the original PDB model coordinates. Small molecule ligands in all complexes were removed to purely study the characteristics of proteins. Each system was immersed in a truncated octahedron box filled with TIP3P29 water molecules (at least a 12 buffer distance between the solute and edge of the periodic box). Based on the charge carried by the protein, Na+ or Cl ions were placed randomly in the simulation box to keep each system neutral. An additional 150mM NaCl solution was added to all systems according to the screening layer tally by the container average potential method30 to match the experimental conditions better. All MD simulations were performed using the AMBER 20 software package20,31 on NVIDIA Tesla A100 graphics cards. The parameters for Na+ and Cl- ions were derived from the previous work by Joung et al.32. The parameters used for the protein structure were AMBER ff14SB force field33. Each system was energy minimized using the conjugate gradient method for 6000 steps. Then, the systems were heated using the Langevin thermostat34 from 0 to 300K in 400ps using position restraints with a force constant of 1000kcal mol1 2 to the protein structure (NVT ensemble, T=300K). Subsequently, each system was gradually released in 5ns (spending 1ns each with position restraints of 1000, 100, 10, 1, and 0kcal mol1 2) using the NPT ensemble (P=1bar, T=300K) before a production run. Afterward, the final structure of each system was subjected to a 30ns MD simulation at constant temperature (300K) and pressure (1bar) with periodic boundary conditions and the particle mesh Ewald (PME) method35 We used the isotropic Berendsen barostat36 with a time constant of 2ps to control constant pressure. The protein structure was completely free in the solutions during the equilibration and production process. Simulations were run with an integration step of 2fs, and bond lengths for hydrogen atoms were fixed using the SHAKE algorithm37. PME electrostatics were calculated with an Ewald radius of 10, and the cutoff distance was also set to 10 for the van der Waals potential.

After simulation, the trajectories were processed and analyzed using the built-in Cpptraj module of AMBER Tools package38. We first removed the translational and rotational motion of all protein molecules to ensure a rigorous comparison between different trajectories. Then, the average structure of each protein (only heavy atoms) was calculated as a reference structure. Afterward, each conformation in the trajectory was aligned to the reference structure and RMSF of the protein molecule was output. These computed RMSF values were subsequently mapped onto voxels of cryo-EM maps to serve as the ground truth for training and evaluating RMSF-net.

For the training of RMSF-net, we utilized a masked mean squared error (MSE) loss function to compute the loss between the predicted RMSF and ground truth RMSF on labeled voxels of the output subboxes. The training spanned 100 epochs with a batch size of 32, and we employed the Adam optimizer39 with a learning rate of 0.004. Several techniques were implemented to mitigate overfitting, including Kaiming weight initialization40, learning rate decay, and early stopping. If the validation loss did not decrease for 10 consecutive epochs, the learning rate was halved. If it did not decrease for 30 epochs, training was terminated, and the model with the minimum validation loss was saved. We applied rotation and mirroring augmentation to the training set to account for the lack of rotational and mirror symmetry in convolutional networks, increasing the training data eightfold. The training of RMSF-net was conducted on two NVIDIA Tesla A100 graphics cards, typically lasting 58h. Following training, we conducted RMSF predictions on the test set to evaluate the performance of RMSF-net.

We employed a five-fold cross-validation approach to assess the performance of this method. The dataset was randomly divided into five equal partitions, with one partition used as the test set each time, and the remaining four partitions served as the training and validation sets. In particular, the division was based on the maps rather than the segmented boxes in order to ensure independence between these sets. The training and testing process was repeated five times, and every data entry was tested once to obtain the methods performance on the entire dataset. To prevent overfitting during model training, the training and validation sets were set at a ratio of 3:1. During testing, the correlation coefficients between the predicted RMSF and the ground truth (RMSF values derived from MD simulations) were computed as the evaluation metric. The correlation coefficients were computed at two levels: voxel level, corresponding to RMSF on the map voxels, and residue level, corresponding to RMSF on the PDB model residues (obtained by averaging RMSF on the corresponding atoms). We defaulted to using the correlation coefficient at the voxel level when analyzing and comparing model performance unless otherwise specified. In addition, we employed the correlation coefficient at the residue level when discussing the protein test cases.

As a baseline, we initially used only the cryo-EM map intensity as the single-channel input to the neural network, referred to as RMSF-net_cryo. Cross-validation using RMSF-net_cryo on the dataset revealed an average correlation coefficient of 0.649 and a bias of 0.156. We also performed cross-validation using the prior method DEFMap method for comparison. DEFMap reported a test correlation of approximately 0.7 on its dataset. However, its dataset includes only 34 proteins and the dataset used in our study is more diverse and significantly larger. Therefore, we applied the DEFMap pipeline to our dataset to ensure fair comparisons. Notably, DEFMap employed different data preprocessing strategies and neural networks. During its preprocessing, a low-pass filter was adopted to standardize the resolution of the cryo-EM maps. In addition, the neural network it used took 10-cubic subvoxels as input and outputted the RMSF of the central voxel. We strictly followed DEFMaps procedures and network for training and testing. The results showed an average correlation coefficient of 0.6 and a bias of 0.171. Through comparison, it is evident that RMSF-net_cryo exhibits superior performance compared to DEFMap.

Although RMSF-net_cryo performed better than DEFMap with our designed network and data processing strategies, it still relies on neural networks to directly establish patterns between cryo-EM maps and flexibility. What role the structural information plays in this process remains unknown. This prompted us to divide dynamic prediction via cryo-EM maps into two sequential steps: first, structural information extraction, and second, dynamic prediction based on the extracted structural information.

To accomplish the extraction of structural information, as depicted in Fig.2a, we introduced an Occ-net module. This module predicts the probabilities of structural occupancy on cryo-EM map voxels using a 3D convolutional network. Both input and output dimensions were set to 403. For training and evaluating Occ-net, we utilized PDB models to generate structure annotation maps as the ground truth, where voxels were categorized into two classes: occupied by structure and unoccupied by structure. The details of this network and data annotation process are provided in the Supplementary Information (section Structure of Occ-net and the data annotation process). Cross-entropy loss function was employed during training, with class weights set to 0.05:0.95 to address class imbalance. Once this training stage was completed, Occ-net parameters were fixed, and the second stage of training commenced. In the second stage, the two-channel classification probabilities output by Occ-net were input into the dynamics extraction module to predict the RMSF for the central 103 voxels, which is consistent with the RMSF-net approach.

a Overview of Occ2RMSF-net. In the first stage, the cryo-EM density (403) is input into Occ-net to predict the probabilities (403) of structure occupancy on the voxels, with Pu denoting the probabilities of voxels being occupied by the protein structure and Po denoting that of not being occupied by the structure. Then in the second stage, the two-channel probabilities are input into RMSF-net to predict the RMSF on the center 103 voxels. b Test performance of Occ2net. For six classification thresholds from 0.3 to 0.8, the precisions, recalls and F1-scores of the positive class (structure occupied) on the test set were computed and are shown in the plot. c Comparision of RMSF prediction performance between Occ2RMSF-net and RMSF-net_cryo on the dataset. CC is an abbreviation for correlation coefficient. d Count distribution of test correlation coefficients for DEFMap, RMSF-net_cryo, and RMSF-net on the dataset. e Data distribution of correlation coefficients for RMSF-net_cryo and RMSF-net_pdb relative to RMSF-net on the dataset. f Count distribution of test correlation coefficients for RMSF-net_pdb, RMSF-net_pdb01, and RMSF-net on the dataset. g Data distribution of correlation coefficients for RMSF-net and RMSF-net_pdb relative to RMSF-net_cryo on data points where the test correlation coefficients with RMSF-net_cryo are above 0.4. The color for each method in d, e, f and g is shown in the legend.

This model is referred to as Occ2RMSF-net, and cross-validation was conducted on it. After training, we first assessed the performance of Occ-net by calculating the precision, recall, and F1-score at the voxel level for the positive class (structure class) on the test set. This evaluation involves six classification thresholds, ranging from 0.3 to 0.8. As depicted in Fig.2b, achieving high precision and recall simultaneously was challenging due to severe class imbalance and noise. A relative balance was achieved at the threshold of 0.7, where the F1 score reached its highest value of 0.581. Regarding the final output RMSF, the correlation between the Occ2RMSF-net predictions and the ground truth on the dataset is 0.6620.158, showing a slight improvement compared to RMSF-net cryo. Figure2c displays the scatter plot of the test correlation of data for Occ2RMSF-net and RMSF-net cryo. The two models exhibited similar performance on most of the data points, with Occ2RMSF-net slightly outperforming RMSF-net_cryo overall. This highlights the critical role of structure information from cryo-EM maps for predicting the RMSF in the network and enhances the interpretability of methods like DEFMap and RMSF-net_cryo.

Inspired by the above results, we incorporated PDB models representing precise structural information and integrated them into our method in a density map-like manner, i.e., simulated density maps generated based on PDB models. We employed two approaches to input the PDB simulated maps into the network. First, the PDB simulated map was taken as a single-channel feature input into the neural network, referred to as RMSF-net_pdb. Second, the PDB simulated map was transformed into a binary encoding map representing occupancy of the structure to highlight tertiary structural features: a threshold of 3 ( represents r.m.s.d of the PDB simulated map density) was chosen, where voxels with densities above the threshold were encoded as 1 and the others as 0. This encoding map was then converted into a two-channel one-hot input to the network, known as RMSF-net_pdb01. The same cross-validation was applied to these two models. Results showed that RMSF-net_pdb achieved a test correlation coefficient of 0.7230.117, and RMSF-net_pdb01 achieved 0.7120.112. These two approaches demonstrated significantly better performance than the above cryo-EM map-based methods, further demonstrating the strong correlation between protein structure topology and flexibility.

We further combined the information from the PDB structure and cryo-EM map and input them into the network, which is the main method proposed in this work. We refer to this method, along with the neural network it employs, simply as RMSF-net. As outlined in the Overview of RMSF-net procedure section, RMSF-net takes the dual-channel feature of density from the cryo-EM map and PDB simulated map at the same spatial position as input, while the main part of the network remains the same as RMSF-net_cryo and RMSF-net_pdb. Conducting the same cross-validation on RMSF-net, the results revealed an average correlation coefficient of 0.746, with a median of 0.767 and a standard deviation of 0.127.

RMSF-net demonstrated an approximate 10% improvement compared to the baseline of RMSF-net_cryo, and a 15% enhancement over DEFMap. Figure2d presents a comparison of the distribution of data quantities at different test correlation levels for the three methods. Overall, the two cryo-EM map-based methods (DEFMap and RMSF- net_cryo) exhibit similar distribution shapes, while the distribution of RMSF-net is more concentrated, focusing on the range between 0.6 and 0.9. Nearly half of the data points cluster around 0.7 to 0.8, and close to one-third fall between 0.8 and 0.9. In comparison, the two PDB-based methods in Fig.2f exhibit similar distributions with RMSF-net. This suggests that the structure information from PDB models plays a primary role in the ability of RMSF-net to predict flexibility. On the other hand, RMSF-net further outforms the PDB-based methods through combination with information from cryo-EM maps, indicating image features related to structural flexibility in the cryo-EM map make an effective auxiliary. Regarding the robustness of these approaches, Table1 demonstrates that RMSF-net_pdb and RMSF-net_pdb01 exhibited less deviation on the test set compared to RMSF-net, while RMSF-net_cryo displayed the highest deviation. This indicates the flexibility-related information in cryo-EM maps is unstable compared to that in PDB models, which might be caused by noise and alignment errors in cryo-EM maps.

The experimental results above prove that the combination of cryo-EM map and PDB model results in the superior performance of RMSF-net. As shown in Fig. 2e, the prediction of RMSF-net is better in most cases, comparing models utilizing only the cryo-EM map or only the PDB model. Because the PDB models are built from the corresponding cryo-EM maps, their spatial coordinates are naturally aligned, and their structural information is consistent. Moreover, the PDB model built from the cryo-EM map corresponds precisely to the average position of the structure, and the cryo-EM map reconstructed from multiple particles in the sample corresponds to the information of multiple instantaneous conformations. By combining the expectation and conformational variance from the two sources, we believe that this structural consistency and complementarity create an alignment effect, and promote the superior performance of RMSF-net. However, structural deviations may exist between the PDB model and the cryo-EM map in some instances, or the PDB model may only partially occupy the cryo-EM map. These anomalies might lead to subpar performance of RMSF-net_cryo compared to RMSF-net and RMSF-net_pdb. To exclude the influence of these factors, we performed dataset filtering by excluding data points with test correlations below 0.4 for RMSF-net_cryo and compared the three models on the filtered dataset. Figure2g shows that RMSF-net and RMSF-net_pdb still demonstrated better performance overall compared to RMSF-net_cryo on the filtered dataset. The test correlations for the three models were 0.7600.084, 0.7330.083, and 0.6840.1, respectively. When the filtering threshold was increased to 0.5, the correlations for the three models were 0.7610.08, 0.7340.08, and 0.6980.084, respectively, showing consistent results.

Figure3a showcases RMSF-net predictions for three relatively small proteins: the bluetongue virus membrane-penetration protein VP5 (EMD-6240/PDB 3J9E)41, African swine fever virus major capsid protein p72 (EMD-0776/PDB 6KU9)42 and C-terminal truncated human Pannexin1 (EMD-0975/PDB 6LTN)43. Among these, 3J9E displays an irregular shape composed of loops and alpha helices, while 6KU9 and 6LTN exhibit good structural symmetry with beta sheets and alpha helices, respectively. The predictions by RMSF-net exhibit strong agreement with the ground truth for these proteins, yielding correlation coefficients of 0.887, 0.731, and 0.757, respectively, as depicted in Fig.3b. Predictions by RMSF-net_cryo and RMSF-net_pdb are supplied in Supplementary Figs.S1S3. On 3J9E and 6KU9, both RMSF-net_cryo and RMSF- net_pdb perform well, achieving correlations of 0.82, 0.69, and 0.881, 0.7, respectively. However, on 6LTN, RMSF- net_cryo only exhibits a correlation of 0.3 with the ground truth, possibly due to ring noise in the intermediate region of EMD-0975, leading to model misjudgment. In contrast, RMSF-net_pdb achieves a higher correlation of 0.767 on this protein, even surpassing RMSF-net, suggesting that instability factors in cryo-EM maps have a slight impact on RMSF-nets inference.

a, b Show RMSF-net performance on three small proteins (EMD-6240/3J9E, EMD-0776/6KU9, EMD-0975/6LTN). a Visual comparison of RMSF-net predictions and ground truth. The first column shows the cryo-EM map overlaid on the PDB model. The second and third columns depict the PDB model colored according to the normalized RMSF values, using colors indicated by the color bar on the right. The second column represents the RMSF predictions by RMSF-net, and the third column represents the ground truth RMSF values from MD simulations. b Correlation plots between normalized RMSF-net predicted values and normalized ground truth values at residue levels for each protein. The normalized RMSF for residues is calculated as the average normalized RMSF of all atoms within that residue. Normalization is achieved by subtracting the mean and dividing by the standard deviation of RMSF in the PDB model. CC is an abbreviation for correlation coefficient. c RMSF-net performance on large protein complexes (PDB entry 6FBV, 6KU9, and 6LTN). The first and second rows display the PDB models of three protein complexes, with colors corresponding to the normalized RMSF values, indicated by the color bar on the right. The first row is colored based on the RMSF predictions by RMSF-net. The second row is colored according to the ground truth RMSF values from MD simulations. The third row demonstrates the profiles of normalized RMSF-net predicted values and normalized ground truth values along residue sequences for three proteins, where residue IDs correspond to the sequence order in the PDB models. CC is an abbreviation for correlation coefficient. d The dynamic change of NTCP protein from the inward-facing conformation to the open-pore conformation (PDB entry 7PQG/7PQQ). From left to right, the first cartoon illustrates the conformational transition of 7PQG to 7PQQ. The second, third, and fourth cartoons depict the dynamic changes in the conformational transition of this protein from the front, back, and top-down perspectives, respectively, where the RMSF difference is calculated and colored on the 7PQQ using the color bar provided in the upper right corner. The RMSF visualization was generated using PyMOL59 and UCSF ChimeraX60.

In addition to small proteins, Fig.3c presents test examples of large protein complexes, including Mycobacterium tuberculosis RNA polymerase with Fidaxomicin44 (EMD-4230/PDB 6FBV), RSC complex45 (EMD-9905/PDB 6K15), and coronavirus spike glycoprotein trimer46 (EMD-6516/PDB 3JCL). RMSF-net also excels on these complex structures, achieving correlation coefficients of 0.902, 0.819, and 0.804, respectively. Remarkably, these proteins are associated with human diseases and drug development, emphasizing the potential value of RMSF-net in facilitating drug development efforts. Predictions by RMSF-net_cryo and RMSF-net_pdb for these proteins are provided in Supplementary Figs.S4S6, with correlation coefficients of 0.759 and 0.859 for 6FBV, 0.661 and 0.774 for 6K15, and 0.635 and 0.784 for 3JCL, respectively. Comparing model predictions in the Supplementary Figs. shows that RMSF-net aligns more closely with RMSF-net_pdb, supporting the previous argument that information from the PDB model plays a primary role in RMSF-nets feature processing.

We further applied RMSF-net to investigate dynamic changes in the NTCP protein during its involvement in biochemical processes. NTCP (Na+/taurocholate co-transporting polypeptide)47 is a vital membrane transport protein predominantly located on the cell membrane of liver cells in the human body. It is responsible for transporting bile acids from the bloodstream into liver cells and plays a crucial role in the invasion and infection processes of liver viruses such as the hepatitis B virus. Therefore, structural and functional analysis of NTCP is crucial for liver disease treatment. In a previous study, two NTCP conformations, the inward-facing conformation (EMD-13593/PDB 7PQG) and the open pore conformation (EMD-13596/PDB 7PQQ), were resolved using cryo-electron microscopy. We performed dynamic predictions using RMSF-Net on these two conformations and revealed dynamic changes during the transition from the inward-facing to the open pore state, as shown in Fig.3d. Compared to the inward-facing state, the open pore conformation displayed increased dynamics in the TM1 and TM6 regions of the panel domain, the TM7 and TM9 regions of the core domain, and the X motif in the center. Other regions maintained stability or exhibited enhanced stability. We hypothesize that the increased flexibility in these regions is associated with the relative motion between the panel and core domain in the open pore state, facilitating the transport of bile acids and the binding of preS1 of HBV in this conformation.

Despite exhibiting high-performance, RMSF-net is trained and tested based on relatively short-term (30ns) MD simulations. In order to determine whether the structural fluctuation patterns obtained through the 30ns simulation are stable enough for the model training, we performed longer simulations on three proteins46,48,49 (PDB 3JCL, 6QNT, and 6SXA). The detailed setup and results are provided in the Supplementary Information (section MD simulations over longer time periods). In these cases, the results from the previous 30ns simulation showed strong correlations with the results up to 500ns, indicating that the 30ns simulation can effectively capture stable structural fluctuation mode, thus qualified to serve as the foundation for our model training. The RMSF-net predictions also maintain high correlations with the long-term MD simulations, proving that the trained model has effectively absorbed the structural fluctuation patterns in MD simulations.

In addition, to make large-scale simulations feasible, we removed small molecules and ionic ligands during MD simulations, but ligands are included in the input density of RMSF-net. Therefore, the simulation results may be inaccurate regarding the flexibility of the ligand-containing protein, especially the flexibility near the ligands. To assess the impact of this treatment, we performed MD simulations for two protein systems containing ligands: the structure of the cargo bound AP-1:Arf1:tetherin-Nef closed trimer monomeric subunit50 (EMD-7537/PDB 6CM9) and spastin hexamer in complex with substrate51 (EMD-20226/PDB 6P07). Configurations of the MD simulations are provided in the Supplementary Information (section MD simulation configurations for ligand-binding proteins and membrane proteins). The simulations with and without ligands exhibited high correlations in terms of RMSF, with correlations of 0.748 and 0.859 for the two proteins, respectively (Table2). The predictions of RMSF-net also maintain comparable correlations with the additional simulations, of 0.757 and 0.859 respectively. This indicates that, overall, small molecule ligands have little impact on protein structural flexibility. However, we observed that near the ligands, the RMSF obtained from simulations without ligands is indeed higher than that obtained from simulations with ligands, as shown in Fig.4. In these regions, the predicted values of RMSF-net are even closer to simulations with ligands, i.e., the predicted RMSF is lower. Our understanding of this phenomenon is that on one hand, the structure of the ligands is relatively small compared to the protein structure, so the scope of influence is limited and does not greatly affect the global distribution of structural flexibility. On the other hand, the local structures near the ligands became relaxed without the ligands in the original simulations. The deep model utilizes the learned pattern between protein internal structures and flexibility to infer the dynamics of the protein structure binding to the ligands. Although this is only an approximation, it has some correction effect.

a Results for 6CM9. The top panel shows scatter plots of RMSF on residues, with colors corresponding to the three approaches as indicated in the legend. The middle panels present the visualizations of RMSF from the three approaches on the PDB structure, with colors corresponding to the normalized RMSF values, indicated by the color bar on the right. Ligands (GTP) are shown as yellow sticks, residues within 5 of the ligands are shown as surfaces, and black boxes indicate their positions. The bottom panels display the RMSF of structures near the ligands separately, highlighting regions within the black boxes in the middle panels. b Results for 6P07. The top panel shows scatter plots of RMSF on residues, with colors corresponding to the three approaches as indicated in the legend. The middle panels present the visualizations of RMSF from the three approaches on the PDB structure, with colors corresponding to the normalized RMSF values, indicated by the color bar on the right. Ligands (ADP, ATP) are shown as yellow sticks, and residues within 5 of the ligands are shown as surfaces. The bottom panels display the RMSF of structures near the ligands separately.

Another aspect of MD simulation is the simulation of membrane proteins. In the sample preparation of cryo-EM, membrane proteins are purified and separated from membrane structures (Van Heel et al., 2000), which means that the structure and dynamics of membrane proteins in cryo-EM reflect their free state in solution. Correspondingly, our dynamic simulations were also performed in the membrane-free state. Therefore, our model is applicable to proteins in their free state in solution. However, in vivo, membrane proteins are attached to the cell membrane, so considering the simulation environment of the membrane will more accurately simulate their dynamics in biological systems. To explore the differences brought by the membranes, we conducted MD simulations in a membrane environment on two membrane proteins, the cryo-EM structure of TRPV5 (1-660) in nanodisc52 (EMD-0593/PDB 6O1N) and cryo-EM structure of MscS channel, YnaI53 (EMD-6805/PDB 5Y4O). The configurations of the MD simulations are provided in Supplementary Information (section MD simulation configurations for ligand-binding proteins and membrane proteins). The results, as shown in Table3 and Fig.5, demonstrate that the RMSF obtained from MD simulations with and without membranes maintain consistency overall, with correlations of 0.767 and 0.678 on 6O1N and 5Y4O respectively. The correlations between RMSF-net predictions and MD simulations with membranes are 0.804 and 0.675 respectively for these two proteins. As shown in Fig.5b, the presence of the membrane leads to some changes in the flexibility of 5Y4O: in the upper region of 5Y4O, the RMSF obtained from MD simulations with the membrane is lower than that from MD simulations without membrane and RMSF-net. We speculate that this region may be influenced by membrane constraints, resulting in decreased flexibility, but the overall flexibility distribution remains largely unchanged. In addition, we observe that on these two highly symmetrical structures, the predictions of RMSF-net also maintain symmetry similar to MD simulations.

a Results for 6O1N. b Results for 5Y4O. The top panels show scatter plots of RMSF on residues, with colors corresponding to the three approaches indicated in the legend. The bottom panels present the visualizations of RMSF from the three approaches on the PDB structure, with colors corresponding to the normalized RMSF values, indicated by the color bar inthe middle.

The experimental results also indicate that our method exhibits consistent performance across cryo-EM maps with varying resolutions. The resolution of cryo-EM maps signifies the minimum scale at which credible structures are discernible within the map. In our dataset, there are more maps in the resolution range of 34 compared to 23, as shown in Fig.6a. Considering that our method takes cryo-EM maps of various resolutions into network training, concerns arise regarding potential model bias towards specific map resolutions. To address this concern, we conducted an analysis of the test performance of RMSF-net_cryo, Occ2RMSF-net, and RMSF-net compared to RMSF- net_pdb on maps of different resolutions. Results demonstrate that these models exhibit no significant performance differences across different resolution ranges, as shown in Fig.6bd. Only a minor deviation is observed in the range of 22.5, which is statistically insignificant due to the limited number of 7 data points. This underscores that neural networks can fit data indiscriminately within the high-resolution range of 24, without the need to process the maps to a uniform resolution at the preprocessing stage. The similar distributions of RMSF-net and Occ2RMSF-net across different resolutions, shown in Fig.6b, e, further support the conclusion that dynamic inference from cryo-EM maps relies on an intermediate process of structural resolution. Furthermore, Fig.6d demonstrates that, on average, RMSF-net outperforms RMSF-net_pdb across different resolution ranges, indicating that cryo-EM maps have an auxiliary effect on PDB models for dynamic analysis across different resolutions.

a Resolution distribution of cryo-EM maps in the dataset. bd Shows the performance of RMSF prediction methods on different resolution maps in the dataset. For resolution groups 22.5, 2.53, 33.5 and 3.54, the sample sizes N=7, 42, 120, and 166. b Distribution of correlation coefficients (CC) of RMSF-net_cryo on maps across the four resolution ranges. c Distribution of correlation coefficients of Occ2RMSF-net across the four resolution ranges. d Distribution of correlation difference between RMSF-net and RMSF-net_pdb on the four resolution ranges. In bd, the center white line, the lower and upper bound of the box in each violin plot indicate the median, the first quartile (Q1), and the third quartile (Q3), respectively. The whiskers of the boxes indicate Q1-1.5*IQR and Q3+1.5*IQR, with IQR representing the interquartile range. The bounds of the violin plots show the minima and maxima, and the width indicates the density of the data. e, f Show the relationship between RMSF-net run time and data size. e The relationship between the RMSF-net run time on CPUs and the weighted sum of map size and PDB size among data points of the dataset. The map size and PDB size are weighted as 0.0015:0.9985 from linear regression, both taking k voxels as the units. The weighted sum range is set below 1000, which encompasses the majority of the data. The full range is presented in Supplementary Fig.S12a. f The relationship between the RMSF-net run time on GPUs and the map size among data points of the dataset. The map size is set below 3003, which encompasses the majority of the data. The full range is presented in Supplementary Fig.S12b.

In addition to its superior performance, RMSF-net demonstrates rapid inference capabilities and minimal storage overhead, whether running on high-performance GPUs or CPUs. Using a computer equipped with 10 CPU cores and 2 NVIDIA Tesla A100 GPUs, we conducted runtime assessments on the dataset, revealing a strong linear relationship between the execution time of RMSF-net and the data size. Moreover, compared to conventional MD simulations and DEFMap, this approach achieves substantial acceleration in processing speed.

As shown in Fig.6e, Supplementary Figs.S11c and S12a, when executed on CPUs, RMSF-nets runtime is directly proportional to the weighted summation of cryo-EM map size and PDB model size, with a weight ratio of map size to PDB size at 0.0015:0.9985, both measured in units of k voxels. For most data, the weighted sum of map size and PDB size is within 500 k voxels, and processing can be completed in under a minute. When executed on GPUs, since most of the time is spent on preprocessing, the total time is linearly related to the map size, as shown in Fig.6f, Supplementary Figs.S11a and S12b. For most maps with sizes below 3003 voxels, computations can be completed within 30s. Detailed information regarding the RMSF-net processing time is provided in the Supplementary Information (section Details of the RMSF-net processing time).

For comparative analysis, we selected ten relatively smallmaps from our dataset and performed runtime assessments using RMSF-net and DEFMap. As presented in Supplementary TablesS1S3, across these ten data points, DEFMap exhibited processing times of 45.9431.84minutes on CPUs and 37.5110.51minutes on GPUs, concurrently generating data files of size 11.988.30 GB. In contrast, RMSF-net showcased remarkable efficiency, with runtime durations of 16.669.60s and 3.091.45s on CPUs and GPUs, respectively, and yielding data files of 66.3031.06MB. Both in terms of storage occupancy and time consumption, RMSF-net demonstrates significant improvements over DEFMap. Furthermore, in contrast to extended MD simulations, which often require hours or even days to perform simulations of 30ns on individual proteins, RMSF-net delivers predictions with an average correlation of up to 0.75 and saves time and resources significantly, making it an ultra-fast means of performing protein dynamic analysis.

Read this article:
Accurate Prediction of Protein Structural Flexibility by Deep Learning Integrating Intricate Atomic Structures and Cryo ... - Nature.com

Empowering Manufacturing Innovation: How AI & GenAI Centers of Excellence can drive Modernization | Amazon Web … – AWS Blog

Introduction

Technologies such as machine learning (ML), artificial intelligence (AI), and Generative AI (GenAI) unlock a new era of efficient and sustainable manufacturing while empowering the workforce. Areas where AI can be applied in manufacturing include predictive maintenance, defect detection, supply chain visibility, demand forecasting, product design, and many more. Benefits include improving uptime and safety, reducing waste and costs, improving operational efficiency, enhancing products and customer experience, and faster time to market. Many manufacturers have started adopting AI. Georgia-Pacific uses computer vision to reduce paper tears, improving quality and increasing profits by millions of dollars. Baxter was able to prevent 500 hours of downtime in just one facility with AI-powered predictive maintenance.

However, many companies struggle (per recent World Economic Forum study) to fully leverage AI due to weak foundations in organization and technology. Reasons include lack of skills, resistance to change, lack of quality data, and challenges in technology integration. AI projects often get stuck at a pilot stage and do not scale for production use. Successfully leveraging AI and Gen AI technologies requires a holistic approach across cultural and organizational aspects, in addition to technical expertise. This blog explores how an AI Center of Excellence (AI CoE) provides a comprehensive approach to accelerate modernization through AI and Gen AI adoption.

The manufacturing industry faces unique challenges for AI adoption as it requires merging the traditional physical world (Operational Technology, or OT) and the digital world (Information Technology, or IT). Challenges include cultural norms, organizational structures, and technical constraints.

Factory personnel deal with mission critical OT systems. They prioritize uptime and safety and perceive change as risky. Cybersecurity was not a high priority, as systems were isolated from the open internet. Traditional factory operators rely on their experience gained through years of making operational decisions. Understanding how AI systems arrive at their decisions is crucial for gaining their trust and overcoming adoption barriers. Factory teams are siloed, autonomous, and operate under local leadership, making AI adoption challenging. Initial investment in AI systems and infrastructure can be substantial, depending on the approach, and many manufacturers may struggle to justify the expense.

AI relies on vast amounts of high-quality data, which may be fragmented, outdated, or inaccessible in many manufacturing environments. Legacy systems in manufacturing often run on vendor-dependent proprietary software, which use non-standard protocols and data formats, posing integration challenges with AI. Limited internet connectivity in remote locations requires overcoming latency challenges as manufacturing systems rely on accurate and reliable real-time response. For example, an AI system needs to process sensor data and camera images in real-time to identify defects as products move down the line. A slight delay in detection could lead to defective products passing through quality control. Additionally, manufacturing AI systems need to meet stringent regulatory requirements and industry standards, adding complexity to AIs development and deployment processes. The field of AI is still evolving, and there is a lack of standardization in tools, frameworks, and methodologies.

Transformative AI adoption requires commitment and alignment from both OT and IT senior leadership. OT leaders benefit by realizing that a connected, smart industrial operation simplifies work without compromising uptime, safety, security, and reliability. Likewise, IT leaders demonstrate business value through AI technologies when they understand the uniqueness of shop floor requirements. In fact, OT can be viewed as a business function enabled by IT. Integrating OT and IT perspectives is crucial for realizing AIs business value, such as revenue growth, new products, and improved productivity. Leadership must craft a clear vision linking AI to strategic goals and foster a collaborative culture to drive functional and cultural change.

While vision provides the why behind AI adoption, successful AI adoption requires vision to be translated into action. The AI CoE bridges the gap between vision and action.

Overview: The AI CoE is a multi-disciplinary team of passionate AI and manufacturing subject matter experts (SMEs) who drive responsible AI adoption. They foster human-centric AI, standardize best practices, provide expertise, upskill the workforce, and ensure governance. They develop a modernization roadmap focused on edge computing and modern data platforms. The AI CoE can start small with 2-4 members and scale as needed. For the AI COE to be successful, it requires executive sponsorship and the ability to act autonomously. Figure 1 outlines the core capabilities of the AI CoE.

Figure 1 AI CoE capabilities

The AI CoE should champion explainable AI in manufacturing, where safety and uptime are critical. For example, when an AI model predicts equipment malfunction, a binary AI output such as failure likely or failure unlikely wont earn trust with factory personnel. Instead, an output such as Failure likely due to a 15% increase in vibration detected in the bearing sensor, similar to historical bearing failure patterns would make people more likely to trust AIs advice. AWS provides multiple ways to enhance AI model explainability.

The AI CoE should partner with HR and leadership to upskill staff in the AI-powered workplace by developing career paths and training programs that leverage existing skills. GenAI solutions can help close the skills gap by showcasing how AI complements worker expertise. Leaders should emphasize how AI-enabled capabilities can free up time for complex problem-solving and interpreting AI insights. For example, Hitachi, Ericsson, and AWS demonstrated computer vision by leveraging a private 5G wireless network that could inspect 24 times more components simultaneously than manual inspections to detect defects.

The AI CoE ensures collaboration and joint decision rights between AI solution builders and factory domain experts. Together, they work backwards from business goals, breaking down silos and converging on AI solutions to achieve desired results. Additionally, the CoE acts as a hub to pinpoint impactful AI use cases, evaluating factors such as data availability, quick success potential, and business value. For example, in a textile factory, the AI CoE can leverage data analysis to optimize energy-intensive processes, delivering cost savings and sustainability benefits. Explore additional use cases with the AWS AI Use Case explorer.

Governance and data platforms are critical for scaling manufacturing AI. The CoE establishes policies, standards, and processes for responsible, secure, and ethical AI use, including data governance and model lifecycle management. AWS offers several tools to build and deploy AI solutions responsibly. The CoE develops a secure data platform to connect diverse sources, enable real-time analysis, scalable AI, and achieve regulatory compliance. This data foundation lays the groundwork for broader AI adoption, as demonstrated by Mercks manufacturing data and analytics platform on AWS, which tripled performance and reduced costs by 50%.

The AI CoE evaluates and standardizes AI and GenAI technologies, tools, and vendors based on manufacturing needs, requirements, and best practices. AWS offers a comprehensive set of AI and Gen AI services to build, deploy, and manage solutions that reinvent customer experiences. Scaling AI requires automation. An AI CoE designs automated data and deployment pipelines that reduce manual work and errors, accelerating time-to-market. Toyota exemplifies AI deployment at scale by using AWS services to process data from millions of vehicles, enabling real-time responses in emergencies.

The value of the AI CoE should be measured in business terms. This requires a holistic approach that is a mix of both hard and soft metrics. Metrics should include business outcomes such as ROI, improved customer experience, efficiency, and productivity gains from manufacturing operations. Internal surveys can gauge employee and stakeholder sentiment towards AI. These metrics help stakeholders understand the value of the AI CoE and investments.

Figure 2 Steps for building AI CoE foundations

Setting up an AI CoE requires a phased approach as illustrated in Figure 2. The first step is to secure executive support from both OT and IT leadership. The next step is to assemble a diverse team of experts consisting of shop floor personnel and AI IT experts. The team is trained in AI and defines the objectives of the CoE. They identify and deliver pilot use cases to demonstrate value. In parallel, they develop and enhance governance frameworks, provide training, foster collaboration, gather feedback, and iterate for continuous improvement. Integrating Gen AI can further enhance the CoEs content creation and problem-solving abilities, accelerating AI adoption across the enterprise. An AI CoE evolves over time. Initially, it can take on a hands-on role, building expertise, setting standards, and launching pilot projects. Over time, they transition to an advisory role, providing training, facilitating collaboration, and tracking success metrics. This empowers the workforce and ensures long-term AI adoption.

AI and GenAI technologies have the potential to create radical, new product designs, drive unprecedented levels of manufacturing productivity, and optimize supply chain applications. Adopting these technologies requires a holistic approach that addresses technical, organizational, and cultural challenges. The AI CoE acts as a catalyst by bridging the gap between business needs and responsible AI solutions. It fosters collaboration, training, and data solutions to optimize efficiency, cut costs, and spur innovation on the factory floor.

Artificial Intelligence and Machine Learning for Industrial

AWS Industrial Data Platform (IDP)

AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI

The organization of the future: Enabled by gen AI, driven by people

Deloitte: 2024 manufacturing industry outlook

World Economic Forum: Mastering AI quality for successful adoption of AI in manufacturing

Harnessing the AI Revolution in Industrial Operations: A Guidebook

Managing Organizational Transformation for Successful OT/IT Convergence

The Future of Industrial AI in Manufacturing

Digital Manufacturing escaping pilot purgatory

Nurani Parasuraman is part of the Customer Solutions team in AWS. He is passionate about helping enterprises succeed and realize significant benefits from cloud adoption by driving basic migration to large-scale cloud transformation across people, processes, and technology. Prior to joining AWS, he held multiple senior leadership positions and led technology delivery and transformation in financial services, retail, telecommunications, media, and manufacturing. He has an MBA in Finance and a BS in Mechanical Engineering.

Saurabh Sharma is a Technical and Strategic Sr. Customer Solutions Manager (CSM) at AWS. He is part of the account team that supports enterprise customers in their cloud transformation journey. In this role, Saurabh works with customers to drive cloud strategy & adoption, provides thought leadership on how to move and modernize their workloads that can help them move fast to cloud, and drive a culture of innovation

Matthew leads the Customer Solutions organization for our North American Automotive & Manufacturing division. He and his team focus on helping customers transformation across people, process, and technology. Prior to joining AWS, Matthew led efforts for numerous organization to transform their operational processes using automation, and AI/ML technologies

Go here to read the rest:
Empowering Manufacturing Innovation: How AI & GenAI Centers of Excellence can drive Modernization | Amazon Web ... - AWS Blog

Promising directions of machine learning for partial differential equations – Nature.com

Brezis, H. & Browder, F. Partial differential equations in the 20th century. Adv. Math. 135, 76144 (1998).

Article MathSciNet Google Scholar

Dissanayake, M. & Phan-Thien, N. Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10, 195201 (1994).

Article Google Scholar

Rico-Martinez, R. & Kevrekidis, I. G. Continuous time modeling of nonlinear systems: a neural network-based approach. In Proc. IEEE International Conference on Neural Networks 15221525 (IEEE, 1993).

Gonzlez-Garca, R., Rico-Martnez, R. & Kevrekidis, I. G. Identification of distributed parameter systems: a neural net based approach. Comput. Chem. Eng. 22, S965S968 (1998).

Article Google Scholar

Raissi, M., Perdikaris, P. & Karniadakis, G. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686707 (2019).

Article MathSciNet Google Scholar

Yu, B. et al. The Deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6, 112 (2018).

Article MathSciNet Google Scholar

Mller, J. & Zeinhofer, M. Deep Ritz revisited. Preprint at https://arxiv.org/abs/1912.03937 (2019).

Gao, H., Zahr, M. J. & Wang, J.-X. Physics-informed graph neural Galerkin networks: a unified framework for solving PDE-governed forward and inverse problems. Comput. Methods Appl. Mech. Eng. 390, 114502 (2022).

Article MathSciNet Google Scholar

Bruna, J., Peherstorfer, B. & Vanden-Eijnden, E. Neural Galerkin schemes with active learning for high-dimensional evolution equations. J. Comput. Phys. 496, 112588 (2024).

Article MathSciNet Google Scholar

Battaglia, P. W. et al. Relational inductive biases, deep learning and graph networks. Preprint at https://arxiv.org/abs/1806.01261 (2018).

Sanchez-Gonzalez, A. et al. Learning to simulate complex physics with graph networks. In Proc. International Conference on Machine Learning 84598468 (PMLR, 2020).

Burger, M. et al. Connections between deep learning and partial differential equations. Eur. J. Appl. Math. 32, 395396 (2021).

Article Google Scholar

Loiseau, J.-C. & Brunton, S. L. Constrained sparse Galerkin regression. J. Fluid Mech. 838, 4267 (2018).

Article MathSciNet Google Scholar

Cranmer, M. et al. Lagrangian neural networks. Preprint at https://arxiv.org/abs/2003.04630 (2020).

Brunton, S. L., Noack, B. R. & Koumoutsakos, P. Machine learning for fluid mechanics. Annu. Rev. Fluid Mech. 52, 477508 (2020).

Article MathSciNet Google Scholar

Wang, R., Walters, R. & Yu, R. Incorporating symmetry into deep dynamics models for improved generalization. In International Conference on Learning Representations (ICLR, 2021).

Wang, R., Kashinath, K., Mustafa, M., Albert, A. & Yu, R. Towards physics-informed deep learning for turbulent flow prediction. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 14571466 (ACM, 2020).

Brandstetter, J., Berg, R. V. D., Welling, M. & Gupta, J. K. Clifford neural layers for PDE modeling. In Eleventh International Conference on Learning Representations (ICLR, 2023)

De Haan, P., Weiler, M., Cohen, T. & Welling, M. Gauge equivariant mesh CNNS: anisotropic convolutions on geometric graphs. In International Conference on Learning Representations (ICLR, 2021).

Brandstetter, J., Welling, M. & Worrall, D. E. Lie point symmetry data augmentation for neural PDE solvers. In Proc. International Conference on Machine Learning 22412256 (PMLR, 2022).

Brandstetter, J., Worrall, D. & Welling, M. Message passing neural PDE solvers. Preprint at https://arxiv.org/abs/2202.03376 (2022).

Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3, 422440 (2021).

Article Google Scholar

Brunton, S. L. & Kutz, J. N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems and Control 2nd edn (Cambridge Univ. Press, 2022).

Bongard, J. & Lipson, H. Automated reverse engineering of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 104, 99439948 (2007).

Article Google Scholar

Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 8185 (2009).

Article Google Scholar

Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 39323937 (2016).

Article MathSciNet Google Scholar

Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582 (2023).

Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv 3, e1602614 (2017).

Article Google Scholar

Schaeffer, H. Learning partial differential equations via data discovery and sparse optimization. Proc. Math. Phys. Eng. Sci. 473, 20160446 (2017).

MathSciNet Google Scholar

Zanna, L. & Bolton, T. Data-driven equation discovery of ocean mesoscale closures. Geophys. Res. Lett. 47, e2020GL088376 (2020).

Article Google Scholar

Schmelzer, M., Dwight, R. P. & Cinnella, P. Discovery of algebraic Reynolds-stress models using sparse symbolic regression. Flow Turbulence Combustion 104, 579603 (2020).

Article Google Scholar

Beetham, S. & Capecelatro, J. Formulating turbulence closures using sparse regression with embedded form invariance. Phys. Rev. Fluids 5, 084611 (2020).

Article Google Scholar

Beetham, S., Fox, R. O. & Capecelatro, J. Sparse identification of multiphase turbulence closures for coupled fluid-particle flows. J. Fluid Mech. 914, A11 (2021).

Article MathSciNet Google Scholar

Bakarji, J. & Tartakovsky, D. M. Data-driven discovery of coarse-grained equations. J. Comput. Phys. 434, 110219 (2021).

Article MathSciNet Google Scholar

Maslyaev, M., Hvatov, A. & Kalyuzhnaya, A. Data-driven partial derivative equations discovery with evolutionary approach. In Proc. Computational ScienceICCS 2019: 19th International Conference Part V 19, 635641 (Springer, 2019).

Xu, H., Zhang, D. & Wang, N. Deep-learning based discovery of partial differential equations in integral form from sparse and noisy data. J. Comput. Phys. 445, 110592 (2021).

Article MathSciNet Google Scholar

Xu, H., Chang, H. & Zhang, D. DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm. J. Comput. Phys. 418, 109584 (2020).

Article MathSciNet Google Scholar

Xu, H., Zhang, D. & Zeng, J. Deep-learning of parametric partial differential equations from sparse and noisy data. Phys. Fluids 33, 037132 (2021).

Article Google Scholar

Xu, H. & Zhang, D. Robust discovery of partial differential equations in complex situations. Phys. Rev. Res. 3, 033270 (2021).

Article Google Scholar

Chen, Y., Luo, Y., Liu, Q., Xu, H. & Zhang, D. Symbolic genetic algorithm for discovering open-form partial differential equations (SGA-PDE). Phys. Rev. Res. 4, 023174 (2022).

Article Google Scholar

Taira, K. & Colonius, T. The immersed boundary method: a projection approach. J. Comput. Phys. 225, 21182137 (2007).

Article MathSciNet Google Scholar

Colonius, T. & Taira, K. A fast immersed boundary method using a nullspace approach and multi-domain far-field boundary conditions. Comput. Methods Appl. Mech. Eng. 197, 21312146 (2008).

Article MathSciNet Google Scholar

Van Breugel, F., Kutz, J. N. & Brunton, B. W. Numerical differentiation of noisy data: a unifying multi-objective optimization framework. IEEE Access 8, 196865196877 (2020).

Article Google Scholar

Messenger, D. A. & Bortz, D. M. Weak SINDy: Galerkin-based data-driven model selection. Multiscale Model. Simul. 19, 14741497 (2021).

Article MathSciNet Google Scholar

Messenger, D. A. & Bortz, D. M. Weak SINDy for partial differential equations. J. Comput. Phys. 443, 110525 (2021).

Article MathSciNet Google Scholar

Schaeffer, H. & McCalla, S. G. Sparse model selection via integral terms. Phys. Rev. E 96, 023302 (2017).

Article MathSciNet Google Scholar

Fasel, U., Kutz, J. N., Brunton, B. W. & Brunton, S. L. Ensemble-SINDy: robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc. R. Soc. A 478, 20210904 (2022).

Article MathSciNet Google Scholar

Gurevich, D. R., Reinbold, P. A. & Grigoriev, R. O. Robust and optimal sparse regression for nonlinear PDE models. Chaos 29, 103113 (2019).

Article MathSciNet Google Scholar

Alves, E. P. & Fiuza, F. Data-driven discovery of reduced plasma physics models from fully kinetic simulations. Phys. Rev. Res. 4, 033192 (2022).

Article Google Scholar

Reinbold, P. A., Gurevich, D. R. & Grigoriev, R. O. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Phys. Rev. E 101, 010203 (2020).

Article Google Scholar

Suri, B., Kageorge, L., Grigoriev, R. O. & Schatz, M. F. Capturing turbulent dynamics and statistics in experiments with unstable periodic orbits. Phys. Rev. Lett. 125, 064501 (2020).

Article Google Scholar

Reinbold, P. A., Kageorge, L. M., Schatz, M. F. & Grigoriev, R. O. Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression. Nat. Commun. 12, 3219 (2021).

Article Google Scholar

Pope, S. A more general effective-viscosity hypothesis. J. Fluid Mech. 72, 331340 (1975).

Article Google Scholar

Ling, J., Kurzawski, A. & Templeton, J. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 807, 155166 (2016).

Article MathSciNet Google Scholar

Duraisamy, K., Iaccarino, G. & Xiao, H. Turbulence modeling in the age of data. Annu. Rev. Fluid Mech. 51, 357377 (2019).

Article MathSciNet Google Scholar

Ahmed, S. E. et al. On closures for reduced order modelsa spectrum of first-principle to machine-learned avenues. Phys. Fluids 33, 091301 (2021).

Article Google Scholar

Supekar, R. et al. Learning hydrodynamic equations for active matter from particle simulations and experiments. Proc. Natl Acad. Sci. USA 120, e2206994120 (2023).

Article MathSciNet Google Scholar

Read the rest here:
Promising directions of machine learning for partial differential equations - Nature.com

The future of productivity agents with NinjaTech AI and AWS Trainium | Amazon Web Services – AWS Blog

This is a guest post by Arash Sadrieh, Tahir Azim, and Tengfui Xue from NinjaTech AI.

NinjaTech AIs mission is to make everyone more productive by taking care of time-consuming complex tasks with fast and affordable artificial intelligence (AI) agents. We recently launched MyNinja.ai, one of the worlds first multi-agent personal AI assistants, to drive towards our mission. MyNinja.ai is built from the ground up using specialized agents that are capable of completing tasks on your behalf, including scheduling meetings, conducting deep research from the web, generating code, and helping with writing. These agents can break down complicated, multi-step tasks into branched solutions, and are capable of evaluating the generated solutions dynamically while continually learning from past experiences. All of these tasks are accomplished in a fully autonomous and asynchronous manner, freeing you up to continue your day while Ninja works on these tasks in the background, and engaging when your input is required.

Because no single large language model (LLM) is perfect for every task, we knew that building a personal AI assistant would require multiple LLMs optimized specifically for a variety of tasks. In order to deliver the accuracy and capabilities to delight our users, we also knew that we would require these multiple models to work together in tandem. Finally, we needed scalable and cost-effective methods for training these various modelsan undertaking that has historically been costly to pursue for most startups. In this post, we describe how we built our cutting-edge productivity agent NinjaLLM, the backbone of MyNinja.ai, using AWS Trainium chips.

We recognized early that to deliver on the mission of tackling tasks on a users behalf, we needed multiple models that were optimized for specific tasks. Examples include our Deep Researcher, Deep Coder, and Advisor models. After testing available open source models, we felt that the out-of-the-box capabilities and responses were insufficient with prompt engineering alone to meet our needs. Specifically, in our testing with open source models, we wanted to make sure each model was optimized for a ReAct/chain-of-thought style of prompting. Additionally, we wanted to make sure the model would, when deployed as part of a Retrieval Augmented Generation (RAG) system, accurately cite each source, as well as any bias towards saying I dont know as opposed to generating false answers. For that purpose, we chose to fine-tune the models for the various downstream tasks.

In constructing our training dataset, our goal was twofold: adapt each model for its suited downstream task and persona (Researcher, Advisor, Coder, and so on), and adapt the models to follow a specific output structure. To that end, we followed the Lima approach for fine-tuning. We used a training sample size of roughly 20 million tokens, focusing on the format and tone of the output while using a diverse but relatively small sample size. To construct our supervised fine-tuning dataset, we began by creating initial seed tasks for each model. With these seed tasks, we generated an initial synthetic dataset using Metas Llama 2 model. We were able to use the synthetic dataset to perform an initial round of fine-tuning. To initially evaluate the performance of this fine-tuned model, we crowd-sourced user feedback to iteratively create more samples. We also used a series of benchmarksinternal and publicto assess model performance and continued to iterate.

We elected to start with the Llama models for a pre-trained base model for several reasons: most notably the great out-of-the-box performance, strong ecosystem support from various libraries, and the truly open source and permissive license. At the time, we began with Llama 2, testing across the various sizes (7B, 13B, and 70B). For training, we chose to use a cluster of trn1.32xlarge instances to take advantage of Trainium chips. We used a cluster of 32 instances in order to efficiently parallelize the training. We also used AWS ParallelCluster to manage cluster orchestration. By using a cluster of Trainium instances, each fine-tuning iteration took less than 3 hours, at a cost of less than $1,000. This quick iteration time and low cost, allowed us to quickly tune and test our models and improve our model accuracy. To achieve the accuracies discussed in the following sections, we only had to spend around $30k, savings hundreds of thousands, if not millions of dollars if we had to train on traditional training accelerators.

The following diagram illustrates our training architecture.

After we had established our fine-tuning pipelines built on top of Trainium, we were able to fine-tune and refine our models thanks to the Neuron Distributed training libraries. This was exceptionally useful and timely, because leading up to the launch of MyNinja.ai, Metas Llama 3 models were released. Llama 3 and Llama 2 share similar architecture, so we were able to rapidly upgrade to the newer model. This velocity in switching allowed us to take advantage of the inherent gains in model accuracy, and very quickly run through another round of fine-tuning with the Llama 3 weights and prepare for launch.

For evaluating the model, there were two objectives: evaluate the models ability to answer user questions, and evaluate the systems ability to answer questions with provided sources, because this is our personal AI assistants primary interface. We selected the HotPotQA and Natural Questions (NQ) Open datasets, both of which are a good fit because of their open benchmarking datasets with public leaderboards.

We calculated accuracy by matching the models answer to the expected answer, using the top 10 passages retrieved from a Wikipedia corpus. We performed content filtering and ranking using ColBERTv2, a BERT-based retrieval model. We achieved accuracies of 62.22% on the NQ Open dataset and 58.84% on HotPotQA by using our enhanced Llama 3 RAG model, demonstrating notable improvements over other baseline models. The following figure summarizes our results.

Looking ahead, were working on several developments to continue improving our models performance and user experience. First, we intend to use ORPO to fine-tune our models. ORPO combines traditional fine-tuning with preference alignment, while using a single preference alignment dataset for both. We believe this will allow us to better align models to achieve better results for users.

Additionally, we intend to build a custom ensemble model from the various models we have fine-tuned thus far. Inspired by Mixture of Expert (MoE) model architectures, we intend to introduce a routing layer to our various models. We believe this will radically simplify our model serving and scaling architecture, while maintaining the quality in various tasks that our users have come to expect from our personal AI assistant.

Building next-gen AI agents to make everyone more productive is NinjaTech AIs pathway to achieving its mission. To democratize access to this transformative technology, it is critical to have access to high-powered compute, open source models, and an ecosystem of tools that make training each new agent affordable and fast. AWSs purpose-built AI chips, access to the top open source models, and its training architecture make this possible.

To learn more about how we built NinjaTech AIs multi-agent personal AI, you can read our whitepaper. You can also try these AI agents for free at MyNinja.ai.

Arash Sadrieh is the Co-Founder and Chief Science Officer at Ninjatech.ai. Arash co-founded Ninjatech.ai with a vision to make everyone more productive by taking care of time-consuming tasks with AI agents. This vision was shaped during his tenure as a Senior Applied Scientist at AWS, where he drove key research initiatives that significantly improved infrastructure efficiency over six years, earning him multiple patents for optimizing core infrastructure. His academic background includes a PhD in computer modeling and simulation, with collaborations with esteemed institutions such as Oxford University, Sydney University, and CSIRO. Prior to his industry tenure, Arash had a postdoctoral research tenure marked by publications in high-impact journals, including Nature Communications.

Tahir Azim is a Staff Software Engineer at NinjaTech. Tahir focuses on NinjaTechs Inf2 and Trn1 based training and inference platforms, its unified gateway for accessing these platforms, and its RAG-based research skill. He previously worked at Amazon as a senior software engineer, building data-driven systems for optimal utilization of Amazons global Internet edge infrastructure, driving down cost, congestion and latency. Before moving to industry, Tahir earned an M.S. and Ph.D. in Computer Science from Stanford University, taught for three years as an assistant professor at NUST(Pakistan), and did a post-doc in fast data analytics systems at EPFL. Tahir has authored several publications presented at top-tier conferences such as VLDB, USENIX ATC, MobiCom and MobiHoc.

Tengfei Xue is an Applied Scientist at NinjaTech AI. His current research interests include natural language processing and multimodal learning, particularly using large language models and large multimodal models. Tengfei completed his PhD studies at the School of Computer Science, University of Sydney, where he focused on deep learning for healthcare using various modalities. He was also a visiting PhD candidate at the Laboratory of Mathematics in Imaging (LMI) at Harvard University, where he worked on 3D computer vision for complex geometric data.

Read the original here:
The future of productivity agents with NinjaTech AI and AWS Trainium | Amazon Web Services - AWS Blog

Generative AI vs. AI: Advantages, Limitations, Ethical Considerations – eWeek

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Generative artificial intelligence (AI) is valued for its ability to create new content, including text, images, video, and music. It uses AI algorithms to analyze patterns in datasets to mimic style or structure to replicate different types of content, and can be used to create deep-fake videos and voice messages.

Generative AI is a subset of artificial intelligence, which also includes a broad range of technologies that enable machines to perform tasks that once required human intelligence and judgment. Its often used to build systems with the cognitive capacity to mine data, and it continuously boosts its performance over the course of repeated events. Heres what you need to know about the benefits and logistics of using AI and generative AI as well as the ethical concerns of which to be aware.

Both generative AI and artificial intelligence,sometimes called traditional AI, use machine learning algorithms to obtain their results. However, they have different goals and purposes. Generative AI is intended to create new content, while AI goes much broader and deeperin essence, to wherever the algorithm coder wants to take it. AIs possible deployments include better decision-making, removing the tedium from repetitive tasks, or spotting anomalies and issuing alerts for cybersecurity. The following summary spells out the common differences between generative AI and AI:

To fully understand the relationship between generative AI and AI, its necessary to understand each of these technologies at a deeper level.

Generative AI is an open-ended and rapidly evolving form of artificial intelligence. Its major characteristics include the following:

With its ability to use source data for any number of creative tasks, generative AIs use cases range from product design to software development to fraud detection.

Generative AI helps in creating innovative designs that meet specific performance criteria, from prototyping to optimization of design, while minimizing not only material use but also waste. Additionally, generative AI succeeds at creating highly personalized product experiences by analyzing user data to create products that align with the preferences and needs of individual users. This personalization can help with cheating marketing and sales campaigns.

For the creative industries, generative AI can mimic various artistic styles, compose original music and even generate complete pieces of artwork. This application is expanding the horizons of creative expression and is being used by artists, musicians, and other content creators to increase their output.

Generative AI provides the ability to automate code generation, bug fixes, and optimization. This results in more efficient development cycles and higher-quality software. AI tools can also generate synthetic data for training and testing purposes, which plays an important role in developing robust AI applications.

Generative AI-powered chatbots and virtual assistants provide 24/7 assistance, personalize interactions, and handle complex queries. These tools raise customer satisfaction and operational efficiency by automating routine support tasks and offering faster responses than human operators.

In finance and insurance, generative AI is used to detect fraud and manage risk. It analyzes transaction patterns and identifies anomalies, then helps in creating detailed reports and summaries that aid in decision-making, thereby enhancing the overall security and reliability of financial operations.

Based on the significant advancements that keep enhancing generative AIs capabilities, its future is incredibly promising. Expect to see models becoming larger and more powerful, like GPT-4 and PaLM2, which are revolutionizing content creation and personalized customer communications. Such models enable businesses to generate high-quality, human-like outputs more efficiently, with impact seen across many market sectors.

We can also expect to see generative AI models run on a wider variety of hardware devices, which will open up an array of use cases. A notable trend is the rise of multimodal AI models that can understand and generate content across several forms of data, such as text, images, and audio. The result? Users will get more immersive and natural user experiences, especially in fields like virtual reality and augmented reality.

Additionally, generative AI is driving new levels of personalization by improving how it adapts products and services to individual preferences. Its therefore seen as a particularly aggressive driver of change across retail, marketing and ecommerce sectors.

Although artificial intelligence has enjoyed an enormously higher profile over the last few years, the history of AI stretches back to the 1940s. This traditional AI is the basis for generative AI, and while there are major differences, there is major overlap between these two technologies. To fully understand the topic, heres a deeper look at artificial intelligence itself.

Overall, traditional AI is focused on explicit programming to execute tasks with precision. The following are its core characteristics:

Artificial intelligence can compute exponentially faster than the fastest team of human experts, even as it handles far greater complexity. This capability enables an array of use cases, ranging from business automation to research and development to cybersecurity.

AI-driven automation is streamlining repetitive and manual business operations. Robotic process automation (RPA) uses AI to automate routine administrative tasks, freeing up human workers for more complex activities. AI algorithms are used to optimize supply chain management by predicting demand, managing inventory, and optimizing logistics.

In research and development (R&D), traditional AI accelerates innovation by analyzing huge datasets to identify patterns, predict outcomes, and generate new insights. In pharmaceuticals, AI helps drug discovery by predicting the efficacy of compounds and optimizing clinical trials. In engineering, AI models can be used to optimize product designs, which helps to lower the time and cost associated with bringing new products to market.

AI is increasingly used for predictive maintenance, with use cases like analyzing data from machinery to predict failures before they occur. This proactive approach helps schedule maintenance activities at optimal times. The benefits include lower downtime and extended equipment lifespans. Industries such as manufacturing, energy, and transportation are the biggest beneficiaries of predictive maintenance.

AIs role in cybersecurity and fraud detection includes analyzing network traffic and identifying potential threats in real time. AI algorithms detect anomalies and patterns associated with cyber attacks, which leads to faster and more accurate responses. AI-driven systems can automate responses to a variety of threats and reduce the risk of breaches and enhance overall security.

AI-enabled forecasting models help financial leaders predict future trends. AI systems incorporate variables like mixed economic forecasts and non-traditional data sources. It then allows for more reliable and comprehensive financial scenario planning and more specific revenue projections.

The future of AI involves handling ever more complex and multifaceted real-world scenarios. Innovations will likely focus on enhancing the adaptability of rule-based systems, making them more flexible and capable of dealing with unforeseen situations. Expect to see enhanced flexibility and the rise of multimodal systems capable of processing many data types simultaneously. This will allow AI to tackle more complex enterprise challenges across multiple domains and significantly broaden its impact.

Self-improving AI systems are also emerging. They leverage reinforcement learning and dynamic analysis for autonomous optimization of performance over time. This will further enhance adaptability and efficiency without constant human intervention.

The integration of traditional AI with generative AI is expected to create hybrid systems that deliver an exponentially more powerful combination. Innovations in AI hardware and infrastructure, including specialized AI processors, will support these advanced systems. This will allow traditional AI to provide more sophisticated solutions across an expanding array of use cases.

Generative AI and traditional AI face largely similar challenges in terms of ethics, including biases built into systems, job displacement and potential environmental impact.

AI systems can inadvertently magnify biases that were built into their training data. These biases can lead to unfair outcomes, particularly for marginalized groups. To ensure fairness in AI, whether generative or traditional AI, there needs to be meticulous scrutiny of the training data, implementation of bias mitigation strategies, and continual monitoring of AI systems for biased behavior. Techniques like algorithmic fairness reviews and bias audits are a step toward promoting equity and inclusivity in AI applications.

The security and privacy concerns raised by the deployment of AI technologies are pervasive. AI systems often need vast amounts of data, including personal and sensitive information, to function effectively. Whether generative or traditional, ensuring robust data protection measures and maintaining privacy throughout the AI lifecycle are critical. This includes implementing strong encryption, data anonymization techniques, and complying with regulations such as GDPR. Transparency about data usage and incorporating user consent is also essential in building trust and safeguarding privacy.

For the ever-increasing reach and use cases of AI, we need to be able to trust AI and hold the technology accountableand many users do not trust AI systems. This trust is enabled by transparency in AI systems. Explainable AI (XAI) practices allow users and stakeholders to understand how AI algorithms make decisions. By providing clear and understandable explanations of AI processes, organizations can enhance user trust and facilitate better decision-making. A transparent system makes it easy to identify and address any ethical issues and to ensure AI systems are used responsibly.

One of the greatest concerns about the rise of AI has been job displacement as automated systems replace human roles. Alleviating this issue calls for strategies for transitioning workforces to new or evolved roles, such as reskilling and upskilling programs to prepare employees for roles created by AI advancements. Organizations need to consider the broader social implications of deploying AI solutions and work to implement practices that strike a balance between technological progress and socioeconomic stability.

The deployment and training of large AI models, especially generative AI, requires significant computational resources, which leads to substantial energy consumption and environmental impact. Organizations using AI need to develop and implement energy-efficient AI models. They also need to optimize computational resources to minimize carbon footprints. Encouraging sustainable practices in AI development and operation is a must for reducing the environmental impact and promoting green AI technologies.

Artificial intelligence in all its forms is advancing at a remarkable rate, so its advantageous for tech professionals to be knowledgeable about AI skills and developments. Here are relevant courses to help you use these technologies effectively. Please be aware that while each title below refers to generative AI, these courses all teach fundamental concepts that also cover overall AI technology.

This course provides a solid foundation in generative AI, covering fundamental concepts, model types, and practical applications. Its suitable for those who are new to the field and want to explore the potential of generative AI using Google Cloud tools like Vertex AI.

Andrew Ngs course offers a comprehensive introduction to generative AI. It cuts across the workings, uses, and impact of generative AI in various industries. The course also includes hands-on exercises for applying the concepts you learn practically.

Based on a partnership between AWS and DeepLearning.AI, this intermediate-level course goes into using large language models (LLMs) like GPT-4 for generative AI. It covers the architecture, training processes, practical applications of LLMs, and more. The course is designed for data scientists, AI developers, and anyone interested in mastering LLMs and applying them effectively in their work.

No, conversational AI and generative AI are related but distinct subsets of artificial intelligence. Conversational AI is designed to interact with users through dialogue, often used in chatbots and virtual assistants like Siri, Alexa, or Google Assistant. It focuses on understanding and generating human-like responses to deliver meaningful interactions. Generative AI, on the other hand, refers to AI systems that create new content based on learned patterns from existing data. While conversational AI can use generative AI techniques to give responses, generative AI covers a broader range of creative applications beyond just conversation.

Predictive AI focuses on analyzing existing data to forecast future events or trends. It uses techniques like regression analysis, time series analysis, and machine learning models to predict outcomes such as stock prices, weather conditions, or customer behaviors. Generative AI, however, aims to create new data rather than predict future events. It uses models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) to generate new content that is similar to the training data.

Generative AI has rapidly gained popularity due to several key factors. The development of sophisticated models like GPT-4, GANs, and VAEs has significantly improved the quality and realism of generated content. Increased access to high-performance computing resources such as GPUs and cloud computing has enabled the training of complex generative models. The vast amount of data available for training these models has allowed them to learn from diverse and extensive datasets, enhancing their capabilities. Plus, the wide range of applicationsfrom creative industries like art and music to practical uses such as text generation and synthetic data creationhas driven interest and investment in generative AI.

Generative AI and traditional AI each bring unique strengths and challenges to the table. Generative AI is geared for creativity, generating new and innovative content, and is seeing more integration into fields like art, music, and content creation. In contrast, traditional AI focuses on analyzing existing data to improve efficiency, accuracy, and decision-making, making it invaluable in sectors that value consistency and predictability such as finance, healthcare, and manufacturing.

As both these technologies continue to evolve rapidly, the differences between them will likely lessen, with generative AIs creativity and AIs data crunching strength found side by side in many advanced applications.

Read our guide to the Top 20 Generative AI Tools and Apps 2024 to learn more about what platforms organizations are using to deploy these dynamic technologies across their businesses.

Follow this link:
Generative AI vs. AI: Advantages, Limitations, Ethical Considerations - eWeek

Microsoft is a Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms – Microsoft

Microsoft is a Leader in this years Gartner Magic Quadrant for Data Science and Machine Learning Platforms. Azure AI provides a powerful, flexible end-to-end platform for accelerating data science and machine learning innovation.

Microsoft is a Leader in this years Gartner Magic Quadrant for Data Science and Machine Learning Platforms. Azure AI provides a powerful, flexible end-to-end platform for accelerating data science and machine learning innovation while providing the enterprise governance that every organization needs in the era of AI.

In May 2024, Microsoft was also named a Leader for the fifth year in a row in the Gartner Magic Quadrant for Cloud AI Developer Services, where we placed furthest for our Completeness of Vision. Were pleased by these recognitions from Gartner as we continue helping customers, from large enterprises to agile startups, bring their AI and machine learning models and applications into production securely and at scale.

Azure AI is at the forefront of purpose-built AI infrastructure, responsible AI tooling, and helping cross-functional teams collaborate effectively using Machine Learning Operations (MLOps) for generative AI and traditional machine learning projects. Azure Machine Learning provides access to a broad selection of foundation models in the Azure AI model catalogincluding the recent releases of Phi-3, JAIS, and GPT-4oand tools to fine-tune or build your own machine learning models. Additionally, the platform supports a rich library of open-source frameworks, tools, and algorithms so that data science and machine learning teams can innovate in their own way, all on a trusted foundation.

Microsoft is named a Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

Were now able to get a functioning model with relevant insights up and running in just a couple of weeks thanks to Azure Machine Learning. Weve even managed to produce verified models in just four to six weeks.

Azure Machine Learning helps organizations build, deploy, and manage high-quality AI solutions quickly and efficiently, whether building large models from scratch, running inference on pre-trained models, consuming models as a service, or fine-tuning models for specific domains. Azure Machine Learning runs on the same powerful AI infrastructure that powers some of the worlds most popular AI services, such as ChatGPT, Bing, and Azure OpenAI Service. Additionally, Azure Machine Learnings compatibility with ONNX Runtime and DeepSpeed can help customers further optimize training and inference time for performance, scalability, and power efficiency.

Whether your organization is training a deep learning model from scratch using open source frameworks or bringing an existing model into the cloud, Azure Machine Learning enables data science teams to scale out training jobs using elastic cloud compute resources and seamlessly transition from training to deployment. With managed online endpoints, customers can deploy models across powerful CPU and graphics processing unit (GPU) machines without needing to manage the underlying infrastructuresaving time and effort. Similarly, customers do not need to provision or manage infrastructure when deploying foundation models as a service from the Azure AI model catalog. This means customers can easily deploy and manage thousands of models across production environmentsfrom on-premises to the edgefor batch and real-time predictions.

Prompt flow helped streamline our development and testing cycles, which established the groundedness we required for making sure the customer and the solution were interacting in a realistic way.

Machine learning operations (MLOps) and large language model operations (LLMOps) sit at the intersection of people, processes, and platforms. As data science projects scale and applications become more complex, effective automation and collaboration tools become essential for achieving high-quality, repeatable outcomes.

Azure Machine Learning is a flexible MLOps platform, built to support data science teams of any size. The platform makes it easy for teams to share and govern machine learning assets, build repeatable pipelines using built-in interoperability with Azure DevOps and GitHub Actions, and continuously monitor model performance in production. Data connectors with Microsoft sources such as Microsoft Fabric and external sources such as Snowflake and Amazon S3, further simplify MLOps. Interoperability with MLflow also makes it seamless for data scientists to scale existing workloads from local execution to the cloud and edge, while storing all MLflow experiments, run metrics, parameters, and model artifacts in a centralized workspace.

Azure Machine Learning prompt flow helps streamline the entire development cycle for generative AI applications with its LLMOps capabilities, orchestrating executable flows comprised of models, prompts, APIs, Python code, and tools for vector database lookup and content filtering. Azure AI prompt flow can be used together with popular open-source frameworks like LangChain and Semantic Kernel, enabling developers to bring experimental flows into prompt flow to scale those experiments and run comprehensive evaluations. Developers can debug, share, and iterate on applications collaboratively, integrating built-in testing, tracing, and evaluation tools into their CI/CD system to continually reassess the quality and safety of their application. Then, developers can deploy applications when ready with one click and monitor flows for key metrics such as latency, token usage, and generation quality in production. The result is end-to-end observability and continuous improvement.

The responsible AI dashboard provides valuable insights into the performance and behavior of computer vision models, providing a better level of understanding into why some models perform differently than others, and insights into how various underlying algorithms or parameters influence performance. The benefit is better-performing models, enabled and optimized with less time and effort.

AI principles such as fairness, safety, and transparency are not self-executing. Thats why Azure Machine Learning provides data scientists and developers with practical tools to operationalize responsible AI right in their flow of work, whether they need to assess and debug a traditional machine learning model for bias, protect a foundation model from prompt injection attacks, or monitor model accuracy, quality, and safety in production.

The Responsible AI dashboard helps data scientists assess and debug traditional machine learning models for fairness, accuracy, and explainability throughout the machine learning lifecycle. Users can also generate a Responsible AI scorecard to document and share model performance details with business stakeholders, for more informed decision-making. Similarly, developers in Azure Machine Learning can review model cards and benchmarks and perform their own evaluations to select the best foundation model for their use case from the Azure AI model catalog. Then they can apply a defense-in-depth approach to mitigating AI risks using built-in capabilities for content filtering, grounding on fresh data, and prompt engineering with safety system messages. Evaluation tools in prompt flow enable developers to iteratively measure, improve, and document the impact of their mitigations at scale, using built-in metrics and custom metrics. That way, data science teams can deploy solutions with confidence while providing transparency for business stakeholders.

Read more on Responsible AI with Azure.

We needed to choose a platform that provided best-in-class security and compliance due to the sensitive data we require and one that also offered best-in-class services as we didnt want to be an infrastructure hosting company. We chose Azure because of its scalability, security, and the immense support it offers in terms of infrastructure management.

In todays data-driven world, effective data security, governance, and privacy require every organization to have a comprehensive understanding of their data and AI and machine learning systems. AI governance also requires effective collaboration between diverse stakeholders, such as IT administrators, AI and machine learning engineers, data scientists, and risk and compliance roles. In addition to enabling enterprise observability through MLOps and LLMOps, Azure Machine Learning helps organizations ensure that data and models are protected and compliant with the highest standards of security and privacy.

With Azure Machine Learning, IT administrators can restrict access to resources and operations by user account or groups, control incoming and outgoing network communications, encrypt data both in transit and at rest, scan for vulnerabilities, and centrally manage and audit configuration policies through Azure Policy. Data governance teams can also connect Azure Machine Learning to Microsoft Purview, so that metadata on AI assetsincluding models, datasets, and jobsis automatically published to the Microsoft Purview Data Map. This enables data scientists and data engineers to observe how components are shared and reused and examine the lineage and transformations of training data to understand the impact of any issues in dependencies. Likewise, risk and compliance professionals can track what data is used to train models, how base models are fine-tuned or extended, and where models are employed across different production applications, and use this as evidence in compliance reports and audits.

Lastly, with the Azure Machine Learning Kubernetes extension enabled by Azure Arc, organizations can run machine learning workloads on any Kubernetes clusters, ensuring data residency, security, and privacy compliance across hybrid public clouds and on-premises environments. This allows organizations to process data where it resides, meeting stringent regulatory requirements while maintaining flexibility and control over their MLOps. Customers using federated learning techniques along with Azure Machine Learning and Azure confidential computing can also train powerful models on disparate data sources, all without copying or moving data from secure locations.

Machine learning continues to transform the way businesses operate and compete in the digital erawhether you want to optimize your business operations, enhance customer experiences, or innovate.Azure Machine Learning provides a powerful, flexible machine learning and data science platform to operationalize AI innovation responsibly.

*Gartner, Magic Quadrant for Data Science and Machine Learning Platforms, By Afraz Jaffri, Aura Popa, Peter Krensky, Jim Hare, Raghvender Bhati, Maryam Hassanlou, Tong Zhang, 17 June 2024.

Gartner, Magic Quadrant for Cloud AI Developer Services, Jim Scheibmeir, Arun Batchu, Mike Fang, Published 29 April 2024.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates and is used herein with permission. All rights reserved.

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartners Research & Advisory organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from this link.

Here is the original post:
Microsoft is a Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms - Microsoft

The 10 Hottest Data Science And Machine Learning Tools Of 2024 (So Far) – CRN

Heres a look at 10 data science and machine learning tools that solution and service providers should be aware of.

Deep Thoughts

Data science and machine learning technologies have long been important for data analytics tasks and predictive analytical software. But with the wave of artificial intelligence and generative AI development in 2023, the importance of data science and machine learning tools has risen to new heights.

One absolute truth about AI systems is that they need huge amounts of data to be effective.

Data science combines math and statistics, advanced analytics, specialized programming and other skills and tools to help uncover actionable insights within an organizations data. The global data science tool market reached $8.73 billion last year and will nearly double to $16.85 billion by 2030, according to 24MarketReports.

Machine learning systems make business-outcome decisions and predictions based on algorithms and statistical models that analyze and draw inferences from huge amounts of data. The worldwide machine learning market is expected to reach $79.29 billion this year, according to Statista, and grow at a 36 percent CAGR to $503.40 billion by 2030

Heres a look at some of the hottest data science and machine learning tools in use today. Some of the following tools are relatively new to the market while others have been around for a while and recently updated. The list also includes both commercial products and open-source software.

Amazon SageMaker

Amazon SageMaker is one of Amazon Web Services (AWS) flagship AI and machine learning software tools and is one of the most prominent machine learning products in the industry.

In November, at the AWS re:Invent extravaganza, AWS expanded SageMakers functionalities with five new capabilities that the company said helps accelerate the building, training and deployment of large language models and other foundation machine learning models that power generative AI.

One new capability enhances SageMakers ability to scale models by accelerating model training time while another optimizes managed ML infrastructure operations by reducing deployment costs and model latency.

The new SageMaker Clarify tool makes it easier to select the right model based on quality parameters that support responsible use of AI. A new no-code feature in SageMaker Canvas makes it possible to prepare data using natural language instructions. And Canvas continues to democratize model building and customization, AWS said, by making it easier to use models to extract insights, make predictions and generate content using an organizations proprietary data.

AWS also offers Amazon Machine Learning, a more highly automated tool for building machine learning models.

Anaconda Distribution for Python

Python has become the most popular programming language overall, but it has long been used by data scientists for development in data analytics, AI and machine learning. Anacondas distribution of the open-source Python system is one of the most widely used data science and AI platforms.

In addition to its distribution of Python, Anaconda offers its Data Science and AI Workbench platform that data science and machine learning teams use for expediting model development and deployment while adhering to security and governance requirements.

Over the last year Anaconda has established alliances with major IT vendors to expand the use of its platform. In April Anaconda announced a partnership to integrate its Anaconda Python Repository with Teradatas VantageCloud and ClearScape Analytics. A collaboration with IBM announced in February provides watsonx.ai users with access to the Anaconda software repository. And in August 2023 the company unveiled the Anaconda Distribution for Python in Microsoft Excel.

ClearML

ClearMLs platform, designed for data scientists and data engineers, automates and simplifies the development and management of machine learning solutions. The system provides a comprehensive lineup of capabilities spanning data science, data management, MLOps, and model orchestration and deployment.

In March startup ClearML added new orchestration capabilities to its platform to expand control over AI infrastructure management and compute costs while maximizing the use of compute resources and improving model serving visibility.

Also in March introduced an open-source fractional GPU tool to help business utilize their GPU utilization by enabling multi-tenancy for all Nvidia GPUs.

Databricks Mosaic AI

At Databricks recent Data + AI Summit the company unveiled a number of new capabilities for its Mosaic AI software for building and deploying production-quality ML and GenAI applications.

Databricks acquired MosaicML in June 2023 in a blockbuster $1.3-billion deal and has been integrating the startups technology with its data lakehouse platform. (Databricks has since rebranded the product as Mosaic AI.)

The latest capabilities in Mosaic AI include support for building compound AI systems, new functionality to improve model quality, and AI governance tools. Databricks said the innovations give users the confidence to build and measure production-quality applications, delivering on the promises of generative AI for their business.

Dataiku

The Dataiku platform offers a comprehensive lineup of data science, machine learning and AI capabilities including machine learning development, MLOps, data preparation, DataOps, visualization, analytical applications and generative AI.

In September 2023, Dataiku launched LLM Mesh, a new tool for integrating large language models within the enterprise that the company called the common backbone for Gen AI applications. LLM Mesh capabilities include universal AI service routing, secure access and auditing for AI services, performance and cost tracking, and safety provisions for private data screening and response moderation.

In April, Dataiku debuted LLM Cost Guard, a new capability within LLM Mesh that creates standards for tracking and optimizing generative AI use cases.

dotData Feature Factory 1.1

dotDatas Feature Factory is an automated feature discovery and engineering platform that helps data scientists find and use data features within large-scale data sets for use in AI and machine learning projects.

In Feature Factory version 1.1, introduced in May, the company provided significant enhancements including new data quality assessment capabilities, support for user-defined features and interactive feature selection, improved support for AutoML through the Python-based PyCaret AutoML library, and preview support for generative AI feature discovery.

Hopsworks MLOps Platform

The Hopsworks platform is used to develop, deploy and monitor AI/ML models at scale.

The core of the serverless system is its machine learning feature store for storing data for ML models running on AWS, Azure and Google Cloud platforms and in on-premises systems. The Hopsworks platform also provides machine learning pipelines and a comprehensive development toolkit.

Hopsworks 3.7, what the company called the GenAI release, became generally available in March with new capabilities to support GenAI and large language model use cases. It also introduced feature monitoring, a new notification service to track changes to specific features, and support for the Delta Lake data storage format.

Founded in Sweden in 2016, Hopsworks has offices in Stockholm, London and Palo Alto, Calif.

Obviously AI

A problem faced by many businesses is the shortage of people with data science and machine learning expertise. Obviously AI looks to close that gap with its no-code AI/ML platform that allows people without technical backgrounds to build and train machine learning models.

The platform helps quickly build models that run predictions on historical data, everything from sales and revenue forecasting to predictions about energy consumption and population growth.

Because data science shouldnt feel like rocket science, the companys web site says.

PyTorch

PyTorch is a powerful open-source framework and deep learning library for data scientists who are building and training deep learning models.

PyTorch is popular for such applications as computer vision, natural language processing, image classification and text generation. It can be used for a variety of algorithms including convolutional neural networks, recurrent neural networks and generative adversarial networks, according to a LinkedIn posting by data scientist and analysis expert Vitor Mesquita.

PyTorch 2.3 was released on April 24.

PyTorch was created out of the Lua-based Torch framework that came out of Facebooks AI research lab in 2017. Today PyTorch is part of the Linux Foundation and is available through the pytorch.org website.

PyTorch and TensorFlow are generally seen as the top alternative even competing open-source data science and machine learning systems, according to a Projectpro.com comparison. PyTorch is often considered better for smaller-scale research projects while TensorFlow is more widely used for production-scale projects.

TensorFlow

TensorFlow is a popular open-source, end-to-end machine learning platform and library for building ML models that can run in any environment. The system handles data preprocessing, model building and model training tasks.

TensorFlow, generally seen as an alternative to PyTorch, was originally developed by the Google Brain team for internal research and production tasks, particularly around machine learning and deep leaning neural networks. It was originally released as open-source software under the Apache License 2.0 in November 2015.

Google continues to own and maintain TensorFlow, which is available through the tensorflow.org community website. A major update, TensorFlow 2.0, was released in September 2019.

TensorFlow and PyTorch are generally seen as the top alternative even competing open-source data science and machine learning systems, according to a Projectpro.com comparison. PyTorch is often considered better for smaller-scale research projects while TensorFlow is more widely used for production-scale projects.

Go here to read the rest:
The 10 Hottest Data Science And Machine Learning Tools Of 2024 (So Far) - CRN