Page 298«..1020..297298299300..310320..»

Starting this Summer, Students Can Minor in Applications of Artificial Intelligence and Machine Learning – Georgia Tech College of Engineering

The minor initially was suggested by leaders and external advisory board members in Georgia Techs biomedical engineering department to provide more AI and ML curriculum for their students.

AI is so rampant in so many disciplines, and biomedical engineering students need to have that background before and after graduation, said Jaydev Desai, associate chair for undergraduate studies in the Wallace H. Coulter Department of Biomedical Engineering at Georgia Tech and Emory University. Once we talked to the rest of the College about the minor, it quickly became clear that other schools had the same goals. This minor will truly be a transformative experience for our undergraduate students and make them competitive after graduation, be it in industry or pursuit of further education.

Desai quickly found a partner in IAC when he spoke with Shatakshee Dhongde, the Colleges associate dean for academic affairs. Together they built a program that teaches both the technical and policy aspects of AI.

This program is designed to address how AI and machine learning models are applied to solve some of the world's most pressing and complex problems," said Dhongde, who is alsoassociate professor in theSchool of Economics. Using AI/ML models with an understanding of the ethical issues around the technology is the real strength of this minor.

Students on both tracks are required to take three core courses including a philosophy course in AI ethics and policy and two electives.

Engineering courses are offered by six of the Colleges eight schools: biomedical, chemical, electrical and computer, industrial systems, materials, and mechanical engineering. Subjects range from robotics to biomedical AI to signal processing and more.

IAC courses cover topics that include machine learning for economics, language and computers, race and gender and digital media, and public policy.

Organizers developed several new courses to create the minor. Desai has already heard from other Georgia Tech colleges and schools about adding more classes, and hes excited to see many more undergraduate students at Georgia Tech benefit as the initiative expands.

We wanted to create something that will improve the educational experience of our undergraduate students and make them more competitive in the marketplace, Desai said. The current collection of courses also will make them stronger if they have a goal of starting their own businesses or creating devices. The minor really has a nice structure that welcomes other disciplines around the campus, and we look forward to them joining us in the future.

See the rest here:
Starting this Summer, Students Can Minor in Applications of Artificial Intelligence and Machine Learning - Georgia Tech College of Engineering

Read More..

AI Jobs: Your Gateway to Careers in AI and Machine Learning – KillerStartups

AI Jobs emerges as a pioneering force in the realm of employment within the rapidly evolving fields of Artificial Intelligence (AI), Machine Learning (ML), and Data Science. Founded by Adam Krzywda, a seasoned entrepreneur, in 2023, AI Jobs has quickly established itself as a critical resource for professionals seeking to navigate the burgeoning job market created by AI advancements.

Company Overview

Progress and Current Status AI Jobs is already live and making significant strides in the industry by rapidly expanding its job listings, enhancing content, and integrating new features. The platform is dedicated to connecting job seekers with a wealth of opportunities in AI-related fields, including positions for AI developers, prompt engineers, AI testers, and internships.

Inspiring Story The journey of AI Jobs is a testament to the tenacity required to thrive in the competitive job board space. Faced with the challenge of establishing a foothold amidst established giants, the AI Jobs team has engaged in relentless marketing and outreach efforts. This commitment to growth, despite the odds, underscores the startups dedication to serving as a comprehensive resource for individuals passionate about AI and ML.

Future Outlook In the next four years, AI Jobs aspires to become the quintessential online hub for AI, Data Science, and ML career opportunities. Alongside its extensive job listings, the platform aims to broaden its content offerings, including blogs, videos, and podcasts, thereby establishing itself as a major resource for AI-related content. Through continuous content production and strategic partnerships, AI Jobs is poised to achieve its goal of being a leading authority in the AI job market.

AI Jobs stands at the forefront of addressing the workforce needs generated by AI advancements, serving as a vital bridge between innovative companies and talented individuals eager to shape the future of technology.

Excerpt from:
AI Jobs: Your Gateway to Careers in AI and Machine Learning - KillerStartups

Read More..

AI in Finance: Machine Learning Models for Stock Price Predication and Auto-Trading – Clark University

Please join the Data Science Seminar this Wednesday for a talk by Dr. Timothy Li. Tim currently serves as principal data scientist and vice president at the Central Modeling Team of Citizens Bank. He will present two of his earlier AI projects in finance, using innovative machine learning models for stock price predication and auto-trading, outperforming conventional approaches.

Li will present two of his earlier AI finance projects. The first used a deep-learning long short-term memory model to forecast the alpha returns of approximately 1000 stocks on both the Chinese A-stock market and the U.S. stock market. Based on predicted returns, the research team devised a new portfolio strategy that outperformed a traditional model in both markets significantly.

In the second project, Li and his fellow researchers developed an optimal portfolio execution system (OPEX), using a tree-based ensemble machine-learning model for automated stock trading to reduce the trading cost. The study demonstrated that the OPEX system could effectively reduce trading costs, with an estimated savings of approximately $35 million per year compared to a legacy linear model.

Dr. Timothy (Minghai) Li currently serves as a principal data scientist and vice president at the central modeling team of Citizens Bank. Prior to this role, he held positions as a senior data scientist at Fidelity Investments, Netbrain Tech Inc., and FIS. He earned his Ph.D. in physics from Boston University and conducted post-doctoral research in Professor Sharon Huos group at Clark. He has authored or co-authored more than 20 publications covering topics in physics, material science, chemistry, biology, and computer modeling and simulation. Presently, his focus lies in the application of advanced machine learning algorithms and generative AI to finance.

Read more here:
AI in Finance: Machine Learning Models for Stock Price Predication and Auto-Trading - Clark University

Read More..

Functional and structural reorganization in brain tumors: a machine learning approach using desynchronized functional … – Nature.com

Acquisition and usage of MRIs

A detailed explanation of the participants as well as the acquisition of the data is already available63; nonetheless, for the sake of transparency, we briefly present some crucial aspects. Subjects were asked to undergo MR scans both in pre- and post-surgery sessions. Out of the 36 subjects that agreed to take part in the pre-surgery session (11 healthy [58.610.6 years], 14 meningioma [60.412.3 years] and 11 glioma [47.511.3 years]), 28 were scanned after surgery (10 healthy [59.610.3 years], 12 meningioma [57.911.0 years] and 7 glioma [50.711.7 years]). The post-surgery scan session took place during the first medical consultation at the hospital after the surgical intervention (mean: 7.9 months, range: 5.210.7 months). There were no differences in the time intervals between the groups (meningioma [24312 days], glioma [22315 days], p=0.328, two-tailed U-test). As a result, 19 pre- and post-surgery pairs of structural connectomes were usable as training and testing data. All brain tumors were classified as grade I, II, and III according to the World Health Organization. All ethical regulations relevant to human research participants were followed63.

Each MR session consisted of a T1-MPRAGE anatomical scan (160 slices, TR=1750 ms, TE=4.18ms, field of view=256mm, flip angle=9o, voxel size 111mm3, acquisition time of 4:05min) followed by a multi-shell HARDI acquisition (60 slices, TR=8700ms, TE=110ms, field of view=240mm, voxel size 2.52.52.5mm3, acquisition time of 15:14min, 101102 directions b=0, 700, 1200, 2800s/mm2) together with two reversed phase-encoding b=0s/mm2 blips to correct susceptibility-induced distortions64. Resting-state functional echo-planar imaging data were obtained (42 slices, TR=2100ms, TE=27ms, field of view=192mm, flip angle=90o, voxel size 333mm3, acquisition time of 6:24min). The TR was accidentally changed to 2400ms after 4 control subjects, 5 meningioma patients and 2 glioma patients were scanned changing the times of acquisition to 7:19min. For all the subsequent Fourier analyses, this TR mismatch is solved by adding zero padding and truncating the shorter time series to ensure that equivalent spectrums were sampled by the Python methods (for further details see Supplementary Material).

Additionally, segmented lesions including the edema, non-enhancing, enhancing, and necrotic areas were available. Tumor masks were obtained with a combination of manual delineation, disconnectome63, and the Unified Segmentation with Lesion toolbox4. To identify the tumor core of gliomas, two clinicians with more than thirty and ten years of experience performed and independently validated the segmentations using 3D slicer. Data only allowed for the identification of the tumor cores; hence we subtracted the resulting cores from the whole lesion to obtain a non-necrotic region for each of the patients diagnosed with a glioma-like tumor.

High-resolution anatomical T1 weighted images were skull-stripped65, corrected for bias field inhomogeneities66, registered to MNI space67, and segmented into 5 tissue-type images68. Diffusion-weighted images suffer from many artifacts all of which were appropriately corrected. Images were also skull-stripped65, corrected for susceptibility-induced distortions64, denoised69, freed from Gibbs ringing artifacts70 and corrected for eddy-currents and motion artifacts71. The preprocessed images were then co-registered to its corresponding anatomical template (already in MNI space)67, resampled to a 1.5mm3 voxel size and eventually corrected for bias field inhomogeneities66. After motion correction as well as registration to the MNI template, the B-matrix was appropriately rotated72.

Functional data was preprocessed with fMRIprep73 and the eXtensible Connectivity Pipeline (XCP-D)74 which are two BIDS-compatible apps that perform all recommended processing steps to correct for distortion artifacts in functional data. Regression of global signal has been shown to improve denoising in BOLD series without excessive loss of community structure75. In total, 36 nuisance regressors were selected from the nuisance confound matrices of fMRIPrep output which included six motion parameters, global signal, the mean white matter, the mean CSF signal with their temporal derivatives, and the quadratic expansion of six motion parameters, tissues signals and their temporal derivatives76. Volumes with framewise displacement higher than 0.3mm were regressed out. Although smoothed time series were available, our analysis did not consider them. All specific steps were common to all subjects, both control and brain tumor patients. All images (T1s, T1 segmentations, diffusion, lesion masks and functional) were eventually co-registered to MNI space for comparison.

BOLD signals belonging to the DMN were identified with the Gordon functional Parcellation77. More precisely, each one of the 41 regions classified as Default by the parcellation image was used as a binary mask to extract the time series from the functional image. For each subject (patient and control), the pair-wise Pearson correlation coefficient between time series was computed to obtain a functional connectivity matrix. The spatial overlap between DMNs and tumor masks was computed by summing all the voxels in the lesion mask belonging to one of these 41 regions. To normalize this score, we divided the resulting number by the number of voxels belonging to each one of the 41 regions labeled as Default. Note that, with this definition, an overlap of 1 would mean the presence of a tumor the size of the entire DMN.

$${{{{{rm{Overlap}}}}}}=frac{{{{{{rm{|}}}}}}{Tumor}cap {DMN}{{{{{rm{|}}}}}}}{left|{DMN}right|}$$

(1)

Moreover, the spatial distance between the center of mass tumor and the DMN was computed by averaging the Euclidean distances to the center of mass of each one of the DMN nodes.

The DMN of the patients was compared to the mean of the healthy networks with two different metrics to assess (1) differences node-wise and (2) the Richness of the networks. Node similarity was assessed by computing the mean Pearson correlation between the same nodes in two different networks. For that, each row in the adjacency matrices was treated as a vector and compared with the same row of all matrices from the healthy subjects. After iterating through all nodes in the DMN, the mean and standard errors were computed for comparison. Furthermore, to assess the complexity of a given network, we computed the absolute difference between the distribution of correlations building the network and a uniform distribution30. We refer to this score as Richness:

$$Theta =1-frac{m}{2(m-1)}mathop{sum }limits_{mu =1}^{m}left|{P}_{mu }left({r}_{{ij}}right)-frac{1}{m}right|$$

(2)

where (m=15) is the number of bins of the histogram estimating the distribution of correlations in the network ({P}_{mu }({r}_{{ij}})). Zamora-Lpez and ccolleagues showed the robustness of the quantity in Eq. (2) with regard to the value of the parameter (m). However, sensible choices range from 10 to 20 to ensure a sufficiently rich approximation of ({P}_{mu }({r}_{{ij}})). The changes in richness across patients were obtained by computing the difference relative to the richness of the mean DMN obtained from control subjects: (Delta Theta ={Theta }_{{Patient}}-{Theta }_{{Healthy}}).

A similar procedure was followed to study BOLD signals inside the lesioned tissue. For each patient, the binary mask containing the edema was used to extract the time series from the patient, as well as from all control subjects. Consequently, BOLD signals in lesioned regions of the brain were comparable to 11 healthy signals from the same region. No network was computable in this case, making the use of Eq. (2) pointless.

To compare time series between subjects, we computed the Real Fast Fourier Transform of the BOLD series. This allowed us to compare the power spectrum of two or more signals regardless of, for example, the dephasing between them. Let ({A}_{omega }) be the amplitude of the component with frequency . Then, the total power of the signal can easily be obtained by summing the squared amplitudes of all the components:

$${P}_{T}=mathop{sum }limits_{forall omega }{left|{A}_{omega }right|}^{2}$$

(3)

With the Fourier decomposition, we could also characterize the power distribution of the signals as a function of the frequency. Analogous to Eq. (3), we summed the squared amplitudes corresponding to frequencies inside a bin of amplitude (Delta omega).

$${P}_{{omega }_{c}}=frac{100}{{P}_{T}}cdot mathop{sum }limits_{forall omega in [{omega }_{c}-Delta omega ,{omega }_{c}]}{left|{A}_{omega }right|}^{2}$$

(4)

Since each signal had a different ({P}_{T}), to compare between subjects and/or regions, we divided the result by the total power ({P}_{T}) and multiplied by 100 to make it a percentage. Arbitrarily, we chose the parameter (Delta omega) for each subject so that each bin included 10% of the total power. The qualitative results did not depend on the exact choice of the bin width.

Similarly, we computed the cumulative power distribution (C{P}_{omega }) by summing all the squared amplitude coefficients up to a certain threshold. For consistency, we measured the (C{P}_{omega }) as a percentage score and chose the thresholds to be multiples of exact percentages i.e., ({omega }^{{prime} }propto 10 % ,20 % ,ldots)).

$${{CP}}_{{omega }_{c}}=frac{100}{{P}_{T}}cdot mathop{sum }limits_{forall omega in [0,{omega }_{c}]}{left|{A}_{omega }right|}^{2}$$

(5)

Both the power distribution ({P}_{omega }) and cumulative power distribution (C{P}_{omega }) can be used to compare dynamics between time series, but they have the inconvenience of not being scalar numbers. Furthermore, computing any distance-like metric (i.e., KL divergence) between these distributions across subjects would not yield any information of whether BOLD signals had slower dynamics (more power located in low frequencies) or the opposite (i.e., DMN in healthy and patient).

To overcome this, we designed a DAS between time series based on the difference between two cumulative power distributions. It is worth noting that in the limit (Delta omega to 0), the summations in Eqs. (2), (3), and (4) become integrals simplifying the following mathematical expressions. The DAS between two BOLD signals (i,j) was computed as the area between the two cumulative power distributions:

$${DAS}left(i,jright)=int domega left(C{P}_{omega }^{i}-C{P}_{omega }^{j}right)=-{DAS}(, j,i)$$

(6)

Finding a positive ({DAS}(i,j)) would mean that time series i had slower dynamics than time series j since more power is accumulated in lower frequencies with respect to the total. Throughout this manuscript, DASs were defined as the difference in power distribution between patients and the healthy cohort. For a simplified and, hopefully comprehensive, example, we kindly refer the reader to Fig. S1. To characterize a specific DMN, all these measures were computed for each region separately and then averaged [meanSEM]. As opposed to the Richness, the DAS was computable both for DMNs and tumors since it only required two temporal series rather than a complete distribution. To compute absolute values of this score, the DAS for each region (or tumor) was made strictly positive. Only then average between regions and subjects was performed. Notably, these two operations are not interchangeable.

For the score defined in Eq. (6) to make sense, the Real Fast Fourier Transform of the time series needed to be computed using the same frequency intervals, which, in short, implied that the time duration of the signals needed to be equal. For functional images with different TRs, this was solved by adding zero-padding to the shortest signal to match the same time duration (Fig. S14). Further permutation analyses on a reduced subset of subjects with identical TRs confirmed the tendencies reported in the text (Fig. S15).

To ensure a detailed subject-specific network, we used a state-of-the-art pipeline to obtain brain graphs while at the same time not neglecting tracts inside lesioned regions of the brain (i.e., brain tumors). We combined two reconstruction methods, yielding two different tractograms and three connectivity matrices. Roughly, the first tractogram aims at reconstructing white matter fibers using non-contaminated diffusion signal, while the second one carefully assesses the presence of meaningful diffusion signal in perilesional and lesioned tissue. Later, for each tractogram, a personalized connectivity matrix can be obtained and combined to yield a unique abstraction of the brain in surgical contexts. A schematic workflow of the pipeline is in Fig.3a, and a detailed account of the parameters is in Table2.

The first branch of the method consisted of a well-validated set of steps to reconstruct the network without considering lesioned regions of the brain. To ensure this was the case, we used a binary brain mask that did not include the segmented lesion (i.e., we subtracted the lesion from the brain binary mask). This step was added for consistency with the logic of not tracking within the lesion. Nonetheless, the steps were repeated without this mask and the results were found to be almost identical (Fig. S6). This was expected as multi-shell methods highly disregard cerebrospinal fluid contamination inside the lesion15. The lesion mask was added to the 5 tissue-type image to be considered as pathological tissue78. Within this mask, for each b-value shell and tissue type (white matter, gray matter, and cerebrospinal fluid) a response function was estimated79; and the fiber orientation distribution functions (FODs) were built and intensity normalized using a multi-shell multi-tissue (MSMT) constrained spherical deconvolution approach80. Within the same binary mask excluding potentially damaged tissue, anatomically constrained whole-brain probabilistic tractography was performed using dynamic seeding, backtracking, and the iFOD2 algorithm68,81. The total number of streamlines was set to 8 million minus the number of streamlines intersecting the lesion (see below). We used spherical-deconvolution informed filtering to assign a weight to each generated streamline and assess their contribution to the initial diffusion image82. Finally, a healthy structural connectivity matrix was constructed by counting the number of weighted streamlines between each pair of regions of interest as delineated by the third version of the Automated Anatomical Label atlas83.

Next, to consider fiber bundles that originate and traverse lesioned tissue, a recent method for reconstruction was used only in the segmented lesion17. The coined Single-Shell-3-Tissue Constrained Spherical Deconvolution (SS3T) algorithm uses only one diffusion shell and the unweighted b=0 volumes. We used the shell with the highest gradient strength (i.e., b=2800s/mm2) as it offered the best white-matter contrast15,80. These FODs were reconstructed, and intensity normalized only inside the lesion mask using the same underlying response function as estimated earlier in the healthy tissue. We merged the reconstructed FODs with the previously obtained with the multi-shell algorithm (Fig.3a CENTER). It is important to note that both images were in NIFTI format, co-registered, and non-overlapping, therefore making this step straightforward. Anatomical constraints were no longer suited since white- and gray-matter are compromised inside the lesion and in the perilesional tissue. Even more, regardless of the FOD reconstruction procedure, the anatomical constraints caused fibers to stop around the edema since those surrounding voxels were (nearly-)always segmented as gray matter (see Fig. S6). We used dynamic seeding only within the masked lesion and whole-brain probabilistic tractography with backtracking to estimate white-matter fibers within the whole-brain mask68,81. The number of streamlines was set as the average number of streamlines intersecting the lesion in the healthy cohort. We superimposed the lesion on the tractograms of each control subject and tallied the overlapping streamlines78. This was important given that each lesion was in a different location and the natural density of streamlines in that specific location differed. This subject-specific streamline count controlled that the tract densities were comparable to the healthy cases (Fig.3be; see also Figs. S10S13). Spherical-deconvolution informed filtering82 was applied to ensure that each streamline adequately contributed to the lesioned diffusion signal (i.e., filtering was applied inside the lesion mask). Then, a lesion structural connectivity matrix was constructed similarly to the previous case.

$${N}_{{streamlines}, {in}, {lesion}}=frac{1}{{N}_{{control}}}mathop{sum }limits_{i=1}^{{N}_{{control}}}mathop{sum }limits_{{streamline}=1}^{{streamlines}}1{{{{{rm{cdot }}}}}}left{begin{array}{c}1;{{{{{rm{if}}}}}},{streamline}in {{{{{rm{Lesion}}}}}}hfill\ 0;{{{{{rm{otherwise}}}}}}hfillend{array}right.$$

(7)

Finally, we merged the two available connectivity matrices to reconstruct a lesioned structural brain network. To do so, we employed a greedy approach where we took the maximum connectivity strength for each pair of regions:

$${omega }_{{ij}}=max left({omega }_{{ij}}^{{{{{{rm{healthy}}}}}}},{omega }_{{ij}}^{{{{{{rm{lesion}}}}}}}right)$$

(8)

Thus, for each pre-operative scan, a total of 3 different connectivity matrices were available: the healthy connections, the (potentially) lesioned connections, and the full lesioned structural network. The networks from the control subjects and post-operative scans from patients were reconstructed using a usual multi-shell multi-tissue pipeline without the binary lesion-free mask but with the same parameters (see Table2). Note that the 3rd version of the Automated Anatomically Labeled Atlas has 4 empty regions out of 170 to maximize compatibility with the previous versions.

As suggested by previous works, guiding learning with healthy cohorts should be useful to inform predictions43,44,45. Brain graphs are notoriously heterogeneous when considering age-related differences. To take this into account, we selected subjects with significant age overlap between healthy subjects and patients in both tumor types. However, we did not consider sex segregation, since structural differences are rather unclear84; even more, the sample size for each subgroup would be severely reduced. We built a prior probability distribution of healthy links to guide the predictions using a thresholded average of the set of connections present in this healthy cohort (see Supplementary Material). This thresholded average allowed us to control for the inclusion (or exclusion) of spurious connections, while minimizing the false positive rate of connections85.

For each reconstructed network, a total of 13695 normalized edges needed to be reconstructed, thus making the problem ill-posed. Nonetheless, as argued in the introduction, we hypothesized that a fully connected network adequately guided with anatomical information could capture essential properties (see Supplementary Material). We evaluated the model using Leave One Out Cross Validation, therefore, training and testing a total of 19 models or 19 folds.

The high number of reconstructed fibers yielded high values for the connectivity between ROIs (~103). To prevent numerical overflow as well as to enhance differences in lower connections, all weights w were normalized by computing (log left(1+omega right)) before feeding them into the artificial deep neural network.

The model consisted of a 1 hidden layer deep neural network which was trained minimizing the Mean Squared Error (MSE) between the output and the ground truth determined from the MRIs (see Supplementary Material). The weights were optimized using stochastic gradient descent with a learning rate of 0.01 and 100 epochs to avoid overfitting. Evaluation metrics included the Mean Absolute Error (MAE), Pearson Correlation Coefficient (PCC) and the Cosine Similarity (CS) between the flattened predicted and ground truth graphs. The topology of the generated networks was evaluated by computing the Kullback-Leiber and Jensen-Shannon divergences between the weight probability distributions of the generated and real graphs.

Leave One Out cross-validation was done using 18 connectomes to train each one of the 19 models. For each model, the training data was randomly split into train (80%) and validation (20%) sets to prevent overfitting. Validation steps were run every 20 training epochs. For each fold, the testing of each model was done in the left-out connectome (TableS1).

Statistical tests and p-value computations were done with Scipys stats module and in-house permutation scripts. Unless stated otherwise, we used one-tailed hypotheses only when addressing the significance of strictly positive magnitudes combined with non-parametric methods. Non-negative magnitudes cannot be tested for negative results and do not need to satisfy normality.

The Leave One Out cross-validation approach yielded a pool of 19 subjects that were independently tested. For each metric, we computed the z-score by subtracting the mean and dividing by the standard deviation of the sample. Despite verifying that all of them were normally distributed, we ran parametric and non-parametric statistical tests due to the small sample size. Topological metrics were computed using the Networkx Python library86. Since the brain graphs were weighted, we computed a weight probability distribution instead of the more common degree distribution (see Supplementary Material). To compare the weight probability distributions of two graphs, we computed the Kullback-Leiber as well as the Jensen-Shannon divergences. The Jensen-Shannon divergence has the advantage of being both symmetric and normalized between 0 and 1 therefore interpretable as a distance between two distributions (i.e., predicted vs ground truth).

Further information on research design is available in theNature Portfolio Reporting Summary linked to this article.

Continued here:
Functional and structural reorganization in brain tumors: a machine learning approach using desynchronized functional ... - Nature.com

Read More..

Accelerating industrialization of Machine Learning at BMW Group using the Machine Learning Operations (MLOps … – AWS Blog

The BMW Group and Amazon Web Services (AWS) announced a strategic collaboration in 2020. The goal of that collaboration is to help further accelerate the BMW Groups pace of innovation by placing data and analytics at the center of its decision-making.

The BMW Groups Cloud Data Hub (CDH) manages company-wide data and data solutions on AWS. The CDH provides BMW Analysts and Data Scientists with access to data that helps drive business value through Data Analytics and Machine Learning (ML). As a part of BMWs larger strategy to leverage the availability of data within CDH and to help accelerate the industrialization of Machine Learning, the BMW Group worked closely with AWS Professional Services to develop their Machine Learning Operations (MLOps) solution.

The BMW Groups MLOps solution includes (1) Reference architecture, (2) Reusable Infrastructure as Code (IaC) modules that use Amazon SageMaker and Analytics services, (3) ML workflows using AWS Step Functions, and (4) Deployable MLOps template that covers the ML lifecycle from data ingestion to inference.

The MLOps solution supported the BMW Group in accelerating their industrialization of AI/ML use cases, resulting in significant value generation within the first two years after the solutions release. The long-term goal of the BMWs MLOps solution team is to help accelerate industrialization of over 80% of the AI/ML use cases at the BMW Group, helping to enable continuous innovation and improvement in AI/ML at the BMW Group.

Starting in 2022, the MLOps solution has been rolled out to AI/ML use cases at the BMW Group. It has seen widespread adoption and recognition as the BMW internal master solution for MLOps.

In this blog, we talk about BMW Groups MLOps solution, its reference architecture, high-level technical details, and benefits to the AI/ML use cases who develop and productionize ML models using the MLOps solution.

The MLOps solution has been developed to address the requirements of AI/ML use cases at the BMW Group. This includes integration with the BMW data lake, such as CDH, as well as ML workflow orchestration, data and model lineage, and governance requirements such as compliance, networking, and data protection.

AWS Professional Services and the MLOps solution team from the BMW Group collaborated closely with various AI/ML use cases to discover successful patterns and practices. This enabled the AWS and the BMW Groups MLOps solution team to gain a comprehensive understanding of the technology stack, as well as the complexities involved in productionizing AI/ML use cases.

To meet the BMW Groups AI/ML use case requirements, the team worked backwards and developed the MLOps solution architecture as mentioned in Figure 1 below.

Figure 1: MLOps Solution Architecture

In the section below, we explain the details of each component of the MLOps solution as represented in the MLOps solution architecture.

The MLOps template is a composition of IaC modules and ML workflows built using AWS managed services with a serverless first strategy designed to allow the BMW Group to use the scalability, reduced maintenance costs, and agility of ML on AWS. The template will be deployed to the AWS account of the AI/ML use cases to create an end-to-end, deployable ML and infrastructure pipeline. This is designed to act as a starting point for building AI/ML use cases at the BMW Group.

The MLOps template offers functional capabilities for the BMW Groups Data Scientists and ML Engineers ranging from data import, exploration, training, to deployment of ML model for inference. It supports applications in the operations of AI/ML use cases at the BMW Group by offering version control and infrastructure and ML monitoring capabilities.

The MLOps solution is designed to offer functional and infrastructure capabilities for use cases as independent building blocks. These capabilities can be used by AI/ML use cases as a whole or can choose selected blocks to help the BMW Group to meet their business goals.

Below is the overview of the MLOps Template building blocks offered by the BMW Groups MLOps Solution:

Figure 2: MLOps Template building blocks

MLOps solution offers Data Scientists and ML Engineers at the BMW Group with example notebooks to help enhance the learning curve of the BMW Groups Data Scientists and ML Engineers with AWS Services. These example notebooks include:

The MLOps solutions training pipeline developed using AWS Step Functions Data Science Python SDK, consists of required steps to train ML models, including data loading, feature engineering, model training, evaluation, and model monitoring.

Use case teams at the BMW Group have the flexibility to modify or expand the MLOps solutions Training pipeline as required for their specific projects. Common customizations thus far have included parallel model training, simultaneous experiments, pre-production approval workflows, and monitoring and alert notifications via Amazon SNS integration.

The details of MLOps solutions training pipeline steps are shown in Figure 3 below:

Figure 3: Training Pipeline

MLOps solution employs AWS CodePipeline to facilitate continuous integration and deployment workflows. The AWS CodePipeline sourcing steps allow users at the BMW Group to select their preferred source control, such as AWS CodeCommit or GitHub Enterprise.

AI/ML use case teams at the BMW Group can use AWS CodePipeline to help deploy the ML training pipeline, and thereby bootstrapping the required AWS infrastructure for orchestrating the ML training pipeline from reading data from the BMW Group data lake e.g., CDH to model training, evaluation, and ML model registration.

When the model training pipeline completes with registering the ML model in the Amazon SageMaker Model registry, the MLOps Solution uses Amazon EventBridge notifications to trigger AWS CodePipeline to deploy the inference module.

Around 80% of AI/ML use cases at the BMW Group served by the MLOps solution require high-performance and high-throughput methods for transforming raw data and generating inference from them. To meet the use case needs, the MLOps solution offers a batch inference pipeline with the required steps for those users at the BMW Group to load and pre-process the raw data, generate predictions, and monitor the predicted results for quality and offer explainability.

Along with the batch inference pipeline, the AI/ML use case teams at the BMW Group are provided with the required modules to help set up real-time inference in case they require low latency predictions and API integration with external use case applications.

The details of MLOps solutions batch inference pipeline steps are shown in Figure 4 below:

Figure 4: Inference Pipeline

The MLOps solution offers AI/ML use cases of the BMW Group to bring their own application stack in addition to the set of modules offered as a part of the MLOps solution. This helps AI/ML use cases at the BMW Group to make necessary customization as per their business and technical needs.

The MLOps solution helped the AI/ML use cases of the BMW Group to build and deploy production grade models, thereby reducing overall time to market by approximately 75%. The MLOps solution also offers a broad range of benefits to the BMW Group, including:

Learn more about BMWs Cloud Data Hub (CDH) in this blog post, AWS offerings at the AWS for Automotive page or contact your AWS team today.

See the original post:
Accelerating industrialization of Machine Learning at BMW Group using the Machine Learning Operations (MLOps ... - AWS Blog

Read More..

Getting Machine Learning Projects from Idea to Execution – The Machine Learning Times

Humanitys latest, greatest invention is stalling right out of the gate. Machine learning projects have the potential to help us navigate our most significant risks including wildfires, climate change, pandemics, and child abuse. It can boost sales, cut costs, prevent fraud, streamline manufacturing, and strengthen health care.

But ML initiatives routinelyfail to deliver returns orfail to deploy entirely. They stall before deploying, and at great cost. One of the major issues is that companies tend to focus more on the technology than how it should deploy. This is like being more excited about the development of a rocket than its launch.

In this article, I offer an antidote: a six-step practice for ushering machine learning projects from conception to deployment that I callbizML. This framework is an effort to establish an updated, industry-standard playbook for running successful ML projects that is pertinent and compelling to both business professionals and data professionals.

MLs problem is in its popularity. For all the hoopla about the core technology, the gritty details of how its deployment improves business operations are often glossed over. In this way, ML is now too hot for its own good. After decades of consulting and running ML conferences, the lesson has sunk in.

Todayshype about MLis overzealous because it feeds a common misconception: the ML fallacy. It goes like this: Since ML algorithms can successfully generate models that hold up for new, unseen situations (which is both amazing and true), their models are intrinsically valuable (which is not necessarily true).The value of MLcomes only when it creates organizational change that is, when an ML-generated model is deployed to actively improve operations. Until a model isusedto actively reshape how your organization works, itsuse-less literally. A model doesnt solve any business problems on its own and it aint gonna deploy itself. ML can be the disruptive technology its cracked up to be, but only if you disrupt with it.

Unfortunately, businesses oftenfail to bridge the business/tech culture gap,a disconnect between data scientists and business stakeholders that precludes deployment and leads to models collecting dust. On the one hand, data scientists, who perform the model development step, fixate solely on data science and generally prefer to not be bothered with mundane managerial activities. Often, they take the deployment of their model for granted and jump past a rigorous business process that would engage stakeholders to collaboratively plan for deployment.

On the other hand, many business professionals especially those already inclined to forgo the particulars as too technical have been seduced into seeing this stunning technology as a panacea that solves problems on its own. They defer to data scientists for any project specifics. But when theyre ultimately faced with the operational change that a deployed model would incur, its a tough sell. Taken off-guard, the stakeholder hesitates before altering operations that are key to the companys profitability.

With no one taking proactive ownership, the hose and the faucet fail to connect. Far too often, the data scientist delivers a viable model, but the operational team isnt ready for the pass and they drop the ball. There arewonderful exceptions and glowing successes, but the generally poor track record we witness today forewarns of broad disillusionment with ML even a dreadedAI winter.

The remedy is to rigorouslyplan for deploymentfrom the inception of each ML project. Laying the groundwork for the operational change that deployment would bring to fruition takes more preaching, socializing,cross-disciplinary collaboration, andchange-management panachethan many, including myself, initially realized.

To accomplish this, a knowledgeable team mustcollaborativelyfollow an end-to-end practice that begins by backward planning for deployment. As I mentioned above, I call this practicebizMLand it consists of the following six steps.

Define the business value proposition: how ML will affect operations in order to improve them (i.e.,operationalizationorimplementation).

Example:UPS predicts which destination addresses will receive a package deliveryin order to plan a more efficient delivery process.

Define what the ML model will predict for each individual case. Each detail matters from a business perspective.

Example: For each destination, how many packages across how many stops will be required tomorrow? For example, a group of three office buildings with 24 business suites at 123 Main St. will require two stops with three packages each by 8:30 a.m.

Determine the salient benchmarks to track during both model training and model deployment and determine what performance level must be achieved for the project to be considered a success.

Examples: Miles driven, gallons of fuel consumed, tons of carbon emitted, and stops-per-mile (the more densely a route is packed with deliveries, the more value is generated from each mile of driving).

Define what the training data must look like and get it into that form.

Example: Assemble a large number of positive and negative examples from which to learn both destinations that did receive deliveries on certain days and others that did not.

Generate a predictive model from the data. The model is the thing thats learned.

Examples: decision trees, logistic regression, neural networks, and ensemble models.

Use the model to render predictive scores (probabilities) thereby applying whats been learned to new cases and then act on those scores to improve business operations.

Example: By accounting for predicted packages along with known packages, UPS improved its system that assigns packages to delivery trucks at shipping centers. This improvement annually saves an estimated 18.5 million miles, $35 million, 800,000 gallons of fuel, and 18,500 metric tons of emissions.

These six steps define a business practice that charts a shrewd path to ML deployment. Anyone who wishes to participate in ML projects must be familiar with them, no matter whether theyre in a business or technical role.

After culminating with step 6, deployment, you have finishedstarting something new. BizML only begins an ongoing journey, a new phase of running improved operations and of keeping things working. Once launched, a model requires upkeep: monitoring it, maintaining it, and periodically refreshing it.

Following these six steps in this order is almost a logical inevitability. To understand why, lets start with the end. The final two culminating steps, steps 5 and 6, are the two main steps of ML, model training and deployment. BizML ushers the project through to their completion.

The step just before those two Step 4: Prepare the data is a known requirement that always precedes model training. You must provide ML software with data in the right form in order for it to work. That step has always been an integral part of modeling projects, ever since linear regression was first applied by businesses in the 1960s.

Before the technical magic, you must perform business magic. Thats where the first three steps come in. They establish a greatly needed preproduction phase of pitching, socializing, and collaborating in order to jointly agree on how ML will be deployed and how its performance will be evaluated. Importantly, these first steps go much further than only agreeing on a projects business objective. They ask business professionals to dive into the mechanics that define exactly how predictions will alter operations and they ask data scientists to reach beyond their usual sphere and work closely with business-side personnel. This cross-disciplinary team is uniquely equipped to navigate to a deployment plan that is both technically feasible and operationally viable.

Following all six of the steps of the bizML practice is uncommon, but hardly unheard of. Many ML projects succeed wildly, even if theyre in the minority. While a well-known, established framework has been a long time coming, the ideas at the heart of the bizML framework are not new to many experienced data scientists.

And yet the folks who need it the most business leaders and other business stakeholders are least likely to be familiar with it. In fact, the business world in general has yet to become aware of even the need for a specialized business practice in the first place. This is understandable, since the common narrative leads them astray. AI is often oversold as an impenetrable yet exciting cure-all. Meanwhile, many data scientists far prefer to crunch numbers than to take pains to elucidate.

First things first: Business professionals need some edification. Before those in charge can participate in the bizML practice and, ultimately, green-light model deployment with confidence, they must gain a concrete understanding of how an ML project works from end to end:What will the model predict? Precisely how will those predictions affect operations? Which metric meaningfully tracks how well it predicts?andWhat kind of data is needed?This isnt the rocket science part, but its still a modest books worth.

Considering the innumerable dollars and resources pumped into ML, how much more potential value could we capture by adopting a universal procedure that facilitates the collaboration and planning needed to reach deployment? Lets find out.

This article is adapted from the book,The AI Playbook: Mastering the Rare Art of Machine Learning Deployment, with permission from the publisher, MIT Press. It is a product of the authors work while he held a one-year position as the Bodily Bicentennial Professor in Analytics at the UVA Darden School of Business.

View original post here:
Getting Machine Learning Projects from Idea to Execution - The Machine Learning Times

Read More..

Can language models read the genome? This one decoded mRNA to make better vaccines. – EurekAlert

image:

Machine learning expert Mengdi Wang, in partnership with biotech startup RVAC Medicines, has developed a language model that used its powers of semantic representation to design a more effective mRNA vaccine for COVID-19. Photo by Sameer A. Khan/Fotobuddy

Credit: Photo by Sameer A. Khan/Fotobuddy

The same class of artificial intelligence that made headlines coding software and passing the bar exam has learned to read a different kind of text the genetic code.

That code contains instructions for all of lifes functions and follows rules not unlike those that govern human languages. Each sequence in a genome adheres to an intricate grammar and syntax, the structures that give rise to meaning. Just as changing a few words can radically alter the impact of a sentence, small variations in a biological sequence can make a huge difference in the forms that sequence encodes.

Now Princeton University researchers led by machine learning expert Mengdi Wang are using language models to home in on partial genome sequences and optimize those sequences to study biology and improve medicine. And they are already underway.

In a paper published April 5 in the journal Nature Machine Intelligence, the authors detail a language model that used its powers of semantic representation to design a more effective mRNA vaccine such as those used to protect against COVID-19.

Found in Translation

Scientists have a simple way to summarize the flow of genetic information. They call it the central dogma of biology. Information moves from DNA to RNA to proteins. Proteins create the structures and functions of living cells.

Messenger RNA, or mRNA, converts the information into proteins in that final step, called translation. But mRNA is interesting. Only part of it holds the code for the protein. The rest is not translated but controls vital aspects of the translation process.

Governing the efficiency of protein production is a key mechanism by which mRNA vaccines work. The researchers focused their language model there, on the untranslated region, to see how they could optimize efficiency and improve vaccines.

After training the model on a small variety of species, the researchers generated hundreds of new optimized sequences and validated those results through lab experiments. The best sequences outperformed several leading benchmarks for vaccine development, including a 33% increase in the overall efficiency of protein production.

Increasing protein production efficiency by even a small amount provides a major boost for emerging therapeutics, according to the researchers. Beyond COVID-19, mRNA vaccines promise to protect against many infectious diseases and cancers.

Wang, a professor of electrical and computer engineering and the principal investigator in this study, said the models success also pointed to a more fundamental possibility. Trained on mRNA from a handful of species, it was able to decode nucleotide sequences and reveal something new about gene regulation. Scientists believe gene regulation, one of lifes most basic functions, holds the key to unlocking the origins of disease and disorder. Language models like this one could provide a new way to probe.

Wangs collaborators include researchers from the biotech firm RVAC Medicines as well as the Stanford University School of Medicine.

The Language of Disease

The new model differs in degree, not kind, from the large language models that power todays AI chat bots. Instead of being trained on billions of pages of text from the internet, their model was trained on a few hundred thousand sequences. The model also was trained to incorporate additional knowledge about the production of proteins, including structural and energy-related information.

The research team used the trained model to create a library of 211 new sequences. Each was optimized for a desired function, primarily an increase in the efficiency of translation. Those proteins, like the spike protein targeted by COVID-19 vaccines, drive the immune response to infectious disease.

Previous studies have created language models to decode various biological sequences, including proteins and DNA, but this was the first language model to focus on the untranslated region of mRNA. In addition to a boost in overall efficiency, it was also able to predict how well a sequence would perform at a variety of related tasks.

Wang said the real challenge in creating this language model was in understanding the full context of the available data. Training a model requires not only the raw data with all its features but also the downstream consequences of those features. If a program is designed to filter spam from email, each email it trains on would be labeled spam or not spam. Along the way, the model develops semantic representations that allow it to determine what sequences of words indicate a spam label. Therein lies the meaning.

Wang said looking at one narrow dataset and developing a model around it was not enough to be useful for life scientists. She needed to do something new. Because this model was working at the leading edge of biological understanding, the data she found was all over the place.

Part of my dataset comes from a study where there are measures for efficiency, Wang said. Another part of my dataset comes from another study [that] measured expression levels. We also collected unannotated data from multiple resources. Organizing those parts into one coherent and robust whole a multifaceted dataset that she could use to train a sophisticated language model was a massive challenge.

Training a model is not only about putting together all those sequences, but also putting together sequences with the labels that have been collected so far. This had never been done before.

The paper, A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions, was published in Nature Machine Learning. Additional authors include Dan Yu, Yupeng Li, Yue Shen and Jason Zhang, from RVAC Medicines; Le Cong from Stanford; and Yanyi Chu and Kaixuan Huang from Princeton.

Nature Machine Intelligence

Experimental study

Cells

A 5' UTR Language MoA 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictionsdel for Decoding Untranslated Regions of mRNA and Function Predictions

5-Apr-2024

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Read more:
Can language models read the genome? This one decoded mRNA to make better vaccines. - EurekAlert

Read More..

Artificial Intelligence and Machine Learning in Clinical Data Management: Opportunities and Ethical Considerations – PharmiWeb.com

Disclaimer: You are now leaving PharmiWeb.com website and are going to a website that is not operated by us. We are not responsible for the content or availability of linked sites.

ABOUT THIRD PARTY LINKS ON OUR SITE

PharmiWeb.com offers links to other third party websites that may be of interest to our website visitors. The links provided in our website are provided solely for your convenience and may assist you in locating other useful information on the Internet. When you click on these links you will leave the PharmiWeb.com website and will be redirected to another site. These sites are not under the control of PharmiWeb.com.

PharmiWeb.com is not responsible for the content of linked third party websites. We are not an agent for these third parties nor do we endorse or guarantee their products. We make no representation or warranty regarding the accuracy of the information contained in the linked sites. We suggest that you always verify the information obtained from linked websites before acting upon this information.

Also, please be aware that the security and privacy policies on these sites may be different than PharmiWeb.com policies, so please read third party privacy and security policies closely.

If you have any questions or concerns about the products and services offered on linked third party websites, please contact the third party directly.

Originally posted here:
Artificial Intelligence and Machine Learning in Clinical Data Management: Opportunities and Ethical Considerations - PharmiWeb.com

Read More..

When Students Get Lost in the Algorithm: The Problems with Nevada’s AI School Funding Experiment – New America

Nevada is not going to be the last state to integrate big data and predictive analytics into its school funding formula, so we need to think critically about how best to deploy these tools. There is value in efforts to use these resources to better target high-need students, and education funding models can certainly take full advantage of the growing data resources available. However, if school fundingand by extension, the opportunities available to studentsare directly linked to the outputs of a machine learning model, then that model must be designed with transparency, equity, and accountability in mind from the start. The methodology must be in the public domain, so that they can be evaluated for how fairly and well they actually support high-need studentswhich should include students from low-income backgrounds as well as those with other needs and challenges. Nevadas new policy falls short on all these fronts, leaving too many students out of the equation.

See more here:
When Students Get Lost in the Algorithm: The Problems with Nevada's AI School Funding Experiment - New America

Read More..

18 of the best large language models in 2024 – TechTarget

Large language models are the dynamite behind the generative AI boom of 2023. However, they've been around for a while.

LLMs are black box AI systems that use deep learning on extremely large datasets to understand and generate new text. Modern LLMs began taking shape in 2014 when the attention mechanism -- a machine learning technique designed to mimic human cognitive attention -- was introduced in a research paper titled "Neural Machine Translation by Jointly Learning to Align and Translate." In 2017, that attention mechanism was honed with the introduction of the transformer model in another paper, "Attention Is All You Need."

Some of the most well-known language models today are based on the transformer model, including the generative pre-trained transformer series of LLMs and bidirectional encoder representations from transformers (BERT).

ChatGPT, which runs on a set of language models from OpenAI, attracted more than 100 million users just two months after its release in 2022. Since then, many competing models have been released. Some belong to big companies such as Google and Microsoft; others are open source.

Constant developments in the field can be difficult to keep track of. Here are some of the most influential models, both past and present. Included in it are models that paved the way for today's leaders as well as those that could have a significant effect in the future.

Below are some of the most relevant large language models today. They do natural language processing and influence the architecture of future models.

BERT is a family of LLMs that Google introduced in 2018. BERT is a transformer-based model that can convert sequences of data to other sequences of data. BERT's architecture is a stack of transformer encoders and features 342 million parameters. BERT was pre-trained on a large corpus of data then fine-tuned to perform specific tasks along with natural language inference and sentence text similarity. It was used to improve query understanding in the 2019 iteration of Google search.

The Claude LLM focuses on constitutional AI, which shapes AI outputs guided by a set of principles that help the AI assistant it powers helpful, harmless and accurate. Claude was created by the company Anthropic. The latest iteration of the Claude LLM is Claude 3.0.

Cohere is an enterprise AI platform that provides several LLMs including Command, Rerank and Embed. These LLMs can be custom-trained and fine-tuned to a specific companys use case. The company that created the Cohere LLM was founded by one of the authors of Attention Is All You Need. One of Coheres strengths is that it is not tied to one single cloud -- unlike OpenAI, which is bound to Microsoft Azure.

Ernie is Baidus large language model which powers the Ernie 4.0 chatbot. The bot was released in August 2023 and has garnered more than 45 million users. Ernie is rumored to have 10 trillion parameters. The bot works best in Mandarin but is capable in other languages.

Falcon 40B is a transformer-based, causal decoder-only model developed by the Technology Innovation Institute. It is open source and was trained on English data. The model is available in two smaller variants as well: Falcon 1B and Falcon 7B (1 billion and 7 billion parameters). Amazon has made Falcon 40B available on Amazon SageMaker. It is also available for free on GitHub.

Gemini is Google's family of LLMs that power the company's chatbot of the same name. The model replaced Palm in powering the chatbot, which was rebranded from Bard to Gemini upon the model switch. Gemini models are multimodal, meaning they can handle images, audio and video as well as text. Gemini is also integrated in many Google applications and products. It comes in three sizes -- Ultra, Pro and Nano. Ultra is the largest and most capable model, Pro is the mid-tier model and Nano is the smallest model, designed for efficiency with on-device tasks. Gemini outperforms GPT-4 on most evaluated benchmarks.

Gemma is a family of open-source language models from Google that were trained on the same resources as Gemini. Gemma comes in two sizes -- a 2 billion parameter model and a 7 billion parameter model. Gemma models can be run locally on a personal computer, and surpass similarly sized Llama 2 models on several evaluated benchmarks.

GPT-3 is OpenAI's large language model with more than 175 billion parameters, released in 2020. GPT-3 uses a decoder-only transformer architecture. In September 2022, Microsoft announced it had exclusive use of GPT-3's underlying model. GPT-3 is 10 times larger than its predecessor. GPT-3's training data includes Common Crawl, WebText2, Books1, Books2 and Wikipedia.

GPT-3 is the last of the GPT series of models in which OpenAI made the parameter counts publicly available. The GPT series was first introduced in 2018 with OpenAI's paper "Improving Language Understanding by Generative Pre-Training."

GPT-3.5 is an upgraded version of GPT-3 with fewer parameters. GPT-3.5 was fine-tuned using reinforcement learning from human feedback. GPT-3.5 is the version of GPT that powers ChatGPT. There are several models, with GPT-3.5 turbo being the most capable, according to OpenAI. GPT-3.5's training data extends to September 2021.

It was also integrated into the Bing search engine but has since been replaced with GPT-4.

GPT-4 is the largest model in OpenAI's GPT series, released in 2023. Like the others, it's a transformer-based model. Unlike the others, its parameter count has not been released to the public, though there are rumors that the model has more than 170 trillion. OpenAI describes GPT-4 as a multimodal model, meaning it can process and generate both language and images as opposed to being limited to only language. GPT-4 also introduced a system message, which lets users specify tone of voice and task.

GPT-4 demonstrated human-level performance in multiple academic exams. At the model's release, some speculated that GPT-4 came close to artificial general intelligence (AGI), which means it is as smart or smarter than a human. GPT-4 powers Microsoft Bing search, is available in ChatGPT Plus and will eventually be integrated into Microsoft Office products.

Lamda (Language Model for Dialogue Applications) is a family of LLMs developed by Google Brain announced in 2021. Lamda used a decoder-only transformer language model and was pre-trained on a large corpus of text. In 2022, LaMDA gained widespread attention when then-Google engineer Blake Lemoine went public with claims that the program was sentient. It was built on the Seq2Seq architecture.

Large Language Model Meta AI (Llama) is Meta's LLM released in 2023. The largest version is 65 billion parameters in size. Llama was originally released to approved researchers and developers but is now open source. Llama comes in smaller sizes that require less computing power to use, test and experiment with.

Llama uses a transformer architecture and was trained on a variety of public data sources, including webpages from CommonCrawl, GitHub, Wikipedia and Project Gutenberg. Llama was effectively leaked and spawned many descendants, including Vicuna and Orca.

Mistral is a 7 billion parameter language model that outperforms Llama's language model of a similar size on all evaluated benchmarks. Mistral also has a fine-tuned model that is specialized to follow instructions. Its smaller size enables self-hosting and competent performance for business purposes. It was released under the Apache 2.0 license.

Orca was developed by Microsoft and has 13 billion parameters, meaning it's small enough to run on a laptop. It aims to improve on advancements made by other open source models by imitating the reasoning procedures achieved by LLMs. Orca achieves the same performance as GPT-4 with significantly fewer parameters and is on par with GPT-3.5 for many tasks. Orca is built on top of the 13 billion parameter version of LLaMA.

The Pathways Language Model is a 540 billion parameter transformer-based model from Google powering its AI chatbot Bard. It was trained across multiple TPU 4 Pods -- Google's custom hardware for machine learning. Palm specializes in reasoning tasks such as coding, math, classification and question answering. Palm also excels at decomposing complex tasks into simpler subtasks.

PaLM gets its name from a Google research initiative to build Pathways, ultimately creating a single model that serves as a foundation for multiple use cases. There are several fine-tuned versions of Palm, including Med-Palm 2 for life sciences and medical information as well as Sec-Palm for cybersecurity deployments to speed up threat analysis.

Phi-1 is a transformer-based language model from Microsoft. At just 1.3 billion parameters, Phi-1 was trained for four days on a collection of textbook-quality data. Phi-1 is an example of a trend toward smaller models trained on better quality data and synthetic data.

"We'll probably see a lot more creative scaling down work: prioritizing data quality and diversity over quantity, a lot more synthetic data generation, and small but highly capable expert models," wrote Andrej Karpathy, former director of AI at Tesla and OpenAI employee, in a tweet.

Phi-1 specializes in Python coding and has fewer general capabilities because of its smaller size.

StableLM is a series of open source language models developed by Stability AI, the company behind image generator Stable Diffusion. There are 3 billion and 7 billion parameter models available and 15 billion, 30 billion, 65 billion and 175 billion parameter models in progress at time of writing. StableLM aims to be transparent, accessible and supportive.

Vicuna is another influential open source LLM derived from Llama. It was developed by LMSYS and was fine-tuned using data from sharegpt.com. It is smaller and less capable that GPT-4 according to several benchmarks, but does well for a model of its size. Vicuna has only 33 billion parameters, whereas GPT-4 has trillions.

Although LLMs are a recent phenomenon, their precursors go back decades. Learn how recent precursor Seq2Seq and distant precursor ELIZA set the stage for modern LLMs.

Seq2Seq is a deep learning approach used for machine translation, image captioning and natural language processing. It was developed by Google and underlies some of their modern LLMs, including LaMDA. Seq2Seq also underlies AlexaTM 20B, Amazon's large language model. It uses a mix of encoders and decoders.

Eliza was anearly natural language processing programcreated in 1966. It is one of the earliest examples of a language model. Eliza simulated conversation using pattern matching and substitution. Eliza, running a certain script, could parody the interaction between a patient and therapist by applying weights to certain keywords and responding to the user accordingly. The creator of Eliza, Joshua Weizenbaum, wrote a book on the limits of computation and artificial intelligence.

View original post here:
18 of the best large language models in 2024 - TechTarget

Read More..