Large language models are the dynamite behind the generative AI boom of 2023. However, they've been around for a while.
LLMs are black box AI systems that use deep learning on extremely large datasets to understand and generate new text. Modern LLMs began taking shape in 2014 when the attention mechanism -- a machine learning technique designed to mimic human cognitive attention -- was introduced in a research paper titled "Neural Machine Translation by Jointly Learning to Align and Translate." In 2017, that attention mechanism was honed with the introduction of the transformer model in another paper, "Attention Is All You Need."
Some of the most well-known language models today are based on the transformer model, including the generative pre-trained transformer series of LLMs and bidirectional encoder representations from transformers (BERT).
ChatGPT, which runs on a set of language models from OpenAI, attracted more than 100 million users just two months after its release in 2022. Since then, many competing models have been released. Some belong to big companies such as Google and Microsoft; others are open source.
Constant developments in the field can be difficult to keep track of. Here are some of the most influential models, both past and present. Included in it are models that paved the way for today's leaders as well as those that could have a significant effect in the future.
Below are some of the most relevant large language models today. They do natural language processing and influence the architecture of future models.
BERT is a family of LLMs that Google introduced in 2018. BERT is a transformer-based model that can convert sequences of data to other sequences of data. BERT's architecture is a stack of transformer encoders and features 342 million parameters. BERT was pre-trained on a large corpus of data then fine-tuned to perform specific tasks along with natural language inference and sentence text similarity. It was used to improve query understanding in the 2019 iteration of Google search.
The Claude LLM focuses on constitutional AI, which shapes AI outputs guided by a set of principles that help the AI assistant it powers helpful, harmless and accurate. Claude was created by the company Anthropic. The latest iteration of the Claude LLM is Claude 3.0.
Cohere is an enterprise AI platform that provides several LLMs including Command, Rerank and Embed. These LLMs can be custom-trained and fine-tuned to a specific companys use case. The company that created the Cohere LLM was founded by one of the authors of Attention Is All You Need. One of Coheres strengths is that it is not tied to one single cloud -- unlike OpenAI, which is bound to Microsoft Azure.
Ernie is Baidus large language model which powers the Ernie 4.0 chatbot. The bot was released in August 2023 and has garnered more than 45 million users. Ernie is rumored to have 10 trillion parameters. The bot works best in Mandarin but is capable in other languages.
Falcon 40B is a transformer-based, causal decoder-only model developed by the Technology Innovation Institute. It is open source and was trained on English data. The model is available in two smaller variants as well: Falcon 1B and Falcon 7B (1 billion and 7 billion parameters). Amazon has made Falcon 40B available on Amazon SageMaker. It is also available for free on GitHub.
Gemini is Google's family of LLMs that power the company's chatbot of the same name. The model replaced Palm in powering the chatbot, which was rebranded from Bard to Gemini upon the model switch. Gemini models are multimodal, meaning they can handle images, audio and video as well as text. Gemini is also integrated in many Google applications and products. It comes in three sizes -- Ultra, Pro and Nano. Ultra is the largest and most capable model, Pro is the mid-tier model and Nano is the smallest model, designed for efficiency with on-device tasks. Gemini outperforms GPT-4 on most evaluated benchmarks.
Gemma is a family of open-source language models from Google that were trained on the same resources as Gemini. Gemma comes in two sizes -- a 2 billion parameter model and a 7 billion parameter model. Gemma models can be run locally on a personal computer, and surpass similarly sized Llama 2 models on several evaluated benchmarks.
GPT-3 is OpenAI's large language model with more than 175 billion parameters, released in 2020. GPT-3 uses a decoder-only transformer architecture. In September 2022, Microsoft announced it had exclusive use of GPT-3's underlying model. GPT-3 is 10 times larger than its predecessor. GPT-3's training data includes Common Crawl, WebText2, Books1, Books2 and Wikipedia.
GPT-3 is the last of the GPT series of models in which OpenAI made the parameter counts publicly available. The GPT series was first introduced in 2018 with OpenAI's paper "Improving Language Understanding by Generative Pre-Training."
GPT-3.5 is an upgraded version of GPT-3 with fewer parameters. GPT-3.5 was fine-tuned using reinforcement learning from human feedback. GPT-3.5 is the version of GPT that powers ChatGPT. There are several models, with GPT-3.5 turbo being the most capable, according to OpenAI. GPT-3.5's training data extends to September 2021.
It was also integrated into the Bing search engine but has since been replaced with GPT-4.
GPT-4 is the largest model in OpenAI's GPT series, released in 2023. Like the others, it's a transformer-based model. Unlike the others, its parameter count has not been released to the public, though there are rumors that the model has more than 170 trillion. OpenAI describes GPT-4 as a multimodal model, meaning it can process and generate both language and images as opposed to being limited to only language. GPT-4 also introduced a system message, which lets users specify tone of voice and task.
GPT-4 demonstrated human-level performance in multiple academic exams. At the model's release, some speculated that GPT-4 came close to artificial general intelligence (AGI), which means it is as smart or smarter than a human. GPT-4 powers Microsoft Bing search, is available in ChatGPT Plus and will eventually be integrated into Microsoft Office products.
Lamda (Language Model for Dialogue Applications) is a family of LLMs developed by Google Brain announced in 2021. Lamda used a decoder-only transformer language model and was pre-trained on a large corpus of text. In 2022, LaMDA gained widespread attention when then-Google engineer Blake Lemoine went public with claims that the program was sentient. It was built on the Seq2Seq architecture.
Large Language Model Meta AI (Llama) is Meta's LLM released in 2023. The largest version is 65 billion parameters in size. Llama was originally released to approved researchers and developers but is now open source. Llama comes in smaller sizes that require less computing power to use, test and experiment with.
Llama uses a transformer architecture and was trained on a variety of public data sources, including webpages from CommonCrawl, GitHub, Wikipedia and Project Gutenberg. Llama was effectively leaked and spawned many descendants, including Vicuna and Orca.
Mistral is a 7 billion parameter language model that outperforms Llama's language model of a similar size on all evaluated benchmarks. Mistral also has a fine-tuned model that is specialized to follow instructions. Its smaller size enables self-hosting and competent performance for business purposes. It was released under the Apache 2.0 license.
Orca was developed by Microsoft and has 13 billion parameters, meaning it's small enough to run on a laptop. It aims to improve on advancements made by other open source models by imitating the reasoning procedures achieved by LLMs. Orca achieves the same performance as GPT-4 with significantly fewer parameters and is on par with GPT-3.5 for many tasks. Orca is built on top of the 13 billion parameter version of LLaMA.
The Pathways Language Model is a 540 billion parameter transformer-based model from Google powering its AI chatbot Bard. It was trained across multiple TPU 4 Pods -- Google's custom hardware for machine learning. Palm specializes in reasoning tasks such as coding, math, classification and question answering. Palm also excels at decomposing complex tasks into simpler subtasks.
PaLM gets its name from a Google research initiative to build Pathways, ultimately creating a single model that serves as a foundation for multiple use cases. There are several fine-tuned versions of Palm, including Med-Palm 2 for life sciences and medical information as well as Sec-Palm for cybersecurity deployments to speed up threat analysis.
Phi-1 is a transformer-based language model from Microsoft. At just 1.3 billion parameters, Phi-1 was trained for four days on a collection of textbook-quality data. Phi-1 is an example of a trend toward smaller models trained on better quality data and synthetic data.
"We'll probably see a lot more creative scaling down work: prioritizing data quality and diversity over quantity, a lot more synthetic data generation, and small but highly capable expert models," wrote Andrej Karpathy, former director of AI at Tesla and OpenAI employee, in a tweet.
Phi-1 specializes in Python coding and has fewer general capabilities because of its smaller size.
StableLM is a series of open source language models developed by Stability AI, the company behind image generator Stable Diffusion. There are 3 billion and 7 billion parameter models available and 15 billion, 30 billion, 65 billion and 175 billion parameter models in progress at time of writing. StableLM aims to be transparent, accessible and supportive.
Vicuna is another influential open source LLM derived from Llama. It was developed by LMSYS and was fine-tuned using data from sharegpt.com. It is smaller and less capable that GPT-4 according to several benchmarks, but does well for a model of its size. Vicuna has only 33 billion parameters, whereas GPT-4 has trillions.
Although LLMs are a recent phenomenon, their precursors go back decades. Learn how recent precursor Seq2Seq and distant precursor ELIZA set the stage for modern LLMs.
Seq2Seq is a deep learning approach used for machine translation, image captioning and natural language processing. It was developed by Google and underlies some of their modern LLMs, including LaMDA. Seq2Seq also underlies AlexaTM 20B, Amazon's large language model. It uses a mix of encoders and decoders.
Eliza was anearly natural language processing programcreated in 1966. It is one of the earliest examples of a language model. Eliza simulated conversation using pattern matching and substitution. Eliza, running a certain script, could parody the interaction between a patient and therapist by applying weights to certain keywords and responding to the user accordingly. The creator of Eliza, Joshua Weizenbaum, wrote a book on the limits of computation and artificial intelligence.
View original post here:
18 of the best large language models in 2024 - TechTarget
- What Is Machine Learning? | How It Works, Techniques ... [Last Updated On: September 5th, 2019] [Originally Added On: September 5th, 2019]
- Start Here with Machine Learning [Last Updated On: September 22nd, 2019] [Originally Added On: September 22nd, 2019]
- What is Machine Learning? | Emerj [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Microsoft Azure Machine Learning Studio [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Machine Learning Basics | What Is Machine Learning? | Introduction To Machine Learning | Simplilearn [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- What is Machine Learning? A definition - Expert System [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- Machine Learning | Stanford Online [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- How to Learn Machine Learning, The Self-Starter Way [Last Updated On: October 17th, 2019] [Originally Added On: October 17th, 2019]
- definition - What is machine learning? - Stack Overflow [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Artificial Intelligence vs. Machine Learning vs. Deep ... [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning in R for beginners (article) - DataCamp [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning | Udacity [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning Artificial Intelligence | McAfee [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- AI-based ML algorithms could increase detection of undiagnosed AF - Cardiac Rhythm News [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- The Cerebras CS-1 computes deep learning AI problems by being bigger, bigger, and bigger than any other chip - TechCrunch [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Can the planet really afford the exorbitant power demands of machine learning? - The Guardian [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- New InfiniteIO Platform Reduces Latency and Accelerates Performance for Machine Learning, AI and Analytics - Business Wire [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- How to Use Machine Learning to Drive Real Value - eWeek [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Machine Learning As A Service Market to Soar from End-use Industries and Push Revenues in the 2025 - Downey Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Rad AI Raises $4M to Automate Repetitive Tasks for Radiologists Through Machine Learning - - HIT Consultant [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning Improves Performance of the Advanced Light Source - Machine Design [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Synthetic Data: The Diamonds of Machine Learning - TDWI [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The transformation of healthcare with AI and machine learning - ITProPortal [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Workday talks machine learning and the future of human capital management - ZDNet [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning with R, Third Edition - Free Sample Chapters - Neowin [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Verification In The Era Of Autonomous Driving, Artificial Intelligence And Machine Learning - SemiEngineering [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Podcast: How artificial intelligence, machine learning can help us realize the value of all that genetic data we're collecting - Genetic Literacy... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The Real Reason Your School Avoids Machine Learning - The Tech Edvocate [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Siri, Tell Fido To Stop Barking: What's Machine Learning, And What's The Future Of It? - 90.5 WESA [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Microsoft reveals how it caught mutating Monero mining malware with machine learning - The Next Web [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The role of machine learning in IT service management - ITProPortal [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Global Director of Tech Exploration Discusses Artificial Intelligence and Machine Learning at Anheuser-Busch InBev - Seton Hall University News &... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The 10 Hottest AI And Machine Learning Startups Of 2019 - CRN: The Biggest Tech News For Partners And The IT Channel [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Startup jobs of the week: Marketing Communications Specialist, Oracle Architect, Machine Learning Scientist - BetaKit [Last Updated On: November 30th, 2019] [Originally Added On: November 30th, 2019]
- Here's why machine learning is critical to success for banks of the future - Tech Wire Asia [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- 3 questions to ask before investing in machine learning for pop health - Healthcare IT News [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Caterpillar Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Measuring Employee Engagement with A.I. and Machine Learning - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Amazon Wants to Teach You Machine Learning Through Music? - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Nvidia Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- AI and machine learning platforms will start to challenge conventional thinking - CRN.in [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Twitter Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Seagate Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If BlackBerry Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Amazon Releases A New Tool To Improve Machine Learning Processes - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Another free web course to gain machine-learning skills (thanks, Finland), NIST probes 'racist' face-recog and more - The Register [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Kubernetes and containers are the perfect fit for machine learning - JAXenter [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- TinyML as a Service and machine learning at the edge - Ericsson [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- AI and machine learning products - Cloud AI | Google Cloud [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning | Blog | Microsoft Azure [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning in 2019 Was About Balancing Privacy and Progress - ITPro Today [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- CMSWire's Top 10 AI and Machine Learning Articles of 2019 - CMSWire [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- Here's why digital marketing is as lucrative a career as data science and machine learning - Business Insider India [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Dell's Latitude 9510 shakes up corporate laptops with 5G, machine learning, and thin bezels - PCWorld [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Finally, a good use for AI: Machine-learning tool guesstimates how well your code will run on a CPU core - The Register [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Cloud as the enabler of AI's competitive advantage - Finextra [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Forget Machine Learning, Constraint Solvers are What the Enterprise Needs - - RTInsights [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Informed decisions through machine learning will keep it afloat & going - Sea News [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- The Problem with Hiring Algorithms - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- New Program Supports Machine Learning in the Chemical Sciences and Engineering - Newswise [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- AI-System Flags the Under-Vaccinated in Israel - PrecisionVaccinations [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- New Contest: Train All The Things - Hackaday [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- AFTAs 2019: Best New Technology Introduced Over the Last 12 MonthsAI, Machine Learning and AnalyticsActiveViam - www.waterstechnology.com [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Educate Yourself on Machine Learning at this Las Vegas Event - Small Business Trends [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Seton Hall Announces New Courses in Text Mining and Machine Learning - Seton Hall University News & Events [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Looking at the most significant benefits of machine learning for software testing - The Burn-In [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Leveraging AI and Machine Learning to Advance Interoperability in Healthcare - - HIT Consultant [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Adventures With Artificial Intelligence and Machine Learning - Toolbox [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Five Reasons to Go to Machine Learning Week 2020 - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Uncover the Possibilities of AI and Machine Learning With This Bundle - Interesting Engineering [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Learning that Targets Millennial and Generation Z - HR Exchange Network [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Red Hat Survey Shows Hybrid Cloud, AI and Machine Learning are the Focus of Enterprises - Computer Business Review [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Vectorspace AI Datasets are Now Available to Power Machine Learning (ML) and Artificial Intelligence (AI) Systems in Collaboration with Elastic -... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- What is Machine Learning? | Types of Machine Learning ... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- How Machine Learning Will Lead to Better Maps - Popular Mechanics [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Jenkins Creator Launches Startup To Speed Software Testing with Machine Learning -- ADTmag - ADT Magazine [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- An Open Source Alternative to AWS SageMaker - Datanami [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Machine Learning Could Aid Diagnosis of Barrett's Esophagus, Avoid Invasive Testing - Medical Bag [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- OReilly and Formulatedby Unveil the Smart Cities & Mobility Ecosystems Conference - Yahoo Finance [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]