The race is on to create one neural network that can process multiple kinds of data -- a more-general artificial intelligence that doesn't discriminate about types of data but instead can crunch them all within the same basic structure.
The genre of multi-modality, as these neural networks are called, is seeing a flurry of activity in which different data, such as image, text, and speech audio, are passed through the same algorithm to produce a score on different tests such as image recognition, natural language understanding, or speech detection.
And these ambidextrous networks are racking up scores on benchmark tests of AI. The latest achievement is what's called "data2vec," developed by researchers at the AI division of Meta (parent of Facebook, Instagram, and WhatsApp).
The point, as Meta researcher Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, and Michael Auli reveal in a blog post, is to approach something more like the general learning ability that the human mind seems to encompass.
"While people appear to learn in a similar way regardless of how they get information -- whether they use sight or sound, for example -- there are currently big differences in the way self-supervised learning algorithms learn from images, speech, text, and other modalities," the blog post states.
The main point is that "AI should be able to learn to do many different tasks, including those that are entirely unfamiliar."
Meta's CEO, Mark Zuckerberg, offered a quote about the work and its ties to a future Metaverse:
People experience the world through a combination of sight, sound, and words, and systems like this could one day understand the world the way we do. This will all eventually get built into AR glasses with an AI assistant so, for example, it could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.
The name data2vec is a play on the name of a program for language "embedding" developed at Google in 2013 called "word2vec." That program predicted how words cluster together, and so word2vec is representative of a neural network designed for a specific type of data, in that case text.
Also: Open the pod bay doors, please, HAL: Meta's AI simulates lip-reading
In the case of data2vec, however, Baevski and colleagues are taking a standard version of what's called a Transformer, developed by Ashish Vaswani and colleagues at Google in 2017, and extending it to be used for multiple data types.
The Transformer neural network was originally developed for language tasks, but it has been widely adapted in the years since for many kinds of data. Baevski et al. show that the Transformer can be used to process multiple kinds of data without being altered, and the trained neural network that results can perform on multiple different tasks.
In the formal paper, "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language," Baevski et al., train the Transformer for image data, speech audio waveforms, and text language representations.
The very general Transformer becomes what is called a pre-training that can then be applied to specific neural networks in order to perform on specific tasks. For example, the authors use data2vec as pre-training to equip what's called "ViT," the "vision Transformer," a neural network specifically designed for vision tasks that was introduced last year by Alexey Dosovitskiy and colleagues at Google.
Meta shows top scores for the venerable ImageNet image-recognition competition.
When used on ViT to try to solve the standard ImageNet test of image recognition, their results come in at the top of the pack, with accuracy of 84.1%. That's better than the score of 83.2% received by a team at Microsoft that pre-trained ViT lead by Hangbo Baolast year.
And the same data2vec Transformer outputs results that are state-of-the-art for speech recognition and that are competitive, if not the best, for natural language learning:
Experimental results show data2vec to be effective in all three modalities, setting a new state of the art for ViT-B and ViT-L on ImageNet-1K, improving over the best prior work in speech processing on speech recognition and performing on par to RoBERTa on the GLUE natural language understanding benchmark.
The crux is that this is happening without any modification of the neural network to be about images, and the same for speech and text. Instead, every input type is going into the same network and is completing the same very general task. That task is the same task that Transformer networks always use, known as "masked prediction."
Also: Google's Supermodel: DeepMind Perceiver is a step on the road to an AI machine that could process anything
The way that data2vec performs masked prediction, however, is an approach known as "self-supervised" learning. In a self-supervised setting, a neural network is trained or developed by having to pass through multiple stages.
First, the network constructs a representation of the joint probability of data input, be it images or speech or text. Then, a second version of the network has some of those input data items "masked out," left unrevealed. It has to reconstruct the joint probability that the first version of the network had constructed, which forces it to create increasingly better representations of the data by essentially filling in the blanks.
An overview of the data2vec approach.
The two networks, the one with the full pattern of the joint probability, and the one with the incomplete version that it is trying to complete, are called, sensibly enough, "Teacher" and "Student." The Student network tries to develop its sense of the data, if you will, by reconstructing what the Teacher has already achieved.
You can see the code for the models on Github.
How is the neural network performing Teacher and Student for three very different types of data? The key is that the "target" of joint probability in all three data cases is not a specific output data type, as is the case in versions of the Transformer for a specific data type, such as Google's BERT or OpenAI's GPT-3.
Rather, data2vec is grabbing a bunch of neural network layers that are inside the neural network, somewhere in the middle, that represent the data before it is produced as a final output.
As the researchers write, "One of the main differences of our method [] other than performing masked prediction, is the use of targets which are based on averaging multiple layers from the teacher network." Specifically, "we regress multiple neural network layer representations instead of just the top layer," so that "data2vec predicts the latent representations of the input data."
They add, "We generally use the output of the FFN [feed-forward network] prior to the last residual connection in each block as target," where a "block" is the Transformer equivalent of a neural network layer.
The point is that every data type that goes in becomes the same challenge for the Student network of reconstructing something inside the neural network that the Teacher had composed.
This averaging is different from other recent approaches to building One Network To Crunch All Data. For example, last summer, Google's DeepMind unit offered up what it calls "Perceiver," its own multi-modal version of the Transformer. The training of the Perceiver neural network is the more-standard process of producing an output that is the answer to a labeled, supervised task such as ImageNet. In the self-supervised approach, data2vec isn't using those labels; it's just trying to reconstruct the network's internal representation of the data.
Even more ambitious efforts lie in the wings. Jeff Dean, head of Google's AI efforts, in October teased about "Pathways," calling it a "next generation AI architecture" for multi-modal data processing.
Mind you, data2vec's very general approach to a single neural net for multiple modalities still has a lot of information about the different data types. Image, speech, and text are all prepared by pre-processing of the data. In that way, the multi-modal aspect of the network still relies on clues about the data, what the team refer to as "small modality-specific input encoders."
Also:Google unveils 'Pathways', a next-gen AI that can be trained to multitask
We are not yet at a world where a neural net is trained with no sense whatsoever of the input data types. We are also not at a point in time when the neural network can construct one representation that combines all the different data types, so that the neural net is learning things in combination.
That fact is made clear from an exchange between ZDNet and the researchers. ZDNet reached out to Baevski and team and asked, "Are the latent representations that serve as targets a combined encoding of all three modalities at any given time step, or are they usually just one of the modalities?"
Baevski and team responded that it is the latter case, and their reply is interesting enough to quote at length:
The latent variables are not a combined encoding for the three modalities. We train separate models for each modality but the process through which the models learn is identical. This is the main innovation of our project since before there were large differences in how models are trained in different modalities. Neuroscientists also believe that humans learn in similar ways about sounds and the visual world. Our project shows that self-supervised learning can also work the same way for different modalities.
Given data2vec's modality-specific limitations, a neural network that might truly be One Network To Rule Them All remains the technology of the future.
Link:
Meta's 'data2vec' is a step toward One Neural Network to Rule Them All - ZDNet
- Working at DeepMind | Glassdoor [Last Updated On: September 8th, 2019] [Originally Added On: September 8th, 2019]
- DeepMind Q&A Dataset - New York University [Last Updated On: October 6th, 2019] [Originally Added On: October 6th, 2019]
- Google absorbs DeepMind healthcare unit 10 months after ... [Last Updated On: October 7th, 2019] [Originally Added On: October 7th, 2019]
- deep mind Mathematics, Machine Learning & Computer Science [Last Updated On: November 1st, 2019] [Originally Added On: November 1st, 2019]
- Health strategies of Google, Amazon, Apple, and Microsoft - Business Insider [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- To Understand The Future of AI, Study Its Past - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- Tremor patients can be relieved of the shakes for THREE YEARS after having ultrasound waves - Herald Publicist [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The San Francisco Gay Mens Chorus Toured the Deep South - SF Weekly [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The Universe Speaks in Numbers: The deep relationship between math and physics - The Huntington News [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- MINI John Cooper Works GP is a two-seater hot hatch that shouts its 306 HP - SlashGear [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- How To Face An Anxiety Provoking Situation Like A Champion - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The Most Iconic Tech Innovations of the 2010s - PCMag [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Why tech companies need to hire philosophers - Quartz [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Living on Purpose: Being thankful is a state of mind - Chattanooga Times Free Press [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- EDITORIAL: West explosion victims out of sight and clearly out of mind - Waco Tribune-Herald [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Do you need to sit still to be mindful? - The Sydney Morning Herald [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Listen To Two Neck Deep B-Sides, Beautiful Madness And Worth It - Kerrang! [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Worlds Last Male Northern White Rhino Brought Back To Life Using AI - International Business Times [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Eat, drink, and be merryonly if you keep in mind these food safety tips - Williamsburg Yorktown Daily [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The alarming trip that changed Jeremy Clarksons mind on climate change - The Week UK [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Actionable Insights on Artificial Intelligence in Law Market with Future Growth Prospects by 2026 | AIBrain, Amazon, Anki, CloudMinds, Deepmind,... [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Searching for the Ghost Orchids of the Everglades - Discover Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Parkinsons tremors could be treated with SOUNDWAVES, claim scientists - Herald Publicist [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Golden State Warriors still have prolonged success in mind - Blue Man Hoop [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- 3 Gratitude Habits You Can Adopt Over The Thanksgiving Holiday For Deeper Connection And Joy - Forbes [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The minds that built AI and the writer who adored them. - Mash Viral [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Parkinson's Patients are Mysteriously Losing the Ability to Swim After Treatment - Discover Magazine [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Hannah Fry, the woman making maths cool | Times2 - The Times [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Meditate with Urmila: Find balance of body, mind and breath - Gulf News [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- We have some important food safety tips to keep in mind while cooking this Thanksgiving - WQOW TV News 18 [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Being thankful is a state of mind | Opinion - Athens Daily Review [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- Can Synthetic Biology Inspire The Next Wave of AI? - SynBioBeta [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- LIVING ON PURPOSE: Being thankful is a state of mind - Times Tribune of Corbin [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- AI Hardware Summit Europe launches in Munich, Germany on 10-11 March 2020, the ecosystem event for AI hardware acceleration in Europe - Yahoo Finance [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
- Of course Facebook and Google want to solve social problems. Theyre hungry for our data - The Guardian [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
- Larry, Sergey, and the Mixed Legacy of Google-Turned-Alphabet - WIRED [Last Updated On: December 6th, 2019] [Originally Added On: December 6th, 2019]
- AI Index 2019 assesses global AI research, investment, and impact - VentureBeat [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- For the Holidays, the Gift of Self-Care - The New York Times [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Stopping a Mars mission from messing with the mind - Axios [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Feldman: Impeachment articles are 'high crimes' Founders had in mind | TheHill - The Hill [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Opinion | Frankenstein monsters will not be taking our jobs anytime soon - Livemint [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- DeepMind co-founder moves to Google as the AI lab positions itself for the future - The Verge [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Google Isn't Looking To Revolutionize Health Care, It Just Wants To Improve On The Status Quo - Newsweek [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
- Artificial Intelligence Job Demand Could Live Up to Hype - Dice Insights [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
- What Are Normalising Flows And Why Should We Care - Analytics India Magazine [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- Terence Crawford has next foe in mind after impressive knockout win - New York Post [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- DeepMind proposes novel way to train safe reinforcement learning AI - VentureBeat [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- Winning the War Against Thinking - So you've emptied your brain. Now what? - Chabad.org [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- 'Echo Chamber' as Author of the 'Hive Mind' - Ricochet.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- Lindsey Graham: 'I Have Made Up My Mind' to Exonerate Trump and 'Don't Need Any Witnesses' WATCH - Towleroad [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- Blockchain in Healthcare Market to 2027 By Top Leading Players: iSolve LLC, Healthcoin, Deepmind Health, IBM Corporation, Microsoft Corporation,... [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- In sight but out of mind - The Hindu [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Case for Limitlessness Has Its Limits: Review of Limitless Mind by Joe Boaler - Education Next - EducationNext [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Top 10 Diners In Deep East Texas, According To Yelp - ksfa860.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- 3 breathing exercises to reduce stress, anxiety and a racing mind - Irish Examiner [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- DeepMind exec Andrew Eland leaves to launch startup - Sifted [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Top 10 Diners In Deep East Texas, According To Yelp - kicks105.com [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
- Mind the Performance Gap New Future Purchasing Category Management Report Out Now - Spend Matters [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
- Madison singles and deep cuts that stood out in 2019 - tonemadison.com [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Hilde Lee: Latkes bring an ancient miracle to mind on first night of Hanukkah - The Daily Progress [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Political Cornflakes: Trump responds to impeachment with complaints about the 'deep state' and toilet flushing - Salt Lake Tribune [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Google CEO Sundar Pichai Is the Most Expensive Tech CEO to Keep Around - Observer [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Christmas Lectures presenter Dr Hannah Fry on pigeons, AI and the awesome power of maths - inews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- The ultimate guitar tuning guide: expand your mind with these advanced tuning techniques - Guitar World [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Inside The Political Mind Of Jerry Brown - Radio Ink [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Elon Musk Fact-Checked His Own Wikipedia Page and Requested Edits Including the Fact He Does 'Zero Investing' - Entrepreneur [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- The 9 Best Blobs of 2019 - Livescience.com [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- AI from Google is helping identify animals deep in the rainforest - Euronews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Want to dive into the lucrative world of deep learning? Take this $29 class. - Mashable [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Re: Your Account Is Overdrawn - Thrive Global [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- Review: In the Vale is full of characters who linger long in the mind - Nation.Cymru [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- 10 Gifts That Cater to Your Loved One's Basic Senses - Wide Open Country [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- The Most Mind-Boggling Scientific Discoveries Of 2019 Include The First Image Of A Black Hole, A Giant Squid Sighting, And An Exoplanet With Water... [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- DeepMind's new AI can spot breast cancer just as well as your doctor - Wired.co.uk [Last Updated On: January 1st, 2020] [Originally Added On: January 1st, 2020]
- Why the algorithms assisting medics is good for health services (Includes interview) - Digital Journal [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- 2020: The Rise of AI in the Enterprise - IT World Canada [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- An instant 2nd opinion: Google's DeepMind AI bests doctors at breast cancer screening - FierceBiotech [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- Google's DeepMind AI outperforms doctors in identifying breast cancer from X-ray images - Business Insider UK [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- New AI toolkit from the World Economic Forum is promising because it's free - The National [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]
- AKA Wants to Help People Break Bad Habits and Create New Positive Ones - Hospitality Net [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]