Arguably one of the premiere events that has brought AI to popular attention in recent years was the invention of the Transformer by Ashish Vaswani and colleagues at Google in 2017. The Transformer led to lots of language programs such as Google's BERT and OpenAI's GPT-3 that have been able to produce surprisingly human-seeming sentences, giving the impression machines can write like a person.
Now, scientists at DeepMind in the U.K., which is owned by Google, want to take the benefits of the Transformer beyond text, to let it revolutionize other material including images, sounds and video, and spatial data of the kind a car records with LiDAR.
The Perceiver, unveiled this week by DeepMind in a paper posted on arXiv, adapts the Transformer with some tweaks to let it consume all those types of input, and to perform on the various tasks, such as image recognition, for which separate kinds of neural networks are usually developed.
The DeepMind work appears to be a waystation on the way to an envisioned super-model of deep learning, a neural network that could perform a plethora of tasks, and would learn faster and with less data, what Google's head of AI, Jeff Dean, has described as a "grand challenge" for the discipline.
One model to rule them all? DeepMind's Perceiver has decent performance on multiple tests of proficiency even though the program is not built for any one kind of input, unlike most neural networks that specialize. Perceiver combines a now-standard Transformer neural network with a trick called "inducing points," as a summary of the data, to reduce how much raw data from pixels or audio or video needs to be computed.
The paper, Perceiver: General Perception with Iterative Attention, by authors Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, and Joao Carreira, is to be presented this month at the International Conference on Machine Learning, which kicks of July 18th and which is being held as a virtual event this year.
Perceiver continues the trend to generality that has been underway for several years now, meaning, having less and less built into an AI program that is specific to a task. Before Vaswani et al.'s Transformer, most natural language programs were constructed with a sense of the particular language function, such as question answering or language translation. Transformer erased those distinctions, producing one program that could handle a multitude of tasks by creating a sufficiently adept representation of language.
Also: AI in sixty seconds
Likewise, Perceiver challenges the idea that different kinds of data, such as sound or image, need different neural network architectures.
However, something more profound is indicated by Perceiver. Last year, at the International Solid State Circuits Conference, an annual technical symposium held in San Francisco, Google's Dean described in his keynote address one future direction of deep learning as the "goal of being able to train a model that can perform thousands or millions of tasks in a single model."
"Building a single machine learning system that can handle millions of tasks is a true grand challenge in the field of artificial intelligence and computer systems engineering," said Dean.
In a conversation with ZDNet at the conference, Dean explained how a kind of super-model would build up from work over the years on neural networks that combine "modalities," different sorts of input such as text and image, and combinations of models known as "mixture of experts":
Mixture of experts-style approaches, I think, are going to be important, and multi-task, and multi-modal approaches, where you sort-of learn representations that are useful for many different things, and sort-of jointly learn good representations that help you be able to solve new tasks more quickly, and with less data, fewer examples of your task, because you are already leveraging all the things you already know about the world.
Perceiver is in the spirit of that multi-tasking approach. It takes in three kinds of inputs: images, videos, and what are called point clouds, a collection of dots that describes what a LiDAR sensor on top of a car "sees" of the road.
Once the system is trained, it can perform with some meaningful results on benchmark tests, including the classic ImageNet test of image recognition; Audio Set, a test developed at Google that requires a neural net to pick out kinds of audio clips from a video; and ModelNet, a test developed in 2015 at Princeton whereby a neural net must use 2,000 points in space to correctly identify an object.
Also: Google experiments with AI to design its in-house computer chips
Perceiver manages to achieve the task using two tricks, or, maybe, one trick and one cheat.
The first trick is to reduce the amount of data that the Transformer needs to operate on directly. While large Transformer neural networks have been fed gigabytes and gigabytes of text data, the amount of data in images or video or audio files, or point clouds, is potentially vastly larger. Just think of every pixel in a 244 by 244 pixel image from ImageNet. In the case of a sound file, "1 second of audio at standard sampling rates corresponds to around 50,000 raw audio samples," write Jaegle and team.
So, Jaegle and team went in search of a way to reduce the so-called "dimensionality" of those data types. They borrow from the work of Juho Lee and colleagues at Oxford University, who introduced what they called the Set Transformer. The Set Transformer reduced the computing needed for a Transformer by creating a second version of each data sample, a kind of summary, which they called inducing points. Think of it as data compression.
Jaegle and team adapt this as what they call a "learned latent array," whereby the sample data is boiled down to a summary that is far less data-hungry. The Perceiver acts in an "asymmetric" fashion: Some of its abilities are spent examining the actual data, but some only look at the summary, the compressed version. This reduces the overall time spent.
The second trick, really kind of a cheat, is to give the model some clues about the structure of the data. The problem with a Transformer is that it knows nothing about the spatial elements of an image, or the time value of an audio clip. A Transformer is always what's called permutation invariant, meaning, insensitive to these details of the structure of the particular kind of data.
That is a potential problem baked into the generality of the Perceiver. Neural networks built for images, for example, have some sense of the structure of a 2-D image. A classic convolutional neural network processes pixels as groups in a section of the image, known as locality. Transformers, and derivatives such as Perceiver, aren't built that way.
The authors, surprisingly, cite the 18th-century German philosopher Immanuel Kant, who said that such structural understanding is crucial.
"Spatial relationships are essential for sensory reasoning," Jaegle and team write, citing Kant, "and this limitation is clearly unsatisfying."
So, the authors, in order to give some sense of the structure of images or sound back to the neural network, borrow a technique employed by Google's Matthew Tancik and colleagues last year, what are called Fourier features. Fourier Features explicitly tag each piece of input with some meaningful information about structure.
For example, the coordinates of a pixel in an image can be "mapped" to an array, so that locality of data is preserved. The Perceiver then takes into account that tag, that structural information, during its training phase.
As Jaegle and team describe it,
We can compensate for the lack of explicit structures in our architecture by associating position and modality-specific features with every input element (e.g. every pixel, or each audio sample) these can be learned or constructed using high-fidelity Fourier features. This is a way of tagging input units with a high-fidelity representation of position and modal- ity, similar to the labeled lined strategy used to construct topographic and cross-sensory maps in biological neural networks by associating the activity of a specific unit with a semantic or spatial location.
The results of the benchmark tests are intriguing. Perceiver is better than the industry standard ResNet-50 neural network on ImageNet, in terms of accuracy, and better than a Transformer that has been adapted to images, the Vision Transformer introduced this year by Alexey Dosovitskiy and colleagues at Google.
On the Audio Set test, the Perceiver blows away most but not all state-of-the-art models for accuracy. And on the ModelNet test of point clouds, the Perceiver also gets quite high marks.
Jaegle and team claim for their program a kind of uber-proficiency that wins by being best all around: "When comparing these models across all different modalities and combinations considered in the paper, the Perceiver does best overall."
There are a number of outstanding issues with Perceiver that make it perhaps not actually the ideal million-task super-model that Dean has described. One is that the program doesn't always do as well as programs made for a particular modality. It still fails against some specific models. For example, on Audio Set, the Perceiver fell short of a program introduced last year by Haytham M. Fayek and Anurag Kumar of Facebook that "fuses" information about audio and video.
On the point cloud, it falls far short of a 2017 neural network built just for point clouds, PointNet++, by Charles Qi and colleagues at Stanford.
And on ImageNet, clearly the Perceiver was helped by the cheat of having Fourier features that tag the structure of images. When the authors tried a version of the Perceiver with the Fourier features removed, called "learned position," the Perceiver didn't do nearly as well as ResNet-50 and ViT.
A second issue is that nothing about Perceiver appears to bring the benefits of more-efficient computing and less data that Dean alluded to. In fact, the authors note that the data they use isn't always big enough. They observe that sometimes, the Perceiver may not be successfully generalizing, quipping that "With great flexibility comes great overfitting." Overfitting is when a neural network is so much bigger than its training data set, that it is able to simply memorize the data rather than achieve important representations that generalize the data.
Hence, "In future work, we would like to pre-train our image classification model on very large scale data," they write.
That leads to a larger question about just what is going on in what the Perceiver has "learned." If Google's Jeff Dean is right, then something like Perceiver should be learning representations that are mutually reinforcing. Clearly, the fact of a general model being able to perform well in spite of its generality suggests that something of the kind is going on. But what?
All we know is that Perceiver can learn different kinds of representations. The authors show a number of what are called attention maps, visual studies that purport to represent what the Perceiver is emphasizing in each clump of training data. Those attention maps suggest the Perceiver is adapting where it places the focus of computing.
As Jaegle and team write, "it can adapt its attention to the input content."
An attention map purports to show what the Perceiver is emphasizing in its video inputs, showing it is learning new represenations specific to the "modality" of the data.
A third weakness is specifically highlighted by the authors, and that is the question of the Fourier features, the cheat. The cheat seems to help in some cases, and it's not clear how or even if that crutch can be dispensed with.
As the authors put it, "End-to-end modality-agnostic learning remains an interesting research direction."
On a philosophical note, it's interesting to wonder if Perceiver will lead to new kinds of abilities that are specifically multi-modal. Perceiver doesn't show any apparent synergy between the different modalities, so that image and sound and point clouds still exist apart from one another. That's probably mostly to do with the tasks. All the tasks used in the evaluation have been designed for single neural networks.
Clearly, Google needs a new benchmark to test multi-modality.
For all those limitations, it's important to realize that Perceiver may be merely a stage on the way to what Dean described. As Dean told ZDNet, an eventual super-model is a kind of evolutionary process:
The nice thing about that vision of being able to have a model that does a million tasks is there are good intermediate points along the way. You can say, well, we're not going to bite off multi-modal, instead let's try to just do a hundred vision tasks in the same model first. And then a different instance of it where we try to do a hundred textual tasks, and not try to mix them together. And then say, that seems to be working well, let's try to combine the hundred vision and hundred textual tasks, and, hopefully, get them to improve each other, and start to experiment with the multi-modal aspects.
Also: Ethics of AI: Benefits and risks of artificial intelligence
Read more here:
Googles Supermodel: DeepMind Perceiver is a step on the road to an AI machine that could process anything... - ZDNet
- Working at DeepMind | Glassdoor [Last Updated On: September 8th, 2019] [Originally Added On: September 8th, 2019]
- DeepMind Q&A Dataset - New York University [Last Updated On: October 6th, 2019] [Originally Added On: October 6th, 2019]
- Google absorbs DeepMind healthcare unit 10 months after ... [Last Updated On: October 7th, 2019] [Originally Added On: October 7th, 2019]
- deep mind Mathematics, Machine Learning & Computer Science [Last Updated On: November 1st, 2019] [Originally Added On: November 1st, 2019]
- Health strategies of Google, Amazon, Apple, and Microsoft - Business Insider [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- To Understand The Future of AI, Study Its Past - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- Tremor patients can be relieved of the shakes for THREE YEARS after having ultrasound waves - Herald Publicist [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The San Francisco Gay Mens Chorus Toured the Deep South - SF Weekly [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The Universe Speaks in Numbers: The deep relationship between math and physics - The Huntington News [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- MINI John Cooper Works GP is a two-seater hot hatch that shouts its 306 HP - SlashGear [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- How To Face An Anxiety Provoking Situation Like A Champion - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The Most Iconic Tech Innovations of the 2010s - PCMag [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Why tech companies need to hire philosophers - Quartz [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Living on Purpose: Being thankful is a state of mind - Chattanooga Times Free Press [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- EDITORIAL: West explosion victims out of sight and clearly out of mind - Waco Tribune-Herald [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Do you need to sit still to be mindful? - The Sydney Morning Herald [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Listen To Two Neck Deep B-Sides, Beautiful Madness And Worth It - Kerrang! [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Worlds Last Male Northern White Rhino Brought Back To Life Using AI - International Business Times [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Eat, drink, and be merryonly if you keep in mind these food safety tips - Williamsburg Yorktown Daily [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The alarming trip that changed Jeremy Clarksons mind on climate change - The Week UK [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Actionable Insights on Artificial Intelligence in Law Market with Future Growth Prospects by 2026 | AIBrain, Amazon, Anki, CloudMinds, Deepmind,... [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Searching for the Ghost Orchids of the Everglades - Discover Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Parkinsons tremors could be treated with SOUNDWAVES, claim scientists - Herald Publicist [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Golden State Warriors still have prolonged success in mind - Blue Man Hoop [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- 3 Gratitude Habits You Can Adopt Over The Thanksgiving Holiday For Deeper Connection And Joy - Forbes [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The minds that built AI and the writer who adored them. - Mash Viral [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Parkinson's Patients are Mysteriously Losing the Ability to Swim After Treatment - Discover Magazine [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Hannah Fry, the woman making maths cool | Times2 - The Times [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Meditate with Urmila: Find balance of body, mind and breath - Gulf News [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- We have some important food safety tips to keep in mind while cooking this Thanksgiving - WQOW TV News 18 [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Being thankful is a state of mind | Opinion - Athens Daily Review [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- Can Synthetic Biology Inspire The Next Wave of AI? - SynBioBeta [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- LIVING ON PURPOSE: Being thankful is a state of mind - Times Tribune of Corbin [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- AI Hardware Summit Europe launches in Munich, Germany on 10-11 March 2020, the ecosystem event for AI hardware acceleration in Europe - Yahoo Finance [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
- Of course Facebook and Google want to solve social problems. Theyre hungry for our data - The Guardian [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
- Larry, Sergey, and the Mixed Legacy of Google-Turned-Alphabet - WIRED [Last Updated On: December 6th, 2019] [Originally Added On: December 6th, 2019]
- AI Index 2019 assesses global AI research, investment, and impact - VentureBeat [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- For the Holidays, the Gift of Self-Care - The New York Times [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Stopping a Mars mission from messing with the mind - Axios [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Feldman: Impeachment articles are 'high crimes' Founders had in mind | TheHill - The Hill [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Opinion | Frankenstein monsters will not be taking our jobs anytime soon - Livemint [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- DeepMind co-founder moves to Google as the AI lab positions itself for the future - The Verge [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Google Isn't Looking To Revolutionize Health Care, It Just Wants To Improve On The Status Quo - Newsweek [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
- Artificial Intelligence Job Demand Could Live Up to Hype - Dice Insights [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
- What Are Normalising Flows And Why Should We Care - Analytics India Magazine [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- Terence Crawford has next foe in mind after impressive knockout win - New York Post [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- DeepMind proposes novel way to train safe reinforcement learning AI - VentureBeat [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- Winning the War Against Thinking - So you've emptied your brain. Now what? - Chabad.org [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- 'Echo Chamber' as Author of the 'Hive Mind' - Ricochet.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- Lindsey Graham: 'I Have Made Up My Mind' to Exonerate Trump and 'Don't Need Any Witnesses' WATCH - Towleroad [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- Blockchain in Healthcare Market to 2027 By Top Leading Players: iSolve LLC, Healthcoin, Deepmind Health, IBM Corporation, Microsoft Corporation,... [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- In sight but out of mind - The Hindu [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Case for Limitlessness Has Its Limits: Review of Limitless Mind by Joe Boaler - Education Next - EducationNext [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Top 10 Diners In Deep East Texas, According To Yelp - ksfa860.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- 3 breathing exercises to reduce stress, anxiety and a racing mind - Irish Examiner [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- DeepMind exec Andrew Eland leaves to launch startup - Sifted [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Top 10 Diners In Deep East Texas, According To Yelp - kicks105.com [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
- Mind the Performance Gap New Future Purchasing Category Management Report Out Now - Spend Matters [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
- Madison singles and deep cuts that stood out in 2019 - tonemadison.com [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Hilde Lee: Latkes bring an ancient miracle to mind on first night of Hanukkah - The Daily Progress [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Political Cornflakes: Trump responds to impeachment with complaints about the 'deep state' and toilet flushing - Salt Lake Tribune [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Google CEO Sundar Pichai Is the Most Expensive Tech CEO to Keep Around - Observer [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Christmas Lectures presenter Dr Hannah Fry on pigeons, AI and the awesome power of maths - inews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- The ultimate guitar tuning guide: expand your mind with these advanced tuning techniques - Guitar World [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Inside The Political Mind Of Jerry Brown - Radio Ink [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Elon Musk Fact-Checked His Own Wikipedia Page and Requested Edits Including the Fact He Does 'Zero Investing' - Entrepreneur [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- The 9 Best Blobs of 2019 - Livescience.com [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- AI from Google is helping identify animals deep in the rainforest - Euronews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Want to dive into the lucrative world of deep learning? Take this $29 class. - Mashable [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Re: Your Account Is Overdrawn - Thrive Global [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- Review: In the Vale is full of characters who linger long in the mind - Nation.Cymru [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- 10 Gifts That Cater to Your Loved One's Basic Senses - Wide Open Country [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- The Most Mind-Boggling Scientific Discoveries Of 2019 Include The First Image Of A Black Hole, A Giant Squid Sighting, And An Exoplanet With Water... [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- DeepMind's new AI can spot breast cancer just as well as your doctor - Wired.co.uk [Last Updated On: January 1st, 2020] [Originally Added On: January 1st, 2020]
- Why the algorithms assisting medics is good for health services (Includes interview) - Digital Journal [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- 2020: The Rise of AI in the Enterprise - IT World Canada [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- An instant 2nd opinion: Google's DeepMind AI bests doctors at breast cancer screening - FierceBiotech [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- Google's DeepMind AI outperforms doctors in identifying breast cancer from X-ray images - Business Insider UK [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- New AI toolkit from the World Economic Forum is promising because it's free - The National [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]
- AKA Wants to Help People Break Bad Habits and Create New Positive Ones - Hospitality Net [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]