The 3 Most Important AI Innovations of 2023 | TIME – TIME

In many ways, 2023 was the year that people began to understand what AI really isand what it can do. It was the year that chatbots first went truly viral, and the year that governments began taking AI risk seriously. Those developments werent so much new innovations, as they were technologies and ideas taking center-stage after a long gestation period.

But there were plenty of new innovations, too. Here are three of the biggest from the past year:

Multimodality might sound like jargon, but its worth understanding what it means: its the ability of an AI system to process lots of different types of datanot just text, but also images, video, audio and more.

This year was the first time that the public gained access to powerful multimodal AI models. OpenAIs GPT-4 was the first of these, allowing users to upload images as well as text inputs. GPT-4 can see the contents of an image, which opens up all kinds of possibilities, for example asking it what to make for dinner based on a photograph of the contents of your fridge. In September, OpenAI rolled out the ability for users to interact with ChatGPT by voice as well as text.

Google DeepMinds latest model Gemini, announced in December, can also work with images and audio. A launch video shared by Google showed the model identifying a duck based on a line drawing on a post-it note. In the same video, after being shown an image of pink and blue yarn and asked what it could be used to create, Gemini generated an image of a pink and blue octopus plushie. (The marketing video appeared to show Gemini observing moving images and responding to audio commands in real time, but in a post on its website, Google said the video had been edited for brevityand that the model was being prompted using still images, not video, and text prompts, not audio, although the model does have audio capabilities.)

I think the next landmark that people will think back to, and remember, is [AI systems] going much more fully multimodal, Google DeepMind co-founder Shane Legg said on a podcast in October. Its early days in this transition, and when you start really digesting a lot of video and other things like that, these systems will start having a much more grounded understanding of the world. In an interview with TIME in November, OpenAI CEO Sam Altman said multimodality in the companys new models would be one of the key things to watch out for next year.

Read More: Sam Altman is TIME's 2023 CEO of the Year

The promise of multimodality isnt just that models become more useful. Its also that the models can be trained on abundant new sets of dataimages, video, audiothat contain more information about the world than text alone. The belief inside many top AI companies is that this new training data will translate into these models becoming more capable or powerful. It is a step on the path, many AI scientists hope, toward artificial general intelligence, the kind of system that can match human intellect, making new scientific discoveries and performing economically valuable labor.

One of the biggest unanswered questions in AI is how to align it to human values. If these systems become smarter and more powerful than humans, they could cause untold harm to our speciessome even say total extinctionunless, somehow, they are constrained by rules that put human flourishing at their center.

The process that OpenAI used to align ChatGPT (to avoid the racist and sexist behaviors of earlier models) worked wellbut it required a large amount of human labor, through a technique known as reinforcement learning with human feedback, or RLHF. Human raters would assess the AIs responses and give it the computational equivalent of a doggy treat if the response was helpful, harmless, and compliant with OpenAIs list of content rules. By rewarding the AI when it was good and punishing it when it was bad, OpenAI developed an effective and relatively harmless chatbot.

But since the RLHF process relies heavily on human labor, theres a big question mark over how scalable it is. Its expensive. Its subject to the biases or mistakes made by individual raters. It becomes more failure-prone the more complicated the list of rules is. And it looks unlikely to work for AI systems that are so powerful they begin doing things humans cant comprehend.

Constitutional AIfirst described by researchers at top AI lab Anthropic in a December 2022 papertries to address these problems, harnessing the fact that AI systems are now capable enough to understand natural language. The idea is quite simple. First, you write a constitution that lays out the values youd like your AI to follow. Then you train the AI to score responses based on how aligned they are to the constitution, and then incentivize the model to output responses that score more highly. Instead of reinforcement learning from human feedback, its reinforcement learning from AI feedback. These methods make it possible to control AI behavior more precisely and with far fewer human labels, the Anthropic researchers wrote. Constitutional AI was used to align Claude, Anthropics 2023 answer to ChatGPT. (Investors in Anthropic include Salesforce, where TIME co-chair and owner Marc Benioff is CEO.)

With constitutional AI, youre explicitly writing down the normative premises with which your model should approach the world, Jack Clark, Anthropics head of policy, told TIME in August. Then the model is training on that. There are still problems, like the difficulty of making sure the AI has understood both the letter and the spirit of the rules, (youre stacking your chips on a big, opaque AI model, Clark says,) but the technique is a promising addition to a field where new alignment strategies are few and far between.

Of course, Constitutional AI doesnt answer the question of to whose values AI should be aligned. But Anthropic is experimenting with democratizing that question. In October, the lab ran an experiment that asked a representative group of 1,000 Americans to help pick rules for a chatbot, and found that while there was some polarization, it was still possible to draft a workable constitution based on statements that the group came to a consensus on. Experiments like this could open the door to a future where ordinary people have much more of a say over how AI is governed, compared to today, when a small number of Silicon Valley executives write the rules.

One noticeable outcome of the billions of dollars pouring into AI this year has been the rapid rise of text-to-video tools. Last year, text-to-image tools had barely emerged from their infancy; now, there are several companies offering the ability to turn sentences into moving images with increasingly fine-grained levels of accuracy.

One of those companies is Runway, a Brooklyn-based AI video startup that wants to make filmmaking accessible to anybody. Its latest model, Gen-2, allows users to not just generate a video from text, but also change the style of an existing video based on a text prompt (for example, turning a shot of cereal boxes on a tabletop into a nighttime cityscape,) in a process it calls video-to-video.

Our mission is to build tools for human creativity, Runways CEO Cristobal Valenzuela told TIME in May. He acknowledges that this will have an impact on jobs in the creative industries, where AI tools are quickly making some forms of technical expertise obsolete, but he believes the world on the other side is worth the upheaval. Our vision is a world where human creativity gets amplified and enhanced, and it's less about the craft, and the budget, and the technical specifications and knowledge that you have, and more about your ideas. (Investors in Runway include Salesforce, where TIME co-chair and owner Marc Benioff is CEO.)

Another startup in the text-to-video space is Pika AI, which is reportedly being used to create millions of new videos each week. Run by two Stanford dropouts, the company launched in April but has already secured funding that values it at between $200 and $300 million, according to Forbes. Pitched not at professional filmmakers but at the general user, free tools like Pika are trying to transform the user-generated content landscape. That could happen as soon as 2024but text-to-video tools are computationally expensive, so dont be surprised if they start charging for access once the venture capital runs out.

Cloud Hosting

The 3 Most Important AI Innovations of 2023 | TIME – TIME

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin