Achieving alignment: How U of T researchers are working to keep AI on track – University of Toronto

In the year since OpenAI released ChatGPT, what once seemed like an esoteric question among researchers has pushed its way to the forefront of public discourse: As artificial intelligence becomes more capable, how do we ensure AI systems act in the best interests of humans and crucially not turn against us?

He recently spoke withU of T Newsabout the alignment problem and what is being done to try and solve it.

What, exactly, is meant by AI alignment?

In the research sense, it means trying to make sure that AI does what we intended it to do so it follows the objectives that we try to give it. But there are lots of problems that can arise, some of which were already seeing in todays models.

One is called reward misspecification. Its tricky to specify what reward function, or objective, you want in the form of a number that an AI model can understand. For example, if youre a company, you might try to maximize profits thats a relatively simple objective. But in pursuing it, there can be unintended consequences in the real world. The model might make or recommend decisions that are harmful to employees or the environment. This example of rewards being underspecified can occur in even more simple settings. If we ask a robot to bring us coffee, we are also implicitly asking it to do so without breaking anything in the kitchen.

If we built the AI models, how is it they learn to do things we didnt foresee?

When we talk about emergent behaviours abilities that are present in larger models but not in smaller ones its useful to think about large language models (LLMs) such as ChatGPT. If given an incomplete sentence, ChatGPTs objective is to predict what the next word is going to be. But if youre giving it a bunch of different training data from the works of Shakespeare to mathematical textbooks the model is going to gain some level of understanding in order to get better at predicting what word comes next.

We dont specify hard-coded rules for what these models are supposed to learn, so we dont have that much control over what the model generates. One example of this is hallucinations, where models such as ChatGPT create plausible but false claims.

What is artificial general intelligence (AGI) and what are some of the existential concerns about it?

There are many definitions, but in a general sense, AGI refers to the potential that we develop an AI system that performs most tasks that require intelligence better than or at the same level as humans.

People who believe this might happen are concerned about whether these models are going to be aligned with human values. In other words, if theyre more intelligent than the average human, its not clear that theyll actually help us.

Some sci-fi ideas about AIs taking over the world or hurting a lot of humans are getting a lot of media attention. One reason people think this might happen is an AI can often act better on its objectives if it has more resources. Hypothetically, an AI system might decide that manipulating humans, or hurting them in some way, might make it easier to acquire resources. This scenario is not going to happen today, but the potential risk is why luminaries such as Geoffrey Hinton emphasize the importance of studying and better understanding the models we are training.

How are U of T researchers working to tackle the short- and long-term risks of AI?

There are five key areas of AI alignment research: specification, interpretability, monitoring, robustness and governance. The Schwartz Reisman Institute is at the forefront of bringing together people from different disciplines to try to steer this technology in a positive direction.

In the case of specification, a common approach to fix the problem of reward misspecification is a technique that allows models to learn from human feedback. This is already being put into practice in training LLMs like ChatGPT. Going forward, some researchers are looking for ways to encode a set of human principles for future advanced models to follow. An important question that we can all think about is alignment to whom? What sort of guidelines do we want these models to follow?

The rest is here:

Achieving alignment: How U of T researchers are working to keep AI on track - University of Toronto

Related Posts

Comments are closed.