Does AI Know What an Apple Is? She Aims to Find Out. – Quanta Magazine

What does understanding or meaning mean, empirically? What, specifically, do you look for?

When I was starting my research program at Brown, we decided that meaning involves concepts in some way. I realize this is a theoretical commitment that not everyone makes, but it seems intuitive. If you use the word apple to mean apple, you need the concept of an apple. That has to be a thing, whether or not you use the word to refer to it. Thats what it means to have meaning: there needs to be the concept, something youre verbalizing.

I want to find concepts in the model. I want something that I can grab within the neural network, evidence that there is a thing that represents apple internally, that allows it to be consistently referred to by the same word. Because there does seem to be this internal structure thats not random and arbitrary. You can find these little nuggets of well-defined function that reliably do something.

Ive been focusing on characterizing this internal structure. What form does it have? It can be some subset of the weights within the neural network, or some kind of linear algebraic operation over those weights, some kind of geometric abstraction. But it has to play a causal role [in the models behavior]: Its connected to these inputs but not those, and these outputs and not those.

That feels like something you could start to call meaning. Its about figuring out how to find this structure and establish relationships, so that once we get it all in place, then we can apply it to questions like Does it know what apple means?

Yes, one result involves when a language model retrieves a piece of information. If you ask the model, What is the capital of France, it needs to say Paris, and What is the capital of Poland should return Warsaw. It very readily could just memorize all these answers, and they could be scattered all around [within the model] theres no real reason it needs to have a connection between those things.

Instead, we found a small place in the model where it basically boils that connection down into one little vector. If you add it to What is the capital of France, it will retrieve Paris; and that same vector, if you ask What is the capital of Poland, will retrieve Warsaw. Its like this systematic retrieve-capital-city vector.

Thats a really exciting finding because it seems like [the model is] boiling down these little concepts and then applying general algorithms over them. And even though were looking at these really [simple] questions, its about finding evidence of these raw ingredients that the model is using. In this case, it would be easier to get away with memorizing in many ways, thats what these networks are designed to do. Instead, it breaks [information] down into pieces and reasons about it. And we hope that as we come up with better experimental designs, we might find something similar for more complicated kinds of concepts.

Cloud Hosting

Does AI Know What an Apple Is? She Aims to Find Out. – Quanta Magazine

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin