Seeing Our Reflection in LLMs. When LLMs give us outputs that reveal | by Stephanie Kirmer | Mar, 2024 – Towards Data Science

Photo by Vince Fleming on Unsplash

By now, Im sure most of you have heard the news about Googles new LLM*, Gemini, generating pictures of racially diverse people in Nazi uniforms. This little news blip reminded me of something that Ive been meaning to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user.

This sort of thing is not that uncommon in machine learning, in my experience, especially when you have flawed or limited training data. A good example of this that I remember from my own work was predicting when a package was going to be delivered to a business office. Mathematically, our model would be very good at estimating exactly when the package would get physically near the office, but sometimes, truck drivers arrive at destinations late at night and then rest in their truck or in a hotel until morning. Why? Because no ones in the office to receive/sign for the package outside of business hours.

Teaching a model about the idea of business hours can be very difficult, and the much easier solution was just to say, If the model says the delivery will arrive outside business hours, add enough time to the prediction that it changes to the next hour the office is listed as open. Simple! It solves the problem and it reflects the actual circumstances on the ground. Were just giving the model a little boost to help its results work better.

However, this does cause some issues. For one thing, now we have two different model predictions to manage. We cant just throw away the original model prediction, because thats what we use for model performance monitoring and metrics. You cant assess a model on predictions after humans got their paws in there, thats not mathematically sound. But to get a clear sense of the real world model impact, you do want to look at the post-rule prediction, because thats what the customer actually experienced/saw in your application. In ML, were used to a very simple framing, where every time you run a model you get one result or set of results, and thats that, but when you start tweaking the results before you let them go, then you need to think at a different scale.

I kind of suspect that this is a form of whats going on with LLMs like Gemini. However, instead of a post-prediction rule, it appears that the smart money says Gemini and other models are applying secret prompt augmentations to try and change the results the LLMs produce.

In essence, without this nudging, the model will produce results that are reflective of the content it has been trained on. That is to say, the content produced by real people. Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether they are things were proud of or not. A model given all the media available in our contemporary society is going to get a whole lot of exposure to racism, sexism, and myriad other forms of discrimination and inequality, to say nothing of violence, war, and other horrors. While the model is learning what people look like, and how they sound, and what they say, and how they move, its learning the warts-and-all version.

Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether they are things were proud of or not.

This means that if you ask the underlying model to show you a doctor, its going to probably be a white guy in a lab coat. This isnt just random, its because in our modern society white men have disproportionate access to high status professions like being doctors, because they on average have access to more and better education, financial resources, mentorship, social privilege, and so on. The model is reflecting back at us an image that may make us uncomfortable because we dont like to think about that reality.

The obvious argument is, Well, we dont want the model to reinforce the biases our society already has, we want it to improve representation of underrepresented populations. I sympathize with this argument, quite a lot, and I care about representation in our media. However, theres a problem.

Its very unlikely that applying these tweaks is going to be a sustainable solution. Recall back to the story I started with about Gemini. Its like playing whac-a-mole, because the work never stops now weve got people of color being shown in Nazi uniforms, and this is understandably deeply offensive to lots of folks. So, maybe where we started by randomly applying as a black person or as an indigenous person to our prompts, we have to add something more to make it exclude cases where its inappropriate but how do you phrase that, in a way an LLM can understand? We probably have to go back to the beginning, and think about how the original fix works, and revisit the whole approach. In the best case, applying a tweak like this fixes one narrow issue with outputs, while potentially creating more.

Lets play out another very real example. What if we add to the prompt, Never use explicit or profane language in your replies, including [list of bad words here]. Maybe that works for a lot of cases, and the model will refuse to say bad words that a 13 year old boy is requesting to be funny. But sooner or later, this has unexpected additional side effects. What about if someones looking for the history of Sussex, England? Alternately, someones going to come up with a bad word you left out of the list, so thats going to be constant work to maintain. What about bad words in other languages? Who judges what goes on the list? I have a headache just thinking about it.

This is just two examples, and Im sure you can think of more such scenarios. Its like putting band aid patches on a leaky pipe, and every time you patch one spot another leak springs up.

So, what is it we actually want from LLMs? Do we want them to generate a highly realistic mirror image of what human beings are actually like and how our human society actually looks from the perspective of our media? Or do we want a sanitized version that cleans up the edges?

Honestly, I think we probably need something in the middle, and we have to continue to renegotiate the boundaries, even though its hard. We dont want LLMs to reflect the real horrors and sewers of violence, hate, and more that human society contains, that is a part of our world that should not be amplified even slightly. Zero content moderation is not the answer. Fortunately, this motivation aligns with the desires of large corporate entities running these models to be popular with the public and make lots of money.

we have to continue to renegotiate the boundaries, even though its hard. We dont want LLMs to reflect the real horrors and sewers of violence, hate, and more that human society contains, that is a part of our world that should not be amplified even slightly. Zero content moderation is not the answer.

However, I do want to continue to make a gentle case for the fact that we can also learn something from this dilemma in the world of LLMs. Instead of simply being offended and blaming the technology when a model generates a bunch of pictures of a white male doctor, we should pause to understand why thats what we received from the model. And then we should debate thoughtfully about whether the response from the model should be allowed, and make a decision that is founded in our values and principles, and try to carry it out to the best of our ability.

As Ive said before, an LLM isnt an alien from another universe, its us. Its trained on the things we wrote/said/filmed/recorded/did. If we want our model to show us doctors of various sexes, genders, races, etc, we need to make a society that enables all those different kinds of people to have access to that profession and the education it requires. If were worrying about how the model mirrors us, but not taking to heart the fact that its us that needs to be better, not just the model, then were missing the point.

If we want our model to show us doctors of various sexes, genders, races, etc, we need to make a society that enables all those different kinds of people to have access to that profession and the education it requires.

View original post here:

Seeing Our Reflection in LLMs. When LLMs give us outputs that reveal | by Stephanie Kirmer | Mar, 2024 - Towards Data Science

Related Posts

Comments are closed.