Data bias is giving way to training bias

Andrew Marble
Feb 24, 2024

Google was in the news this week for the way its Gemini AI application generated images with racial bias – I’m not going to dwell on that, there are plenty of articles1. What’s interesting though is that the traditional concern (for AI traditional means from two years ago) was about big AI models perpetuating bias in the training data. It’s been repeatedly demonstrated (and obvious) that datasets of images or text scraped from the internet have many different biases and stereotypes. And since stereotyping is literally what AI does, a raw model trained on this data is going to do things like assume someone of a certain profession is male or female, or be more likely to depict a CEO as an old white man. Someone using a pretrained model for an application that exposes users to this bias would run the risk of passing it on.

What’s different now and what Gemini could be the poster child for is that the dominant (not only) bias mode in AI is moving towards the biases of the creators rather than the biases of internet data (for lack of a better term). Aspects (like race and gender of people) of the images Gemini generates are skewed towards what its creators trained it to generate, not the distribution of billions of internet images. The same is true for “aligned” language models. They are doing what they’re told (through fine-tuning) and not what they might have done based on how they were pre-trained.

Overall that’s a great shift. If we can make a model do something stupid but repeatable, we’re on track to get it to do something useful instead of just acting as an autocomplete. But part of the shift is that the responsibility for behavior shifts to the team building the model, and the risk profile is different.

One risk is the political or brand impact of training decisions. A model that refuses to answer benign questions ends up subject to ridicule. That will become less relevant for end applications that are focused on more narrow use cases. A general purpose model like Llama-chat giving a lecture on misinformation when asked to generate a poem about the moon landing being fake is well over the top. A customer service chatbot refusing to talk about anything but customer service is normal and expected2.

The more concerning risk to me is side-effects as models are fine-tuned to remove unwanted biases and promote particular outputs. Superficial fixes may cause unexpected behavior that ends up as bad or worse than the issue it was trying to fix. This appears to be what happened at Google. These models are still computer programs and risk responding to the letter of their training instead of the spirit. A very careful approach in training, evaluation, and red-teaming is going to be needed, that possibly includes making some tough decisions about the kind of output that’s appropriate, and looking beyond first-order notions of bias. Meta’s approach of comparing “helpfulness” vs “harmfulness” and looking at how the two trade-off is an example.3

I think it’s especially important to be clear about what decisions we’re expecting AI models to make, and how we read in to them. If I ask an image generation model to draw me a picture of a person in 1930, I’m giving it an underspecified problem and an expectation of stereotyping is implicit in the question4. There are several obvious approaches a person might use to handle this: Draw something reflective of a historically significant event at that time and explain why you chose this depiction (this could even be an opportunity for some education about a historical event). Draw a historical figure. Or ask for more information – what country, what setting, etc. At least some part of the problem is the expectation that it’s possible to generate output that’s universally appropriate without sufficient input, and that the we allow the models to be used in this way.

We’ve moved on from accepting the raw bias of training data to having to make our own choices about the AI systems we build. But even if humans are now firmly in control, this applies mostly to the recent instruction tuned models. There are still plenty of ways for models to be biased in unwanted or inappropriate ways. And it applies to people who know what they’re doing – which few if any do until hindsight kicks in. Google is one of the most prominent AI research groups in the world and they still can make mistakes.