Rubbish In, Rubbish Out – Is accuracy in Artificial Intelligence as simple as we think?
What do we want from Artificial Intelligence? Do we want the type of hyper intelligent, almost omniscient electronic being we have become used to (and frightened of) in popular culture? Or do we expect something that is emotionless, hyper neutral in the mould of HAL 9000 in ‘2001: A Space Odyssey,’ (although the two are not mutually exclusive)?
If super neutrality is what we want or expect, the next question is ‘how do we achieve this?’, especially given that humans are inherently biased, mistaken or misinformed.
Bias in images…an experiment
An article in The Lancet, ‘Reflections before the storm: the AI reproduction of biased imagery in global health visuals,’ gives an idea of how difficult this can be to achieve. In response to a study on the effect of stereotypes and tropes used in images to illustrate world health stories, Arsenii Alenichev, Patricia Kingori and Koen Peeters Grietens used generative AI to investigate how deeply embedded such tropes are within global health images.
To do this they attempted to switch these tropes around by using image-generating prompts to Via Midjourney Bot Version 5.1 (released in May, 2023), to create ‘photographs’ of black doctors treating suffering white children.
The task proved all but impossible. While the AI could generate images of white children suffering and black doctors, it was unable to merge the two, to produce an image of a black doctor treating suffering white children.
The sole image it produced to meet the set task involved a tribal doctor treating a white child wearing an extreme stereotype of traditional African dress.
AI as bias spreader
Elsewhere, it has been found that humans will adopt the biases they have found while working with biased AI systems, and replicate them when AI is not involved in the decision making process.
Lucía Vicente and Helena Matute, of Deusto University in Bilbao, Spain set up a series of three experiments where subjects undertook a medical themed classification task, with and without the help of AI. The result was that people working on a diagnostic task with the help of a biased AI made decisions influenced by that bias. More, continued with the bias even when the AI was not involved.
If the AI learns from biased data, it will reproduce that bias. If people work with a biased AI, they will inherit the biases and make the same mistakes as the AI has already made.
It underlines our preconceived ideas of AI and its reliability.
‘Sexist’ credit card and other stories
Apple’s experience with credit cards highlights this point.
In August 2019, Apple launched its credit card, actually issued and backed by Goldman Sachs, designed to work with Apple devices and Apple Pay. On November 7, customers, including Apple co-founder Steve Wosniak, began complaining that the card appeared to offer smaller lines of credit to women than it did to men.
Goldman claimed the card could not possibly be discriminatory along lines of gender because…gender was not used as an input. There was, it appeared, no way for the artificial intelligence to know the gender of any customer that applied for credit.
It quickly became obvious that Goldman Sachs had made a mistake. While it had explicitly removed gender as a category in issuing credit, it had not gone far enough in identifying data that could be analogous to gender.
Women tend to have more credit cards than men, and a higher credit utilisation than men. Some claim women are more likely to go bankrupt than men. It is possible this and similar data (women often use different retail outlets than men) was learnt by the algorithm and then applied to everyone that applied for an Apple card.
Gender is not the only label where proxies can be found without resorting to using a blunt categorisation. Owning a Mac or a PC can give different results on creditworthiness; residential address can produce different results pointing to race.
Nor is Apple the first (and doubtless will not be the last) organisation embarrassed by its algorithm; Amazon has been forced to pull a hiring algorithm because of gender bias; the UK government has been questioned over how one of its algorithms works after Bulgarian nationals living in Britain were flagged up as potential fraudsters by a Department of Work and Pensions algorithm. Tellingly, the Department says the algorithm does not take account of nationality.
As we have seen, removing one category from an algorithm will not prevent the algorithm from reaching unfair decisions. Indeed, removing such categories from an algorithm can make auditing that software for bias somewhat harder.
Producing an algorithm free from such biases is not easy, given the way information can quietly describe us as individuals without baldly stating what we are.
University of San Francisco professor Rachel Thomas says these attributes must be included within an algorithm precisely because organisations can measure that software for bias. For example, data used by an algorithm could be checked for biases before it is put into the algorithm, not simply when it produces an output.
Explaining the ‘Black Box’
Outside of academic experiment, there are a number of different cases where AI has appeared to fail because it learns bias or is unable to identify risk. Among the proffered solutions could be Explainable AI (XAI) and Transparent AI (TAI).
Transparent AI is the building of ethical, lawful and robust AI systems. Key to this is XAI a methodology used to ensure that results produced by AI-based systems are understandable to people, an important step in developing trust in what is an often mysterious technology.
In recent years, organisations such as The Royal Society in the UK, the US Defence Advanced Research Projects Agency (DARPA), China’s State Council and the government of Finland have all organised events and programs to advance explainability and interpretability in AI.
Machine learning is classified as Transparent and Opaque. The transparent models are able to explain their results using methods such as logistics / linear regression, k-nearest neighbours and decision tree. Opaque models, such as Convolutional neural network (CNN), recurrent neural network (RNN), support vector machine (SVM) and random forest (RF), by their ‘black box’ nature, are far harder to understand. Despite their accuracy, identifying how they arrived at their results can be all but impossible. Even here, there are methods of extracting information to understand how a decision has been arrived at, often after the AI has accomplished its task. These ‘post-hoc explainability’ approaches divide into ‘model-agnostic’ and ‘model-specific’
As its name suggests, model-agnostic can be applied to any even the most complex machine learning models. Mode-specific can only be used with a single model or class of models.
Artificial Intelligence can be a complex field, but it doesn’t have to be threateningly mysterious. With good, reliably provenanced data to train with, and the right interpretation techniques, even the most complex AI can be safely and productively used.
What models (if any) can we use to eliminate bias from artificial intelligence? Given the apparent chaotic nature of cause & effect in AI, how far can a mechanistic approach to removing bias go in removing bias? How far do we have to go before we can create an AI that can assess information with the subtlety and nimbleness of a human? Humans speak human and AI speaks maths, what gets lost in translation between the two?