In this Opinion piece, Charlotte Halkett from buzzvault looks at how valuable data has become, and how skewed, or fake data, can sometimes be bad for business. How the insurance industry gathers, sifts and acts upon its wealth of data will make a big difference to customer relationships in the future.
How much has changed in the last 12 months! A year ago, the Economist published their article “The world’s most valuable resource is no longer oil, but data” (interestingly, which also put forward the case for how the internet giants should be regulated).
Now the world is talking about data. The Facebook/Cambridge Analytica scandal has catapulted the value of personal data into mainstream consciousness, and enthusiastic GDPR emails have been bombarding everyone’s inbox like a meteorite storm in a Sci-Fi movie. Everyone knows that data is desirable, delicious, keenly pursued.
But I want to underline one area of data that isn’t being talked about enough right now. And that’s bad data. What about when your data is a poor representation of reality, collated with inherent bias or – even worse – just plain wrong.
Data isn’t collected for no reason – it’s there to form the foundation of decisions. Brainy people will build complex models to scenario-test business cases. Machine learning will scrape through your data, finding patterns that cluster your population into groups. Artificial Intelligence will learn from your data, using it make decisions about the optimal way to act.
But all those processes amplify what’s underneath, and there is great danger in smart models amplifying bad, unreliable data into bad, unreliable decision-making. If my very clever system designed by clever people tells me that the sky is always blue, how can it ever be otherwise? (Better check what time of day you took that data sample…)
This isn’t a new conclusion. Statisticians can tell you it’s been around longer than they have. It is often called, uncharacteristically simply for statistics: cr*p in = cr*p out.
So why bring it up now? Because the processes we use for modelling data are becoming ever more powerful and are used by an ever-widening group of people. I have had a number of encounters this year with so-called experts who seem to have a poor grasp of data hygiene and in particular data bias. Just because you have a lot of data, it doesn’t mean that’s the end of the story.
CAN ARTIFICIAL INTELLIGENCE DEVELOP AN UNCONSCIOUS BIAS?
Last October, Google’s AI lead John Giannandrea described bias in AI systems as the real danger associated with the systems.
“The real safety question, if you want to call it that, is that if we give these systems biased data, they will be biased”
Data used to train machines can be (and probably is) biased. At a recent Machine Learning and AI conference I was at, a question to an expert panel about what could be done to minimise bias in machine-learning algorithms was met with confused faces and even – shockingly – laughter from the audience. There’s such a lack of understanding that even some of those driving the industry don’t recognise the issue exists. I have even found people who seem actually offended by the idea there may be unconscious bias in their thinking – why?
We are intelligent beings who don’t exist in a vacuum, you can only gain by understanding what you have and why it came about. Then you can understand the potential limitations and opportunities in how you are training your computational tasks.
There are particular risks where computational techniques trained on biased data/processes may amplify existing societal bias – for example, a recent study showing the increased error rate in facial recognition software by both skin-type and gender. Analysis has shown error rates of 0.8 percent for light-skinned men and 34.7 percent for dark-skinned women respectively. The consequences of this could be far-reaching as the technology is rolled out. Or how about AI in screening candidates for a job – if your model is looking for historical characteristics in a role, this could lead to an amplification of past bias, such as gender discrimination.
The insurance industry in particular is vulnerable to poor data and bias as it tries to assess future risk based on past record – it’s vital to understand how your data was built and what it really means, or you might make costly decisions.
Before a skyscraper is built, the foundations must be solid. So don’t just prioritise data in your organisation – prioritise quality data and understand how and why it came about. What bias may exist in the way it was collected? Then go and build that gorgeous stats model with your eyes open.