
Rob Bevington – Head of Data Science at Synectics Solutions, takes a look at the pros and cons of supervised and unsupervised learning.
According to the Association of British Insurers (ABI) in 2021, 49,000 motor insurance fraud cases were detected valued at £577 million. Motor fraud continues to make up 60% of all claims fraud detected by Aviva in 2022.
In recent years, Artificial Intelligence (AI) has completely revolutionised how the industry operates. One such use case is identifying fraud at every touchpoint in the customer journey. Here, the focus is on First Notice Of Loss (FNOL) to enable same day payments – improving customer experience while minimising fraud.
In the customer journey, there are several points where data is continuously provided in myriad formats, including structured and unstructured data and images. At these points, AI models are being deployed to detect patterns and anomalies that may indicate fraud.
Two modelling techniques are having a transformative impact in motor claims: supervised and unsupervised machine learning.
Supervised Learning
Supervised machine learning involves training a model on ‘labelled’ claims – when an investigation outcome is known, for example Confirmed Fraud, Defeated, Partial Loss, Clear. New claims are scored against the supervised model to highlight any high degrees of similarity with adverse claims. It’s an effective approach. One customer we work with was able to capture over 90% of the fraud from less than 2% of the data.
Supervised models can be applied almost anywhere. The only requirements are a reasonable amount of data and the label/outcome you want to predict. This could include fraud prevention, marketing optimisation, pricing etc. They are also predictable and explainable, aiding model governance approval as AI regulation increases.
How can they be used? In a number of ways. Supervised models can directly target specific challenges (e.g. staged accident fraud), provide a comprehensive fraud defence and consistently and quickly produce quality referrals. This point about ‘quality referrals’ is key. We’ve seen a client save £6 million in the first month of adopting supervised learning models to help them identify fraud and prioritise investigations. Another reduced referrals by 50% and increase conversion rates to fraud by 30%.
Unsupervised Learning
Unsupervised learning involves building models on data which hasn’t yet been labelled. It relies on statistical techniques like anomaly detection to identify claims not yet marked as adverse but are nonetheless unusual compared to a typical claim portfolio. It deploys an algorithm to discover hidden patterns and data groupings without human intervention.
This type of modelling is well-suited to highlighting new fraud modus operandi and a good fit for huge volume use cases like quotes and transactions. Unsupervised models are also useful where labelled data is not available, such as when launching into new markets. One of our customers has been able to identify incremental cases of fraud and anti-money laundering concerns at a better than 2:1 FPR using an anomaly detection-based model.
However, unsupervised models do have some disadvantages. While the example I just gave showed a very successful false positive rate, unsupervised learning deployed as a stand-alone model without the right level and frequency of recalibration, can be prone to high false positives. Consequently, obtaining unsupervised model governance approval for production use is often difficult by the regulatory authorities.
Difference between the two models
The key difference between the supervised and unsupervised models is the approach to using labelled datasets.
· Supervised modelling uses labelled datasets to check for adverse claims as opposed to unsupervised modelling algorithm which does not.
· In supervised modelling, the algorithm learns from the labelled datasets and makes predictions based on the similarity to historic labelled behaviours. Whereas, unsupervised modelling makes predictions to highlight data that looks unusual or different in general. Therefore, it tends to be less accurate than supervised modelling.
· Contrary to unsupervised modelling, supervised modelling requires human intervention in labelling the dataset in the first place.
· Unsupervised modelling can work autonomously to detect anomalies and inconsistent or atypical patterns or behaviours.
Data is King
Whether talking about supervised or unsupervised learning, we must always mention data. Regardless of modelling technique, it’s imperative that the dataset content being analysed is relevant, timely and accurate to the problem being solved. It also needs to be the right blend.
Claims models which show optimal AUC (area under the curve), for instance, leverage core data comprising claimant information (Including information obtained at quote and policy stage), accident details (structured and unstructured), vehicle data etc., coupled with syndicated fraud data from National SIRA* – data from the latter adding a 30% increase to the predictive performance of models.
*National SIRA is the largest proprietary syndicated database of cross sector customer risk intelligence in the UK with close to half a billion records and over 6 million adverse records.
Which works best?
It’s a legitimate question for insurers to ask, and one we put to the test with a Proof of Concept (PoC) project specifically designed to determine which technique proved more performant as an FNOL anti-fraud defence. Both models – supervised and unsupervised (data anomaly detection) utilised the same base data set as a fair comparison.
Analysing the results, it was clear that a combination of both as part of a holistic anti-fraud claims strategy yields optimal results. From the cases classed as high risk by either model, 98.5% were only flagged by one of the two models, showing how well they complement each other. However, if needing to select either the supervised model or the anomaly detection model as a stand-alone deployment, the supervised model proved a clear winner – finding over five times more claims that went on to be fraud than the anomaly model.
No one size fits all
The reality is that both have their strengths and weaknesses. But what’s most important when talking AI models is that there is no ‘one size fits all’ answer. Choosing the right model, or combination of models, is inevitably linked to each organisation’s specific business focus, availability of labelled data, data characteristics, task objectives, scalability or outliers etc.
AI has huge implications for changing the way the motor insurance industry detects and prevents fraud. The key at this stage to is explore options and to tailor solutions rather than try to adopt a modelling option that just doesn’t quite fit.
Be the first to comment