Insurance Risk Management

Risk management is very important for insurance industry. Insurance means that insurance companies take over risks from customers. Insurers consider every available quantifiable factors to develop profiles of high and low insurance risk. Level of risk determines insurance premiums. Generally, insurance policies involving factors with greater risk of claims are charged at a higher rate. With much information at hand, insurers can evaluate risk of insurance policies at much higher accuracy. To this end, insurers collect a vast amount of information about policy holders and insured objects. Statistical methods and tools based on data mining techniques can be used to analyze or to determine insurance policy risk levels.

Insurance Risk Analysis

In this page, insurance risk analysis methods are described;

  • Insurance risk factor profiling
  • Insurance predictive modeling
  • insurance risk modeling
  • Insurance scoring
  • Insurance risk-level classification

Profiling of Risky Segments

Profiling insurance risk factors is very important. The Pareto principle suggests that 80%~90% of the insurance claims may come from 10%~20% of the insurance segment groups. Profiling these segments can reveal invaluable information for insurance risk management. Insurance providers often collect a large amount of information on insured entities. Policy information (such as automobile insurance, life insurance, general insurance, etc.) often consists of dozens or even hundreds of variables, involving both categorical and numerical data with noisy information. Profiling is to identify factors and variables that best summarize the segments.

Combinational factor analysis and Combinatorial blowout!

Analyzing such vast information is an extremely difficult and challenging task. In conventional profiling methods, factor analysis is performed on a few (to several) variables at a time using statistical software. As the total number of variables analyzed increases, the number of combinations to be examined in this way grows combinatorially. When a large number of variables is involved, the number of combinations is too large. Thorough systematic analysis is all but impossible! A conventional method to this problem is to examine only combinations that are likely to have influence. However, hunch can leave out important factors without being noticed.

Fortunately, this problem can be overcome with CMS Hotspot Profiling Analysis Tools. Hotspot profiling analysis drills-down data systematically and detects important relationships, co-factors, interactions, dependencies and associations amongst many variables and values accurately using Artificial Intelligence techniques such as incremental learning and searching, and generate profiles of most interesting segments. It is noted that insurance premiums are normally stipulated with profiles of risky (or very low-risk) policy holders. Hotspot analysis can identify profiles of high (and low) risk policies accurately through thorough analysis of all available insurance data. The followings are examples of risk factor profiles. It is noted that the same can be applied to other quantifiable risk insurances such as credit insurance, general insurance, and so on.

High risk healthcare coverage risk factor profiling

An insurance company keeps health care insurance coverage (or health insurance for short) or life insurance records in its database: gender, age, education, smoking, drinking, sun activity, height, weight (=obesity level), claim payment, etc., as well as other contact information. The company wishes to know which health insurance groups are at the highest risk, i.e., have the highest claim ratio. The following is a possible output of hotspot profiling analysis;

high risk health insurance profiling.
High risk auto insurance risk factor profiling

An insurance company keeps records on motor vehicle insurance (or automobile insurance) information in its database containing driver and vehicle information: Gender, age, license experience, education, occupation, drinking, smoking, mobile phone use; vehicle manufacturer, type, model, year make, and so on. The company wishes to know which motor vehicle insurance is at the highest risk groups, i.e., highest average insurance payouts. The following is a possible output of hotspot profiling analysis;

high risk auto insurance profling.

Insurance Risk Modeling

If past is any guide for predicting future events, predictive modeling is an excellent technique for insurance risk management. Predictive models are developed from past historical records of insurance polices, containing financial, demographic, psychographic, geographic information, along with properties of insured objects. From the past insurance policy information, predictive models can learn patterns of different insurance claim ratios, and can be used to predict risk levels of future insurance policies. It is important to note that statistical process requires a substantially large number of past historical records (or insurance policies) containing useful information. Useful information is something that can be a factor that differentially affects insurance claims ratios.

Insurance Predictive Modeling and Tools

CMS supports robust easy-to-use predictive modeling tools. Users can develop models with the help of intuitive model visualization tools. Application and deployment of insurance risk models is also very simple. CMS supports the following predictive modeling tools;

  • Neural Network is a very powerful modeling tool. It generally offers most accurate and versatile predictive models. It's very easy to develop neural network predictive models with CMS. Network visualization tools will guide users from configuration, training, testing, and more importantly direct application to databases.
  • Cramer Decision Tree produces most compact and thus most general decision trees. Decision tree can be used for predicting segmentation-based statistical probability of insurance claims.
  • Regression produces mathematical functions for predicting insurance claims. It can be very limiting to be used as general-purpose insurance claims predictive modeling methods. However, regression may be useful for some ad-hoc special case modeling.
Pitfalls of classification modeling techniques

Classification models predict events into categorical cases, say, "risky" or "safe". Classification methods are primarily supported by decision tree, SVM, neural network, etc. Intuitively, classification is a very appealing approach as prediction is made using terms that anyone can understand without professional interpretation. However, there is a serious drawback in applying classification techniques to insurance risk management. The problem lies with the fact that insurance claims are in general very low ratio events, say, less than 10%. Developing predictive models with skewed data is very difficult, especially with decision tree classification. Decision trees develop predictive models through segmenting populations into smaller groups repeatedly. It uses the dominant value (or most frequent value) of each segment as the predicted value for the segment. Dominant values are the values represented by over 50% segment population. Insurance customers are already well screened. It is possible that no segments may contain risky customers in excess over 50%! Even they exist, they may be slightly over 50%! Segments in which 49% customers have claim history will be predicted as "not" risky, although they are very high risk groups! This type of models will have very low accuracy in predicting risky customers as "risky". Much worse is that, as a consequence, more non-risky customers may end up being classified as "risky". Not much useful properties! It is important to note that ALL classification techniques have this limitation. To overcome this problem, some may be tempted to use tricks by introducing extra instances. However, such tricks will necessarily distort overall representation of population. Still the problem remains! A better approach is insurance scoring using statistical probability described in the next section.

Do regression methods work?

Generally speaking, regression methods don't work well for complex modeling. This is especially true if modeling data have severe skews. It tends to produce rather randomly predictions. The following histograms show comparison between different modeling techniques under severe data skew;

By Neural network
Neural network is a very powerful modeling framework. As shown in the left figure, it can learn in very detail. Most green areas are located below 0.4. Most red areas are located above 0.4.
By Cramer Tree Segmentation
The left is a result from probability modeling using Cramer decision tree segmentation. Although it is not as good as neural network, it still produces useful result patterns.
By Regression: General Linear Model
This result is produced with general linear regression models. With general linear model of RR=0.99936, it produces totally useless predictive patterns! This figure shows no patterns in distributions of reds.

Insurance Risk Scoring

Insurance risk scoring is numerical rating of insurance policies. It measures the level of risk of being claimed. This section describes advanced insurance risk modeling and insurance scoring methods;

Method 1: Predicting claims-probability

Decision tree divides insurance customer segments into smaller sub segments recursively. At each segment, splitting is made in a way that boosts proportions of either claimed polices or no claim polices, in each resulting sub segment. This process repeats until no further improvement can be made.

Decision trees - statistical probaility insurance scoring

The above figure shows CMS decision tree. Insurance segments are partitioned recursively in a way that increases the proportion of either claimed polices or no claim policies. In the figure, reds represent claimed policy portions and greens for no claim policies. Nodes in red indicate that over 50% customers of the segments have claimed policies. Green nodes have less than 50% of claimed policies.

For new insurance applications, when customer's information is applied to the tree, it will normally lead to a terminal node segment. The claims ratio of the node is used as the insurance score of the customer or policy. If the segment has 35% claims ratio in the past, the score will be 35% (0r 0.35). For more information, please read Decision Tree Software.

Better method: Predicting relative claims risk levels

Tree-based insurance scoring provides coarse level prediction. It lacks the accuracy that neural network models can produce. Neural Network is a very powerful predictive modeling technique. Neural network is derived from animal nerve systems (e.g., human brains). The heart of the technique is (artificial) neural network. Neural networks can learn to predict in detail with high accuracy. The following shows the neural network module of CMS;

neural network for credit scoring.

Neural network works differently from decision tree. It can be trained to predict either relative claim levels or expected claim amounts. When the former is used, network will predict relative level of insurance claims. The latter will predict expected claim amounts. The followings are CMS histograms, showing distribution of insurance scores predicted by a neural network insurance scoring model. Note that reds are claimed polices. Greens represent no claim policies. Clearly, the neural network model predicts claimed policies with higher scores and no claim policies with lower scores. Analyzing distribution of scores, claims probability may be deduced.

proportional distribution of insurance scores. insurance score distribution histogram.

*** Find out the limitations of predictive modeling based credit risk management in the next section.

Incorporating Judgmental Scoring

Insurance industries heavily rely on judgmental methods. Judgments are made from past experience on important factors. Judgmental rules are used to arrive at ratings.

Normally, this process is performed manually. With the advancement of predictive rule engines, it is now possible to automate this process. This can incorporate the best of both judgmental scoring and statistical scoring methods. Critical data which are the basis of judgment can be collected from financial statements, and so on. Judgmental data may be included as well. Judgmental data are subjective soft data. From financial statements, certain judgmental data may be extracted as subjective assessment by staff. Rules are developed to score risks based on critical and judgmental data. This type of automated systems will promote scoring consistency and accuracy in ratings while maintaining flexibility.

Predictive models may be included in judgmental rules. That is, rules can be used to assess outcomes of statistical predictive models. Combining both judgmental and statistical predictive models can result in best industry practices.

Real-time Expert Advisor for Insurance Scoring

Predictive modeling is based on past statistical evidences. If there is not enough evidence, predictive modeling can fail to predict reliably. In general, most of high risky applications are filtered manually by various regulations, policies and judgmental discretions. They are not in modeling data records. Most statistical evidence for high risk insurance policied are not present in historical data. Thus predictive models will fail to predict even the most obvious risks. Predictive modeling alone cannot be used as the whole solution.

Rule-based modeling is a very powerful platform that combines the best of the knowledge of experienced human actuaries and the power of predictive modeling. It is ideally suited to overcome the limitations of predictive modeling for risk management. This incorporates judgemental scoring. Rosella BI Platform provides two rule-based modeling engines: RME and RME-EP. Both are based on SQL-like rule specification languages. They are very powerful languages incorporating predictive models along with logical expressions and mathematical formulas. RME is a procedural language. RME-EP is for rule-based expert systems. Together they serve as a very powerful platform for risk modeling. For more, please read Expert Systems Shell - Rule Engines.

The following figure shows examples of web-embedded risk management dashboard components. It shows visualized risk levels inferred using rule-based predictive models. Models are evaluated from Rosella BI server and fed to internal charting system;

web embedded credit risk dashbords.

Rule-based model specification language in Rosella Platform is based on powerful SQL database query language with enhanced predictive modeling support. Intuitive-ness and expressive power of SQL is well proven. It can easily incorporates the followings into insuarance scoring models;

  • Government regulations.
  • Internal business policies.
  • Common sense and judgmental rules.
  • Industry actuarial heuristics.

In you are interested in trial, please write to us.


Are you an Insurance Solution Developer or Provider?

Rosella BI Platform is the multi-purpose end-to-end developer platform for insurance solutions. It supports all the tools needed in developing insurance risk management solutions: profiling, segmentation, decision tree, neural network, rule-base modeling, expert system shells, model validation, model deployment, charting and report engines, and so on. It provides all the features for CPM, BAM, BEP, CEP and Balanced Scorecard for insurance solutions. For more, please read Rosella BI Platform, Expert Systems Rule Engines and Predictive Modeling.