Predictive Modeling and Predictive Models

A predictive model is a system created and used to perform prediction. Predictive models can predict or forecast variety of things and events. For example, future share prices, credit defaults, insurance claims, customer ordering products, and so on. Predictive models are developed from past historical data or from purposely collected data through sampling. Typical examples may include;

  • Insurance: Annual insurance policy applications and claims records can be used to develop models that can predict probability (or level of risk) of insurance claims. Predictive models use demographic and financial information of policy holders along with characteristics of insured objects in determining risk levels. For more, please read Insurance Risk Modeling.
  • Credit loans: Predicting default risk for credit loan applications is another use of predictive models. Data collected from past customer loans, including demographic and financial information of borrowers, can be used to build predictive models that can predict likelihood of loans being defaulted. For more, please read Credit Risk Modeling.
  • Marketing: Predictive modeling can be used for various tasks. For example, from past customer purchasing records, you can develop models that can select customers who are likely to buy your new products. Another example is customer churn detection. Using past customer information, models that can predict customers who are likely to churn near future. This can be very useful in retention marketing. For more, please read Direct Mail Marketing and Customer Churn Modeling.
Key requirement for predictive modeling

The most important factor that can lead to successful implementation of predictive modeling is the availability of useful information. It is noted that predictive models are statistically-developed patterns extracted out of past historical data or purposely collected sampling survey data. With proper data representing predictive patterns of application domains, accurate predictive models can be developed quite easily. For more, please read Cookbook for Predictive Analytics.

Predictive Modeling Software Program Tools

Predictive modeling is done automatically by computer software that can learn patterns from data. StarProbe data miner supports powerful predictive modeling tools. Users can build models with the help of intuitive model visualization tools. Application of models is very easy. Users can apply models directly to user data using built-in database interface tools. StarProbe data miner comes with the following predictive modeling software programs;

  • Predictive Neural Network
    Neural network is a predictive model which is based on the architecture of, say, our brains. It can be used to predict both numerical values and categorical classifications. Generally speaking, neural net offers most accurate and versatile predictive models. For more, please read Neural Network.

  • Decision Tree
    Decision tree develops predictive models based on recursive segmentation. Decision tree models have tree-like structures. As the rule, decision is made based on the democratic principle: the winner takes all. If a category of a decision node has the largest number of cases, it will be the predicted category. Of course, this leads to certain limitations! To overcome this, StarProbe data miner also uses probability. For more, please read Decision Tree Classifier Software

  • Rule Induction classification
    Rule induction is based on Hotspot Profiling. It builds predictive models based on profiles of category hotspots. It determines risk levels based on risk factor profiles.

  • Regression
    Compared to above methods, regression can be very limiting and inflexible, since all categorical information should be encoded into numerical variables. However, regression can be very useful in developing mathematically oriented models with simple variable sets.

Decision tree classification predictive modeling. Neural network predictive modeling.

New way of Deploying Predictive Models for End-users

Normally, predictive models are developed from desktop software programs. Model developers and end-users are often different. For example, insurance risk models may be developed by professional model developers. Special graphical interface (GUI) programs have to be developed and distributed to insurance actuaries. However, developing fully featured GUI programs is costly and time consuming. In addition, managing software distribution and update is even more difficult. Security of software is another concern. Note that in general, risk modeling systems are confidential internal use only!

StarProbe Webkit provides hassle-free deployment kits. Predictive models developed with StarProbe can be deployed over the web without any costly programming effort. You can easily embed it into your enterprise systems using industry standard protocols such as SOAP, XML, etc. Update and new addition of models is also simple and easy. The following figure shows an example of web-deployment of predictive risk models;

insurance risk scoring on web.
This technology will bring your personal gadget
predictive models to your corporate colleagues
and to the world users!

How can you develop predictive models?

To learn more about predictive modeling,
please read The Cookbook for Predictive Analytics.

Advanced Predictive Modeling Techniques

Developing single all-weather predictive models that can predict all corners well is difficult. Predictive models have limitations in learning capacity. Limitations can be improved with advanced techniques. Advanced techniques can overcome certain limitations of predictive modeling;

  • Develop segment-specific predictive models.
    People (or entities) with similar attributes tend to exhibit similar patterns in their behaviors. Predictive segmentation can induce segments rich (or poor) with desirable business outcomes. Although predictive modeling per se may address with this problem to a certain degree, predictive segmentation offers additional benefit for applications where outcome is severely skewed. For example, in direct marketing, positive response ratio is very tiny. Building predictive models directly will not produce successful outcome! The same problem occurs in many other areas, e.g., customer churn identification, credit risk modeling, insurance risk modeling, and so on. Predictive segmentation can lead to segments with boosted response ratios. Predictive models can be developed specifically for such segments. This can improve predictive accuracy significantly.

  • Employ multiple predictive models.
    Predictive models are generally developed using a single dataset (or a set of data records). Such models work well for the dataset used for modeling. But they inherently possess biases and may not work well other datasets. To reduce this problem, multiple models are applied and and results are computed statistically: average, minimum, maximum, or most frequent values. It is noted that this process is also known as "bagging".

  • Incorporate rules and formulas with predictive models.
    Business transactions subject to many business regulations and internal policies to deal with risky transactions. Customer behaviors may be described with rules. Rules are also useful in describing customer segments. Moreover, it is common that complex rules involve complex mathematical formulas. Note that this cannot be handled by flat predictive modeling!

Why Rule-based Modeling and Rule-based Model Evaluation?

Generally, predictive modeling is not much useful if it can not deal with complexities of real world requirements. Predictive modeling is very effective when it is applied to very specific problems. But most real world problems can be readily defined with rules and mathematical formulas. This leaves predictive modeling lesser ground to stand for, which means lesser applicable domains. Rule-based modeling provides an environment where predictive models can be used in conjunction with rules and formulas. In another words, rules and formulas are used to model real world problems. Predictive models are used as functions inside rules and formulas. This paves a perfect modeling environment for advanced applications that may involve complex rules and predictive models.

Rule-based Modeling Environment (RME) is a platform for developing complex predictive models that StarProbe data miner offers uniquely. It allows you to employ advanced techniques and integrate into your enterprise applications seamlessly. For more, please read Rule-based Modeling Environment (PDF/1.0MB). The following examples demonstrate the expressive power of the rule-based modeling language. (Note that predictive models are represented with "MODEL" or "PREDICT" expressions.)

    // for the segment, returns average purchase amount;
    IF Age < 25 and Gender= 'Male' THEN
         RETURN total_purchase / purchase_frequency
    END ;

    // expressing logistic regression function;
    RETURN  // classification based on logistic regression;
       CASE 5.0 / (1.0 + EXP(2.5 + 0.4 * X))
        WHEN < 1.5  THEN 'Low risk'
        WHEN < 3.5  THEN 'Average risk'
        WHEN >= 3.5 THEN 'High risk'
       END ;

    // even complex bagging is so trivial in RME;
    IF MIN(AVG(MODEL(Default1), MODEL(Default2)), // minimum of averages
               AVG(MODEL(Default3), MODEL(Default4))) > 0.25 THEN
         RETURN MIN(MAX(MODEL(Default1), MODEL(Default2)), 0.3)
    END ; // maximum.

    // if any one predicts as 'Risky', it returns average probability;
    IF 'Risky' IN (PREDICT(tree1), PREDICT(tree2), PREDICT(tree3)) 
       THEN RETURN AVG(PREDICT(tree1, 'Risky'), PREDICT(tree1, 'Risky'), 
                    PREDICT(tree3, 'Risky')) // returns average probability
    END ;

    // expressing manually-developed decision tree is straightforward;
    CASE 
       WHEN Gender  = 'Male' THEN
          CASE Age
            WHEN < 20 THEN RETURN 'Teen male' // male under 20
            WHEN < 40 THEN RETURN 'Young male' // male under 40
            WHEN <= 65 THEN RETURN 'Mature male' // male up to 65
            ELSE RETURN 'Retired male' // male over 65
          END
       WHEN Gender  = 'Female' THEN
          CASE Age
            WHEN < 20 THEN RETURN 'Teen female' // female under 20
            WHEN < 40 THEN RETURN 'Young female' // female under 40
            WHEN <= 65 THEN RETURN 'Mature female' // female up to 65
            ELSE RETURN 'Retired female' // female over 65
          END
    END ;
Building Expert systems and Decision support systems (DSS)

Expert systems provide high-level heuristic know-how and expertise that cannot be easily transferred to others. They are designed to provide computerized opinions of human experts in the relevant domains. Building expert systems are always difficult. This is mainly due to the fact that extracting rules from human experts and transforming them into computerized forms is difficult. Rule-based predictive modeling provides an alternative approach. Most rules may be extracted from past data automatically. Meta and supplemental rules can be provided by human experts. In this way, robust expert systems can be developed quickly.

Literally speaking, decision support systems (DSS) are computerized systems developed for help in decision making. For example, insurance actuaries have to make decisions on insurance applications. Being able to determine the level of risk involved systematically will be a big advantage. There are many other areas that decision support systems with rule-based predictive modeling can be developed, e.g., credit approval, fraud detection, marketing, and so on.

Audit and Event Monitoring with Predictive Modeling

Audit of business transactions is required for various reasons. Laws and government regulations prohibit certain types of transactions. Internal policies may forbid them. In addition, auditing may lead to fraud detection and risk management. Rule-based predictive modeling is an excellent platform for implementing real-time auditing systems. With the expressive power of the language integrated with predictive modeling, complex auditing rules can be expressed very easily. In addition, RME can be embedded into users' transaction processing systems. It is noted that these provide a perfect environment for real-time auditing systems. For more, please read real-time transaction audit and event monitoring systems .

Applications of Advanced Predictive Modeling

The robustness of rule-based predictive modeling is ideally suited for applications that are otherwise known to be difficult to model successfully. Typical examples may include the followings;

  • Credit, Finance, Loans Default Risk Predictive Modeling
    Predictive modeling can be used to assess risk levels for financial loans such as mortgages, credit cards, installment purchase, etc. For example, motor vehicle purchase financiers can develop models using past loan data including default information, say, default amounts, default status, etc. Note that such models can predict probability of being defaulted or expected default amounts. For more, read credit risk modeling.

  • Insurance Risk Analysis and Claims Predictive Modeling
    Predictive modeling can be used to analyze risk levels of various insurance policies such as motor vehicle, health, life, etc. For instance, motor accident insurance companies can develop models that can predict expected claim amount or probability for claims, out of past insurance data containing claim information. For more, read insurance risk analysis and scoring.

  • Pinpoint Precision Direct Marketing
    In direct mail marketing, predictive modeling can be used to select customers who are most likely to respond positively to marketing campaigns. One source of modeling data is to use past customers' purchasing historical data. The other is to send product promotional information to small selected sampling groups and collect response information. This information is, then, used to develop predictive models. They are applied to all customers in database (excluding those participated in the sampling). For more, read database marketing, direct mail catalog sales, and RFM marketing

  • Customer Churn Prevention Modeling
    Customer retention is very important because acquiring a new customer is far more expensive than keeping an existing one. Especially, wireless telecom industry has huge annual churn rate. Customers who are likely to churn in the near future can be identified with predictive modeling and preventive measures can be followed.

  • Healthcare Fraud Detection
    Predictive modeling has been used for detecting potential fraudulent cases and for real time transaction auditing activities.

  • Real-time auditing, Security checks, Process control, Exception monitoring
    Realtime process auditing and control is a very important application of advanced predictive techniques which combines rules and predictive modeling.

  • Email Spam Filtering
    Spam is the single most nagging problem for all email users. Rule-based predictive models can be used to filter out spams effectively.

For more information and trial, please write to us.