Rosella       Machine Intelligence & Data Mining

Top-down Drill-down Data analysis

Conventional techniques with statistical packages, OLAP Pivot tables and BI software can be considered as bottom-up approaches. In the approach, fields and their values are examined one by one manually using statistical data visualization and reporting tools which generally support a few dimensions. This approach works well provided the numbers of variables and dimensions are small.

Combinational Factor analysis and Combinatorial Blowout!

The conventional bottom-up approach does not work well if numbers of variables and dimensions grow. The numbers of combinations that analysts will examine will grow combinatorially as numbers of variables and dimensions grow. With the conventional bottom-up manual approach, it becomes increasingly more difficult to analyze. It is noted that analysts should compare different combination and compile different results. This process is a time consuming and error-prone process. Furthermore, if the number reaches certain level, the approach is simply not practical. So they normally don't perform systematic thorough analysis, but performs partial analysis only.

It is noted that many business and scientific data consist of dozens of variables. Many of them are categorical variables that render to dimensional data analysis. For example, customer survey data, direct marketing customer records, and government census data consist of numerous fields. A better technology is needed!

Hotspot Drill-down Segmentation analysis

Hotspot analysis employs drill-down analytic process using using Artificial Intelligence techniques such as search and incremental learning. analysis starts from the whole population. Step-by-step, it generates hypothesis in all possible directions, tests (or scores) them with the input data, and order them based on user-selected scoring criteria. Examples of scoring criteria can be found from here. This provides analysts accurate mapping of most interesting segments, i.e., hotspots.

Hotspot drill-down process is performed automatically by the system. Then analysts can perform tasks in a top-down fashion. Initially, hotspot search can be used to identify factors, properties, sub-populations, etc. It offers starting points for top-down data analysis. The following figure shows an example of hotspot analysis output. Top-left is hotspot drill-down tree. Top-right shows detailed statistics of hotspots selected. Bottom left and right provide gains/lift factor analysis.

Top-do<wbr>wn drill-do<wbr>wn data ana<wbr>lysis using hotspot profiling.

For more information on Hotspot analysis, click Hotspot analysis.
To know how insurance industry can use hotspot analysis to develop profiles of risky insurance policies, click here.


Hierarchical Drill-down Segmentation analysis

Another useful drill-down analysis is decision tree. Decision tree divides populations into smaller segments repeatedly. At a node, it selects a single variable in such a way that values of the variable boost proportions of a largest categorical value in each resulting segments. If the population is insurance policies, each segmentation will try to increase the proportion of either never-claimed or claimed policies. This tends to lead segments with higher portion of claimed policies. Similar analogy applies to other areas, e.g., credit, finance, direct mail catalog responses, customer churns, and so on. There are many applications that this type of segmentation can be useful.

The following figure shows the CMSR Data Miner decision tree classifier module. It is noted that trees are drawn from left to right. This renders compact presentation of trees! Node statistics are shown at the right-hand side. It includes variable-selection criteria scores, value distribution, and prediction value distribution. For the insurance example, reds represent claimed customer portions and greens for never-claimed customers. Nodes in red indicate that over 50% customers of the segments have claims. Green nodes have less than 50% of claim customers. In addition to red nodes, nodes with lower height green bar may be of interest. Note that they represent relatively higher proportions of risky customers.

decision tree classifier

To find out how drill-down tree is used in Insurance Risk analysis.

To find out how drill-down tree is used in Credit Risk analysis.