Deviation Analysis and Detection

Deviation analysis can reveal surprising facts hidden inside data. StarProbe provides tools that can be used to detect deviations, anomalies, and outliers. Detection is needed for various reasons;

  • knowledge discovery: often such information is vital part of important business decisions and scientific discovery.
  • auditing: examining such information can reveal problems and mal-practices.
  • fraud detection: fraudulent claims often carry inconsistent information. Such information can reveal fraud cases. More on healthcare fraud detection.
  • data cleaning: such information can be from mistakes in data entry which should be corrected.

Cross Tables and Hidden Patterns

StarProbe Data Miner and DBisual Database Chart Mate support very powerful deviation detection methods for Cross Tables based on Chi-square statistics. The methods can reveal hidden patterns and hidden information hidden inside cross table numbers. As an example, assume two dimensional variables as in the following table. There are 100 population. Along the gender variable, there are 60 males and 40 females. (Note that you can think the numbers as percentages.) Similarly, there are 50 clerks, 40 graduates and 10 management persons in the population.

Clerks Graduates Managements TOTAL
Male       60
Female       40
TOTAL 50 40 10 100

The distributions on these dimensional variables can be considered as general (or overall) patterns of the population. If population is totally bias-free, then we will have expected distribution on the gender - category interactions as follows. Note that interactions (represented in inner cells) are computed based on proportions to both variables. For example, "Male-Clerks" is computed from 100*60%(Male)*50%(Clerks)=30.

Predictive Modeling

Predictive Modeling, such as decision tree, rule induction and neural network, can be used to detect deviations. To detect anomalies in categorical fields, all three tools can be used. For numerical fields, however, only neural network can be employed. Note that decision tree and rule induction cannot predict values for numerical fields. With StarProbe, this works as follows;

  • Build predictive models for targeted fields, using other fields as induction fields.
  • Apply the models to data in database and save results onto database rows.
  • Identify records with different predicted values. For numerical fields, ratios of values can be used. You can perform this easily with simple SQL statements.

Hotspot Analysis

Hotspot Analysis can detect outliers. More specifically, this will detect patterns of outliers, defined in terms of profile conditions. Outliers can have extremely high or low averages, probabilities, etc. With StarProbe, you can perform as follows;

  • Search hotspot profiles.
  • Query database using the hotspot profiles and examine the result rows.

Clustering

Clustering objects based on similarity and analyzing clusters may reveal outliers. With StarProbe, you can perform as follows;

  • Cluster objects based on similarity.
  • Examine clusters using cluster visualization tools.