Clustering and SegmentationSegmentation is the process that groups similar objects together and forms clusters. Thus it is often referred to as clustering. Clustered groups are homogeneous within and desirably heterogeneous in between. The rationale of intra-group homogeneity is that objects with similar attributes are likely to respond somewhat similarly to a given action. This property has various uses both in business and in scientific research. What are the problems with clustering techniques?Most clustering techniques are developed for laboratory generated simple data consisting of a few to several numerical variables. Applying these techniques to business data that consist of many categorical complex data suffers from various limitations, as described in the followings;
Lost in Translation?Transformation process described above obscures hidden patterns in data. Generally speaking, transformation changes information. Therefore patterns discovered from translated information may not truly represent real genuine patterns. At least, it does not produce accurate and precise patterns. For optimum segmentation, you need clustering tools that do not require extensive data transformation.
Self Organizing Maps (SOM) and Competitive LearningSelf Organizing Maps (SOM), also known as Kohonen Feature Maps, were developed to simulate the way that vision systems work in our brain. Organizations constructed with SOM are very useful in clustering data. It can automatically learn patterns present in data. SOM is based on Neural Network. It is noted that neural networks do not suffer greatly from the limitations discussed above. SOM uses competitive learning techniques to train networks (or to learn patterns). It is often referred to as "Winner takes all strategy", since nodes compete among themselves to display the strongest activation to a given data. Neural ClusteringStarProbe provides a neural clustering procedure that is based on SOM. Neural clustering can be best explained with the figures shown below. In the figures, objects are placed into two dimensional grid cells. There are 81 cells from 9 rows and 9 columns. Note that some cells are empty with no members. Each cell contains most similar objects, i.e., having many similar properties. Objects in neighboring (or nearby) cells are also similar in nature. Closeness of cell distance indicates high degree of similarity. Neural clustering exhibits the following advantages;
Segment AnalysisIn the left figure, colored circles are pie charts representing distribution for combination of gender and race. Notice that all cells contain objects of the same single type. Furthermore, nearby cells of the same type objects are clustered together. An example of perfect clustering! The middle figure shows histograms for a numerical variable. You will notice that nearby cells have similar distributions. The right is all-in-one distribution charts for a specific cell segment.
![]() Neural clustering is robust in detecting patterns and organizes them in a way that provides powerful cluster visualization, as shown in the above figures. This is extremely useful with marketing and business data. The following is another example of neural clustering. This example is based two numerical variables. You can easily find this type of clustering in scientific research. Notice that how well neural clustering works both numerical and categorical data. ![]()
Segmentation Variable Selection MethodsAlthough neural clustering can automatically adjust variable weights, it is often desirable to work only with variables of significant importance. Limiting to such variables can generate segments with simple and clean profiles. It is noted that careful segmentation-target variable selection is essential in predictive segmentation modeling. Unlike standard predictive modeling, predictive segmentation modeling relies on modeler's manual selection of predictive variables. Otherwise, segmentation may not induce models that show predictive power. Identification of significant variables can be very difficult without proper tools. StarProbe link analysis and predictive neural network can be used for analyzing variable's significance. It is noted that selection of segmentation variables using link analysis and/or predictive neural network assures that segmentation results will have predictive power. For more on segmentation variable selection methods, read Link Analysis.
Lift Factor Analysis with Lift Charts / Gains ChartsOnce network training is performed and suitable clusters are found, clusters with highest properties in certain aspects may be selected for follow-up actions. For example, in direct mail marketing, you may select customers in cells that show highest response ratios and mail your new product sales campaign catalog! The following shows the segment selection panel of StarProbe Data Miner;
While selecting cells, you can perform gains factor analysis using the gains chart at the center left. The gains chart takes three different forms, depending on the types of variables and options selected. Horizontally, gains charts show cumulative population sizes covered by the entries in the table. The starting point indicates 0% of the whole population. The ending means 100%. This represents entries of the table in appearing order. The vertical white line indicates the table entry selected. Others will be explained after the figures. Nominal target values ![]() Vertically, the chart shows relative ratio of target class sub-populations over the that of the whole population. The diagonal blue line indicates even (= random) distribution of a target class (= value) selected. The red line is the cumulative ratios of cumulative sub-populations. In most cases, we look for steeply rising curves for obvious reasons. Remind you that we use clustering to find groups with denser properties which is indicated by steeply rising curves above the diagonal blue line. The vertical white line indicates the selected entry in the table. The above figure shows that less than 20% of population covers about 90% of target class. Response & Profit analysis ![]() This is a special feature for direct marketing. The blue curve shows cumulative ratios of response captures. The green curve indicates cumulative capture ratios of order quantities. The red curve shows cumulative profit amounts. The figures show rapidly rising curves in the left. This is a good indication for good segmentation. To find out how clustering is used in Market Segmentation. To find out how clustering is used in Geographic Segmentation. To find out how clustering is used in Customer Segmentation. To find out how clustering is used in Database Marketing. To find out how clustering is used in Direct Marketing. To find out how clustering is used in Direct Mail Marketing. Data Mining Tools for SegmentationStarProbe Data Miner is available for trial with free technical support. If you are interested in trial with your industrial data, please write us from Contact Us. |
![]()
|




