Usually, when we study and deal with things, we often need to classify things. For example, geological exploration classifies samples according to geophysical and geochemical indicators. In paleontology research, they are classified according to the shape and size of the excavated bones; In the monitoring, because the amount of observation data obtained is very large, it is sometimes necessary to classify them, obtain their typical representatives and conduct in-depth analysis, etc., classify things, and then summarize and discover that their laws have become one of people's understanding of the world and the transformation of the world. An important method.
Due to the complexity of objects, experience and expertise alone cannot be accurately classified. With the development of multivariate statistical techniques and the popularity of computer technology, it is not only necessary but entirely possible to use mathematical methods for more scientific classification.
In recent years, numerical taxonomy has gradually formed a new branch called cluster analysis. Cluster analysis is applicable to many different types of data sets, many research fields, such as engineering, biology, medicine, language, anthropology, psychology. Learning and marketing, etc., have all contributed to the development and application of clustering technology.
What is cluster analysis?Cluster analysis, also known as group analysis or point group analysis, is a quantitative method for studying multi-factor matter classification problems. It is an emerging multi-statistic method and is a combination of contemporary taxonomy and multivariate analysis. The basic principle is that, according to the attributes of the sample itself, mathematical methods are used to quantitatively determine the affinity relationship between samples according to a certain similarity or difference index, and the samples are clustered according to the degree of the affinity relationship.
Cluster analysis is to classify the classification objects in a multi-dimensional space and classify them according to the degree of affinity of their spatial relationships.
In layman's terms, cluster analysis is based on the identification of different attributes of things, and the integration of things with similar attributes into a class, so that things of the same type have a high degree of similarity.
Clustering analysis method is an important method to quantitatively study the classification of geographic objects and geographical partitioning problems. Common clustering methods include systematic clustering, dynamic clustering and fuzzy clustering.
What are the benefits of cluster analysis?Cluster analysis: classify individuals (samples) or objects (variables) by similarity (distance and distance) so that the similarities between elements in the same class are stronger than those of other classes. The goal is to maximize the homogeneity of the elements between classes and to maximize the heterogeneity of classes and classes. The main reason is that samples clustered into the same data set should be similar to each other, and samples belonging to different groups should be sufficiently dissimilar.
Common clustering methods: system clustering method, K-means method, fuzzy clustering method, clustering of ordered samples, decomposition method, adding method.
Precautions:1. The system clustering method can classify variables or records, and the K-means method can only classify records;
2. The K-means method requires the analyst to know in advance how many samples are classified;
3. The multivariate normality of the variables, the homogeneity of the variance, etc. are higher.
Application areas: market segments, consumer behavior division, design sampling plans, etc.
Advantages: The advantages of the cluster analysis model are intuitive and the conclusions are concise.
Disadvantages: When the sample size is large, it is difficult to obtain clustering conclusions. Because the similarity coefficient is based on the reflection of the participants to establish indicators that reflect the intrinsic link between the subjects, in practice, although there is a close relationship between the data reflected by the participants, there is no relationship between the things. Intrinsic connection. At this time, it is obviously inappropriate to obtain the results of cluster analysis based on distance or similarity coefficient. However, the cluster analysis model itself cannot identify such errors.
Cluster analysis is an exploratory analysis. In the process of classification, people do not have to give a classification standard in advance. Cluster analysis can automatically classify from sample data, and the methods used in cluster analysis are different. Different conclusions will be obtained. Different researchers will perform cluster analysis on the same set of data, and the number of clusters obtained may not be consistent.
What is the significance of cluster analysis?Cluster analysis refers to the process of grouping a collection of physical or abstract objects into multiple classes of similar objects. It is an important human behavior.
The goal of cluster analysis is to collect data on a similar basis for classification. Clustering stems from many fields, including mathematics, computer science, statistics, biology, and economics. Many clustering techniques have been developed in different application areas. These techniques are used to describe data, measure similarities between different data sources, and classify data sources into different clusters.
Business: Cluster analysis is used to discover different customer segments and to characterize different customer segments through purchase patterns. Cluster analysis is an effective tool for market segments. It can also be used to study consumer behavior, find new potential markets, select experimental markets, and use it as a pre-processing for multivariate analysis.
Biology: Cluster analysis is used to classify animals and plants and classify genes to gain an understanding of the inherent structure of the population.
Geography: Clustering can help the similarities of database vendors observed in the Earth
Insurance industry: Cluster analysis identifies groups of auto insurance policy holders through a high average consumption, while identifying a city's real estate group based on residential type, value, and geographic location.
Internet: Cluster analysis is used to document documents online to fix information
E-commerce: Cluster analysis is also an important aspect in e-commerce website construction data mining. By grouping and clustering customers with similar browsing behaviors and analyzing the common characteristics of customers, it can better help e-commerce users. Know your customers and provide more appropriate services to your customers.
SHENZHEN CHONDEKUAI TECHNOLOGY CO.LTD , https://www.szsiheyi.com