Should data mining be called data mining? If you think about it, mining in rocks is called gold mining, instead of “rock mining.” So maybe data mining should be named “knowledge mining” because that is what you essentially find. It can reveal patterns in the form of business rules, affinities, correlations, trends, or prediction models.
Data mining can involve many different software packages and analytics tools. The process can be automatic or manual, depending on the demands of the project. In essence, data mining describes sophisticated searching protocols that return specific results from large databases. For instance, a data mining tool might examine decades of financial information to calculate expenses for any given period. Analysts can then cross-reference this information to discover patterns or trends.
Data mining itself is not a discipline but made up of many regulations, which is why it is complicated to understand. It contains parts of statistics, Artificial Intelligence, machine learning & pattern recognition, information visualization, database management, and data warehousing, and management science and information systems.
Data are often buried deep within large databases, which sometimes contain data from several years. In many cases, data is cleansed and consolidated into the warehouse. With new tools, you can now reach unexplored places and use cutting edge data miners to discover even more insight.
What is essential to know is that data mining often results in unexpected results. This forces end-users to think outside of the box, during the process and regarding the interpretation of the findings.
Big data vs. data mining
When considering big data vs. data mining, big data is the asset, and data mining describes the method of intelligence extraction. However, data mining does not depend on big data; software packages and data scientists can mine data with any scale of data set. Whereas the value of big data is contingent on data mining. If data mining cannot uncover actionable insights, big data is of no use. Although big data in itself fulfills the variety and volume criteria, data mining delivers business intelligence at a rapid pace.
How Data Mining works
Data mining builds models to detect patterns in collected data (internal and external). It seeks to find four major types of patterns. Namely associations, predictions, clusters, and sequential relationships.
Associations
Associations find commonly co-occurring groupings of things, discovering interesting relationships among variables in large databases. Mainly used in the retail industry, made easy with barcode scanners. Think of a market-basket analysis, where they discover relationships between products that are often bought together.
Two other popular derivatives of association data mining are link analysis and sequence mining. With link analysis, the linkage between many objects of interest is discovered automatically. With sequence mining, relationships are examined in terms of their order of occurrence to identify associations over time.
Predictions
Predictions tell the nature of future occurrences of certain events based on the past. Predicting is commonly referred to as the act of telling about the future. Prediction exists of classification, regression, or time series. For instance, forecasting a temperature of 30ºC would have a class label that says “sunny”, whereas, with regression, the predicted thing is an actual number; in this case, 30ºC.
Classification is most common of all data mining tasks. The objective is to analyze historical data stored in a database and generate a model to predict future behavior. With this, the hope is to predict future events accurately. Classification tools include decision trees (from machine learning), neural networks, support vector machines, and genetic algorithms.
Clusters
Clustering means having natural groupings of things based on their characteristics, for example, assigning customers in different segments based on their demographics and previous shopping history.
Often, an expert needs to modify and interpret the clusters suggested by the algorithm before the results can be put into actual use. This is because sometimes it occurs that different algorithms end with a different set of clusters for the same data set.
The goal is to create groups so that members of a group have maximum similarity, and across groups members have minimum similarity, which can be useful for segmenting customers and directing appropriate marketing tools to the segments.
Sequential relationships
Sequential relationships discover time-ordered events. A clear example is predicting that an existing banking customer who already has a checking account will open a savings account, followed by an investment account within a year.
The benefits of Data Mining
Data mining has become a tool in addressing many complex business problems and opportunities, and has been proven successful in many different areas, including;
Customer relationship management:
With its goal to build one-on-one relationships with customers by developing an intimate understanding of their needs and wants, data mining can come in very useful. With all the data that is generated from various events (product inquiries, sales, product reviews), there are many different ways data mining can provide more insight.
- Identify most likely buyers / responders of new products and services.
- Understand the root causes of customer attrition to improve customer retention.
- Discover time-variant associations between products and services to maximize sales and customer value.
- Identify the most profitable customers, and their preferential needs to strengthen relationships and maximize sales.
The retail industry:
- Predict accurate sales volumes at specific inventory levels.
- Identify sales relationships between different product types (market-basket analysis).
- Forecast consumption levels of different product types (based on seasonal and environmental conditions) to optimize logistics and hence maximize revenue.
- Discover interesting patterns into the movement of products, especially ones with a short shelf life, in a supply chain by analyzing sensory and RFID data.
Manufacturing and production:
- Predict machinery failures before they occur by using sensory data, which will enable condition-based maintenance.
- Identify commonalities and anomalies in production systems to optimize manufacturing capacity.
- Discover novel patterns to identify and improve product quality.
The travel industry (airlines or hotels):
- Useful to predict sales of different services (seat types in airplanes, type of hotel rooms) to optimally price services to maximize revenues as a function of yield management.
- Forecast demand at different locations to better allocate limited organizational resources.
- Identify the most profitable customers and provide them with personalized services to maintain their repeat business.
- Retain valuable employees by identifying and acting on the root causes for attrition.
Data mining is complicated, but once you understand what it is about, it provides value in many different aspects.