Where is the "big" of big data mining_ Shang pin China focuses on high quality website construction

In the past, we talked about data mining, while in the era of big data, we talked about big data excavate. Where is the "big" of big data mining? This article has carried on some induction to this, hoped can provide some ponders the question the method.

Please leave a comment on the shortcomings.

1、 Large amount of data

How much data is there? This is a question that many people ask when they are mining big data.

From some practical applications, if the amount of data processed every day reaches T and P levels, you can consider deploying Hadoop, Spark and other big data processing platforms. A certain amount of data processing can highlight the advantages of these platforms.

The amount of data is small, and data reading and relocation take too much time, which can not reflect the advantages of big data processing platform. Many applications only use big data for big data, and hundreds of M also have a Hadoop. Therefore, when we talk about big data, we think that Hadoop, Spark and other platforms have limitations.

Of course, when deciding whether to use a big data platform, more factors may need to be considered, such as integrating many low performance machines, portability between heterogeneous software and hardware platforms, and processing a large amount of unstructured data.

2、 Diversification of data types

In the era of data mining, we mainly mine relational data. In the era of big data, various applications have produced various kinds of data, which usually involves multiple data types in big data mining. The data type mentioned here is not a common data type in program design, but is closer to the application data representation, usually including time series data, track data, graph data, text data, etc.

The daily sales records and prices are common data types, but they are connected in order from the time dimension. The time series data formed can reflect the law of price changes, and of course have richer meanings.

Everyone's location is just a (x, y) common data type, but connecting the locations according to the order of movement constitutes a person's activity trajectory, which reflects his life and habits, and these hidden information is what big data should pay attention to.

Everyone in microblog or forum exists independently and is also ordinary data. But if everyone is connected according to fans, attention and other relationships, it can form a large graph, that is, graph data. The population and outliers in the graph, as well as the higher graph data with attributes such as group preference and group movement, are the focus of big data mining.

3、 Data processing noise

In the era of data mining, data comes from relational databases, which are business related and high quality data, and can be directly mined when taken. Big data mining is certainly not the case. Big data thinking determines that we should consider the quality of data from different sources, and the data structure is mixed to enhance the robustness of data processing. For example, to conduct enterprise level customer analysis, different branches may use different customer management systems. Some systems use undergraduate/master/doctoral degrees to distinguish customers' degrees, while others use undergraduate/graduate degrees. This requires consideration of data consistency processing. In addition, data format, data integrity, etc. are all considered in big data mining.

4、 Diversification of data mining

In the era of data mining, it generally focuses on single data analysis, while big data mining may focus on the simultaneous existence of multiple data mining tasks, such as classification, prediction, correlation, clustering, etc. Although there are many business requirements, these classifications, predictions, correlations, and clusters may use the same model on the bottom layer. Therefore, it is very important to consider the separation of models, algorithms, and services when mining big data, that is, the so-called big data processing hierarchy.

Where is the "big" of big data mining?

Date：2018-12-11 Source: Zeng Jianping Type: website encyclopedia
Word Size： small medium big

+86 10-60259772