What is a Data Warehouse and Data Mining?
A data warehouse has a wide range of operations because it covers the entire organization’s areas of interest. Data mart covers a specified portion of corporate-wide data. The portion covered by the data mart is mainly of interest to a particular group of users. Virtual warehouses provide views raised on demand, which are then presented on the operational databases.
Characteristics of a Data Warehouse
The four main characteristics of a data warehouse are:
Subject-oriented – The organization of a data warehouse highlights the major subjects: sales, customers, and products. Data organization in a data warehouse is done in reference to the subject instead of the application. For instance, an insurance company would arrange its data in its data warehouse by the customer, claim, and premium, as opposed to the various company products.
Integrated – The construction of a data warehouse involves the integration of various heterogeneous sources; for example, flat files, OLTP, and relational databases. In cases where data is stored in various separate applications in their environment of operation, data encoding is normally inconsistent. Data movement to the data warehouse from the operational environment follows a consistent coding convention.
Nonvolatile – A data warehouse is a store of physically separate data converted from the application data present in the right environment. The separation is important because data warehouses do not require transaction processing, concurrency control, and recovery. Once the data enters the data warehouses, the data can only be refreshed, accessed for queries, or loaded but never changed or updated.
Time variant – The main goal of storing data in the data warehouse is to provide a historical perspective. All the key structure present in the data warehouse possesses either explicit or implicit element of time.
Data mining techniques are divided into two main categories:
Descriptive method – It is a process of data description through finding human interpretable patterns. The data mining groups together comparable documents that have been returned in line with their context by the search engine.
Predictive methods – This method involves the utilization of variables in the prediction of future or unknown valuables.
Data mining techniques
Regression modeling – utilizes standard statistics to the data provided to disapprove or prove a hypothesis.
Visualization – utilizes multidimensional graphs to understand patterns, relationships, or trends.
Correlation – this technique identifies variables’ relationships in a given data group.
Variance analysis is a statistical technique used to check the mean values variances between non-dependent and known variables.
Discriminate analysis – It is a classification technique for identifying or discriminating factors affecting grouping membership.
Forecasting – this technique foretells the outcome of variables based on familiar results of past events.
Cluster analysis – this technique minimizes the data by clustering the groupings and then analyzing the characteristics of each group.
Decision trees – the technique separates data based on rules that have been set, which are described using the “if-then-else” language.
Neutral networks – these are data models that are responsible for stimulating cognitive functions.
Most organizations currently are experiencing immense pressure as they attempt to compete in an environment where deadlines are tight, and profits are low. To compete effectively, enterprises require faster decision support, which utilizes forecasting and the analysis of the existing predictive behavior. Data mining and data warehousing have the capacity to provide these techniques.