STEPS INVOLVED IN DATA MINING
Here I am going to share about various things that are involved in Data Mining and I came to know about all these things when I was doing my courses and these are my notes.
- Establishing Goals:- First step is to set up goals. Identify the key and concerned questions to be answered and determine the accuracy required for the results. High level accuracy makes more cost and vice versa. Thus, cost benefits trade offs for desired level of accuracy are important for Data Mining goals. A good starting point for Data Mining is Data Visualization.
2. Selecting Data:- Output of Data Mining largely depends upon quality of data being used and it is crucial. Sometimes data may not be readily available, in such cases we must identify other sources of data or plan new data collection intiatives.
3. Preprocessing Data:- It is an important step where you identify the irrelevant attributes of data. Data should be subject to checks to ensure integrity. If the data were missing randomly a simple set of solutions is enough, but if the data were missing systematically we must determine the impact of missing data on the results therefore you must check these things in advance.
4. Transforming Data:- We have to determine the appropriate format in which data must be stored. An important thing in Data Mining is to reduce the number of attributes needed to explain the phenomena. This requires transforming data reduction algorithms called as Principal Component Analysis, which can reduce number of attributes without a significant loss in information.
5. Storing Data:- It is also important to store data on a storage media that keeps the data secure. The data must be stored in a format gives immediate read privileges to the Data Scientists. Data safety and privacy should be a primary concern for storing data. After these evaluation process would be held to improve the further process.