Difference: IvoaKDDguideProcess (1 vs. 2)

Revision 22011-01-30 - SabineMcConnell

 
META TOPICPARENT name="IvoaKDDguide"

IVOA KDD-IG: A user guide for Data Mining in Astronomy

3: Overview of the data mining process

Changed:
<
<
Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.
>
>
<--Here we elucidate common steps in the data mining process,	from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.-->
Added:
>
>
The data-mining process can be broken down into the following steps, each of which are repeatedly executed and refined.
 
  • Data collection
Changed:
<
<
  • Data preprocessing
  • Attribute selection [Incl. dimension reduction]
  • Selection of algorithm
  • Improving results
>
>
  • Data preprocessing
  • Model Building
  • Model Validation
  • Model Deployment
Deleted:
<
<
  • Algorithm application and limitations
 
Changed:
<
<

>
>
The extraction of information through data mining is an iterative process that is impossible to automate. Each dataset is different in a multitude of ways, collected for different purpose and often as a byproduct of some other process.
Added:
>
>
In addition, a large variety of algorithms are available, each with their own characteristics. Therefore, one of the most important rules is to be cautious, and not trust any results without careful validation of the approach. The following section aims at pointing out some of the main points for each of the main steps in data mining.

Step 1: Data Collection Usually this is outside the control of a data miner. The data mining process typically starts with a data set that is collected as a by-product obtained from some other process. Therefore, the most important point for this step is to understand the bias in the data, and the restriction this imposes on your choices for the later steps.

Step 2: Data Preprocessing This step is the most time-consuming step in the overall approach. Not only do issues with the data such as missing values, multiple measurements, noise, etc. have to be addressed, the data also has to be transformed into a format suitable for the algorithm that is to be applied. Since the algorithm to be applied is likely to change during any course of a data-mining exercise (at least in the initial iterations through this process), this also means that the pre-processing step has to be visited over and over again, with varying requirements.

Step 3: Model Building

Step 4: Model Validation

Step 5: Deployment
 
Added:
>
>
  Under construction by group members


-- NickBall - 05 Sep 2010

Added:
>
>

-- SabineMcConnell - 30 Jan 2011
 
<--  
-->

Revision 12010-09-05 - NickBall

 
META TOPICPARENT name="IvoaKDDguide"

IVOA KDD-IG: A user guide for Data Mining in Astronomy

3: Overview of the data mining process

Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.

  • Data collection
  • Data preprocessing
  • Attribute selection [Incl. dimension reduction]
  • Selection of algorithm
  • Improving results
  • Algorithm application and limitations


Under construction by group members


-- NickBall - 05 Sep 2010


<--  
-->
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback