Difference: IvoaKDDguide (1 vs. 14)

Revision 142012-06-26 - root

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

NickBall
KirkBorne
RaffaeleDAbrusco


Purpose of this Guide

The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that is done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.


Table of contents


Authors

  • Nick Ball is an active participant in the Next Generation Virgo Cluster Survey at the Herzberg Institute for Astrophysics in Victoria, British Columbia, Canada. His research interests include the galaxy luminosity function, galaxy, AGN, and quasar properties versus environment, photometric redshifts and classification, and data mining.

  • Sabine McConnell (main author of sections 3 and 4) is an Assistant Professor in the Department of Computing and Information Systems at Trent University, Peterborough, Ontario, Canada.


-- NickBall - 18 Mar 2011
-- NickBall - 23 Sep 2011


Revision 132011-09-24 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

NickBall
KirkBorne
RaffaeleDAbrusco


Purpose of this Guide

The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that is done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.

Deleted:
<
<

Under construction

Currently, sections 1, 2, 3, 4, 7, and 8 are largely written, in a form that from here is hopefully more about improvement that rewriting, but the plan is for the guide to grow in usefulness with time.

Sections 6 and 9 are unwritten. Contributions to these are welcome. Material exists for section 5.

 

Table of contents


Authors

  • Nick Ball is an active participant in the Next Generation Virgo Cluster Survey at the Herzberg Institute for Astrophysics in Victoria, British Columbia, Canada. His research interests include the galaxy luminosity function, galaxy, AGN, and quasar properties versus environment, photometric redshifts and classification, and data mining.

  • Sabine McConnell (main author of sections 3 and 4) is an Assistant Professor in the Department of Computing and Information Systems at Trent University, Peterborough, Ontario, Canada.


-- NickBall - 18 Mar 2011

Added:
>
>

-- NickBall - 23 Sep 2011
 

<--  
-->

Revision 122011-03-19 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy

Deleted:
<
<

 
Added:
>
>

 

Who's interested?

NickBall
KirkBorne
RaffaeleDAbrusco


Purpose of this Guide

The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that is done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.

Under construction

Currently, sections 1, 2, 3, 4, 7, and 8 are largely written, in a form that from here is hopefully more about improvement that rewriting, but the plan is for the guide to grow in usefulness with time.

Sections 6 and 9 are unwritten. Contributions to these are welcome. Material exists for section 5.

Added:
>
>

 

Table of contents

Added:
>
>

 

Authors

  • Nick Ball is an active participant in the Next Generation Virgo Cluster Survey at the Herzberg Institute for Astrophysics in Victoria, British Columbia, Canada. His research interests include the galaxy luminosity function, galaxy, AGN, and quasar properties versus environment, photometric redshifts and classification, and data mining.

  • Sabine McConnell (main author of sections 3 and 4) is an Assistant Professor in the Department of Computing and Information Systems at Trent University, Peterborough, Ontario, Canada.
Deleted:
<
<
 

-- NickBall - 18 Mar 2011

Changed:
<
<

>
>

 
<--  
-->

Revision 112011-03-19 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy


Deleted:
<
<
 

Who's interested?

NickBall
KirkBorne
RaffaeleDAbrusco


Purpose of this Guide

Changed:
<
<
The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that can be done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.
>
>
The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that is done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.
 

Under construction

Changed:
<
<
Currently, the guide is under construction, and will grow and increase in usefulness with time. Section 1 is largely complete. Active work is ongoing on sections 2-5, and 8.
>
>
Currently, sections 1, 2, 3, 4, 7, and 8 are largely written, in a form that from here is hopefully more about improvement that rewriting, but the plan is for the guide to grow in usefulness with time.
 
Changed:
<
<
Contributions to the sections, especially 2, 6, 7, 8, and 9, are welcome.
>
>
Sections 6 and 9 are unwritten. Contributions to these are welcome. Material exists for section 5.
 

Table of contents

Changed:
<
<
>
>
 
Changed:
<
<
>
>
 

Authors

  • Nick Ball is an active participant in the Next Generation Virgo Cluster Survey at the Herzberg Institute for Astrophysics in Victoria, British Columbia, Canada. His research interests include the galaxy luminosity function, galaxy, AGN, and quasar properties versus environment, photometric redshifts and classification, and data mining.
Changed:
<
<
-- NickBall - 07 Jan 2011
>
>
  • Sabine McConnell (main author of sections 3 and 4) is an Assistant Professor in the Department of Computing and Information Systems at Trent University, Peterborough, Ontario, Canada.
 
Added:
>
>

 
Added:
>
>
-- NickBall - 18 Mar 2011
 
Added:
>
>
 
<--  
-->

Revision 102011-01-07 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

NickBall
KirkBorne
RaffaeleDAbrusco


Purpose of this Guide

The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that can be done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.

Under construction

Changed:
<
<
Currently, the guide is under construction, and will grow and increase in usefulness with time. Section 1 is largely complete. Active work is ongoing on sections 2-5.
>
>
Currently, the guide is under construction, and will grow and increase in usefulness with time. Section 1 is largely complete. Active work is ongoing on sections 2-5, and 8.
  Contributions to the sections, especially 2, 6, 7, 8, and 9, are welcome.

Table of contents

Authors

  • Nick Ball is an active participant in the Next Generation Virgo Cluster Survey at the Herzberg Institute for Astrophysics in Victoria, British Columbia, Canada. His research interests include the galaxy luminosity function, galaxy, AGN, and quasar properties versus environment, photometric redshifts and classification, and data mining.

-- NickBall - 07 Jan 2011


<--  
-->

Revision 92011-01-07 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

NickBall
KirkBorne
RaffaeleDAbrusco


Added:
>
>

Purpose of this Guide

The purpose of this guide is to show how the techniques of data mining, a large well-known field with a wide array of applications, can be used within astronomy to improve the science that can be done with the available data. It is written with the typical astronomer in mind, i.e., one whose first priority is to get good science done.

Under construction

Currently, the guide is under construction, and will grow and increase in usefulness with time. Section 1 is largely complete. Active work is ongoing on sections 2-5.

Contributions to the sections, especially 2, 6, 7, 8, and 9, are welcome.

 

Table of contents

Changed:
<
<
-- NickBall - 05 Sep 2010
>
>
Added:
>
>

Authors

  • Nick Ball is an active participant in the Next Generation Virgo Cluster Survey at the Herzberg Institute for Astrophysics in Victoria, British Columbia, Canada. His research interests include the galaxy luminosity function, galaxy, AGN, and quasar properties versus environment, photometric redshifts and classification, and data mining.

-- NickBall - 07 Jan 2011

 


<--  
-->

Revision 62010-09-05 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

NickBall
RaffaeleDAbrusco


Changed:
<
<

Draft table of contents

>
>

Table of contents

 
Changed:
<
<
Each heading 1, 2, ... to link to its own page. Items in square brackets examples of possible content to include.
>
>
Added:
>
>
 
Changed:
<
<
  • 1: What is 'data mining' and why is it important in astronomy?
>
>
-- NickBall - 05 Sep 2010
Deleted:
<
<
We begin by attempting to summarize what exactly is meant by 'data mining', and why it will be important to a significant fraction of the astronomical community.

The two most important points are:

    • It allows one to do better science with given data
    • Handling upcoming large astronomical datasets will be intractable without it

[Astroinformatics in context: The 'fourth paradigm', X-informatics and computational-X. Bio- geo- etc.]

  • 2: Examples of improved science enabled by data mining techniques

We describe some examples where data mining techniques allowed improved science results.

[Published science results best, indirectly also improved object detection, object classification, photometric redshifts. But these latter are not really convincing in their own right.]

  • 3: Overview of the data mining process

Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.

    • Data collection
    • Data preprocessing
    • Attribute selection [Incl. dimension reduction]
    • Selection of algorithm
    • Improving results
    • Algorithm application and limitations

  • 4: The main data mining algorithms

One of the aims of the KDD-IG is to build up an inventory of data mining algorithms that are of use to astronomy. (The current list is here [link].) We don't attempt to duplicate that here, but instead provide descriptions of some of the most well-known data mining algorithms, many of which have been fairly extensively used in astronomy.

    • Artificial neural network
    • Decision tree
    • Genetic algorithms
    • k nearest neighbor
    • k-means clustering
    • Kernel density estimation
    • Kohonen self-organizing map
    • Independent component analysis
    • Mixture models and EM algorithm
    • Support vector machine
    • Bayesian Algorithms

  • 5: Which algorithm to use?

Unfortunately, there is no simple answer, because the differing characteristics of different algorithms render them more or less suitable for different problems. While within in the data mining community there is much literature, for example, comparing different algorithms on a specific dataset, or looking at their theoretical properties in idealized (read: unrealistic) situations, there is much less available to help make a practical choice with real data. We attempt to remedy that here by comparing and contrasting the characteristics of some commonly used algorithms.

[Comparison table: e.g., algorithm, advantages, disadvantages]

  • 6: Present and future directions

The combination of an abundance of available data mining algorithms, advancing technology, large amounts of new astronomical data continuously opening up new regions of parameter space, and the consequent large number of newly addressable science questions, means that several interesting new directions for data mining in astronomy are opening up in the near-term future.

    • The time domain [Markov models, etc.]
    • Graphical Processing Units [CUDA, code types amenable to speedup]
    • Parallel/distributed data mining [Clock speed -> more cores, code has to be rewritten]
    • Visualization [High dimensionality]
    • The VO [Standardized data access]
    • Semantics [e.g. MG's Semantics and Data Mining IVOA talk]
    • Clouds

  • 7: Algorithms and techniques astronomers could benefit from but don't use

There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas.

  • 8: Links: websites, books

There are of course a huge number of websites and books about data mining. This section aims to point to some of those that are most useful for astronomy.

[Elements of Statistical Learning, mine and Kirk's DM reviews, 4th paradigm book, ...]

  • 9: Worked example

Illustrate raw-data-to-science from an existing paper.

-- NickBall - 04 Aug 2010

 


<--  
-->

Revision 52010-08-05 - NinanSajeethPhilip

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

NickBall
RaffaeleDAbrusco


Draft table of contents

Each heading 1, 2, ... to link to its own page. Items in square brackets examples of possible content to include.

  • 1: What is 'data mining' and why is it important in astronomy?
Changed:
<
<
We begin by attempting to summarize what exactly is mean by 'data mining', and why it will be important to a significant fraction of the astronomical community.
>
>
We begin by attempting to summarize what exactly is meant by 'data mining', and why it will be important to a significant fraction of the astronomical community.
  The two most important points are:

    • It allows one to do better science with given data
    • Handling upcoming large astronomical datasets will be intractable without it

[Astroinformatics in context: The 'fourth paradigm', X-informatics and computational-X. Bio- geo- etc.]

  • 2: Examples of improved science enabled by data mining techniques

We describe some examples where data mining techniques allowed improved science results.

[Published science results best, indirectly also improved object detection, object classification, photometric redshifts. But these latter are not really convincing in their own right.]

  • 3: Overview of the data mining process

Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.

    • Data collection
    • Data preprocessing
    • Attribute selection [Incl. dimension reduction]
    • Selection of algorithm
    • Improving results
    • Algorithm application and limitations

  • 4: The main data mining algorithms

One of the aims of the KDD-IG is to build up an inventory of data mining algorithms that are of use to astronomy. (The current list is here [link].) We don't attempt to duplicate that here, but instead provide descriptions of some of the most well-known data mining algorithms, many of which have been fairly extensively used in astronomy.

    • Artificial neural network
    • Decision tree
    • Genetic algorithms
    • k nearest neighbor
    • k-means clustering
    • Kernel density estimation
    • Kohonen self-organizing map
    • Independent component analysis
    • Mixture models and EM algorithm
    • Support vector machine
Added:
>
>
    • Bayesian Algorithms
 
  • 5: Which algorithm to use?

Unfortunately, there is no simple answer, because the differing characteristics of different algorithms render them more or less suitable for different problems. While within in the data mining community there is much literature, for example, comparing different algorithms on a specific dataset, or looking at their theoretical properties in idealized (read: unrealistic) situations, there is much less available to help make a practical choice with real data. We attempt to remedy that here by comparing and contrasting the characteristics of some commonly used algorithms.

[Comparison table: e.g., algorithm, advantages, disadvantages]

  • 6: Present and future directions

The combination of an abundance of available data mining algorithms, advancing technology, large amounts of new astronomical data continuously opening up new regions of parameter space, and the consequent large number of newly addressable science questions, means that several interesting new directions for data mining in astronomy are opening up in the near-term future.

    • The time domain [Markov models, etc.]
    • Graphical Processing Units [CUDA, code types amenable to speedup]
    • Parallel/distributed data mining [Clock speed -> more cores, code has to be rewritten]
    • Visualization [High dimensionality]
    • The VO [Standardized data access]
    • Semantics [e.g. MG's Semantics and Data Mining IVOA talk]
    • Clouds

  • 7: Algorithms and techniques astronomers could benefit from but don't use

There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas.

  • 8: Links: websites, books

There are of course a huge number of websites and books about data mining. This section aims to point to some of those that are most useful for astronomy.

[Elements of Statistical Learning, mine and Kirk's DM reviews, 4th paradigm book, ...]

  • 9: Worked example

Illustrate raw-data-to-science from an existing paper.

-- NickBall - 04 Aug 2010


<--  
-->

Revision 42010-08-05 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

Changed:
<
<
NickBall
>
>
NickBall
RaffaeleDAbrusco
 

Draft table of contents

Each heading 1, 2, ... to link to its own page. Items in square brackets examples of possible content to include.

  • 1: What is 'data mining' and why is it important in astronomy?

We begin by attempting to summarize what exactly is mean by 'data mining', and why it will be important to a significant fraction of the astronomical community.

The two most important points are:

    • It allows one to do better science with given data
    • Handling upcoming large astronomical datasets will be intractable without it

[Astroinformatics in context: The 'fourth paradigm', X-informatics and computational-X. Bio- geo- etc.]

  • 2: Examples of improved science enabled by data mining techniques

We describe some examples where data mining techniques allowed improved science results.

[Published science results best, indirectly also improved object detection, object classification, photometric redshifts. But these latter are not really convincing in their own right.]

  • 3: Overview of the data mining process

Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.

    • Data collection
    • Data preprocessing
    • Attribute selection [Incl. dimension reduction]
    • Selection of algorithm
    • Improving results
    • Algorithm application and limitations

  • 4: The main data mining algorithms

One of the aims of the KDD-IG is to build up an inventory of data mining algorithms that are of use to astronomy. (The current list is here [link].) We don't attempt to duplicate that here, but instead provide descriptions of some of the most well-known data mining algorithms, many of which have been fairly extensively used in astronomy.

    • Artificial neural network
    • Decision tree
    • Genetic algorithms
    • k nearest neighbor
    • k-means clustering
    • Kernel density estimation
    • Kohonen self-organizing map
    • Independent component analysis
    • Mixture models and EM algorithm
    • Support vector machine

  • 5: Which algorithm to use?

Unfortunately, there is no simple answer, because the differing characteristics of different algorithms render them more or less suitable for different problems. While within in the data mining community there is much literature, for example, comparing different algorithms on a specific dataset, or looking at their theoretical properties in idealized (read: unrealistic) situations, there is much less available to help make a practical choice with real data. We attempt to remedy that here by comparing and contrasting the characteristics of some commonly used algorithms.

[Comparison table: e.g., algorithm, advantages, disadvantages]

  • 6: Present and future directions

The combination of an abundance of available data mining algorithms, advancing technology, large amounts of new astronomical data continuously opening up new regions of parameter space, and the consequent large number of newly addressable science questions, means that several interesting new directions for data mining in astronomy are opening up in the near-term future.

    • The time domain [Markov models, etc.]
    • Graphical Processing Units [CUDA, code types amenable to speedup]
    • Parallel/distributed data mining [Clock speed -> more cores, code has to be rewritten]
    • Visualization [High dimensionality]
    • The VO [Standardized data access]
    • Semantics [e.g. MG's Semantics and Data Mining IVOA talk]
    • Clouds

  • 7: Algorithms and techniques astronomers could benefit from but don't use

There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas.

  • 8: Links: websites, books

There are of course a huge number of websites and books about data mining. This section aims to point to some of those that are most useful for astronomy.

[Elements of Statistical Learning, mine and Kirk's DM reviews, 4th paradigm book, ...]

  • 9: Worked example

Illustrate raw-data-to-science from an existing paper.

-- NickBall - 04 Aug 2010


<--  
-->

Revision 32010-08-05 - NickBall

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy




Who's interested?

Added:
>
>
NickBall
 
Added:
>
>

 
Added:
>
>

Draft table of contents

 
Added:
>
>
Each heading 1, 2, ... to link to its own page. Items in square brackets examples of possible content to include.
 
Added:
>
>
  • 1: What is 'data mining' and why is it important in astronomy?

We begin by attempting to summarize what exactly is mean by 'data mining', and why it will be important to a significant fraction of the astronomical community.

The two most important points are:

    • It allows one to do better science with given data
    • Handling upcoming large astronomical datasets will be intractable without it

[Astroinformatics in context: The 'fourth paradigm', X-informatics and computational-X. Bio- geo- etc.]

  • 2: Examples of improved science enabled by data mining techniques

We describe some examples where data mining techniques allowed improved science results.

[Published science results best, indirectly also improved object detection, object classification, photometric redshifts. But these latter are not really convincing in their own right.]

  • 3: Overview of the data mining process

Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here.

    • Data collection
    • Data preprocessing
    • Attribute selection [Incl. dimension reduction]
    • Selection of algorithm
    • Improving results
    • Algorithm application and limitations

  • 4: The main data mining algorithms

One of the aims of the KDD-IG is to build up an inventory of data mining algorithms that are of use to astronomy. (The current list is here [link].) We don't attempt to duplicate that here, but instead provide descriptions of some of the most well-known data mining algorithms, many of which have been fairly extensively used in astronomy.

    • Artificial neural network
    • Decision tree
    • Genetic algorithms
    • k nearest neighbor
    • k-means clustering
    • Kernel density estimation
    • Kohonen self-organizing map
    • Independent component analysis
    • Mixture models and EM algorithm
    • Support vector machine

  • 5: Which algorithm to use?

Unfortunately, there is no simple answer, because the differing characteristics of different algorithms render them more or less suitable for different problems. While within in the data mining community there is much literature, for example, comparing different algorithms on a specific dataset, or looking at their theoretical properties in idealized (read: unrealistic) situations, there is much less available to help make a practical choice with real data. We attempt to remedy that here by comparing and contrasting the characteristics of some commonly used algorithms.

[Comparison table: e.g., algorithm, advantages, disadvantages]

  • 6: Present and future directions

The combination of an abundance of available data mining algorithms, advancing technology, large amounts of new astronomical data continuously opening up new regions of parameter space, and the consequent large number of newly addressable science questions, means that several interesting new directions for data mining in astronomy are opening up in the near-term future.

    • The time domain [Markov models, etc.]
    • Graphical Processing Units [CUDA, code types amenable to speedup]
    • Parallel/distributed data mining [Clock speed -> more cores, code has to be rewritten]
    • Visualization [High dimensionality]
    • The VO [Standardized data access]
    • Semantics [e.g. MG's Semantics and Data Mining IVOA talk]
    • Clouds

  • 7: Algorithms and techniques astronomers could benefit from but don't use

There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas.

  • 8: Links: websites, books

There are of course a huge number of websites and books about data mining. This section aims to point to some of those that are most useful for astronomy.

[Elements of Statistical Learning, mine and Kirk's DM reviews, 4th paradigm book, ...]

  • 9: Worked example

Illustrate raw-data-to-science from an existing paper.

-- NickBall - 04 Aug 2010

 


<--  
-->

Revision 22010-07-16 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy


Added:
>
>


Who's interested?

 


<--  
-->

Revision 12010-07-16 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: A user guide for Data Mining in Astronomy



<--  
-->
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback