TWiki
>
IVOA Web
>
IvoaKDD
>
IvoaKDDguide
(revision 5) (raw view)
Edit
Attach
<H1>IVOA KDD-IG: A user guide for Data Mining in Astronomy</H1> <br/> --- %TOC% --- ---++ Who's interested? NickBall<br> RaffaeleDAbrusco --- ---++ Draft table of contents Each heading 1, 2, ... to link to its own page. Items in square brackets examples of possible content to include. * *1: What is 'data mining' and why is it important in astronomy?* We begin by attempting to summarize what exactly is meant by 'data mining', and why it will be important to a significant fraction of the astronomical community. The two most important points are: * It allows one to do better science with given data * Handling upcoming large astronomical datasets will be intractable without it [Astroinformatics in context: The 'fourth paradigm', X-informatics and computational-X. Bio- geo- etc.] * *2: Examples of improved science enabled by data mining techniques* We describe some examples where data mining techniques allowed improved science results. [Published science results best, indirectly also improved object detection, object classification, photometric redshifts. But these latter are not really convincing in their own right.] * *3: Overview of the data mining process* Here we elucidate common steps in the data mining process, from raw data to science result. While each particular case will be unique, and driven by the particular science question being addressed, many of the issues encountered are common to different pieces of work, and we describe those here. * Data collection * Data preprocessing * Attribute selection [Incl. dimension reduction] * Selection of algorithm * Improving results * Algorithm application and limitations * *4: The main data mining algorithms* One of the aims of the KDD-IG is to build up an inventory of data mining algorithms that are of use to astronomy. (The current list is here [link].) We don't attempt to duplicate that here, but instead provide descriptions of some of the most well-known data mining algorithms, many of which have been fairly extensively used in astronomy. * Artificial neural network * Decision tree * Genetic algorithms * k nearest neighbor * k-means clustering * Kernel density estimation * Kohonen self-organizing map * Independent component analysis * Mixture models and EM algorithm * Support vector machine * Bayesian Algorithms * *5: Which algorithm to use?* Unfortunately, there is no simple answer, because the differing characteristics of different algorithms render them more or less suitable for different problems. While within in the data mining community there is much literature, for example, comparing different algorithms on a specific dataset, or looking at their theoretical properties in idealized (read: unrealistic) situations, there is much less available to help make a practical choice with real data. We attempt to remedy that here by comparing and contrasting the characteristics of some commonly used algorithms. [Comparison table: e.g., algorithm, advantages, disadvantages] * *6: Present and future directions* The combination of an abundance of available data mining algorithms, advancing technology, large amounts of new astronomical data continuously opening up new regions of parameter space, and the consequent large number of newly addressable science questions, means that several interesting new directions for data mining in astronomy are opening up in the near-term future. * The time domain [Markov models, etc.] * Graphical Processing Units [CUDA, code types amenable to speedup] * Parallel/distributed data mining [Clock speed -> more cores, code has to be rewritten] * Visualization [High dimensionality] * The VO [Standardized data access] * Semantics [e.g. MG's Semantics and Data Mining IVOA talk] * Clouds * *7: Algorithms and techniques astronomers could benefit from but don't use* There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas. * *8: Links: websites, books* There are of course a huge number of websites and books about data mining. This section aims to point to some of those that are most useful for astronomy. [Elements of Statistical Learning, mine and Kirk's DM reviews, 4th paradigm book, ...] * *9: Worked example* Illustrate raw-data-to-science from an existing paper. -- IVOA.NickBall - 04 Aug 2010 <br/> <!-- * Set ALLOWTOPICRENAME = %MAINWEB%.TWikiAdminGroup -->
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r13
|
r7
<
r6
<
r5
<
r4
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r5 - 2010-08-05
-
NinanSajeethPhilip
IVOA
Log in
or
Register
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
TWiki intro
TWiki tutorial
User registration
Notify me
Working Groups
Applications
Data Access Layer
Data Model
Grid & Web Services
Registry
Semantics
Interest Groups
Data Curation
Education
Knowledge Discovery
Operations
Radio Astronomy
Solar System
Theory
Time Domain
Committees
Stds&Procs
www.ivoa.net
Documents
Events
Members
XML Schema
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback