International Virtual Observatory Alliance

Difference: IvoaKDDguideUnused (1 vs. 5)

Revision 52012-06-26 - root

META TOPICPARENT	name="IvoaKDDguide"

IVOA KDD-IG: A user guide for Data Mining in Astronomy

7: Algorithms and techniques astronomers could benefit from but seldom use

There are a large number of algorithms and approaches that are well-known to workers in other fields, but are seldom used in astronomy. There is much potential for novelty in collaboration with these subject areas. In particular, significant collaboration with statisticians has created a field related to astroinformatics, astrostatistics.

Subject areas include:

Applied mathematics
Artificial intelligence
Computer science
Signal processing
Statistics

We now describe some of the techniques that hold potential.

Bootstrap aggregating (bagging): Random subsamples with replacement (bootstrap samples) are taken, and the algorithm, for example, a decision tree, is trained on each. The created algorithms vote on the testing set. Bagging often works well on real-world data. When bagging is used with the random selection of a small subset of features for splitting at each node of a decision tree, the procedure is known as a Random Forest.

Mixture of experts: This is a general approach that often works well on real-world data. The outputs from several instances of an algorithm (also called a committee), or several different algorithms, are combined, using some method such as voting, or the output of one algorithm forming the input of another. The final performance is often better than any of the individual algorithms.

Boosting: This uses the fact that several weaker instances of an algorithm can be combined to produce a stronger instance, often better than any single instance. Boosted decision trees have gained popularity in recent years in several fields. The weak learners are iteratively added, and misclassified objects receive a higher weight. Thus, differences between classes are emphasized, i.e., boosted. Boosting is different to bagging or mixture of experts, because the data themselves are altered.

Simulated annealing: This is an addition to a search for a minimum in a parameter space, in which a point is perturbed. This allows it not to get trapped in a local minimum, and hence be more likely to find the global minimum.

Semi-supervised learning: Combines the advantages of supervised and unsupervised learning. Existing classes are incorporated as training data, but new classes can be discovered in an automated way from the data. Could be used for, e.g., large surveys, where spectra are available, but also applied to the much larger photometric regime.

Radial basis function networks: Similar to the usual artificial neural networks used in astronomy, these fit smooth functions to a training set and can thus interpolate. This allows one to overcome the problem of high dimensional data being sparse (the 'curse of dimensionality'). They use a hyperelliptical kernel, and are a route to partial unsupervised learning.

Adaptive resonance theory: Patterns are fed sequentially to an ANN, and new examples can add to existing clusters, or form new ones.

Fuzzy c-means clustering: Extension of k-means clustering that allows the probability of cluster membership to be assigned to each object, removing the requirement that clusters be completely separated. Useful, because different classes of astronomical object often overlap each other in the parameter space of observations.

Locally linear embedding: Part of manifold learning, this is another method of nonlinear dimension reduction that can project high dimensional data onto something more manageable.

Probabilistic principal surfaces: A generalization of PCA that performs unsupervised clustering using dimension reduction

Negative entropy clustering: Clustering performed hierarchically based on the information difference between a Gaussian and a generic multivariate distribution in a space of the same number of dimensions

Wavelets: A wavelet is a wave that starts and ends at amplitude zero. A given shape of wavelet can be convolved with data to extract signals of the same shape from the data. These are already used in astronomy, for example in analyzing the cosmic microwave background, but there remains much potential to be exploited, including other shapes, such as needlets, curvelets, platelets, and so on, and other generalizations, such as to the low counts regime.

Hidden Markov models: part of a broader class of Bayesian networks, which are themselves graphical models, these have been used in astronomy for real-time classification of time domain data, where the existing base of knowledge that would form the training set to a more traditional supervised method is sparse (e.g., few objects, or missing data), thus prior knowledge is useful. Markov logic networks allow domain knowledge to be expressed as a quantified value within the network.

Online learning: a machine learning algorithm learns the data from a training set as it is streamed, as opposed to all at once. This is useful for utilizing training sets that are too large to be loaded into machine memory.

Diffusion maps: Part of spectral connectivity analysis, these are a nonlinear method for transforming observed data onto a simpler, more natural coordinate system. Results can be superior to those gained from principal component analysis, and the method is robust to outliers.

False discovery rate: This is a method for establishing a significant discovery from a set of data smaller than the usual N sigma hypothesis test. It allows one to a priori control the number of false rejections made when testing a model against a null hypothesis, and reduces the number of false rejections compared to correct detections. It also works with correlated data.

Anderson-Darling test: A statistical test that is more sensitive than the Kolmogorov-Smirnov test (widely used in astronomy) to test for departures of a sample from a normal distribution.

Mark correlation function: This generalizes the usual 2- or N-point correlation function, so that the properties (or marks) of an object are taken into account. An example is galaxies that are clustered in space, where the clustering is a function of several properties. Related to this, the mark connection function gives the relative frequencies of mark pairs as a function of pair separation. Both of these are part of the statistics of marked point processes.

Multiscale methods: These are capable of adapting to spatially varying smoothness in images, enabling deconvolution with error estimates.

-- NickBall - 19 Mar 2011
-- NickBall - 23 Sep 2011

Revision 42011-09-24 - NickBall

META TOPICPARENT	name="IvoaKDDguide"

IVOA KDD-IG: A user guide for Data Mining in Astronomy

7: Algorithms and techniques astronomers could benefit from but seldom use

Subject areas include:

Applied mathematics
Artificial intelligence
Computer science
Signal processing
Statistics

We now describe some of the techniques that hold potential.

Bootstrap aggregating (bagging): Random subsamples with replacement (bootstrap samples) are taken, and the algorithm, for example, a decision tree, is trained on each. The created algorithms vote on the testing set. Bagging often works well on real-world data. When bagging is used with the random selection of a small subset of features for splitting at each node of a decision tree, the procedure is known as a Random Forest.

Mixture of experts: This is a general approach that often works well on real-world data. The outputs from several instances of an algorithm (also called a committee), or several different algorithms, are combined, using some method such as voting, or the output of one algorithm forming the input of another. The final performance is often better than any of the individual algorithms.

Boosting: This uses the fact that several weaker instances of an algorithm can be combined to produce a stronger instance, often better than any single instance. Boosted decision trees have gained popularity in recent years in several fields. The weak learners are iteratively added, and misclassified objects receive a higher weight. Thus, differences between classes are emphasized, i.e., boosted. Boosting is different to bagging or mixture of experts, because the data themselves are altered.

Simulated annealing: This is an addition to a search for a minimum in a parameter space, in which a point is perturbed. This allows it not to get trapped in a local minimum, and hence be more likely to find the global minimum.

Semi-supervised learning: Combines the advantages of supervised and unsupervised learning. Existing classes are incorporated as training data, but new classes can be discovered in an automated way from the data. Could be used for, e.g., large surveys, where spectra are available, but also applied to the much larger photometric regime.

Radial basis function networks: Similar to the usual artificial neural networks used in astronomy, these fit smooth functions to a training set and can thus interpolate. This allows one to overcome the problem of high dimensional data being sparse (the 'curse of dimensionality'). They use a hyperelliptical kernel, and are a route to partial unsupervised learning.

Adaptive resonance theory: Patterns are fed sequentially to an ANN, and new examples can add to existing clusters, or form new ones.

Fuzzy c-means clustering: Extension of k-means clustering that allows the probability of cluster membership to be assigned to each object, removing the requirement that clusters be completely separated. Useful, because different classes of astronomical object often overlap each other in the parameter space of observations.

Locally linear embedding: Part of manifold learning, this is another method of nonlinear dimension reduction that can project high dimensional data onto something more manageable.

Changed:

<
<

Principal surfaces and negative entropy clustering: These are unsupervised methods that can be optimized by using existing knowledge, for example, a subset of objects with spectra.

>
>

Probabilistic principal surfaces: A generalization of PCA that performs unsupervised clustering using dimension reduction

Added:

>
>

Negative entropy clustering: Clustering performed hierarchically based on the information difference between a Gaussian and a generic multivariate distribution in a space of the same number of dimensions

Wavelets: A wavelet is a wave that starts and ends at amplitude zero. A given shape of wavelet can be convolved with data to extract signals of the same shape from the data. These are already used in astronomy, for example in analyzing the cosmic microwave background, but there remains much potential to be exploited, including other shapes, such as needlets, curvelets, platelets, and so on, and other generalizations, such as to the low counts regime.

Hidden Markov models: part of a broader class of Bayesian networks, which are themselves graphical models, these have been used in astronomy for real-time classification of time domain data, where the existing base of knowledge that would form the training set to a more traditional supervised method is sparse (e.g., few objects, or missing data), thus prior knowledge is useful. Markov logic networks allow domain knowledge to be expressed as a quantified value within the network.

Online learning: a machine learning algorithm learns the data from a training set as it is streamed, as opposed to all at once. This is useful for utilizing training sets that are too large to be loaded into machine memory.

Diffusion maps: Part of spectral connectivity analysis, these are a nonlinear method for transforming observed data onto a simpler, more natural coordinate system. Results can be superior to those gained from principal component analysis, and the method is robust to outliers.

False discovery rate: This is a method for establishing a significant discovery from a set of data smaller than the usual N sigma hypothesis test. It allows one to a priori control the number of false rejections made when testing a model against a null hypothesis, and reduces the number of false rejections compared to correct detections. It also works with correlated data.

Anderson-Darling test: A statistical test that is more sensitive than the Kolmogorov-Smirnov test (widely used in astronomy) to test for departures of a sample from a normal distribution.

Mark correlation function: This generalizes the usual 2- or N-point correlation function, so that the properties (or marks) of an object are taken into account. An example is galaxies that are clustered in space, where the clustering is a function of several properties. Related to this, the mark connection function gives the relative frequencies of mark pairs as a function of pair separation. Both of these are part of the statistics of marked point processes.

Multiscale methods: These are capable of adapting to spatially varying smoothness in images, enabling deconvolution with error estimates.

-- NickBall - 19 Mar 2011

Added:

>
>

-- NickBall - 23 Sep 2011

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 32011-03-19 - NickBall

  META TOPICPARENT 
  name="IvoaKDDguide"  

 IVOA KDD-IG: A user guide for Data Mining in Astronomy 

 7: Algorithms and techniques astronomers could benefit from but seldom use 

There are a large number of algorithms and approaches that are well-known to workers in other fields, but are seldom used in astronomy. There is much potential for novelty in collaboration with these subject areas. In particular, significant collaboration with statisticians has created a field related to astroinformatics, astrostatistics.

Subject areas include:
 

 Applied mathematics
  Artificial intelligence
  Computer science
  Signal processing
  Statistics
 

We now describe some of the techniques that hold potential.
 

 Bootstrap aggregating (bagging): Random subsamples with replacement (bootstrap samples) are taken, and the algorithm, for example, a decision tree, is trained on each. The created algorithms vote on the testing set. Bagging often works well on real-world data. When bagging is used with the random selection of a small subset of features for splitting at each node of a decision tree, the procedure is known as a Random Forest.
- META TOPICPARENT
+  name="IvoaKDDguide"
-<
<
+ Mixture of experts: This is a general approach that often works well on real-world data. The outputs from several instances of an algorithm (also called a committee), or several different algorithms, are combined, using some method such as voting, or the output of one algorithm forming the input of another. The final performance is often better tha any of the individual algorithms.
->
>
+ Mixture of experts: This is a general approach that often works well on real-world data. The outputs from several instances of an algorithm (also called a committee), or several different algorithms, are combined, using some method such as voting, or the output of one algorithm forming the input of another. The final performance is often better than any of the individual algorithms.
-<
<
+ Boosting: This uses the fact that several weaker instances of an algorithm can be combined to produce a stronger instance, often better than any single instance. Boosted decision trees have gained popularity in recent years in several fields. The weak learners are iteratively added, and misclassified objects receive a higher weight. Thus, differences between classes are emphasised, i.e., boosted. Boosting is different to bagging or mixture of experts, because the data themselves are altered.
->
>
+ Boosting: This uses the fact that several weaker instances of an algorithm can be combined to produce a stronger instance, often better than any single instance. Boosted decision trees have gained popularity in recent years in several fields. The weak learners are iteratively added, and misclassified objects receive a higher weight. Thus, differences between classes are emphasized, i.e., boosted. Boosting is different to bagging or mixture of experts, because the data themselves are altered.
-<
<
+ Simulated annealing: an addition to a search for a minimum in a parameter space, in which a point is perturbed, allowing it to not get trapped in a local minimum, and hence be more likely to find the global minimum.
->
>
+ Simulated annealing: This is an addition to a search for a minimum in a parameter space, in which a point is perturbed. This allows it not to get trapped in a local minimum, and hence be more likely to find the global minimum.
-<
<
+ Semi-supervised learning: Combines the advantages of supervised and unsupervised learning. Existing classes are incorporated as training data, but new classes can be discovered in an automated way from the data. Could be used for, e.g., large surveys, where spectra are available, but also applied to the much larger photometric regime. A class of semi-supervised learning algoritms is constrained clustering.
->
>
+ Semi-supervised learning: Combines the advantages of supervised and unsupervised learning. Existing classes are incorporated as training data, but new classes can be discovered in an automated way from the data. Could be used for, e.g., large surveys, where spectra are available, but also applied to the much larger photometric regime.
  Radial basis function networks: Similar to the usual artificial neural networks used in astronomy, these fit smooth functions to a training set and can thus interpolate. This allows one to overcome the problem of high dimensional data being sparse (the 'curse of dimensionality'). They use a hyperelliptical kernel, and are a route to partial unsupervised learning.
 
 

 Adaptive resonance theory: Patterns are fed sequentially to an ANN, and new examples can add to existing clusters, or form new ones.
 
 

 Fuzzy c-means clustering: Extension of k-means clustering that allows the probability of cluster membership to be assigned to each object, removing the requirement that clusters be completely separated. Useful, because different classes of astronomical object often overlap each other in the parameter space of observations.
 
 

 Locally linear embedding: Part of manifold learning, this is another method of nonlinear dimension reduction that can project high dimensional data onto something more manageable.
 
 

 Principal surfaces and negative entropy clustering: These are unsupervised methods that can be optimized by using existing knowledge, for example, a subset of objects with spectra.
-<
<
+ Wavelets: A wavelet is a wave that starts and ends at amplitude zero. A given shape of wavelet can be convolved with data to extract signals of the same shape from the data. These are already used in astronomy, for example in analysing the cosmic microwave background, but there remains much potential to be exploited, including other shapes, such as needlets, curvelets, platelets, and so on, and other generalizations, such as to the low counts regime.
->
>
+ Wavelets: A wavelet is a wave that starts and ends at amplitude zero. A given shape of wavelet can be convolved with data to extract signals of the same shape from the data. These are already used in astronomy, for example in analyzing the cosmic microwave background, but there remains much potential to be exploited, including other shapes, such as needlets, curvelets, platelets, and so on, and other generalizations, such as to the low counts regime.
-<
<
+ Hidden Markov models: part of a broader class of Bayesian networks, which are themselves graphical models, these have been used in astronomy for real-time classification of time domain data, where the existing base of knowledge that would form the training set to a more traditional supervised method is sparse (e.g., missing data), thus prior knowledge is useful. Markov logic networks allow domain knowledge to be expressed as a quantified value within the network.
->
>
+ Hidden Markov models: part of a broader class of Bayesian networks, which are themselves graphical models, these have been used in astronomy for real-time classification of time domain data, where the existing base of knowledge that would form the training set to a more traditional supervised method is sparse (e.g., few objects, or missing data), thus prior knowledge is useful. Markov logic networks allow domain knowledge to be expressed as a quantified value within the network.
  Online learning: a machine learning algorithm learns the data from a training set as it is streamed, as opposed to all at once. This is useful for utilizing training sets that are too large to be loaded into machine memory.
-<
<
+ Diffusion maps: Part of Spectral connectivity analysis, a nonlinear method for transforming observed data onto a simpler, more natural coordinate system. Results can be superior to those gained from Principal Component Analysis, and the method is robust to outliers.
->
>
+ Diffusion maps: Part of spectral connectivity analysis, these are a nonlinear method for transforming observed data onto a simpler, more natural coordinate system. Results can be superior to those gained from principal component analysis, and the method is robust to outliers.
-<
<
+ False discovery rate: This is a method for establishing a significant discovery from a set of data smaller than the usual N sigma hypothesis test. It allows one to a priori control the number of false rejections made when testing a model against a null hypothesis, and reduces the number of false retections compared to correct detections. It also works with correlated data.
->
>
+ False discovery rate: This is a method for establishing a significant discovery from a set of data smaller than the usual N sigma hypothesis test. It allows one to a priori control the number of false rejections made when testing a model against a null hypothesis, and reduces the number of false rejections compared to correct detections. It also works with correlated data.
  Anderson-Darling test: A statistical test that is more sensitive than the Kolmogorov-Smirnov test (widely used in astronomy) to test for departures of a sample from a normal distribution.
-<
<
+ Mark correlation function: This generalizes the usual 2- or N-point correlation function, so that the properties (or marks) of an object are taken into account, for example, galaxies that are clustered, where the clustering is a function of several properties. Related to this, the mark connection function gives the relative frequencies of mark pairs as a function of pair separation. Both of these are part of the statistics of marked point processes.
->
>
+ Mark correlation function: This generalizes the usual 2- or N-point correlation function, so that the properties (or marks) of an object are taken into account. An example is galaxies that are clustered in space, where the clustering is a function of several properties. Related to this, the mark connection function gives the relative frequencies of mark pairs as a function of pair separation. Both of these are part of the statistics of marked point processes.
  Multiscale methods: These are capable of adapting to spatially varying smoothness in images, enabling deconvolution with error estimates.
-<
<
+-- NickBall - 18 Mar 2011
->
>
+-- NickBall - 19 Mar 2011
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 22011-03-19 - NickBall

  META TOPICPARENT 
  name="IvoaKDDguide"  

 IVOA KDD-IG: A user guide for Data Mining in Astronomy
- META TOPICPARENT
+  name="IvoaKDDguide"
-<
<
+: Algorithms and techniques astronomers could benefit from but don't use
->
>
+: Algorithms and techniques astronomers could benefit from but seldom use
-<
<
+There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas.
->
>
+There are a large number of algorithms and approaches that are well-known to workers in other fields, but are seldom used in astronomy. There is much potential for novelty in collaboration with these subject areas. In particular, significant collaboration with statisticians has created a field related to astroinformatics, astrostatistics.
-<
<
+ Diffusion maps
->
>
+Subject areas include:
-<
<
->
>
+ Applied mathematics
->
>
+ Artificial intelligence
  Computer science
  Signal processing
  Statistics
-<
<
+ KDD-IG members: please add examples here!
->
>
+We now describe some of the techniques that hold potential.
->
>
+ Bootstrap aggregating (bagging): Random subsamples with replacement (bootstrap samples) are taken, and the algorithm, for example, a decision tree, is trained on each. The created algorithms vote on the testing set. Bagging often works well on real-world data. When bagging is used with the random selection of a small subset of features for splitting at each node of a decision tree, the procedure is known as a Random Forest.
 
 

 Mixture of experts: This is a general approach that often works well on real-world data. The outputs from several instances of an algorithm (also called a committee), or several different algorithms, are combined, using some method such as voting, or the output of one algorithm forming the input of another. The final performance is often better tha any of the individual algorithms.
 
 

 Boosting: This uses the fact that several weaker instances of an algorithm can be combined to produce a stronger instance, often better than any single instance. Boosted decision trees have gained popularity in recent years in several fields. The weak learners are iteratively added, and misclassified objects receive a higher weight. Thus, differences between classes are emphasised, i.e., boosted. Boosting is different to bagging or mixture of experts, because the data themselves are altered.
 
 

 Simulated annealing: an addition to a search for a minimum in a parameter space, in which a point is perturbed, allowing it to not get trapped in a local minimum, and hence be more likely to find the global minimum.
 
 

 Semi-supervised learning: Combines the advantages of supervised and unsupervised learning. Existing classes are incorporated as training data, but new classes can be discovered in an automated way from the data. Could be used for, e.g., large surveys, where spectra are available, but also applied to the much larger photometric regime. A class of semi-supervised learning algoritms is constrained clustering.
 
 

 Radial basis function networks: Similar to the usual artificial neural networks used in astronomy, these fit smooth functions to a training set and can thus interpolate. This allows one to overcome the problem of high dimensional data being sparse (the 'curse of dimensionality'). They use a hyperelliptical kernel, and are a route to partial unsupervised learning.
 
 

 Adaptive resonance theory: Patterns are fed sequentially to an ANN, and new examples can add to existing clusters, or form new ones.
 
 

 Fuzzy c-means clustering: Extension of k-means clustering that allows the probability of cluster membership to be assigned to each object, removing the requirement that clusters be completely separated. Useful, because different classes of astronomical object often overlap each other in the parameter space of observations.
 
 

 Locally linear embedding: Part of manifold learning, this is another method of nonlinear dimension reduction that can project high dimensional data onto something more manageable.
 
 

 Principal surfaces and negative entropy clustering: These are unsupervised methods that can be optimized by using existing knowledge, for example, a subset of objects with spectra.
 
 

 Wavelets: A wavelet is a wave that starts and ends at amplitude zero. A given shape of wavelet can be convolved with data to extract signals of the same shape from the data. These are already used in astronomy, for example in analysing the cosmic microwave background, but there remains much potential to be exploited, including other shapes, such as needlets, curvelets, platelets, and so on, and other generalizations, such as to the low counts regime.
 
 

 Hidden Markov models: part of a broader class of Bayesian networks, which are themselves graphical models, these have been used in astronomy for real-time classification of time domain data, where the existing base of knowledge that would form the training set to a more traditional supervised method is sparse (e.g., missing data), thus prior knowledge is useful. Markov logic networks allow domain knowledge to be expressed as a quantified value within the network.
 
 

 Online learning: a machine learning algorithm learns the data from a training set as it is streamed, as opposed to all at once. This is useful for utilizing training sets that are too large to be loaded into machine memory.
 
 

 Diffusion maps: Part of Spectral connectivity analysis, a nonlinear method for transforming observed data onto a simpler, more natural coordinate system. Results can be superior to those gained from Principal Component Analysis, and the method is robust to outliers.
 
 

 False discovery rate: This is a method for establishing a significant discovery from a set of data smaller than the usual N sigma hypothesis test. It allows one to a priori control the number of false rejections made when testing a model against a null hypothesis, and reduces the number of false retections compared to correct detections. It also works with correlated data.
 
 

 Anderson-Darling test: A statistical test that is more sensitive than the Kolmogorov-Smirnov test (widely used in astronomy) to test for departures of a sample from a normal distribution.
 
 

 Mark correlation function: This generalizes the usual 2- or N-point correlation function, so that the properties (or marks) of an object are taken into account, for example, galaxies that are clustered, where the clustering is a function of several properties. Related to this, the mark connection function gives the relative frequencies of mark pairs as a function of pair separation. Both of these are part of the statistics of marked point processes.
 
 

 Multiscale methods: These are capable of adapting to spatially varying smoothness in images, enabling deconvolution with error estimates.
-<
<
+-- NickBall - 05 Sep 2010
->
>
+-- NickBall - 18 Mar 2011
->
>
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 12010-09-06 - NickBall

META TOPICPARENT	name="IvoaKDDguide"

IVOA KDD-IG: A user guide for Data Mining in Astronomy

7: Algorithms and techniques astronomers could benefit from but don't use

There are a large number of algorithms and approaches that are well-known to computer scientists and statisticians, but are little- or un-used in astronomy. There is much potential for novelty in collaboration between these three subject areas.

Diffusion maps

KDD-IG members: please add examples here!

-- NickBall - 05 Sep 2010

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

View topic | History: r5 < r4 < r3 < r2 | More topic actions...