Unfortunately, there is no simple answer, because the differing characteristics of different algorithms render them more or less suitable for different problems. While within in the data mining community there is much literature, for example, comparing different algorithms on a specific dataset, or looking at their theoretical properties in idealized (read: unrealistic) situations, there is much less available to help make a practical choice with real data. We attempt to remedy that here by comparing and contrasting the characteristics of some commonly used algorithms.
Given the number of factors influencing the decision, we do not recommend a particular algorithm. However, if it is truly unknown where to start, the decision tree is a good bet, because, although it may not provide the best results, it is generally robust to issues encountered in real-world data.
An interesting point to note is that, aside from their relative popularity, none of the attributes of the algorithms mentioned are specific to astronomy, or, indeed, any given problem. This emphasizes their generic utility for a variety of analyses, and hence, science goals.
Material for this table is based on section 4 of this guide, Table 1 of Ball & Brunner (2010), and Table 10.1 of Hastie et al. (2001).
Algorithm |
Advantages |
Disadvantages |
|
Training can be slow |
|
No missing values |
|
Can be slow to converge |
|
Requires numerical inputs if using Euclidean distance |
|
Not guaranteed to find local minimum |
|
Basic algorithm requires no overlap between classes |
|
Requires normalized data |
|
Susceptible to outliers |
|
Often requires a large number of iterations through the data |
|
Sensitive to outliers if maximum likelihood is used |
Genetic algorithm |
Able to avoid getting trapped in local minima |
Not guaranteed to find the local minimum |
|
Can approximate nonlinear functions |
Poor at handling missing or irrelevant attributes |
|
Can be extended to the nonlinear case and noisy data |
Requires whitening of the data as preprocessing step |
|
Can give class labels for semi-supervised learning |
Can be slow to converge |
|
Can input and output numerical or categorical variables |
Generally poorer predictive power than ANN, SVM or kNN |
|
Conceptually simple, and easy to code |
|
constrained k-means, guided by training data |
Affected by outliers |
|
Copes with missing data |
Doesn't work well with a large number of components |
Support Vector Machine |
Copes with noise |
Harder to classify > 2 classes |
|
Does not require training |
No model is created |
|
Easily parallelized |
Susceptible to local minima |
|
Easily parallelized |
Non-trivial choice of fitness function and solution representation |
|
Easily parallelized |
Susceptible to noise and irrelevant attributes |
|
Extensively used in astronomy |
Sensitive to noise |
|
Few or no adjustable parameters |
Performs poorly with high-dimensional data |
|
Gives expected error rate |
No model is created |
Mixture Models & Expectation Maximization |
Gives number of clusters in the data |
Can be biased toward Gaussians |
|
Gives unique solution (no local minima) |
Some adjustable parameters |
Artificial Neural Network |
Good approximation of nonlinear functions |
Black-box model |
|
Good at handling missing values and outliers |
|
Good computational scalability |
Building the tree can be slow (data sorting) |
|
Good predictive power |
Non-trivial architecture: many adjustable parameters |
|
Good predictive power |
Long training time |
|
Good predictive power |
|
Good scalability to high-dimensional data |
Can overfit |
|
Interpretable model, easily converted to rules |
Can overfit |
|
Many extensions, e.g., |
Biased towards finding spherical clusters |
Decompositions (SVD, PCA, ICA) |
PCA is extensively used in astronomy |
Applicable to numerical data only |
|
Popular algorithm in astronomy |
Poor interpretability |
Decision Tree |
Popular real-world data mining algorithm |
Can generate large trees that require pruning |
|
probabilistic cluster membership |
Has difficulties with different densities in the clusters |
|
Requires numerical data, not categorical |
|
Robust to irrelevant or redundant attributes |
Can overfit |
|
Robust to outliers, noisy or redundant attributes, missing values |
Many adjustable parameters |
|
Suitable for clusters of differerent density, size and shape |
Local minima |
k-Nearest Neighbor |
Uses all available information |
Computationally intensive (but can be mitigated, e.g., k-d tree) |
|
Uses numerical data |
Kohonen Self-Organizing maps |
Well-known, popular algorithm |
Susceptible to noise |
k-means clustering |
Well-known, simple, popular algorithm |
Susceptible to noise |