| Algorithm |
Advantages |
Disadvantages |
| Artificial Neural Network |
Good approximation of nonlinear functions |
Black-box model |
| |
Easily parallelized |
Susceptible to local minima |
| |
Good predictive power |
Non-trivial architecture: many adjustable parameters |
| |
Extensively used in astronomy |
Sensitive to noise |
| |
Robust to irrelevant or redundant attributes |
Can overfit |
| |
Training can be slow |
| |
No missing values |
| Decision Tree |
Popular real-world data mining algorithm |
Can generate large trees that require pruning |
| |
Can input and output numerical or categorical variables |
Generally poorer predictive power than ANN, SVM or kNN |
| |
Interpretable model, easily converted to rules |
Can overfit |
| |
Robust to outliers, noisy or redundant attributes, missing values |
Many adjustable parameters |
| |
Good computational scalability |
Building the tree can be slow (data sorting) |
| Genetic algorithm |
Able to avoid getting trapped in local minima |
Not guaranteed to find the local minimum |
| |
Easily parallelized |
Non-trivial choice of fitness function and solution representation |
| |
Can be slow to converge |
| Support Vector Machine |
Copes with noise |
Harder to classify > 2 classes |
| |
Gives expected error rate |
No model is created |
| |
Good predictive power |
Long training time |
| |
Popular algorithm in astronomy |
Poor interpretability |
| |
Can approximate nonlinear functions |
Poor at handling missing or irrelevant attributes |
| |
Good scalability to high-dimensional data |
Can overfit |
| |
Gives unique solution (no local minima) |
Some adjustable parameters |
| k-Nearest Neighbor |
Uses all available information |
Computationally intensive (but can be mitigated, e.g., k-d tree) |
| |
Does not require training |
No model is created |
| |
Easily parallelized |
Susceptible to noise and irrelevant attributes |
| |
Few or no adjustable parameters |
Performs poorly with high-dimensional data |
| |
Good predictive power |
| |
Uses numerical data |
| |
Conceptually simple, and easy to code |
| |
Good at handling missing values and outliers |
| k-means clustering |
Well-known, simple, popular algorithm |
Susceptible to noise |
| |
Many extensions, e.g., |
Biased towards finding spherical clusters |
| |
probabilistic cluster membership |
Has difficulties with different densities in the clusters |
| |
constrained k-means, guided by training data |
Affected by outliers |
| |
Requires numerical inputs if using Euclidean distance |
| |
Not guaranteed to find local minimum |
| |
Basic algorithm requires no overlap between classes |
| Mixture Models & Expectation Maximization |
Gives number of clusters in the data |
Can be biased toward Gaussians |
| |
Suitable for clusters of differerent density, size and shape |
Local minima |
| |
Copes with missing data |
Doesn't work well with a large number of components |
| |
Can give class labels for semi-supervised learning |
Can be slow to converge |
| |
Requires numerical data, not categorical |
| Kohonen Self-Organizing maps |
Well-known, popular algorithm |
Susceptible to noise |
| |
Requires normalized data |
| |
Susceptible to outliers |
| |
Often requires a large number of iterations through the data |
| Decompositions (SVD, PCA, ICA) |
PCA is extensively used in astronomy |
Applicable to numerical data only |
| |
Can be extended to the nonlinear case and noisy data |
Requires whitening of the data as preprocessing step |
| |
Sensitive to outliers if maximum likelihood is used |