IVOA KDD-IG: A user guide for Data Mining in Astronomy

2: Examples of improved results enabled by data mining techniques

We describe some examples where data mining techniques allowed improved astronomical results. The list aims to be comprehensive and illustrative of the range of results, but not exhaustive, or a literature review. The emphasis is on refereed articles, or published conference proceedings.

We begin with science results, including Virtual-Observatory-enabled science and the discovery of new objects, then improved utility results such as processing, object detection and classification, and photometric redshifts. The emphasis is on results enabled by KDD and astroinfomatics, although inevitably there is overlap with astrostatistics, computer science, and other fields.

Broadly, the section can be summarized as a collection of examples where specific, significant improvement came from the methods, or improved access to existing data, rather than new data. However, if a result is from a combination of both new data and astroinformatics methods (e.g., the data might otherwise be intractable), then such results are also included. Papers are given in approximate chronological order, newest first.

Because our aim here is not to review everything, but rather to select highlights, we provide links to online databases, e.g., the NASA Astrophysics Data system, which give a more complete overview of the full range of astronomy literature in which data mining was used.

Given the rather nebulous definition of data mining, it is inevitable that what to include here and what not to include is also somewhat arbitrary and subjective. Generally, we include results related to machine learning and prediction, and in particular, the increasing volume of improved analyses using Bayesian inference, while extremely important, is not generally included, unless there is an obvious data mining component. We also do not include descriptions of software packages, unless they have been used to give a particular science result.

Omission of a particular result may or may not be deliberate. Suggestions for additions are welcome.

General Science / VO-enabled science

  • Ascasibar Y. & Sanchez-Almeida J., MNRAS 415 2417 (2011): Do galaxies form a spectroscopic sequence? --- Using spectroscopic classification based on K-means clustering, most spectral classes are distributed on a 1D curve, suggesting a single affine parameter can describe the spectra.

  • Chilingarian I.V. & Zolotukhin I.Y., arXiv/1102.1159 (2011): A universal ultraviolet-optical colour–colour–magnitude relation of galaxies --- Virtual observatory (VO) technologies are used to cross-match 225,000 galaxies in the GALEX and SDSS sky surveys, showing that 'adding the near ultraviolet colour (GALEX NUV lambdaeff = 227 nm) to the optical (g−r vs Mr) colour-magnitude diagram reveals a tight relation in the three-dimensional colour–colour–magnitude space smoothly continuing from the “blue cloud” to the “red sequence”.' Photometric redshifts with better accuracy than those from most multi-colour datasets are also possible from this 3-dimensional parameter space.

  • Chilingarian I.V., et al., arXiv/1011.1852 (2011): Dynamical versus Stellar Masses of Ultracompact Dwarf Galaxies in the Fornax Cluster --- 'Using Virtual Observatory technologies we found archival HST images for two more UCDs and then determined their structural properties. ... Overall, the observed UCD characteristics suggest at least two formation channels: tidal threshing of nucleated dwarf galaxies for massive UCDs (~108 Msun), and a classical scenario of red globular cluster formation for lower-mass UCDs (< 107 Msun).'

  • Fathi K., ApJ 722 120 (2010): Revisiting the Scale Length-mu0 Plane and the Freeman Law in the Local Universe --- VO tools quantify the scale length - surface brightness relation for disks and confirm the Freeman law's upper limit for surface brightness using a sample two orders of magnitude larger than previously.

  • Fraix-Burnet D., et al., MNRAS 407 2207 (2010): Structures in the fundamental plane of early-type galaxies --- Multivariate clustering and cladistic analyses show that the fundamental plane of galaxies is made of of several groups, each forming its own plane. The planes have varying orientations and thicknesses. The plane properties directly relate to the different formation histories of the constituent galaxies, e.g., mergers, accretion, and are objectively obtained.

  • Steiner J.E., et al., MNRAS 395 64 (2009): PCA Tomography: how to extract information from data cubes --- Illustrative use of principal component analysis (PCA) on data cube of spectra to demonstrate that the galaxy NGC 4736 has a type 1 active nucleus, not known before.

  • Chilingarian I.V., et al., Science 326 1379 (2009): A Population of Compact Elliptical Galaxies Detected with the Virtual Observatory --- A sample of 21 compact ellipticals assembled with the VO and followup spectroscopy supports the tidal stripping scenario for their formation.

  • Banerji M., et al., MNRAS 386 1219 (2008): Photometric redshifts for the Dark Energy Survey and VISTA and implications for large-scale structure --- 'By removing all galaxies with a neural network photo-z error estimate of greater than 0.1 from our DES + VHS sample, we can constrain the galaxy power spectrum out to a redshift of 2 and reduce the fractional error on this power spectrum by ~15–20 per cent compared to using the entire catalogue.'

  • Bayo A., et al., A&A 492 277 (2008): VOSA: Virtual Observatory SED Analyzer. An application to the Collinder 69 open cluster --- VOSA is used to apply spectral energy distribution (SED) fitting to the multiwavelength data describing the Collinder 69 stellar association to derive physical parameters. The VO tool is a key step enabling this to be done for all ~170 candidate members.

  • Dalla S., Fletcher L. & Walton, N.A., A&A 479 L1 (2008): Invisible sunspots and rate of solar magnetic flux emergence --- Use of AstroGrid to analyse 6862 sunspot regions shows that the visibility function from centre to limb is much stronger than would be expected from projection effects, showing that the duration of the growth phase of solar regions has previously been underestimated.

  • Ball N.M., et al., MNRAS 373 845 (2006): Bivariate galaxy luminosity functions in the Sloan Digital Sky Survey --- Use of neural network classifications allows a study of the luminosity function in the Sloan survey, bivariate with Hubble Type morphology, classified with equal accuracy to human experts, but for a sample size of 37,047 objects.

  • Serra-Ricart, M., et al., AJ 109 312 (1995): Multidimensional Interpolation Using Artificial Neural Networks: Application to an HI Cloud in Perseus --- An ANN is used to interpolate multidimensional unbinned data, giving an HI map of an interstellar cloud, from which the physical parameters are readily obtained.

Discovery of New Objects

Several papers demonstrate the use of VO and data mining tools to discover new astronomical objects.

  • Oreiro, R., et al., A&A 530 A2 (2011): A search for new hot subdwarf stars by means of Virtual Observatory tools --- A VO search produces 38 hot subdwarf stellar candidates. An increased number of detections is important to enable statistical studies to elucidate the still unknown origin of these stars. 26 of 30 were confirmed, indicating the high success rate of this technique. An all-sky search is planned.

  • Zolotukhin I.Y. & Chilingarian I.V., A&A 526 A84 (2011): Virtual Observatory based identification of AX J194939+2631 as a new cataclysmic variable --- VO tools enable the discovery of a new cataclysmic variable star, its nature confirmed by multiwavelength data, and spectroscopic followup.

  • Jiménez-Esteban F.M., Caballero J.A. & Solano E., A&A 525 A29 (2011): Identification of blue high proper motion objects in the Tycho-2 and 2MASS catalogues using Virtual Observatory tools --- 'Cross-matching the all-sky Tycho-2 and 2MASS catalogues, and assembling multi-wavelength photometry, using the VO tools Aladin and VOSA, respectively, results in the discovery of 5 new blue high proper motion objects, one of which is confirmed as a hot subdwarf star using the far-UV FUSE satellite.'

  • Chilingarian I.V. & Bergond G., MNRAS 405 L11 (2010): SDSSJ150634.27+013331.6: the second compact elliptical galaxy in the NGC5846 group --- VO workflows enabled the discovery of the fifth compact elliptical for which resolved stellar populations and stellar kinematics can be obtained.

  • Eatough R.P., et al., A&A 407 2443 (2009): Selection of radio pulsar candidates using artificial neural networks --- The use of automated selection, replacing selection of candidates by human experts that is impractical due to the large amount of data, results in the discovery of a previously unidentified pulsar.

  • Chilingarian I.V. & Mamon G.A., MNRAS 385 83 (2008): SDSSJ124155.33+114003.7 -- a Missing Link Between Compact Elliptical and Ultracompact Dwarf Galaxies --- The object lies in between the compact elliptical and ultracompact dwarf sequences, and a description is enabled using VO tools, which suggests tidal stripping.

  • Padovani P, et al., A&A 424 545 (2004): Discovery of optically faint obscured quasars with Virtual Observatory tools --- Use of VO tools combining X-ray and optical data more than triples the number of optically obscured quasars discovered in the GOODS fields from 9 to 31, and suggests the surface density of such quasars on the sky has been underestimated.

Improved processing

Many techniques make improvements in the quality of processing or the time needed to achieve useful results, with often broad potential applicability to many different studies. This applicability is algorithmic, rather than simply the creation of new software (which is beyond the scope of this section).

  • Grassi T., et al., arXiv/1103.0509: MaNN: Multiple Artificial Neural Networks for modelling the Interstellar Medium --- ANNs are able to speed up codes modeling the interstellar medium, by reproducing the details of interstellar gas evolution, and could replace real-time calculation of chemical evolution in hydrodynamical codes.

  • Silva L., et al., MNRAS 410 2043 (2011): Modelling the spectral energy distribution of galaxies: introducing the artificial neural network --- An artificial neural network (ANN) added to the radiative transfer code GRASIL, because computing the radiative transfer of stellar radiation through the dust for an SED is too time consuming, when used for semi-analytic models gives comparable results to the full code.

  • Norgaard-Nielsen H.U., A&A 520 A87 (2010): Foreground removal from WMAP 5yr temperature maps using an MLP neural network --- 'A simple multilayer perceptron neural network with two hidden layers provides temperature estimates over more than 75 per cent of the sky with random errors significantly below those previously extracted from these data.'

  • Gruen D., et al., ApJ 720 639 (2010): Bias-free Shear Estimation Using Artificial Neural Networks --- 'We demonstrate that bias present in existing shear measurement pipelines ... can be almost entirely removed by means of neural networks.'

  • Wu H.Y., Rozo E. & Wechsler R.H., ApJ 713 127 (2010): Annealing a Follow-up Program: Improvement of the Dark Energy Figure of Merit for Optical Galaxy Cluster Surveys --- Simulated annealing with fixed observational cost for calibrating the relation of cluster mass to dark energy 'can reduce the observational cost required to achieve a specified precision by up to an order of magnitude'.

  • Walker M.G., et al., AJ 137 3109 (2009): Clean Kinematic Samples in Dwarf Spheroidals: An Algorithm for Evaluating Membership and Estimating Distribution Parameters When Contamination is Present --- Using the expectation maximization algorithm for the selection of stars that are members of dwarf spheroidal galaxies, 'the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).'

  • Prsa A., et al., ApJ 687 542, (2008): Artificial Intelligence Approach to the Determination of Physical Properties of Eclipsing Binaries. I. The EBAI Project --- The number of eclipsing binary stars analyzed is hundreds, but soon, millions will be available. An ANN trained on model light curves outputs model parameters, surmounting the previous bottleneck of required human inspection.

  • Budavari T., et al., MNRAS 394 1496 (2008): Reliable Eigenspectra for New Generation Surveys --- A robust PCA algorithm that deals with outliers (to which PCA is normally sensitive), and missing data, gives components that are much cleaner. For example, narrow nebular emission lines from broader absorption features that were previously unseen. The method also shows an otherwise unseen line of higher ionization, attributed to AGN.

  • Carroll T.A., Kopf M. & Strassmeier K.G., A&A 488 781 (2008): A fast method for Stokes profile synthesis. Radiative transfer modeling for ZDI and Stokes profile inversion --- The approximation provided by ANNs of the nonlinear mapping between stellar atmospheric parameters and Stokes profiles accelerates their synthesis by over a factor of 1000.

  • Auld T., Bridges M. & Hobson M.P., MNRAS 387 1575 (2008): COSMONET: fast cosmological parameter estimation in non-flat models using neural networks --- COSMONET can estimate cosmological parameters approximately 32x faster than the earlier CAMB code.

  • Hogg D.W., arXiv/0807.4820 (2008): Data analysis recipes: Choosing the binning for a histogram --- A jackknife (leave-one-out cross-validation likelihood) method for finding optimal, rather than arbitrary, histogram bin widths.

  • Gai M. & Cancelliere R., MNRAS 362 1483 (2005): Neural network correction of astrometric chromaticity --- The nonlinear approximation enabled by ANNs can reduce stellar atmospheric chromaticity (the SED-dependent positional variation of stars) from milliarcseconds to microarcseconds.

  • Rohde D.J., et al., MNRAS 360 69 (2005): Applying machine learning to catalogue matching in astrophysics --- Machine learning improves the percentage of the HIPASS radio catalogue (which has large positional uncertainty) matches to the SuperCOSMOS optical catalogue from 44% to 72%, giving 1209 new matches.

  • Ramirez J.F., Funetes O. & Gulati R.K., Experimental Astronomy 12 163 (2001): Prediction of Stellar Atmospheric Parameters Using Instance-Based Machine Learning and Genetic Algorithms --- 'Our experimental results show that the feature selection performed by the genetic algorithm reduces the running time of KNN up to 92%, and the predictive accuracy error up to 35%.'

  • Snider S., et al., ApJ 562 528 (2001): Three-dimensional Spectral Classification of Low-Metallicity Stars Using Artificial Neural Networks --- ANNs can predict the stellar atmospheric parameters Teff, log g and Fe/H for medium resolution uncalibrated spectra to the same accuracy as that from previous fine analysis of high resolution spectra.

  • Gothoskar P. & Khobragade S., MNRAS 277 1274 (1995): Detection of interplanetary activity using artificial neural networks --- ANNs can detect interplanetary activity from scintillation spectra, which otherwise require manual intervention.

  • Angel J.R.P., et al., Nature 348 221 (1990): Adaptive Optics for Array Telescopes Using Neural Network Techniques --- ANNs improved image quality obtained by adaptive optics.

Object Detection and Classification

Numerous papers provide improved detections and classifications of objects. We present examples where there was a demonstrable, qualitative change in the quality of results compared to a 'traditional' method, rather than just minor improvement.

  • Kim, D-W., et al., ApJ 735 68 (2011): Quasi-stellar Object Selection Algorithm Using Time Variability and Machine Learning: Selection of 1620 Quasi-stellar Object Candidates from MACHO Large Magellanic Cloud Database --- SVM on MACHO time series finds 1620 QSO candidates. Cross-matching with the SAGE survey suggests over 70% of them are true QSOs.

  • Richards J.W., et al., arXiv/1106.2832 (2011): Active learning to overcome sample selection bias: Application to photometric variable star classification --- Active learning, in which the testing set is used to iteratively improve the training set by populating its most potentially useful regions with further examples, allows for dramatic improvement in the classification of variable stars trained on Hipparcos and OGLE and applied to the ASAS survey.

  • Richards J.W., et al., ApJ 733 10 (2011): On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data --- New methods are introduced for assigning classification probabilities for variable stars. A random forest classifier gives a 24% improvement over the previous best classification of a set of 1542 variable stars. Specific science classes, such as pulsational variables used in Milky Way tomography, can be selected with over 95% confidence.

  • D'Abrusco R., Longo G. & Walton N.A., MNRAS 396 223 (2009): Quasar candidates selection in the Virtual Observatory era --- Probabilistic principal surfaces and negative entropy clustering allow unsupervised selection of quasar candidates, guided by existing knowledge. Results for the SDSS and UKIDSS surveys are comparable to or better than existing methods.

  • Vanderplas J. & Connolly A., AJ 138 1365 (2009): Reducing the Dimensionality of Data: Locally Linear Embedding of Sloan Galaxy Spectra --- Locally linear embedding improves on PCA and line diagnostics, combining their strengths, for nonlinear dimension reduction of spectra.

  • Huertas-Company M., et al., A&A 478 971 (2008): A robust morphological classification of high-redshift galaxies using support vector machines on seeing limited images I. Method description and II. Quantifying morphological k-correction in the COSMOS field at 1 < z < 2: Ks band vs. I band --- Support vector machines can separate early/late galaxies to 20% contamination to KAB = 22 mag, which is 2x better than the CAS system, and comparable to space data.

  • Bailer-Jones C.A.L., et al., MNRAS 391 1838 (2008): Finding rare objects and building pure samples: probabilistic quasar classification from low-resolution Gaia spectra --- 'Modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects allows the selection of a very pure sample of quasars for the Gaia satellite, even though quasars are extremely rare within the sample. This is a considerable improvement compared to results for unmodified probabilities.'

  • Abdalla F., et al., MNRAS 387 945 (2008): Predicting spectral features in galaxy spectra from broad-band photometry --- ANNs can be used to classify AGN, passive, and star-forming galaxies, without requiring the usual diagnostic emission lines to be present.

  • Bailey S., et al., ApJ 665 1246 (2007): How to Find More Supernovae With Less Work: Object Classification Techniques for Difference Imaging --- Support vector machine and random forest improve on threshold cuts for selecting supernovae by up to 10x.

  • Bazell D. & Miller D.J., ApJ 618 723 (2005): Class Discovery in Galaxy Classification --- The use of semi-supervised learning allows up to a 57% reduction in classification error compared to a normal ANN using only labeled data.

  • Carballo R., Cofino A.S. & Gonzalez-Serrano J.I., MNRAS 353 211 (2004): Selection of quasar candidates from combined radio and optical surveys using neural networks --- ANN selection allows a sample with 87% reliability, and 80% completeness, compared to 56% for the original candidate list.

  • Philip N.S., et al., A&A 385 1119 (2002): A difference boosting neural network for automated star-galaxy classification --- The performance of the DBNN is comparable to SExtractor, but is significantly faster and more flexible during both training and classification.

  • Bazell D. & Aha D.W., ApJ 548 219 (2001): Ensembles of Classifiers for Morphological Galaxy Classification --- 'the ensemble approach can significantly increase the performance of certain automated classification methods when applied to the domain of morphological galaxy classification.'

  • Andreon S., et al., MNRAS 319 700 (2000): Wide field imaging I. Applications of neural networks to object detection and star/galaxy classification --- The NExt (Neural Extractor) software is at least as effective as that of SExtractor, without the myriad of adjustable parameters.

  • Serra-Ricart M., et al., A&AS 115 195 (1996): Faint Object Classification Using Artificial Neural Networks --- The ANN produces similar results to traditional methods (FOCAS), but gives a clear advantage in terms of speed.

  • Weir N., Fayyad U.M. & Djorgovski S., AJ 109 6 (1995): Automated Star/Galaxy Classification for Digitized POSS-II --- Use of decision trees deepens the detection of objects on plate material by 0.5-1 mag.

  • Naim A., et al., MNRAS 275 567 (1995): Automated morphological classification of APM galaxies by supervised artificial neural networks --- ANNs can morphologically classify galaxies with an RMS dispersion of 1.8 Hubble T types, the same as that between human experts, but at an enormously higher speed, meaning that large galaxy catalogues can therefore be classified.

  • Weaver W.B. & Torres-Dodgen A.V., ApJ 446 300 (1995): Neural Network Classification of the Near-Infrared Spectra of A-Type Stars --- ANNs are able to classify A-type stars on the Morgan-Keenan (MK) system at least as accurately as human experts. (Also extended to temperature and luminosity classes in 1997, ApJ 487 847.)

  • Rogers R.D. & Riess A.G., PASP 106 532 (1994): Detection and classification of CCD defects with an artificial neural network --- An ANN finds defects with higher efficiency and in a much shorter time than human inspectors.

  • Klusch M. & Napiwotzki R., A&A 276 309 (1993): HNS - a Hybrid Neural System and its Use for the Classification of Stars --- Combining a neural and semantic network to perform classification of stars on the MK system, 'performance and results of stellar classification of the HNS show significant improvements compared to conventional astronomical techniques.'

Photometric Redshifts

Similar to classification, numerous papers provide results for photometric redshifts.

  • Laurino O., et al., arXiv/1107.3160 (2011): Astroinformatics of galaxies and quasars: a new general method for photometric redshift estimation --- The method of weak gated experts provides photo-zs for SDSS galaxies and quasars with dispersions of 0.021 and 0.035, and an estimate of error for each photo-z.

  • Gillis B.R. & Hudson M.J., MNRAS 410 13 (2011): Group-finding with photometric redshifts: the photo-z probability peaks algorithm --- Using photometric redshift (photo-z) probability density functions (PDFs) allows the direct detection of lower richness groups than similar 2D matched-filter methods.

  • Gerdes D.W., et al., ApJ 715 823 (2010): ArborZ: Photometric Redshifts Using Boosted Decision Trees --- Stacked photo-z PDFs from this method give improved estimates of the redshift distribution of objects, N(z).

  • Bonfield D.G., et al., MNRAS 405 987 (2010): Photometric redshift estimation using Gaussian processes --- Gaussian process regression is superior to other regression codes when the training set is small, as is the case with high redshift spectroscopy, the RMS errors increase more smoothly, and the method can interpolate across gaps.

  • Myers A.D., White M. & Ball N.M., MNRAS 399 2279 (2009): Incorporating photometric redshift probability density information into real-space clustering measurements --- Use of full redshift PDFs for objects increases the signal-to-noise of the clustering signal such that the effective survey size is increased by 4-5x compared to the use of single values.

  • Wolf C., MNRAS 397 520 (2009): Bayesian photometric redshifts with empirical training sets --- Empirical chi-square photo-zs, combining the advantages of empirical and template approaches.

  • van Breukelen C. & Clewley L., MNRAS 395 1845 (2009): A reliable cluster detection technique using photometric redshifts: introducing the 2TecX algorithm --- Using the full probability density function in redshift, as opposed to just a single value plus Gaussian error, and combining two cluster finding methods (Voronoi and Friends-of-Friends) substantially reduces the number of spurious detections when searching for high redshift galaxy clusters.

  • Wittman D., ApJL 700 174 (2009): What Lies Beneath: Using p(z) to Reduce Systematic Photometric Redshift Errors --- Using a photo-z estimator that is a good approximation to the full photo-z PDF 'can substantially reduce systematics in dark energy parameter estimation from weak lensing, at no cost to the survey.'

  • Budavari T., ApJ 695 747 (2009): A Unified Framework for Photometric Redshifts --- General formalism, in which template-based and empirical photo-zs are special cases, enabling their advantages to be combined.

  • Ball N.M., et al., ApJ 683 12 (2008): Robust Machine Learning Applied to Astronomical Data Sets. III. Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX --- The use of full PDFs assigned by the k nearest neighbour method for photo-zs of quasars, as opposed to a single-valued redshift, can be used, by selecting quasars with a single peak in their probability distribution, to virtually eliminate catastrophic failures (objects where the redshift is completely incorrect), a feature that previously plagued quasar datasets.

  • Firth A.E., Lahav O. & Somerville R.S., MNRAS 339 1195 (2003): Estimating photometric redshifts with artificial neural networks --- 'ANNs produce photometric redshift accuracies at least as good as and often better than the template-fitting method. The Bayesian priors on the underlying redshift distribution are automatically taken into account. Furthermore, inputs other than galaxy colours – such as morphology, angular size and surface brightness – may be easily incorporated, and their utility assessed.'

  • Budavari T., et al., AJ 120 1588 (2000): Creating Spectral Templates From Multicolor Redshift Surveys --- Using an iterative procedure to improve the photo-z templates utilizing empirical data, 'We find that in a small number of iterations the dispersion in the photometric redshifts estimator (a comparison between predicted and measured redshifts) can decrease by up to a factor of 2.'

  • Csabai I., et al., AJ 119 69 (2000): Reconstructing Galaxy Spectral Energy Distributions from Broadband Photometry --- Using a method that combines the information from templates and empirical data, 'we demonstrate that these improved spectral energy distributions lead to a photometric redshift relation for the Hubble Deep Field that is more accurate than standard template-based approaches.'

Data Mining Literature

These links provide an overview of the astronomy literature that uses the main methods of data mining.

Although it is trivial to, e.g., query ADS for "neural networks", these queries add some simple refinements to give a better overview. Nevertheless, because we limit the number of variations to keep control of the multiplicity of possible links, the returned results are not guaranteed complete.

The numbers are the number of results returned by the query, as of Sep 26th 2011.

Data Mining Algorithm Refereed Non-refereed
Neural network 634 736
Decision tree 35 36
Genetic algorithm 225 204
Support vector machine 49 83
k nearest neighbor 12 14
k-means clustering 6 2
Expectation maximization 39 36
Kernel density estimation 22 14
Self organizing map 27 33
Independent component analysis 32 23

Some notes on query criteria:

  • We query astronomy and not arXiv, or physics, because those return a large number of non-astronomy articles.
  • We query both the title and abstract.
  • We provide two links, refereed, and not refereed, but do not differentiate between 'articles' and other sources for non-refereed (all refereed are articles), or between journals and conference proceedings.
  • We sort by citation count, not, e.g., normalized citations, because the importance here is the impact of the given data mining paper, not individual authors.
  • We set number of items to return to a large value (9999), since some queries return more than 200 results, but not a large amount more
  • All other ADS settings are the default settings, unless otherwise stated (e.g., short list format, synonym replacement on).
  • Because the results are automated, and, e.g., synonym replacement is on, occasional hits will be spurious.
  • The EM query is done without quotes or synonym replacement

-- NickBall - 19 Mar 2011
-- NickBall - 23 Sep 2011
-- NickBall - 03 Oct 2011

Topic revision: r8 - 2011-10-03 - NickBall
This site is powered by the TWiki collaboration platformCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback