Harvesting outliers: data barriers to turn anomalies into discoveries
Over the last few years astronomers have become increasingly effective
at identifying anomalous objects in large astronomical datasets. So
far, that has meant "finding objects in sparsely populated regions of
a multidimensional feature space". This is done using a number of
methods that includes ensemble methods such as random forests
searches, and more recently generative models that identify anomalies
as those objects are more difficult to reconstruct by the trained
model. This has produced huge lists of anomalies in diverse datasets
that include SDSS galaxy spectra, Kepler and TESS light curves, and
X-ray catalogs. Yet, most of those anomalies are not followed up,
because of a cultural difficulty for scientists to interpret
multi-dimensional scatter plots that have no labels in their axes. We
argue that such cultural barrier can be overcome with novel ways to
combine domain knowledge expertise with data visualization, or even
incorporating domain knowledge directly into the anomaly detection
algorithms. We would like to discuss ways in which VO tools can help
in the identification of anomalies that represent true astronomical
discoveries, by harvesting the currently publicly available catalogs
of anomalies.