IVOA Knowledge Discovery Interest Group


Knowledge discovery is the task of processing and analyzing data-sets with the aim of extracting new knowledge. This area spans widely across multiple disciplines, including visualization, remote data exploration, machine learning techniques, statistical methods, workflow orchestration, and polymorphic data access. To support the process of discovery, the KD-IG interacts closely with the other working/interest groups and feeds back requirements of the scientific community.

The activities of the KD-IG include the following items with a strong emphasis on the first two points:

  1. Participating in the definition of new data preservation and exchange formats with respect to support machine learning algorithms.
  2. Introducing uncertainties and probabilistic description to VO standards and services.

  3. Presenting and collecting best practice examples of scientific data analytics in astronomy.
  4. Defining requirements for implementing and adding machine learning capabilities to services.
  5. Coordinating and unifying the access to data visualization functionalities.
  6. Discussing the aspect of data provenance with respect to data used to derive/train models.
  7. Introducing proper statistical scoring and evaluation methods as services.
  8. Contributing to the discussion on scripting and orchestrating the scientific discovery workflow.
  9. Supporting the development of dedicated knowledge discovery applications.


During the Strasbourg InterOp Meeting it emerged the need for an Interest Group on Data Mining (KD-IG) as an indispensable step to bridge the Virtual Observatory Infrastructure with the expected VO science. In fact, "...Data mining, or KDD, is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in data. In other words, traditional data analysis is assumption driven as a hypothesis is formed and validated against the data. Data mining, in contrast, is discovery driven as the patterns are automatically extracted from data....”

Data Mining is a rapidly evolving set of methodologies which needs to be imported under the VO umbrella and not just another application. As such, DM cannot be just a tool or a suite of tools offered by a group of developers to a “passive community”. Data Mining involves a large number of researchers across many domains. The astronomical community, which has only recently entered the Massive Data Sets era, makes use of just a handful of methods and tools which very often are far from optimal. The synergy of different expertise present in the IVOA makes it the ideal arena for exploring new and more modern approaches.

KD-IG requires a strong and continuous interaction with the scientific community which, besides testing the proposed solutions, methods, and tools, will also provide feedback and inputs aiming at extending the scientific capabilities of the VO.

The KD-IG will interfaces to many other IVOA working and interest groups: Applications, Semantics, Time Domain, Data Models, Grid & Web Services, and Resource Registry. This cross- discipline nature is also a primary reason to create a specific IG. Data Mining, in fact, addresses sophisticated and extreme modes of usage which require a careful orchestration and fine tuning of standards, methods, and tools provided by the other IVOA WGs and IGs. Typical examples are the automatic extraction of bases of knowledge from VO archives using VO ontologies; the transparent access to large computational facilities regardless the computational paradigm; the automated switching from asynchronous to synchronous mode of data access; and the extreme usage of workflows and advanced visualization methods. Furthermore, effective KD requires the possibility for an inexperienced user to contribute, or at least seamlessly use under the VO infrastructure, his/her own KD routines and methods. This situation puts strong requirements on security issues and opens new problems for ticketing and scheduling. In other words, the KD-IG will provide feedback to the solutions implemented by the WG’s and, by posing new operational problems, will stimulate the development and adoption of new solutions and standards.

We also wish to stress that, in ultimate analysis, the goal of the KD-IG is to allow the VO to produce new scientific knowledge publishable in astronomical journals. On the one end its activities will contribute to demonstrate to the community the power and necessity of federated access to the vast VO universe of data and, on the other, KD-IG will illustrate the power and performance of data mining algorithms to facilitate and accelerate astronomical discovery within this data universe.

KD-IG Meetings

IG Session At Meeting When Where
IVOA.InterOpMay2010KDD InterOpMay2010 May 2010 Victoria
IVOA.InterOpMay2011KDD InterOpMay2011 May 2011 Naples
IVOA.InterOpMay2016-KDD InterOpMay2016 May 2016 Cape Town
IVOA.InterOpMay2017-KDD InterOpMay2017 May 2017 Shanghai
IVOA.InterOpOCt2017KDD InterOpOct2017 October 2017 Santiago
IVOA.InterOpMay2018KDD InterOpMay2018 May 2018 Victoria
IVOA.InterOpMay2019KDD InterOpMay2019 May 2019 Paris
IVOA.InterOpMay2020KDD InterOpMay2020 May 2020 Sydney virtual
IVOA.InterOpNov2021KDD InterOpNov2021 Nov 2021 virtual

Other Interesting Meetings for KD-IG

Meeting When Where Docs
Challenges and Methods for Massive Astronomical Data August 2010 CfA Slides

Related Topics

In the past hot topics had been identified. These are following the priorities emerged during the first KD-IG meeting held at the IVOA.InterOpMay2010KDD in Victoria, and singled out by the Chair in his welcome message to the members.

Please follow the links and edit the specific pages:

  1. Dictionary of Data Mining terms;
  2. Census of Data Mining and Machine Learning tools and methods of astronomical interest;
  3. Template datasets for algorithm benchmarking;
  4. A user guide for Knowledge Discovery in Databases in Astronomy;
  5. Knowledge Discovery in Databases and VO standards;
  6. Specific fields of applications of KDD in astronomical research;


Chair: RaffaeleDAbrusco

Vice Chair: Yihan Tao

Edit | Attach | Watch | Print version | History: r44 < r43 < r42 < r41 < r40 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r44 - 2022-09-07 - YihanTao
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback