IVOA Web>IvoaKDD>IvoaKDDTemplate (2011-05-04, CiroDonalek)

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

RoyWilliams
CiroDonalek
RaffaeleDAbrusco

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.

******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.
Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.

Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

Topic revision: r9 - 2011-05-04 - CiroDonalek

IVOA

Log in or Register

IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics

Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki

TWiki intro
TWiki tutorial
User registration
Notify me

Working Groups

Interest Groups

Time Domain

Committees

Stds&Procs

www.ivoa.net
Documents
Events
Members
XML Schema