IvoaKDDTemplate < IVOA

<H1>IVOA KDD-IG: Template datasets for algorithm benchmarking</H1>
<br/>

---
%TOC%
---
---++ Who's interested?
RoyWilliams<br>
CiroDonalek<br/>
RaffaeleDAbrusco<br/>
<hr/>
ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/
<p><b>reg_class: data from SDSS-DR7.</b><br/>
<br/>
Dataset 1: regression problem.<br/>
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z<br/>
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts).
Whatever combination of the first 9 columns can be used.
Training, evaluation (if needed) and test sets must be extracted from this file.<br/>
<br/>
******************************<br/>
<br/>
Dataset 2: classification problem.<br/>
# umg gmr rmi imz specClass<br/>
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification).
Training, evaluation (if needed) and test sets must be extracted from this file.<br/>
The target values are (0, 1, 3, 2, 4, 6).<br/>
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target,
a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.<br/>
Classes:<br/>
0 -> unknown source<br/>
1 -> star<br/>
2 -> galaxy<br/>
3 -> quasars<br/>
4 -> high redshift quasars<br/>
5 -> artifact<br/>
6 -> late type stars<br/>
<br/>
******************************<br/>
<br/>
Dataset 3: regression problem.<br/>
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec<br/>
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.<br/>
<hr/>
<b>Notes</b>
<p>I am going to add more datasets... -- Ciro</p>
<p>
It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:
</p>
<ul>
<li> Catalog of sources, each with ID, position, magnitudes, ... more
<li> Light curve of source, with time, magnitude, upper-limit observations, ... more.
<li> Image, with WCS, calibration, ...more.
</ul>
<p>
In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).
</p><p>
One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the *semantics* of that database schema must enable the *semantics* of every proposed output format.
-- Roy
</p>









<br/>
<!--
      * Set ALLOWTOPICRENAME = IVOA.TWikiAdminGroup
-->
This topic: IVOA > WebHome > IvoaKDD > IvoaKDDTemplate
Topic revision: r9 - 2011-05-04 - CiroDonalek