Revision 102012-06-26 - root

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

RoyWilliams
CiroDonalek
RaffaeleDAbrusco

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.

******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.
Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.

Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

Revision 92011-05-04 - CiroDonalek

  META TOPICPARENT 
  name="IvoaKDD"  

 IVOA KDD-IG: Template datasets for algorithm benchmarking 




 
 IVOA KDD-IG: Template datasets for algorithm benchmarking 
  Who's interested?
 
 


 Who's interested? 
RoyWilliams

CiroDonalek

RaffaeleDAbrusco


ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/
- META TOPICPARENT
+  name="IvoaKDD"
-<
<
+reg_class: data from SDSS-DR7.
->
>
+reg_class: data from SDSS-DR7.
 Dataset 1: regression problem.

# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z

The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts).
Whatever combination of the first 9 columns can be used.
-<
<
+Training, evaluation (if needed) and test sets must be extracted from this file.
->
>
+Training, evaluation (if needed) and test sets must be extracted from this file.
 ******************************



Dataset 2: classification problem.

# umg gmr rmi imz specClass

The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification).
Training, evaluation (if needed) and test sets must be extracted from this file.

The target values are (0, 1, 3, 2, 4, 6).

While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target,
a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.
-<
<
 Classes:

0 -> unknown source

1 -> star

2 -> galaxy

3 -> quasars

4 -> high redshift quasars

5 -> artifact

6 -> late type stars



******************************



Dataset 3: regression problem.

# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec

This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.
-<
<
+Notes
->
>
+Notes
 I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:


 Catalog of sources, each with ID, position, magnitudes, ... more
 Light curve of source, with time, magnitude, upper-limit observations, ... more.
 Image, with WCS, calibration, ...more.


In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format.
-- Roy












<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 82011-05-04 - CiroDonalek

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

RoyWilliams

Changed:

<
<

CiroDonalek
RaffaeleDAbrusco

>
>

CiroDonalek
RaffaeleDAbrusco

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.

Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.

Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 72011-05-04 - RaffaeleDAbrusco

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

RoyWilliams
CiroDonalek

Added:

>
>

RaffaeleDAbrusco

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.

Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.

Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 62011-05-04 - CiroDonalek

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

RoyWilliams
CiroDonalek

Added:

>
>

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.

Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.

Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 52010-10-13 - RaffaeleDAbrusco

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

RoyWilliams

Changed:

<
<

Ciro Donalek

>
>

CiroDonalek

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 42010-09-10 - RaffaeleDAbrusco

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Who's interested?

Added:

>
>

RoyWilliams
Ciro Donalek

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 32010-07-16 - RoyWilliams

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Deleted:

<
<

Who's interested?

Added:

>
>

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

Catalog of sources, each with ID, position, magnitudes, ... more
Light curve of source, with time, magnitude, upper-limit observations, ... more.
Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 22010-07-16 - RaffaeleDAbrusco

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

Added:

>
>

IVOA KDD-IG: Template datasets for algorithm benchmarking
- Who's interested?

Added:

>
>

Who's interested?

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 12010-07-16 - RaffaeleDAbrusco

META TOPICPARENT	name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Difference: IvoaKDDTemplate (1 vs. 10)

Revision 102012-06-26 - root

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 92011-05-04 - CiroDonalek

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 82011-05-04 - CiroDonalek

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 72011-05-04 - RaffaeleDAbrusco

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 62011-05-04 - CiroDonalek

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 52010-10-13 - RaffaeleDAbrusco

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 42010-09-10 - RaffaeleDAbrusco

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 32010-07-16 - RoyWilliams

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 22010-07-16 - RaffaeleDAbrusco

IVOA KDD-IG: Template datasets for algorithm benchmarking

Who's interested?

Revision 12010-07-16 - RaffaeleDAbrusco

IVOA KDD-IG: Template datasets for algorithm benchmarking