Difference: IvoaKDDTemplate (1 vs. 10)

Revision 102012-06-26 - root

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

RoyWilliams
CiroDonalek
RaffaeleDAbrusco

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.

******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.
Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.


Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


Revision 92011-05-04 - CiroDonalek

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

RoyWilliams
CiroDonalek
RaffaeleDAbrusco

ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/
Changed:
<
<

reg_class: data from SDSS-DR7.

>
>

reg_class: data from SDSS-DR7.

 
Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used.
Changed:
<
<
Training, evaluation (if needed) and test sets must be extracted from this file.
>
>
Training, evaluation (if needed) and test sets must be extracted from this file.
 
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.
Deleted:
<
<

 Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.

Changed:
<
<
Notes
>
>
Notes
 

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


<--  
-->

Revision 82011-05-04 - CiroDonalek

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

RoyWilliams
Changed:
<
<
CiroDonalek
RaffaeleDAbrusco
>
>
CiroDonalek
RaffaeleDAbrusco
 
ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.

Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.


Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


<--  
-->

Revision 72011-05-04 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

RoyWilliams
CiroDonalek
Added:
>
>
RaffaeleDAbrusco
 
ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.

Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.


Notes

I am going to add more datasets... -- Ciro

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


<--  
-->

Revision 62011-05-04 - CiroDonalek

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

RoyWilliams
CiroDonalek

Added:
>
>
ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/

reg_class: data from SDSS-DR7.

Dataset 1: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz z
The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.
******************************

Dataset 2: classification problem.
# umg gmr rmi imz specClass
The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.
The target values are (0, 1, 3, 2, 4, 6).
While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.

Classes:
0 -> unknown source
1 -> star
2 -> galaxy
3 -> quasars
4 -> high redshift quasars
5 -> artifact
6 -> late type stars

******************************

Dataset 3: regression problem.
# err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec
This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.


Notes

I am going to add more datasets... -- Ciro

 

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


<--  
-->

Revision 52010-10-13 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

RoyWilliams
Changed:
<
<
Ciro Donalek
>
>
CiroDonalek
 

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


<--  
-->

Revision 42010-09-10 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Who's interested?

Added:
>
>
RoyWilliams
Ciro Donalek
 

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy


<--  
-->

Revision 32010-07-16 - RoyWilliams

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking




Deleted:
<
<
 

Who's interested?

Added:
>
>

It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG:

  • Catalog of sources, each with ID, position, magnitudes, ... more
  • Light curve of source, with time, magnitude, upper-limit observations, ... more.
  • Image, with WCS, calibration, ...more.

In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?).

One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the semantics of that database schema must enable the semantics of every proposed output format. -- Roy

 


<--  
-->

Revision 22010-07-16 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking


Added:
>
>


 
Added:
>
>

Who's interested?

 


<--  
-->

Revision 12010-07-16 - RaffaeleDAbrusco

 
META TOPICPARENT name="IvoaKDD"

IVOA KDD-IG: Template datasets for algorithm benchmarking



<--  
-->
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback