TWiki
>
IVOA Web
>
IvoaKDD
>
IvoaKDDTemplate
(2011-05-04,
CiroDonalek
)
(raw view)
E
dit
A
ttach
<H1>IVOA KDD-IG: Template datasets for algorithm benchmarking</H1> <br/> --- %TOC% --- ---++ Who's interested? RoyWilliams<br> CiroDonalek<br/> RaffaeleDAbrusco<br/> <hr/> ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/ <p><b>reg_class: data from SDSS-DR7.</b><br/> <br/> Dataset 1: regression problem.<br/> # err_umg err_gmr err_rmi err_imz umg gmr rmi imz z<br/> The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.<br/> <br/> ******************************<br/> <br/> Dataset 2: classification problem.<br/> # umg gmr rmi imz specClass<br/> The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.<br/> The target values are (0, 1, 3, 2, 4, 6).<br/> While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.<br/> Classes:<br/> 0 -> unknown source<br/> 1 -> star<br/> 2 -> galaxy<br/> 3 -> quasars<br/> 4 -> high redshift quasars<br/> 5 -> artifact<br/> 6 -> late type stars<br/> <br/> ******************************<br/> <br/> Dataset 3: regression problem.<br/> # err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec<br/> This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.<br/> <hr/> <b>Notes</b> <p>I am going to add more datasets... -- Ciro</p> <p> It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG: </p> <ul> <li> Catalog of sources, each with ID, position, magnitudes, ... more <li> Light curve of source, with time, magnitude, upper-limit observations, ... more. <li> Image, with WCS, calibration, ...more. </ul> <p> In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?). </p><p> One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the *semantics* of that database schema must enable the *semantics* of every proposed output format. -- Roy </p> <br/> <!-- * Set ALLOWTOPICRENAME = IVOA.TWikiAdminGroup -->
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r9
<
r8
<
r7
<
r6
<
r5
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r9 - 2011-05-04
-
CiroDonalek
IVOA
Log in
or
Register
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
TWiki intro
TWiki tutorial
User registration
Notify me
Working Groups
Applications
Data Access Layer
Data Model
Grid & Web Services
Registry
Semantics
Interest Groups
Data Curation
Education
Knowledge Discovery
Operations
Radio Astronomy
Solar System
Theory
Time Domain
Committees
Stds&Procs
www.ivoa.net
Documents
Events
Members
XML Schema
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback