<H1>IVOA KDD-IG: Template datasets for algorithm benchmarking</H1> <br/> --- %TOC% --- ---++ Who's interested? RoyWilliams<br> CiroDonalek<br/> RaffaeleDAbrusco<br/> <hr/> ftp://ftp.astro.caltech.edu/users/donalek/DM_templates/ <p><b>reg_class: data from SDSS-DR7.</b><br/> <br/> Dataset 1: regression problem.<br/> # err_umg err_gmr err_rmi err_imz umg gmr rmi imz z<br/> The first 8 columns contain the features (color errors and colors of galaxies), the last contains the target (spectroscopic redshifts). Whatever combination of the first 9 columns can be used. Training, evaluation (if needed) and test sets must be extracted from this file.<br/> <br/> ******************************<br/> <br/> Dataset 2: classification problem.<br/> # umg gmr rmi imz specClass<br/> The first 4 columns contain the features (colors of stellar sources), the last column contains the target (spectroscopic classification). Training, evaluation (if needed) and test sets must be extracted from this file.<br/> The target values are (0, 1, 3, 2, 4, 6).<br/> While it would be preferable to obtain a classification in different classes corresponding to each distinct value of the target, a grosser classification in two classes (namely, target = (0,1,2,6) for stars and galaxies, and target = (3,4)) would be interesting and useful as well.<br/> Classes:<br/> 0 -> unknown source<br/> 1 -> star<br/> 2 -> galaxy<br/> 3 -> quasars<br/> 4 -> high redshift quasars<br/> 5 -> artifact<br/> 6 -> late type stars<br/> <br/> ******************************<br/> <br/> Dataset 3: regression problem.<br/> # err_umg err_gmr err_rmi err_imz umg gmr rmi imz zspec<br/> This dataset is similar to dataset 1, except for the fact that the sources are quasars, not galaxies. Same rules apply.<br/> <hr/> <b>Notes</b> <p>I am going to add more datasets... -- Ciro</p> <p> It would seem to me that these points below are intimately related: (3) the templates and (5) working with IVOA, meaning choice of data models and formats. I would see the following as interesting data objects for the KDDIG: </p> <ul> <li> Catalog of sources, each with ID, position, magnitudes, ... more <li> Light curve of source, with time, magnitude, upper-limit observations, ... more. <li> Image, with WCS, calibration, ...more. </ul> <p> In each case, there is an IVOA approach to these things through the spectrum data model (VOTable + Utypes). There are also other formats for these things that are not IVOA approved. And of course there is always a call for the stripped-down CSV (just gimme the data). The archive can have many format choices (so what are they?). </p><p> One could also say that the formatting is easy once the data is in the database, which avoids choice of a specific syntax. But the *semantics* of that database schema must enable the *semantics* of every proposed output format. -- Roy </p> <br/> <!-- * Set ALLOWTOPICRENAME = IVOA.TWikiAdminGroup -->
This topic: IVOA
Topic revision: r9 - 2011-05-04 - CiroDonalek
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback