=================================================================== From: Pedro Osuna Reply-To: Pedro Osuna To: dm@ivoa.net Cc: Pedro.Osuna@esa.int, Christophe.Arviset@esa.int, Jesus.Salgado@sciops.esa.int Subject: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 12:48:16 +0200 Dear all, at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to get the vacant responsibility of coordinating the efforts in a "Catalogue" subgroup of the Data Model. After his agreement, I'm sending this note to ask for volunteers to join this subgroup and start getting inputs from all of you to compile information that would eventually become a Catalogue Data Model recommendation. In order to give a bit of flesh on what I understand we are after, I send you some brainstorming on the whole idea of the Catalogue DM and hope it serves to start proper discussions on the issue. Further mails on this will appear with a [CATALOGUE] heading so that they can be conveniently filtered/trashed. Thank you. Cheers, Pedro Osuna. Catalogue Data Model Subgroup starting inputs --------------------------------------------- In order to build a proper Data Model for Catalogues, I think it would be important to answer the following questions: 1) What is a Catalogue? 2) What is a Catalogue used for? 3) Why do we want to model Catalogues? 4) Where do Catalogues find a place within the VO? 5) What are the interesting Use Cases for a Catalogue DM? The most important in the first stages of this work is to identify what exactly we mean by a Catalogue, to come to a common agreement on what we will be modeling. Some of my own views on the definition of what a Catalogue is follow with the idea to serve as a starting/discussion point. DEFINITION OF A CATALOGUE ------------------------- From "Webster's Revised Unabridged Dictionary (1913)": "[...]A list or enumeration of names, or articles arranged methodically, often in alphabetical order; as, a catalogue of the students of a college, or of books, or of the stars.[...]" In the case of astronomy, thus, a catalogue would be a list or enumeration of certain astronomical objects (to be clarified later) in a certain order and including certain information per object. The definition of an astronomical object in this context would vary. An astronomical object could be anything from Stars to Galaxies, etc., but also something more general like Observations, Sources or Observatories. In this sense, the Catalogue data model would not have to describe the inner details of the object it is cataloging, that should be described in other data models, but just the information relevant for the catalogue itself. It is also true that some of the internal properties of the astronomical objects would appear in the catalogue itself through its columns. For example, the XMM-Newton "1XMM" is a list of serendipitous sources detected by the satellite in its observing campaign. The model for this catalogue could consist of things like the provenance (ESA), number of columns (400) number of rows (~32000), etc., or it might give more relevant information like: column number three in the catalogue is the Source.likelihood where likelihood is an attribute of the Source Data Model. I think this is an interesting point for discussion..... A place to find literally thousands of catalogues is the CDS, where they have 5587 Catalogues available. Their clasification of the catalogues obeys to the type of data they are cataloging, e.g., Astrometric Data, Photometric data, Spectroscopic data, etc.. The same question as above on whether we would have to create specific data model for each of the eventual astronomical object categories we are cataloging arises. It would be nice, in passing, to get someone from CDS directly involved in this subgroup, given their experience in catalogues. A point to clarify as well is whether a catalogue -in the Data Model sense- has to be bi-dimensional or can have more than two dimensions. What I mean by this is that, for example, we might have two different catalogues for the same set of objects, one for filter A and the other one for filter B. In the Data Model, however, we might have a unique object with just three axes, one for the objects, other for filter A and the other for filter B. The final representations of the catalogues would always be bi-dimensional, but a Data Model representation allowing more axes would be more compact, powerful and flexible. Whether this would be a Pandora box or not I hope to get people's impressions.... In summary, there are obvious things to model from a catalogue, like its provenance, number of columns, type of columns, names of columns, number of rows, etc., but there are others which might make the model more interesting and powerful, like including n-dimensions (in the, let's say, cartesian sense of orthogonal catalogues, not in a relational one) or linking the objects cataloged with their own data model.... Hope this serves somehow as a starting point. I will be on holiday, back on Aug 23., then I'll process any eventual inputs you sent. P.S.: on a personal note to me, Jonathan was touching on the issue of whether we should say CATALOG or CATALOGUE, and the same for other IVOA standard docs, whether we should use British or American english. Not being a native speaker, I don't feel with the right to say anything and apologize beforehand because of my absence of accuracy when writing this, and other, word(s). -- Pedro Osuna Alcalaya Software Engineer European Space Astronomy Center (ESAC/ESA) e-mail: Pedro.Osuna@esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN ============================================================================= From: Matthew Graham To: Pedro Osuna Cc: dm@ivoa.net, Pedro.Osuna@esa.int, Christophe.Arviset@esa.int, Jesus.Salgado@sciops.esa.int Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 08:31:16 -0700 (PDT) Hi Pedro, I am curious how you see the difference between the Catalogue DM and VOTable? Surely, at least, VOTable is the XML serialization of whatever it is that the Catalogue DM group come up with, so isn't this more a case of reverse engineering? Cheers, Matthwe ============================================================================= From: Jonathan McDowell Reply-To: Jonathan McDowell To: dm@ivoa.net Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 11:50:43 -0400 (EDT) > VOTable is the XML serialization of whatever it is that the Catalogue DM group come up with, so isn't this more a case of reverse engineering? No - the other working groups have made it clear they have requirements for a standard model for astronomical source catalogs. The VOTable is a serialization of a simple table, but an astronomical catalog is more than a table - it will have extra standard metadata linking the sources to their parent observations and extraction algorithms, for instance. One output of the Catalogue data model effort will certainly be a more formal statement of the VOTable model, but another output will be recommendations for ways to serialize this extra metadata in VOTable (particular PARAM and FIELD values for certain things, for example). And yet another output will be an XML schema for those who prefer to use generic XML, although in the particular case of catalogs I hope that a VOTable-based serialization will be the preferred approach. But just saying "write a VOTable" is not a sufficient spec. I hope the CDS folks can say a little about how a Vizier README is converted to VOTable, and others can comment on how pipeline-generated catalogs should be recorded, and what extra metadata (wavelet scales, data characterization like wavelength band, etc) are appropriate. - Jonathan ============================================================================= From: Brian Thomas To: Pedro Osuna Cc: Pedro.Osuna@esa.int, Christophe.Arviset@esa.int, Jesus.Salgado@sciops.esa.int Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 12:06:07 -0400 On Monday 26 July 2004 06:48 am, Pedro Osuna wrote: > Dear all, > > at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to > get the vacant responsibility of coordinating the efforts in a > "Catalogue" subgroup of the Data Model. > > After his agreement, I'm sending this note to ask for volunteers to join > this subgroup and start getting inputs from all of you to compile > information that would eventually become a Catalogue Data Model > recommendation. > Hi Pedro, I have been working on catalog schema for some time, most recently for NOAO survey data. I am interested in belonging to this subgroup. =b.t. -- * Dr. Brian Thomas * Dept of Astronomy/University of Maryland-College Park * NOAO Science Archive * Code 630.1/Goddard Space Flight Center-NASA * fax: (301) 286-1775 * phone: (301) 286-6128 [GSFC] (301) 405-2312 [UMD] ============================================================================= From: Roy Williams Reply-To: Roy Williams To: Jonathan McDowell , dm@ivoa.net Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 11:37:34 -0700 > The VOTable > is a serialization of a simple table, but an astronomical catalog > is more than a table - it will have extra standard metadata linking > the sources to their parent observations and extraction algorithms, > for instance. You will use inheritance, I hope. Not just build everything from scratch? We already have a simple example. A ConeSearchResponse inherits from the VOTable model. It is a VOTable that must have RA, Dec, and ID attributes. We can also inherit curatedTable from Table by adding the VOResource curation information. Please tell me you are not going to rebuild all this stuff that we already have....? Roy ============================================================================= From: Kirk Borne Reply-To: Kirk Borne To: roy@caltech.edu Cc: Jonathan McDowell , dm@ivoa.net, Kirk Borne (at George Mason University) Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 15:11:00 -0400 (EDT) A catalogue is not a table, generally speaking. It can be expressed as a table or as a set of many tables, but that is not the point. A catalogue is a set of derived data: derived from imaging, spectral, event lists, time series, interferometric, or other types of data. The data model must describe not only the structure of the "table", including attributes and values, but also inheritance, provenance, semantics, error columns, and more (units, formats, column-column relationships, table-table relationships). We will *not* have to rebuild all this stuff that we already have, if we simply recognize the work of the former-ADC group, which started building the catalogue model already, as expressed in "dataset"... http://xml.gsfc.nasa.gov/#dataset Furthermore, UCDs already provide the semantics. - Kirk > From owner-dm@eso.org Mon Jul 26 14:51:05 2004 > From: "Roy Williams" > To: "Jonathan McDowell" , > Subject: Re: [CATALOGUE]Starting Data Model Subgroup > Date: Mon, 26 Jul 2004 11:37:34 -0700 > > > The VOTable > > is a serialization of a simple table, but an astronomical catalog > > is more than a table - it will have extra standard metadata linking > > the sources to their parent observations and extraction algorithms, > > for instance. > > You will use inheritance, I hope. Not just build everything from scratch? > > We already have a simple example. A ConeSearchResponse inherits from the > VOTable model. It is a VOTable that must have RA, Dec, and ID attributes. > > We can also inherit curatedTable from Table by adding the VOResource > curation information. > > Please tell me you are not going to rebuild all this stuff that we already > have....? > > Roy > ============================================================================= From: Arnold Rots Reply-To: Arnold Rots To: dm@ivoa.net Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 15:29:23 -0400 (EDT) That's correct, it's a set of derived data - often in the past, for practical reasons, printed as a table. But now that we are much more flexible in choosing our publication forms, there may be links to data objects from which parameter values may be derived on the fly and according to the user's specifications. Or, for that matter, the publication form may be graphical, or a spreadsheet. In other words, we now have the ability to create truly interactive catalogs and it would be a pity to constrain ourselves by what a catalog(ue) used to look like. And, by the way, STC contains space-time coordinate metadata elements specifically intended for catalog records. - Arnold Kirk Borne wrote: > A catalogue is not a table, generally speaking. It can be expressed > as a table or as a set of many tables, but that is not the point. > A catalogue is a set of derived data: derived from imaging, spectral, > event lists, time series, interferometric, or other types of data. > The data model must describe not only the structure of the "table", > including attributes and values, but also inheritance, provenance, > semantics, error columns, and more (units, formats, column-column > relationships, table-table relationships). > > We will *not* have to rebuild all this stuff that we already have, > if we simply recognize the work of the former-ADC group, which started > building the catalogue model already, as expressed in "dataset"... > > http://xml.gsfc.nasa.gov/#dataset > > Furthermore, UCDs already provide the semantics. > > - Kirk > > > > From owner-dm@eso.org Mon Jul 26 14:51:05 2004 > > From: "Roy Williams" > > To: "Jonathan McDowell" , > > Subject: Re: [CATALOGUE]Starting Data Model Subgroup > > Date: Mon, 26 Jul 2004 11:37:34 -0700 > > > > > The VOTable > > > is a serialization of a simple table, but an astronomical catalog > > > is more than a table - it will have extra standard metadata linking > > > the sources to their parent observations and extraction algorithms, > > > for instance. > > > > You will use inheritance, I hope. Not just build everything from scratch? > > > > We already have a simple example. A ConeSearchResponse inherits from the > > VOTable model. It is a VOTable that must have RA, Dec, and ID attributes. > > > > We can also inherit curatedTable from Table by adding the VOResource > > curation information. > > > > Please tell me you are not going to rebuild all this stuff that we already > > have....? > > > > Roy > > > > > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots@head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- ============================================================================= From: Matthew Graham Reply-To: Matthew Graham To: Kirk Borne Cc: roy@caltech.edu, Jonathan McDowell , dm@ivoa.net, Kirk Borne (at George Mason University) Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 26 Jul 2004 12:32:16 -0700 (PDT) Hi, > A catalogue is not a table, generally speaking. It can be expressed > as a table or as a set of many tables, but that is not the point. > A catalogue is a set of derived data: derived from imaging, spectral, > event lists, time series, interferometric, or other types of data. > The data model must describe not only the structure of the "table", > including attributes and values, but also inheritance, provenance, > semantics, error columns, and more (units, formats, column-column > relationships, table-table relationships). This really just sounds like defining the VOTable superclass. If there is more to it than that might I suggest changing the name of this DM from Catalogue to Derived Data Set or something similar because when I think of a catalogue, I think of a tabulated structure, be it for books in my local library or lingerie from Victoria's Secret. Cheers, Matthew ============================================================================= From: Gerard Lemson Reply-To: Gerard Lemson To: Matthew Graham Cc: dm@ivoa.net Subject: RE: [CATALOGUE]Starting Data Model Subgroup Date: Tue, 27 Jul 2004 10:57:56 +0200 Hi, > > A catalogue is not a table, generally speaking. It can be expressed > > as a table or as a set of many tables, but that is not the point. > > A catalogue is a set of derived data: derived from imaging, spectral, > > event lists, time series, interferometric, or other types of data. > > The data model must describe not only the structure of the "table", > > including attributes and values, but also inheritance, provenance, > > semantics, error columns, and more (units, formats, column-column > > relationships, table-table relationships). > > This really just sounds like defining the VOTable superclass. If there is > more to it than that might I suggest changing the name of this DM from > Catalogue to Derived Data Set or something similar because when I think of > a catalogue, I think of a tabulated structure, be it for books in my local > library or lingerie from Victoria's Secret. > The Catalogue data model will hopefully provide a formal way to describe an astronomical catalogue "completely". This will include more than simply saying it is a table with a certain number of columns whose meaning can sometimes be described by some string expression. The fact that some catalogue instances can be serialized into such a structure does not mean that a catalogue "is-a" (VO)table, or a (VO)table "is-a" catalogue. It is precisely the task of the DM group to define ways by which one can distinguish different tabular data structures by providing means for describing their contents, i.e. by providing a model for the meta-data that should be attached to the structure. To stay with your metaphor, such a description will help you make the right choice in the morning so that you will come to work wearing a boxer short instead of the latest Harry Potter. Cheers Gerard ============================================================================= From: Pierre Didelon Reply-To: Pierre Didelon To: dm@ivoa.net Cc: Kirk Borne , Kirk Borne (at George Mason University) Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Tue, 27 Jul 2004 11:14:07 +0200 Hi everybody, some comment concerning provenance/history handling, a little bit off from the main subject of this thread, but which may (perhaps) impact deeply on catalog design. Even if obvious as claimed by Pedro Osuna, "In summary, there are obvious things to model from a catalogue, like its provenance...", it can be complex depending of the level considered. In fact a catalog history can be handled at different level; - first it can be related to the whole catalog it self -> one catalog - one provenance/history - it can be related to a column or a row -> one row/column - one provenance/history - it can be related to a cell -> one cell - one provenance/history - or even it can be related to a group of cell, a sub-cube in the catalog cube ( of eventually n dim.) -> one sub-cube/group_of_data - one provenance/history One obvious and simple example of this, is illustrated by all RA, DEC ,ErrRA, ErrDEC obtained with one astrometric calibration for a certain set of astronomical objects; in this case this sub-cube (4 cols * n rows) has a common history which can be different from a photometric data part of the (same?) catalog. I remember that I had a fruitfull conversion with Pat Dowler in Cambridge (UK) concerning this subject, and their related experience in CADC. He can may be add some comments on this subject. But it can be seen already, that depending of the kind of granularity we want to handle with provenance/history, the implementation may be different and more or less complex. Kirk Borne wrote: > A catalogue is not a table, generally speaking. It can be expressed > as a table or as a set of many tables, but that is not the point. > A catalogue is a set of derived data: derived from imaging, spectral, > event lists, time series, interferometric, or other types of data. Yes. But how derivation is made, and how is it kept in (or with) the catalog is not always identical depending of the kind of catalog. I remember an article of C.Jaschek making a kind of classification of catalog types between Observation Catalog to Compilation Catalog of data collections merging and homogenisation : http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1984QJRAS..25..259J&db_key=AST&high=3d9c6cf76d17675 some considerations may be of interest, as well as some from another article concerning information and catalogues, http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1973IAUS...50..275J&db_key=AST&high=3d9c6cf76d17675 > The data model must describe not only the structure of the "table", > including attributes and values, but also inheritance, provenance, remarks, above. > semantics, error columns, and more (units, formats, column-column > relationships, table-table relationships). > > We will *not* have to rebuild all this stuff that we already have, > if we simply recognize the work of the former-ADC group, which started > building the catalogue model already, as expressed in "dataset"... > > http://xml.gsfc.nasa.gov/#dataset As well as the CDS ReadMe and all former catalog descriptions they use, (see http://vizier.u-strasbg.fr/doc/catstd.htx). All this, including VOTable, must guide the DM group, but not frooze the DM Catalogue elaboration. > > Furthermore, UCDs already provide the semantics. > > - Kirk > SY -- Pierre -------------------------------------------------------------------------- DIDELON :@: pdidelon_at_cea.fr Phone : 33 (0)1 69 08 58 89 CEA SACLAY - Service d'Astrophysique 91191 Gif-Sur-Yvette Cedex -------------------------------------------------------------------------- ============================================================================= From: Gerard Lemson Reply-To: Gerard Lemson To: dm@ivoa.net Subject: RE: [CATALOGUE]Starting Data Model Subgroup Date: Tue, 27 Jul 2004 13:18:45 +0200 > > VOTable is the XML serialization of whatever it > is that the Catalogue DM group come up with, so isn't this more a case of > reverse engineering? > > No - the other working groups have made it clear they have requirements > for a standard model for astronomical source catalogs. The VOTable > is a serialization of a simple table, but an astronomical catalog > is more than a table - it will have extra standard metadata linking > the sources to their parent observations and extraction algorithms, > for instance. One output of the Catalogue data model effort will > certainly be a more formal statement of the VOTable model, but another > output will be recommendations for ways to serialize this extra metadata > in VOTable (particular PARAM and FIELD values for certain things, for > example). And yet another output will be an XML schema for those who > prefer to use generic XML, although in the particular case of catalogs > I hope that a VOTable-based serialization will be the preferred > approach. But just saying "write a VOTable" is not a sufficient spec. > I hope the CDS folks can say a little about how a Vizier README is > converted to VOTable, and others can comment on how pipeline-generated > catalogs should be recorded, and what extra metadata (wavelet scales, > data characterization like wavelength band, etc) are appropriate. > I fully agree with Jonathan here, but would like to add some comments. I think one of the things that is often not realized is the fact that the DM WG's needs to provide models for the meta-data describing the contents of some data product, as well as models for the data themselves. For example, the Observation model is mainly a model for the meta-data describing the results of an observation. This is more than describing how the data is stored and/or formatted. The latter may be done using the Quantity model, I guess. Secondly, it still seems that people confuse the act of defining a datamodel with that of defining representations/serializations of the data model applicable to a particular runtime environment within which one wants to deal with instances of the datamodel, be that messaging (XML), Java virtual machine or relational database. Defining such serializations is, or should be part of the DM WG's tasks. In the data modeling effort it *is* extremely useful to look at existing data models, even if only implicitly represented in particular serializations, if only to see which concepts, entities, attributes and relationships others have thought of already and should therefore probably be incorporated into the IVOA data model. One can however not insist in advance that the data model itself should be tied to some existing representation, as this may be unsuitable for representations that must work in a different environment. Even when we interpret some of the comments in the context of the definition of a serialization I think we should not predefine *how* exactly to use the results of existing efforts. For example I see no a-priori reason why we should follow Roy's suggestion to "use inheritance". Inheritance is only one way in which the results of the VOTable/conesearch can be reused. Data modeling languages allow many different types of relations between entities and in fact inheritance is the one most often abused. Cheers Gerard ============================================================================= From: Ed Shaya To: Pedro Osuna , Data Model IVOA List Subject: Re: [CATALOGUE]Starting Data Model Subgroup Date: Mon, 02 Aug 2004 16:04:30 -0400 Pedro Osuna wrote: >Dear all, > >at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to >get the vacant responsibility of coordinating the efforts in a >"Catalogue" subgroup of the Data Model. > > It is great that someone is taking this on! > > > >DEFINITION OF A CATALOGUE >------------------------- > >>From "Webster's Revised Unabridged Dictionary (1913)": > >"[...]A list or enumeration of names, or articles arranged >methodically, often in alphabetical order; as, a catalogue of >the students of a college, or of books, or of the stars.[...]" > > >In the case of astronomy, thus, a catalogue would be a list or >enumeration of certain astronomical objects (to be clarified later) in a >certain order and including certain information per object. > >The definition of an astronomical object in this context would vary. >An astronomical object could be anything from Stars to Galaxies, etc., >but also something more general like Observations, Sources or >Observatories. > >In this sense, the Catalogue data model would not have to describe the >inner details of the object it is cataloging, that should be described >in other data models, but just the information relevant for the >catalogue itself. > I agree with everything up to this point. >It is also true that some of the internal properties >of the astronomical objects would appear in the catalogue itself through >its columns. > Here. The mere mention of columns is, in my opinion, out of place. The concept of rows and columns should not appear in any component of our data model. They belong in a relational database data model. Here I think we are working on a more abstract level in which objects may contain other objects. This results in tree-like structures. We should worry about transformation into a set of interelated relational tables only after the VO data model for this is complete. I believe that Roy correctly chimed in that VOTable can already do this only because Pedro incorrectly brought up the issue of describing rows and columns. > >For example, the XMM-Newton "1XMM" is a list of serendipitous sources >detected by the satellite in its observing campaign. The model for this >catalogue could consist of things like the provenance (ESA), number of >columns (400) number of rows (~32000), etc., or it might give more >relevant information like: column number three in the catalogue is the >Source.likelihood where likelihood is an attribute of the Source Data >Model. >I think this is an interesting point for discussion..... > > > A catalog should be a list of sourceObjects which holds/contains/aggregates Quantities. The quantities should be allowed to be of arbitrary depth and detail. That is, one should be free to enter QuntitySets of QuantitySets. To make this more concrete, lets talk about a general catalog of galaxies. We wish to provide at a minimum basic data about each galaxy (ie. simple quantities: magnitudes, ra, dec, morphological class). Also, one wants the Observations of each galaxy, such as Image. We may just want to hold crucial metadata about each image (exposure time, ra,dec, filter) and perhaps a URL to the actual data. But we may want to group these images into various regions. So we have /galaxy/region/observation/image so far. Region may specify not just the location on the celestial sphere, but also give information on the type of region (spiral arm, interarm, open cluster region, outerhalo, etc). There may be photometry catalogs created from these images that are to be included. These catalogs should have starObjects with mags with errors and filter info, and location pointers to pixel coordinates in the image. Some of the photometryCatalogs are the children of images but some may be concatention of several tables within a region. That would be a child of the region. Also in the region may be some higher resolution images in a crowded region (/galaxy/region/region/observation/photoCatalog). We may want to point out variable stars, supernovae, etc so one has special subCatalogs of these. There may be reasons for others to attach additional info about the variable stars since they may be messing up the TRGB distances. Finally there are outputs of the tip edge detectors and their input paramters as well. Columns does not mean anything in this context. Although one could and will provide a mechanism to serialize this by VOTABLE, a more object oriented method is prefered, not because it is easier for the human to read, but because it is easier for the machine to read. To make it manageable to the human one has XSLT scripts for each object type. One can provide skeleton views to see the general nested structure and then click on an object to display it more completely. >A place to find literally thousands of catalogues is the CDS, where they >have 5587 Catalogues available. Their clasification of the catalogues >obeys to the type of data they are cataloging, e.g., Astrometric Data, >Photometric data, Spectroscopic data, etc.. The same question as above >on whether we would have to create specific data model for each of the >eventual astronomical object categories we are cataloging arises. > > > I think this is what the DM is all about. We are creating spectral object, bandpass object, and STCobject. These are building blocks for spectralCatalog, photometricCatalog, and astrometricCatalog respectively. However, I think 90% of what one wants in any astronomicalObject is satisfied by the same set of things. Universe, cluster of galaxies, galaxy, cluster, star, planet, comet, can all take STC for location, Quantity for any global property, Region for subregions, Layer for layers like convection zone or mesosphere, Members or perhaps Parts for component parts. The real power of this schema is that one can establish a data model schema for query that is acceptable to all data centers, but completely hides each datacenters internal organization. A query for galaxies with supergiant stars in the interarm region is a simple XPath: //galaxy//region[@type="interarm"]//star//spectralType/value="supergiant" This query could be sent to all datacenters and be decomposed at the datacenters into a set of SQLs to retrieve the appropriate data and then construct a galaxyCatalog for output. Used inside of an XQuery, the request could compose an alternate structure for the output XML object such as starCatalog rather than galaxyCatalog. Ed ============================================================================= From: Martin Hill Reply-To: Martin Hill To: Data Model IVOA List Subject: Re: [CATALOGUE]Starting Data Model Subgroup - terms like 'column' Date: Tue, 03 Aug 2004 09:32:11 +0100 Ed Shaya wrote: > Here. The mere mention of columns is, in my opinion, out of place. The > concept of rows and columns should not appear in any component of our > data model. They belong in a relational database data model. Here I > think we are working on a more abstract level in which objects may > contain other objects. This results in tree-like structures. We > should worry about transformation into a set of interelated relational > tables only after the VO data model for this is complete. I believe > that Roy correctly chimed in that VOTable can already do this only > because Pedro incorrectly brought up the issue of describing rows and > columns. Being a bit pedantic, but our data models won't necessarily be trees either. In fact our data models are 'relational' - they consist of various bits of information in 'lumps' that make sense to us, related to other 'lumps'. Some of these relations will be tree-like, but some won't. We *could* write down our models where the 'lumps' are 'table definitions' and the 'bits of information' are 'columns' in those tables. I believe however we are intending to model these lumps as 'objects' and the bits of information as 'properties' of those objects, and use UML relational diagrams to write it down. Using UML to represent our data and its relationships is fine, but we must also remember that our data may be stored and processed in non-OO languages, such as FORTRAN. If some find it easy to think in columns and tables, and others in terms of objects and properties, we should be able to cope with both. But we should avoid using particular implemenations of representations; we shouldn't try and describe *models* in terms of Java Objects/Interfaces or Sybase or VOTables or FORTRAN structs or XML Schemas. These are specific implementations of representations, not suitable for our general models, but we may want to use them for 'worked examples' of how our models might be used in practice. Cheers, Martin -- Martin Hill www.mchill.net +44 7901 55 24 66 ============================================================================= From: Martin Hill Reply-To: Martin Hill Cc: Data Model IVOA List Subject: Re: [CATALOGUE]Starting Data Model Subgroup - reinventing modelling Date: Tue, 03 Aug 2004 09:44:24 +0100 Ed Shaya wrote: > Pedro Osuna wrote: >> For example, the XMM-Newton "1XMM" is a list of serendipitous sources >> detected by the satellite in its observing campaign. The model for this >> catalogue could consist of things like the provenance (ESA), number of >> columns (400) number of rows (~32000), etc., or it might give more >> relevant information like: column number three in the catalogue is the >> Source.likelihood where likelihood is an attribute of the Source Data >> Model. >> I think this is an interesting point for discussion..... >> >> >> > A catalog should be a list of sourceObjects which > holds/contains/aggregates Quantities. The quantities should be allowed > to be of arbitrary depth and detail. That is, one should be free to > enter QuntitySets of QuantitySets. I'm happy with the next bit, which is a prose description of some of the things that Ed would like to see in the catalogue data model. Bearing in mind my previous email though, our models can be relational rather than restricted to trees, so our catalogues might not be just a set of things that might be sets of other things. However the above bit seems to complicate what should be straightforward; we already have a system for modelling 'things' that are aggregates of other 'things' - they are 'Objects' in UML. Let's model those things first, *then* see if there are common elements we can factor out to 'QuantitySets'. Doing it the other way around is Bad Practice (as I have mentioned before) and adds an unnecessary layer of IVO-specific terms to what should be a straightforward exercise. Let's hear more about what people need to know, and also about what people don't want to know. For example, it seems some people don't care about Passband details for most cases; they just need a simple Fravergy error band on a flux measurement. This implies we may need more than one way of modelling similar information. Cheers, Martin > To make this more concrete, lets > talk about a general catalog of galaxies. We wish to provide at a > minimum basic data about each galaxy (ie. simple quantities: magnitudes, > ra, dec, morphological class). Also, one wants the Observations of each > galaxy, such as Image. We may just want to hold crucial metadata about > each image (exposure time, ra,dec, filter) and perhaps a URL to the > actual data. But we may want to group these images into various > regions. So we have /galaxy/region/observation/image so far. Region may > specify not just the location on the celestial sphere, but also give > information on the type of region (spiral arm, interarm, open cluster > region, outerhalo, etc). There may be photometry catalogs created from > these images that are to be included. These catalogs should have > starObjects with mags with errors and filter info, and location pointers > to pixel coordinates in the image. Some of the photometryCatalogs are > the children of images but some may be concatention of several tables > within a region. That would be a child of the region. Also in the > region may be some higher resolution images in a crowded region > (/galaxy/region/region/observation/photoCatalog). We may want to > point out variable stars, supernovae, etc so one has special subCatalogs > of these. There may be reasons for others to attach additional info > about the variable stars since they may be messing up the TRGB > distances. Finally there are outputs of the tip edge detectors and > their input paramters as well. > > -- Martin Hill www.mchill.net +44 7901 55 24 66 ============================================================================= From: Ed Shaya Reply-To: Ed Shaya To: Martin Hill Cc: Data Model IVOA List Subject: Re: [CATALOGUE]Starting Data Model Subgroup - terms like 'column' Date: Tue, 03 Aug 2004 13:14:10 -0400 Martin Hill wrote: > Ed Shaya wrote: > >> Here. The mere mention of columns is, in my opinion, out of place. >> The concept of rows and columns should not appear in any component of >> our data model. They belong in a relational database data model. >> Here I think we are working on a more abstract level in which objects >> may contain other objects. This results in tree-like structures. >> We should worry about transformation into a set of interelated >> relational tables only after the VO data model for this is complete. >> I believe that Roy correctly chimed in that VOTable can already do >> this only because Pedro incorrectly brought up the issue of >> describing rows and columns. > > > Being a bit pedantic, but our data models won't necessarily be trees > either. In fact our data models are 'relational' - they consist of > various bits of information in 'lumps' that make sense to us, related > to other 'lumps'. Some of these relations will be tree-like, but some > won't. We *could* write down our models where the 'lumps' are 'table > definitions' and the 'bits of information' are 'columns' in those > tables. I believe however we are intending to model these lumps as > 'objects' and the bits of information as 'properties' of those > objects, and use UML relational diagrams to write it down. My use of the word relational was a poor choice. I was just saying that a catalog should not be restricted to only 2 dimensional datasets. You are right that tree-like is also not general enough since there could be explicit relationships from any object to any other object. If we agree to extend the meaning of the word column to mean a set of similar classed objects, then I could accept its use as well. But still a Catalog may not have any columns since each object may have differing sets of properties. For instance a list of two clusters of galaxies. For one we know its richness and X-ray properties, for the other we know its member names and its mass. > > Using UML to represent our data and its relationships is fine, but we > must also remember that our data may be stored and processed in non-OO > languages, such as FORTRAN. If some find it easy to think in columns > and tables, and others in terms of objects and properties, we should > be able to cope with both. > I'm afraid it will be just too difficult to program in FORTRAN77 for the general Catalog. But for certain common subclasses of Catalog it should be fine. > But we should avoid using particular implemenations of > representations; we shouldn't try and describe *models* in terms of > Java Objects/Interfaces or Sybase or VOTables or FORTRAN structs or > XML Schemas. These are specific implementations of representations, > not suitable for our general models, but we may want to use them for > 'worked examples' of how our models might be used in practice. > Of course. One needs either a modeling language or an ontology. Along these lines, I believe that modeling languages like UML are best for processing and data flow architectures. Ontology is best for information and knowledge statement architectures. Most of what we are trying to do in DM is the latter. > Cheers, > > Martin > ============================================================================= From: Elizabeth Auden To: Pedro.Osuna@esa.int Subject: [CATALOGUE]Starting Data Model Subgroup - Astrogrid joiner Date: Wed, 11 Aug 2004 16:43:48 +0100 (BST) Hi Pedro, > at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to > get the vacant responsibility of coordinating the efforts in a > "Catalogue" subgroup of the Data Model. Tony Linde has asked me to join the catalogue data model subgroup on behalf of Astrogrid. I have just joined the data model mail list, and I've been working my way through the Catalogue thread. The main experience I've had with catalogues in the past has been 1) bright star catalogues (such as Tycho II), 2) photometric catalogues (like HST standard candles), and 3) solar event catalogues (flares, loops, coronal mass ejections, etc). > What is a Catalogue? In my experience, either a list of objects with something in common (ie bright stars, non-variable UV standard candles) that are organized by coordinates, OR a list of events such as gamma ray bursts and solar flares, organized by coordinates and by time. > What is a Catalogue used for? I've used different catalogues for different things: 1. Bright stars: used for navigation for the Swift satellite ("don't point the satellite at this") 2. Photometric standards: calibration of filters and grisms for XMM-OM and Swift 3. Solar event catalogues: producing movies from specific pieces of solar satellite data (ie, give me a movie showing yesterday's solar flare, but don't give me a movie of the rest of the time when nothing exciting happened) > Why do we want to model Catalogues? Zillions of column headers have been identified since space objects and events were first catalogues; the data will be more efficiently searched (and hopefully yield more efficient science) if common themes - such as spatial, spectral, and temporal information, can be exploited. Catalogues aren't just used for pure science research; they're also useful for hardware and software instrumentation. > Where do Catalogues find a place within the VO? As the front gate to a data source, or as a starting point to obtain data products from multiple sources. > What are the interesting Use Cases for a Catalogue DM? I'll give this one some more thought! cheers, Elizabeth Auden ============================================================================= From: Pedro Osuna To: Elizabeth Auden , Brian Thomas , Jonathan McDowell Cc: Pedro.Osuna@esa.int, Jesus.Salgado@sciops.esa.int Subject: [CATALOGUE]Warming up... Date: Mon, 30 Aug 2004 17:25:07 +0200 Dear all, I went through all the mails received on the Catalogue issue. Thank you for volunteering to join the effort. I think Brian has done some work related to Catalogues and so has Elizabeth. I haven't seen comments from any of you two on the mails sent by the people, so it would be nice to make a survey of the claimed things to try to come to an agreement on what we are after. So I'll start myself with a summary on mine and Jesus view of the comments from people, and hope to get inputs from you as well. - With respect to the comment that "VOTable is the XML serialization of whatever it is that the DM group come up with" (M. Graham) I disagree, as I think the serialization of a data model (whichever it is) is independent from the way it is serialized. For "complex" data models, the VOTable might NOT allow for complete serialization, whereas for simple ones it might (e.g., SIAP extensions). But in summary, the serialization is independent of the model. - with respect to the "use of inheritance" (R. Williams) I didn't quite understand the point. We will use inheritance in the model wherever it is appropriate for the data model, just as part of the modeling effort, but using inheritance as a general sort of tool I do not understand. I did not understand either what the Data Model for VOTable is (mentioned by Roy and also in the answer from Jonathan to M .Graham) so may be Jonathan could tell us more about what that means (as far as I know, there are only three data models going on: Observation, Quantity and Spectrum (plus the current Catalogue)). - about the use of the words "tables", "attributes" and "values" in the mail from Kirk Borne, I'd prefer to avoid in the future using these type of words (like column and row, which created so much discussion after my mail) as people tend to interpret words literally for what they mean in their experience and do not go any further in the interpretation. This is a very important point in the definition of the Catalogue. In a private mail from Jonathan before I sent the mail to the DM, I was wondering whether we should model "two-dimensional" catalogues or "n-dimensional" ones. Jonathan answered to me on these lines: "[...]I'm going to argue (as a member of the group, not wearing my Chair hat) that we should focus on source catalog(ue)s rather than general tables like lists of observatories[...]" "[...]To put it another way, is there a difference between a CATALOG(UE) model and a TABLE model? What things are tables but not catalog(ue)s? I am not convinced we should spend too much effort on a very general table model[...]" However, I see that some of the people answering my original mail seemed to be pushing for a very general Data Model including all possible types of catalogues. I have to say that trying to model all types of catalogues could be a never-ending task and I tend to agree with Jonathan that we should be concentrating on standard source type catalogues, but could you all please give an opinion on this? - the mail from Arnold was mentioning that we are in a position to have "fully interacting catalogues". I'll send a mail to Arnold copy to you asking him to provide us with a use case for interacting catalogues. - Pierre Didelon posed his concern on "trivialising" things like Provenance. It is clear that some of these things are not trivial at all, but it shows that we should be very careful when designing the data model as people seem to be very touchy on the things they know most. - G. Lemson says that "[...]Defining such serializations -for the data models- should be part of the DM WG's task[...]" I think this is a very important point with which I disagree, as I believe (as said before) that a DM is independent from the serialization. Could you all please comment on this? Jesus and myself are preparing a first attempt data model for a General catalogue and will be sending it during the course of the week. We will try to also model the Source object which I think is the main object we should be cataloguing. If you have any inputs, please let us know. Wait for your news.... Cheers, P. -- Pedro Osuna Alcalaya Software Engineer European Space Astronomy Center (ESAC/ESA) e-mail: Pedro.Osuna@esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN ============================================================================= From: Pedro Osuna Reply-To: Pedro Osuna To: dm@ivoa.net Cc: Pedro.Osuna@esa.int Subject: [CATALOGUE]Second round seeking people.... Date: Mon, 30 Aug 2004 17:26:57 +0200 Dear all, after having processed all the mails related to the Catalogue Subgroup creation, I have seen only two people showing interest in joining the group, Brian Thomas and Elizabeth Auden. I have, however, seen a lot of interesting discussions, so I'd again insist in having more people joining the group. For the time being, I'll consider the group as being formed by Elizabeth, Brian, Jonathan (as head of the general group), Jesus (Salgado) and myself, to whom more restricted mails will be sent whenever appropriate. With respect to all the mails received, they give a lot of meat to discuss, as expected from my original mail, so we will start with internal discussions and let the rest of the general DM group know with a proper "[CATALOGUE]" header as already agreed whenever appropriate. Cheers, P. -- Pedro Osuna Alcalaya Software Engineer European Space Astronomy Center (ESAC/ESA) e-mail: Pedro.Osuna@esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN ============================================================================= From: Martin Hill @ ROE To: Pedro Osuna Subject: Re: [CATALOGUE]Second round seeking people.... Date: Mon, 30 Aug 2004 16:45:32 +0100 My apologies Pedro, I obviously spent too much time replying to your first round's technical comments without actually replying to the original question... Could you stick me in the list too please? Most of the data we're publishing just now is catalogue data (INT-WFS, SuperCOSMOS, 6dF, 2dF and 2MASS). Thanks! Martin Pedro Osuna wrote: > Dear all, > > > after having processed all the mails related to the Catalogue Subgroup > creation, I have seen only two people showing interest in joining the > group, Brian Thomas and Elizabeth Auden. > > I have, however, seen a lot of interesting discussions, so I'd again > insist in having more people joining the group. > > For the time being, I'll consider the group as being formed by > Elizabeth, Brian, Jonathan (as head of the general group), Jesus > (Salgado) and myself, to whom more restricted mails will be sent > whenever appropriate. > > With respect to all the mails received, they give a lot of meat to > discuss, as expected from my original mail, so we will start with > internal discussions and let the rest of the general DM group know with > a proper "[CATALOGUE]" header as already agreed whenever appropriate. > > > Cheers, > P. > -- Martin Hill, Software Engineer AstroGrid (ROE) +44 7901 55 24 66 http://www.roe.ac.uk/~mch/ ============================================================================= From: Kirk Borne Reply-To: Kirk Borne (at George Mason University) To: Pedro.Osuna@sciops.esa.int Cc: Kirk Borne (at George Mason University) Subject: Re: [CATALOGUE]Second round seeking people.... Date: Mon, 30 Aug 2004 13:28:26 -0400 (EDT) Hello Pedro. Please include me in the group. I will try to contribute as much as time permits. - Kirk > From owner-dm@eso.org Mon Aug 30 11:48:25 2004 > Date: Mon, 30 Aug 2004 17:26:57 +0200 > From: Pedro Osuna > Subject: [CATALOGUE]Second round seeking people.... > To: dm@ivoa.net > Cc: Pedro.Osuna@esa.int > > Dear all, > > > after having processed all the mails related to the Catalogue Subgroup > creation, I have seen only two people showing interest in joining the > group, Brian Thomas and Elizabeth Auden. > > I have, however, seen a lot of interesting discussions, so I'd again > insist in having more people joining the group. > > For the time being, I'll consider the group as being formed by > Elizabeth, Brian, Jonathan (as head of the general group), Jesus > (Salgado) and myself, to whom more restricted mails will be sent > whenever appropriate. > > With respect to all the mails received, they give a lot of meat to > discuss, as expected from my original mail, so we will start with > internal discussions and let the rest of the general DM group know with > a proper "[CATALOGUE]" header as already agreed whenever appropriate. > > > Cheers, > P. > > -- > Pedro Osuna Alcalaya > > > Software Engineer > European Space Astronomy Center > (ESAC/ESA) > e-mail: Pedro.Osuna@esa.int > Tel + 34 91 8131314 > > European Space Agency > VILLAFRANCA Satellites Tracking Station > P.O. Box 50727 > E-28080 Villafranca del Castillo > MADRID - SPAIN ============================================================================= From: Elizabeth Auden To: Pedro Osuna Cc: Brian Thomas , Jonathan McDowell , Pedro.Osuna@esa.int, Jesus.Salgado@sciops.esa.int Subject: Re: [CATALOGUE]Warming up... Date: Mon, 30 Aug 2004 21:46:22 +0100 (BST) Hi, My recent experience with catalogues has mainly been with solar and solar terrestrial physics data. I have been working on registering data archives in these disciplines with the Astrogrid registry. > there are only three data models going on: > Observation, Quantity and > Spectrum (plus the current Catalogue)). Where does magnetic data fit into this? There are several solar and STP magnetic data sources, including the RAL world data centre's ionosonde data or the upcoming Solar Dynamic Observatory's helioseismic magnetic imager. > In a private mail from Jonathan before I sent the mail to the DM, I was > wondering whether we should model "two-dimensional" catalogues or > "n-dimensional" ones. Drawing on experience with RAL Ionosonde data again, the setup for that data is a collection of several 2-D tables that are many layers deep. Would the cataloguing effort be concerned with how interconnecting tables are modelled, or is this a job left for VO workflows and advanced registry searches? > concentrating on standard source type catalogues, but could you all > please give an opinion on this? I agree - standard source catalogues are a good starting point, but I'd like to see the group work with standard solar and STP catalogues, too. Solar event catalogues seem to be mainly 2-D tables, and I'm still getting to grips with STP catalogues. A quick google has given me a link for the OSSE solar flare catalogue to view as an example: http://heseweb.nrl.navy.mil/gamma/solarflare/flarelib.htm cheers, Elizabeth ============================================================================= From: Pedro Osuna Cc: Pedro.Osuna@esa.int Subject: First-attempt Catalogue DM for discussion Date: Wed, 01 Sep 2004 16:10:34 +0200 Dear all, our small group has increased in two members. The list is currentl (in alphabetical order): Elizabeth Auden Kirk Borne Martin Hill Jonathan McDowell (DM Chairman) Jesus Salgado Brian Thomas and myself. I think it is really important that we come to an agreement on _what_ exactly we are trying to model when we talk about the CATALOGUE Data Model. If we take as example the many CDS catalogues, they are merely 2 dimensional tables, with rows and columns. They subdivide their catalogues in categories, somehow arbitrary, in terms of astronomical "branches". They also have "Tables" coming from publications (again, 2-D tables). Another example of flat tables is the pointer that Elizabeth was giving. That's just a 2-D table giving solar flare information happening in certain dates with certain conditions. Another 2-D example is the 1XMM-Newton Source catalogue. If what we are after is just the model for a 2-D table, the model as such would be quite simple. We have done an example Data Model for that that could serve as a starting internal discussion point. This is the diagram below called CATALOGUE_DM_UML.jpg. The real data model is only the upper part. Then, already existing DataModels like the Observation or Quantity would come naturally into the game. As you can see in this Data Model, a "Catalogue" is formed of "CatalogueEntry"-ies, which either AstronomicalObject or AstronomicalEvent extend. A SolarFlare, e.g., would extend an AstronomicalEvent, and an Observation (could be called CatalogueObservation and implement the existing Observation DM) would extend it as well (this would be the case if we want to allow for "Observation" Catalogues. Otherwise -following Jonathan comments questioning whether we shoudl discuss these type of catalogues) they would just not appear there as an extension of an AstronomicalEvent, as they would NOT in general be "Entry"-ies ever). In this type of model, a Source would just be an AstronomicalObject, and what we would really have to concentrate on would be the attributes, etc. of the objects marked in red, i.e., Catalogue, AstronomicalEvent, AstronomicalObject and Source for the time being. The Source could be just one of other many objects whose attributes might be modeled when necessity arises. I guess there should be a centralized point and procedure where models are being added to the whole VO machinery. In this case, someone modeling a SolarFlare (i.e., identifying its attributes, etc.) would propose that model and then it would be accepted by the board and become part of the VO general model. Someone would also like to model the object Galaxy and then we could go on and on, and models added regularly..... what the position of the VO in general is with respect to this issue I don't know... We give an instantiation example of this simple model for the case of the SolarFlare example catalogue that Elizabeth sent due to its simplicity in the diagram CATALOGUE_DM_Instantiation_UML.jpg. In case we are after allowing that an Entry in the Catalogue can be composed of one or more entries, and then those entries as well can include more entries, then the modeling effort would get much more complicated. The problem of allowing, for example, different Observations which observed the same Source in a catalogue and/or different sources observed by the same observation (Observation being a valid Entry) is a tough problem. In the image called CATALOGUE_DM_Recursive_UML.jpg below, we have added an auto-association in the entries to reflect that idea. However, how to deal with the Jekyll/Hyde problem (multi-inheritance) of an Observation being an Entry and cotaining one or more sources which can be Entry as well would not be easy to solve. I would say that for the time being, and again following the recommendation of the DM Team Leader, we might want to concentrate in modeling the CATALOGUE in the first option (2-D tables) and define the AstronomicalObject and Events, etc. as mentioned before. This could give people an idea of how we want to proceed and then evolve further when discussions start. Please send me your comments and ideas in this respect. Please don't be too touchy on real UML modeling, as the model is just only illustrative and does not mean to be rigurous in any aspect. As soon as we come to an agreement on how to attack the problem, we can start thinking on doing the things rigorously. Wait for your comments. Cheers, P. -- Pedro Osuna Alcalaya Software Engineer European Space Astronomy Center (ESAC/ESA) e-mail: Pedro.Osuna@esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN JPEG image attachment (CATALOGUE_DM_Recursive_UML.jpg) JPEG image attachment (CATALOGUE_DM_UML.jpg) JPEG image attachment (CATALOGUE_DM_Instantiation_UML.jpg) ============================================================================= From: Elizabeth Auden To: Pedro Osuna Subject: Re: First-attempt Catalogue DM for discussion Date: Wed, 01 Sep 2004 15:38:10 +0100 (BST) Hi Pedro, > If what we are after is just the model for a 2-D table, the model as > such would be quite simple. I'm just going to send out an email to the Astrogridders and my colleagues at MSSL to see if there are any important non-2-D catalogues that we should take into consideration while working on the data model. I'll get back to you on this by Friday at the latest. cheers, Elizabeth ============================================================================= From: Ed Shaya To: Pedro Osuna Subject: Re: [CATALOGUE]Second round seeking people.... Date: Fri, 03 Sep 2004 09:12:38 -0400 Pedro, I would like to be on this subgroup. Ed Pedro Osuna wrote: >Dear all, > > >after having processed all the mails related to the Catalogue Subgroup >creation, I have seen only two people showing interest in joining the >group, Brian Thomas and Elizabeth Auden. > >I have, however, seen a lot of interesting discussions, so I'd again >insist in having more people joining the group. > >For the time being, I'll consider the group as being formed by >Elizabeth, Brian, Jonathan (as head of the general group), Jesus >(Salgado) and myself, to whom more restricted mails will be sent >whenever appropriate. > >With respect to all the mails received, they give a lot of meat to >discuss, as expected from my original mail, so we will start with >internal discussions and let the rest of the general DM group know with >a proper "[CATALOGUE]" header as already agreed whenever appropriate. > > >Cheers, >P. > > > ============================================================================= From: Ed Shaya To: Pedro Osuna Subject: Re: [CATALOGUE]Second round seeking people.... Date: Fri, 03 Sep 2004 10:30:35 -0400 Pedro, In the WCS we will need redshift or velocities as well. I think the recursion is not so bad because an observation of an astroObject leads to multiple objects at smaller scales and Observation of those objects leads to objects on yet smaller scales. Therefore one never gets led back to the original object. So there is no circularity. Still missing: 1) subregions of an object - "NW spiral arm of a galaxy", chromosphere of the star, along the ionizing rim of Orion cloud, etc 2) references - A catalog of papers or articles about kinematics within galaxies in the Coma Cluster. 3) pointers to external Observations - One might have a photometry table but the observations are not in the same catalog. I have trouble with calling an Observation a type of astroEvent. To me an astroEvent is something that happens in the universe independent of humans: a flare, a supernova, a neutron star - neutron star collision. An observation is something we generate. Just because both happen in time does not make them the same or even related. I could agree that astroEvent and Observation are both types of Events. Ed PS - I do not have everyone's address in the subgroup so would you be kind enough to relay this to the rest of the group. Pedro Osuna wrote: >Hi Ed, > >thanks for joining. > >Please have a look at the attached emial I sent to the reduced >distribution list a couple of days ago. > >Cheers, >P. > >On Fri, 2004-09-03 at 15:12, Ed Shaya wrote: > > >>Pedro, >> I would like to be on this subgroup. >>Ed >> >> >>Pedro Osuna wrote: >> >> >> >>>Dear all, >>> >>> >>>after having processed all the mails related to the Catalogue Subgroup >>>creation, I have seen only two people showing interest in joining the >>>group, Brian Thomas and Elizabeth Auden. >>> >>>I have, however, seen a lot of interesting discussions, so I'd again >>>insist in having more people joining the group. >>> >>>For the time being, I'll consider the group as being formed by >>>Elizabeth, Brian, Jonathan (as head of the general group), Jesus >>>(Salgado) and myself, to whom more restricted mails will be sent >>>whenever appropriate. >>> >>>With respect to all the mails received, they give a lot of meat to >>>discuss, as expected from my original mail, so we will start with >>>internal discussions and let the rest of the general DM group know with >>>a proper "[CATALOGUE]" header as already agreed whenever appropriate. >>> >>> >>>Cheers, >>>P. >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> Subject: >>> First-attempt Catalogue DM for discussion >>> From: >>> Pedro Osuna >>> Date: >>> Wed, 01 Sep 2004 16:10:34 +0200 >>> To: >>> Undisclosed-Recipient: ; >>> >>> To: >>> Undisclosed-Recipient: ; >>> CC: >>> Pedro.Osuna@esa.int >>> >>> >>>Dear all, >>> >>> >>>our small group has increased in two members. >>>The list is currentl (in alphabetical order): >>> >>>Elizabeth Auden >>>Kirk Borne >>>Martin Hill >>>Jonathan McDowell (DM Chairman) >>>Jesus Salgado >>>Brian Thomas >>> >>>and myself. >>> >>> >>> >>>I think it is really important that we come to an agreement on _what_ >>>exactly we are trying to model when we talk about the CATALOGUE Data >>>Model. >>> >>>If we take as example the many CDS catalogues, they are merely 2 >>>dimensional tables, with rows and columns. They subdivide their >>>catalogues in categories, somehow arbitrary, in terms of astronomical >>>"branches". They also have "Tables" coming from publications (again, 2-D >>>tables). >>> >>>Another example of flat tables is the pointer that Elizabeth was giving. >>>That's just a 2-D table giving solar flare information happening in >>>certain dates with certain conditions. >>> >>>Another 2-D example is the 1XMM-Newton Source catalogue. >>> >>>If what we are after is just the model for a 2-D table, the model as >>>such would be quite simple. We have done an example Data Model for that >>>that could serve as a starting internal discussion point. This is the >>>diagram below called CATALOGUE_DM_UML.jpg. The real data model is only >>>the upper part. Then, already existing DataModels like the Observation >>>or Quantity would come naturally into the game. >>> >>>As you can see in this Data Model, a "Catalogue" is formed of >>>"CatalogueEntry"-ies, which either AstronomicalObject or >>>AstronomicalEvent extend. A SolarFlare, e.g., would extend an >>>AstronomicalEvent, and an Observation (could be called >>>CatalogueObservation and implement the existing Observation DM) would >>>extend it as well (this would be the case if we want to allow for >>>"Observation" Catalogues. Otherwise -following Jonathan comments >>>questioning whether we shoudl discuss these type of catalogues) they >>>would just not appear there as an extension of an AstronomicalEvent, as >>>they would NOT in general be "Entry"-ies ever). >>> >>>In this type of model, a Source would just be an AstronomicalObject, and >>>what we would really have to concentrate on would be the attributes, >>>etc. of the objects marked in red, i.e., Catalogue, AstronomicalEvent, >>>AstronomicalObject and Source for the time being. >>> >>>The Source could be just one of other many objects whose attributes >>>might be modeled when necessity arises. I guess there should be a >>>centralized point and procedure where models are being added to the >>>whole VO machinery. In this case, someone modeling a SolarFlare (i.e., >>>identifying its attributes, etc.) would propose that model and then it >>>would be accepted by the board and become part of the VO general model. >>>Someone would also like to model the object Galaxy and then we could go >>>on and on, and models added regularly..... what the position of the VO >>>in general is with respect to this issue I don't know... >>> >>>We give an instantiation example of this simple model for the case of >>>the SolarFlare example catalogue that Elizabeth sent due to its >>>simplicity in the diagram CATALOGUE_DM_Instantiation_UML.jpg. >>> >>> >>>In case we are after allowing that an Entry in the Catalogue can be >>>composed of one or more entries, and then those entries as well can >>>include more entries, then the modeling effort would get much more >>>complicated. The problem of allowing, for example, different >>>Observations which observed the same Source in a catalogue and/or >>>different sources observed by the same observation (Observation being a >>>valid Entry) is a tough problem. In the image called >>>CATALOGUE_DM_Recursive_UML.jpg below, we have added an auto-association >>>in the entries to reflect that idea. However, how to deal with the >>>Jekyll/Hyde problem (multi-inheritance) of an Observation being an Entry >>>and cotaining one or more sources which can be Entry as well would not >>>be easy to solve. >>> >>> >>>I would say that for the time being, and again following the >>>recommendation of the DM Team Leader, we might want to concentrate in >>>modeling the CATALOGUE in the first option (2-D tables) and define the >>>AstronomicalObject and Events, etc. as mentioned before. This could give >>>people an idea of how we want to proceed and then evolve further when >>>discussions start. >>> >>>Please send me your comments and ideas in this respect. >>>Please don't be too touchy on real UML modeling, as the model is just >>>only illustrative and does not mean to be rigurous in any aspect. As >>>soon as we come to an agreement on how to attack the problem, we can >>>start thinking on doing the things rigorously. >>> >>>Wait for your comments. >>> >>>Cheers, >>>P. >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> ------------------------------------------------------------------------ >>> ============================================================================= From: Elizabeth Auden To: Pedro Osuna Cc: Martin Hill , Brian Thomas , Jesus Salgado , Kirk Borne , Jonathan McDowell , Ed Shaya , Pedro.Osuna@esa.int Subject: n-dimensional catalogues Date: Fri, 03 Sep 2004 15:52:25 +0100 (BST) Hi all, A few days ago Pedro discussed basing the DM catalogue model on 2-d tables to begin with. I emailed my colleagues at MSSL and on the Astrogrid project to see if anyone regularly used catalogues with more than 2 dimensions in tables. Aside from several responses along the lines of "What do you mean by 'catalogue'?" (well, exactly), I received a few examples of tables containing members which are either vectors or n-dimensional tables. 1. From MSSL: XID-DB has an oject-oriented structure which means that tables - for us objects - have members which are themselves tables - objects. Is is also the case for XCat which is the catalogue of XMM sources. 2. From Astrogrid: A catalogue of spectral line fluxes in galaxies may fall into that category (of n-dimensional tables), whereas for each galaxy one can have more like a 2d table associated to a conventional data-cell than a single point, eg, Object: line Flux NGC 1068 Halpha 1.234 HBeta 2.345 [OIII] 2.347 If we choose not to model these kinds of tables during a first attempt, it would be good to make sure our model can evolve to describe such catalogues. cheers, Elizabeth ============================================================================= From: Mark Taylor To: Pedro.Osuna@esa.int Subject: Re: [CATALOGUE]Second round seeking people.... Date: Tue, 07 Sep 2004 09:27:45 +0100 (BST) > From: owner-dm@eso.org [mailto:owner-dm@eso.org] On Behalf Of Pedro Osuna > Sent: 30 August 2004 16:27 > To: dm@ivoa.net > Cc: Pedro.Osuna@esa.int > Subject: [CATALOGUE]Second round seeking people.... > > Dear all, > > > after having processed all the mails related to the Catalogue Subgroup > creation, I have seen only two people showing interest in joining the > group, Brian Thomas and Elizabeth Auden. > > I have, however, seen a lot of interesting discussions, so I'd again > insist in having more people joining the group. > > For the time being, I'll consider the group as being formed by > Elizabeth, Brian, Jonathan (as head of the general group), Jesus > (Salgado) and myself, to whom more restricted mails will be sent > whenever appropriate. > > With respect to all the mails received, they give a lot of meat to > discuss, as expected from my original mail, so we will start with > internal discussions and let the rest of the general DM group know with > a proper "[CATALOGUE]" header as already agreed whenever appropriate. > > > Cheers, > P. Dear Pedro, sorry for the delay in replying, I don't normally read the DM list and this message was brought to my attention by someone else. If you're agreeable I would like to be in on the discussions of the Catalog(ue) Subgroup. My interest is as an author of catalogue-handling software (TOPCAT, STIL) - I'm not sure at present how much contribution I would have to make to catalogue data model design, but I would at least be interested to know the way that the discussions are going, and may be able to comment on some of the software implications. Thanks, Mark Taylor Starlink project (UK) -- Mark Taylor Starlink Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ ============================================================================= From: Pedro Osuna To: Martin Hill , Brian Thomas , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden , Ed Shaya , Mark Taylor Cc: Pedro.Osuna@esa.int Subject: Re: First-attempt Catalogue DM for discussion Date: Tue, 14 Sep 2004 16:29:11 +0200 Dear all, Mark Taylor joined the group late August, so welcome Mark. After my email on a first attempt for the CAtalogue DM, I only got a comment from Elizabeth concerning 3-D catalogues and one from Ed (which I forwarded by that time). From Elizabeth mail: [...]1. From MSSL: XID-DB has an oject-oriented structure which means that tables - for us objects - have members which are themselves tables - objects. Is is also the case for XCat which is the catalogue of XMM sources. [...] I do not think this is a sound example of an n-D catalogue. A catalogue is not a set of tables or objects in a database, but a collection of items in a certain order. How they are organized internally does not matter. What matters is the final catalogue they give, and for a source catalogue like their XMM one, a 2-D catalogue can be produced. For the second example: [...]A catalogue of spectral line fluxes in galaxies may fall into that category (of n-dimensional tables), whereas for each galaxy one can have more like a 2d table associated to a conventional data-cell than a single point, eg, Object: line Flux NGC 1068 Halpha 1.234 HBeta 2.345 [OIII] 2.347 [...] again, this can be converted to a 2-D table by displaying (which is very often the case): Object Halpha Flux HBeta Flux OIII Flux ------ ----------- ---------- --------- NGC1068 1.234 2.345 3.456 NGC1222 5.432 4.321 3.210 What I meant with n-D catalogues was more in the direction of allowing an entry to contain one or more entries inside, and that's where the serious problems appear when dealing with multiple inheritance, etc. With respect to Ed's comments, I basically agree with the missing bits in the WCS (it was just an example from me without trying to put all the attributes) and the references and pointers to external observations (I think this last one is implicit in the model already) although I'm not so sure about the "subregions" of an object. Anyway, this mail is trying to get some feedback from you all, as I haven't heard anything back since I sent the first attempt DM. In particular, I would like Jonathan to give me green light or otherwise to start working on a first draft (written in the conventional IVOA format, etc) in the lines that I wrote in the mail. Wait for your news. Cheers, p. P.S.: group members list: Elizabeth Auden Kirk Borne Martin Hill Jonathan McDowell (DM chairman) Jesus Salgado Mark Taylor Brian Thomas On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote: > Dear all, > > > our small group has increased in two members. > The list is currentl (in alphabetical order): > > Elizabeth Auden > Kirk Borne > Martin Hill > Jonathan McDowell (DM Chairman) > Jesus Salgado > Brian Thomas > > and myself. > > > > I think it is really important that we come to an agreement on _what_ > exactly we are trying to model when we talk about the CATALOGUE Data > Model. > > If we take as example the many CDS catalogues, they are merely 2 > dimensional tables, with rows and columns. They subdivide their > catalogues in categories, somehow arbitrary, in terms of astronomical > "branches". They also have "Tables" coming from publications (again, 2-D > tables). > > Another example of flat tables is the pointer that Elizabeth was giving. > That's just a 2-D table giving solar flare information happening in > certain dates with certain conditions. > > Another 2-D example is the 1XMM-Newton Source catalogue. > > If what we are after is just the model for a 2-D table, the model as > such would be quite simple. We have done an example Data Model for that > that could serve as a starting internal discussion point. This is the > diagram below called CATALOGUE_DM_UML.jpg. The real data model is only > the upper part. Then, already existing DataModels like the Observation > or Quantity would come naturally into the game. > > As you can see in this Data Model, a "Catalogue" is formed of > "CatalogueEntry"-ies, which either AstronomicalObject or > AstronomicalEvent extend. A SolarFlare, e.g., would extend an > AstronomicalEvent, and an Observation (could be called > CatalogueObservation and implement the existing Observation DM) would > extend it as well (this would be the case if we want to allow for > "Observation" Catalogues. Otherwise -following Jonathan comments > questioning whether we shoudl discuss these type of catalogues) they > would just not appear there as an extension of an AstronomicalEvent, as > they would NOT in general be "Entry"-ies ever). > > In this type of model, a Source would just be an AstronomicalObject, and > what we would really have to concentrate on would be the attributes, > etc. of the objects marked in red, i.e., Catalogue, AstronomicalEvent, > AstronomicalObject and Source for the time being. > > The Source could be just one of other many objects whose attributes > might be modeled when necessity arises. I guess there should be a > centralized point and procedure where models are being added to the > whole VO machinery. In this case, someone modeling a SolarFlare (i.e., > identifying its attributes, etc.) would propose that model and then it > would be accepted by the board and become part of the VO general model. > Someone would also like to model the object Galaxy and then we could go > on and on, and models added regularly..... what the position of the VO > in general is with respect to this issue I don't know... > > We give an instantiation example of this simple model for the case of > the SolarFlare example catalogue that Elizabeth sent due to its > simplicity in the diagram CATALOGUE_DM_Instantiation_UML.jpg. > > > In case we are after allowing that an Entry in the Catalogue can be > composed of one or more entries, and then those entries as well can > include more entries, then the modeling effort would get much more > complicated. The problem of allowing, for example, different > Observations which observed the same Source in a catalogue and/or > different sources observed by the same observation (Observation being a > valid Entry) is a tough problem. In the image called > CATALOGUE_DM_Recursive_UML.jpg below, we have added an auto-association > in the entries to reflect that idea. However, how to deal with the > Jekyll/Hyde problem (multi-inheritance) of an Observation being an Entry > and cotaining one or more sources which can be Entry as well would not > be easy to solve. > > > I would say that for the time being, and again following the > recommendation of the DM Team Leader, we might want to concentrate in > modeling the CATALOGUE in the first option (2-D tables) and define the > AstronomicalObject and Events, etc. as mentioned before. This could give > people an idea of how we want to proceed and then evolve further when > discussions start. > > Please send me your comments and ideas in this respect. > Please don't be too touchy on real UML modeling, as the model is just > only illustrative and does not mean to be rigurous in any aspect. As > soon as we come to an agreement on how to attack the problem, we can > start thinking on doing the things rigorously. > > Wait for your comments. > > Cheers, > P. -- Pedro Osuna Alcalaya Software Engineer European Space Astronomy Center (ESAC/ESA) e-mail: Pedro.Osuna@esa.int Tel + 34 91 8131314 European Space Astronomy Center European Space Agency P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN ============================================================================= From: Mark Taylor To: Pedro Osuna Subject: Re: First-attempt Catalogue DM for discussion Date: Wed, 15 Sep 2004 10:13:40 +0100 (BST) On Tue, 14 Sep 2004, Pedro Osuna wrote: > Dear all, > > Mark Taylor joined the group late August, so welcome Mark. > > > After my email on a first attempt for the CAtalogue DM, I only got a > comment from Elizabeth concerning 3-D catalogues and one from Ed (which > I forwarded by that time). Pedro, thanks for including me on the circulation for the catalogue effort as requested. I presume that the text part of your 'first attempt' is the dicusssion included at the end of this message (originally sent 1 September); this references a couple of diagrams (CATALOGUE_DM_UML.jpg, CATALOGUE_DM_Instantiation_UML.jpg) that I don't have. Would you be kind enough to send me copies? Thanks a lot Mark -- Mark Taylor Starlink Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ ============================================================================= From: Mark Taylor To: Pedro Osuna Cc: Martin Hill , Brian Thomas , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden , Ed Shaya Subject: Re: First-attempt Catalogue DM for discussion Date: Wed, 15 Sep 2004 19:56:25 +0100 (BST) Pedro, Your comments seem like a good starting point. In particular: On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote: > I would say that for the time being, and again following the > recommendation of the DM Team Leader, we might want to concentrate in > modeling the CATALOGUE in the first option (2-D tables) and define the > AstronomicalObject and Events, etc. as mentioned before. This could give > people an idea of how we want to proceed and then evolve further when > discussions start. I agree with this. 2-d catalogues represent a well-defined structure and the questions to be answered are relatively clear, as you've outlined. While there are many situations in which one wants to think in terms of more complicated data structures than this, I'd say that such situations fall more into the domain of data processing than of data modelling. If a simple and well-defined model for catalogues is available, users can operate on them (for instance perform various kinds of joins) in customised ways which don't have to be codified by the IVOA. In my opinion, attempting to come up with a model which can cope with the various kinds of "n-dimensional" tables would risk producing something which is too complicated to be implemented and/or too restrictive to be useful. Mark -- Mark Taylor Starlink Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ ============================================================================= From: Clive Davenhall To: Pedro.Osuna@esa.int Subject: Catalogue subgroup of the IVOA Data Modelling group. Date: Thu, 16 Sep 2004 11:02:38 +0100 (BST) 16/9/04. Pedro, I saw a copy of the message that you circulated a couple of weeks ago inviting expressions of interest in joining a Catalogue Subgroup of the IVOA Data Modelling group. I'd like to become involved in this Catalogue Subgroup, if this is still possible. Obviously I only have a limited amount of time available for this work, but that is always the case. Maybe I should mention that I've worked on developing astronomical catalogue software for many years and have been involved in the VOTable work. Maybe you could let me know whether you're still open to new members. regards, Clive. ----------------------------------------------------------------------------- Clive Davenhall Institute for Astronomy, e-mail (internet, JANET): acd @ roe.ac.uk Royal Observatory Edinburgh, fax from within the UK: 0131-668-8416 Blackford Hill, Edinburgh, fax from overseas: +44-131-668-8416 EH9 3HJ, Scotland. ============================================================================= From: Ed Shaya To: Brian Thomas Cc: Mark Taylor , Pedro Osuna , Martin Hill , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden Subject: Re: First-attempt Catalogue DM for discussion Date: Thu, 16 Sep 2004 10:54:22 -0400 All, Brian said most of what I would also say, but I would add that this is not a matter of deciding how many N of an N-dimensional cube to start at. The issue should be whether we want a cube, tree, or directed graph (nodes with pointers to other nodes in a random space). The N-cube does not work for the use cases that I gave (before we went into a huddle). And as Brian mentions, that would just be a minor upgrade to VOTable. A tree is simple, can be described fairly well in a schema and therefore has advantages in query. It also happens to be the way the universe is structured: universe region, cluster, field region, galaxy, QSO, absoption line system (ALS), ICM region, stellar cluster, IGM, molecular cloud region, stellar system region, planet, star, asteroid, comet region, surface, core, layer where each line is "contained" by the previous line. Another advantage of using such a hierarchy is that a set of Catalogs can be neatly merged into a new larger Catalog. Or, another way to think of this, a query over many Catalogs can be expressed as if it is over a single Catalog. You can call that all processing, I suppose, but it sure is easier if the data model supports it! Directed graph has the advantage of allowing any topological connectedness and it is supported by both OWL and Topic Maps. While I am a big fan of such things, the tree is simpler, closely matches the actual relationships between objects in our discipline, and one can support the occasional link between distant objects in a tree with a relationship pointer. Ed Brian Thomas wrote: > Hi All, > > Been silent recently because of being mobbed at work..but I wanted to > throw in my 2 cents on this.. > >On Wednesday 15 September 2004 06:56 pm, Mark Taylor wrote: > > >>Pedro, >> >>Your comments seem like a good starting point. In particular: >> >>On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote: >> >> >> >>>I would say that for the time being, and again following the >>>recommendation of the DM Team Leader, we might want to concentrate in >>>modeling the CATALOGUE in the first option (2-D tables) and define the >>>AstronomicalObject and Events, etc. as mentioned before. This could give >>>people an idea of how we want to proceed and then evolve further when >>>discussions start. >>> >>> >>I agree with this. 2-d catalogues represent a well-defined structure and >>the questions to be answered are relatively clear, as you've outlined. >> >> > > I think we have an opportunity to think larger than standard 2D > catalogs here. If the goal of this group is to get a standard out quickly, > then I would be forced to agree that the focus is constrained to 2D > catalogs (objects are simple collection of properties). IF the goal is to > consider longer term effects of what we would like to be able to do with > catalogs, then we are short-changing ourselves. > > I don't think that N-Dimensional catalogs are that hard a problem. I (and Ed) > have a proposal for doing such things already based on the Quantity. And > this is no amazing ground that we are breaking..I know that former formats > exist which do the same as well, to wit NDF (I think) and XDF (and there > must be others). > > Another consideration for going beyond 2D is social/political: A narrow focus > on 2D catalogs is essentially a study of astronomical tables. This has been > already done by the VOTable people. Are we then to just rubber-stamp VOTable? > Or appear as if we want to re-invent the wheel? > > I think we should have a clear picture of what the catalog standard should > provide to the VO. What comes to mind are the following: > > 1. Standard for transport/exchange > > 2. Standard for a "catalog" search across the VO > > I've gone a bit out of MDA ordering..but we can form an opinion of whether or > not 2D catalogs will be sufficient in light of a variety of use-cases. I imagine > the above 2 requirements will appear. What use-cases we accept as "needed" > will say whether or not 2D is the only catalog we wish to have (and I present > one such 3D use-case below) > > Now a few specific replies... > > > >>While there are many situations in which one wants to think in terms >>of more complicated data structures than this, I'd say that such >>situations fall more into the domain of data processing than of >>data modelling. >> >> > > I respectfully disagree. Any time you start talking about cataloging > "objects" which are more complex than a simple collection of scalar > properties, you have 3+ dimensions to store. And this isn't theoretical > argument. There is much astronomical data which is better modeled > as higher dimensional catalogs..for example what about a grism image > survey where each spectra is associated with a sky position? Thats clearly > 3D in nature (although I allow that you could "flatten it" to 2D if you liked... > but thats ugly and makes it harder to design an appropriate search). > > > >>If a simple and well-defined model for catalogues >>is available, users can operate on them (for instance perform >>various kinds of joins) in customised ways which don't have to be >>codified by the IVOA. In my opinion, attempting to come up with a >>model which can cope with the various kinds of "n-dimensional" tables >>would risk producing something which is too complicated to be >>implemented and/or too restrictive to be useful. >> >> > > I do agree that we need a simple 2D catalog that will be used in 60% of cases. > But all that is needed there is to see that the full N-D model can "collapse" to > the 2D case. If people are interested, I can present a possible model that > does this. > > Laters, > > =b.t. > > > ============================================================================= From: Brian Thomas To: Mark Taylor Cc: Pedro Osuna , Martin Hill , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden , Ed Shaya Subject: Re: First-attempt Catalogue DM for discussion Date: Thu, 16 Sep 2004 16:57:10 +0000 Hi All, Been silent recently because of being mobbed at work..but I wanted to throw in my 2 cents on this.. On Wednesday 15 September 2004 06:56 pm, Mark Taylor wrote: > Pedro, > > Your comments seem like a good starting point. In particular: > > On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote: > > > I would say that for the time being, and again following the > > recommendation of the DM Team Leader, we might want to concentrate in > > modeling the CATALOGUE in the first option (2-D tables) and define the > > AstronomicalObject and Events, etc. as mentioned before. This could give > > people an idea of how we want to proceed and then evolve further when > > discussions start. > > I agree with this. 2-d catalogues represent a well-defined structure and > the questions to be answered are relatively clear, as you've outlined. I think we have an opportunity to think larger than standard 2D catalogs here. If the goal of this group is to get a standard out quickly, then I would be forced to agree that the focus is constrained to 2D catalogs (objects are simple collection of properties). IF the goal is to consider longer term effects of what we would like to be able to do with catalogs, then we are short-changing ourselves. I don't think that N-Dimensional catalogs are that hard a problem. I (and Ed) have a proposal for doing such things already based on the Quantity. And this is no amazing ground that we are breaking..I know that former formats exist which do the same as well, to wit NDF (I think) and XDF (and there must be others). Another consideration for going beyond 2D is social/political: A narrow focus on 2D catalogs is essentially a study of astronomical tables. This has been already done by the VOTable people. Are we then to just rubber-stamp VOTable? Or appear as if we want to re-invent the wheel? I think we should have a clear picture of what the catalog standard should provide to the VO. What comes to mind are the following: 1. Standard for transport/exchange 2. Standard for a "catalog" search across the VO I've gone a bit out of MDA ordering..but we can form an opinion of whether or not 2D catalogs will be sufficient in light of a variety of use-cases. I imagine the above 2 requirements will appear. What use-cases we accept as "needed" will say whether or not 2D is the only catalog we wish to have (and I present one such 3D use-case below) Now a few specific replies... > While there are many situations in which one wants to think in terms > of more complicated data structures than this, I'd say that such > situations fall more into the domain of data processing than of > data modelling. I respectfully disagree. Any time you start talking about cataloging "objects" which are more complex than a simple collection of scalar properties, you have 3+ dimensions to store. And this isn't theoretical argument. There is much astronomical data which is better modeled as higher dimensional catalogs..for example what about a grism image survey where each spectra is associated with a sky position? Thats clearly 3D in nature (although I allow that you could "flatten it" to 2D if you liked... but thats ugly and makes it harder to design an appropriate search). > If a simple and well-defined model for catalogues > is available, users can operate on them (for instance perform > various kinds of joins) in customised ways which don't have to be > codified by the IVOA. In my opinion, attempting to come up with a > model which can cope with the various kinds of "n-dimensional" tables > would risk producing something which is too complicated to be > implemented and/or too restrictive to be useful. I do agree that we need a simple 2D catalog that will be used in 60% of cases. But all that is needed there is to see that the full N-D model can "collapse" to the 2D case. If people are interested, I can present a possible model that does this. Laters, =b.t. -- * Dr. Brian Thomas * Dept of Astronomy/University of Maryland-College Park * Code 630.1/Goddard Space Flight Center-NASA * fax: (301) 286-1775 * phone: (301) 286-6128 [GSFC] (301) 405-2312 [UMD] ============================================================================= From: Mark Taylor To: Brian Thomas Cc: Pedro Osuna , Martin Hill , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden , Ed Shaya Subject: Re: First-attempt Catalogue DM for discussion Date: Fri, 17 Sep 2004 11:48:10 +0100 (BST) Brian, On Thu, 16 Sep 2004, Brian Thomas wrote: > > I think we have an opportunity to think larger than standard 2D > catalogs here. If the goal of this group is to get a standard out quickly, > then I would be forced to agree that the focus is constrained to 2D > catalogs (objects are simple collection of properties). IF the goal is to > consider longer term effects of what we would like to be able to do with > catalogs, then we are short-changing ourselves. > > I don't think that N-Dimensional catalogs are that hard a problem. I (and Ed) > have a proposal for doing such things already based on the Quantity. And > this is no amazing ground that we are breaking..I know that former formats > exist which do the same as well, to wit NDF (I think) and XDF (and there > must be others). NDF doesn't describe anything like a catalogue or table, it only describes a single N-dimensional array of primitives, associating coordinate systems and per-pixel quality flags and error values. However, I realise that XDF does allow much more flexibility than this. I agree that describing a model which can handle N-dimensional tables or other flexible data structures is not in itself that difficult. My concern is that having done so the resulting structure will be difficult (a) for data centres to implement and (b) for VO clients to make sense of (of these I think that (b) is the more serious problem). > Another consideration for going beyond 2D is social/political: A narrow focus > on 2D catalogs is essentially a study of astronomical tables. This has been > already done by the VOTable people. Are we then to just rubber-stamp VOTable? > Or appear as if we want to re-invent the wheel? VOTable addresses the problem of a standard transport/exchange/storage format. It does not address the problem of semantic interpretation of the data thus represented, although to the casual eye it might look like it does. The introduction of the 'utype' attribute in VOTable 1.1 makes this explicit (VOTable 1.1 recommendation sec 4.5). In order to gain semantic information from a VOTable (e.g.: what class of physical object does row #i represent? what is its position on the sky?) you really need to associate elements of the VOTable with elements of a data model. You can have a go at this kind of semantic interpretation by grubbing around with UCDs and column names, but it is not a rigorous or reliable way to go about things. So VOTable does stand in need of a data model it can hook up to. I agree that attempting to answer this need is more a workmanlike task than a great voyage of discovery, but I don't think that makes it less worthwhile. > I think we should have a clear picture of what the catalog standard should > provide to the VO. What comes to mind are the following: Excellent idea! > 1. Standard for transport/exchange > > 2. Standard for a "catalog" search across the VO These things are needed, but we don't necessarily have to start from scratch to provide them. *If* we go with 2-d-like tables, then I believe that VOTable may be most of the answer to (1). Clearly (2) has got a lot to do with the VOQL effort. My conception of what we are supposed to be doing is to provide the semantic glue which will permit these things to be able to work. However, I don't recall seeing a terms of reference or mission statement or similar for this group, and some of the disagreements may be as a result of disagreements about our aims. Pedro: what is the question that the catalogue-dm subgroup is supposed to be answering? > Now a few specific replies... > > > While there are many situations in which one wants to think in terms > > of more complicated data structures than this, I'd say that such > > situations fall more into the domain of data processing than of > > data modelling. > > I respectfully disagree. Any time you start talking about cataloging > "objects" which are more complex than a simple collection of scalar > properties, you have 3+ dimensions to store. And this isn't theoretical > argument. There is much astronomical data which is better modeled > as higher dimensional catalogs..for example what about a grism image > survey where each spectra is associated with a sky position? Thats clearly > 3D in nature (although I allow that you could "flatten it" to 2D if you > liked... > but thats ugly and makes it harder to design an appropriate search). If I understand correctly the structure you're talking about (in its 'flattened' form a 2-d table with cells in some columns containing a vector of numeric values representing a spectrum) then I'm not sure I agree that the flattened form is a bad way to deal with it. VOTable and FITS are quite happy to deal with N-dimensional arrays of primitives in a cell like this. As for searching - is a search on particular pixels of an array of this sort the kind of thing you'd want to do? I'd have thought that to search on the characteristics of a spectrum you would typically need the whole thing so you could fit lines etc, but perhaps I'm just not familiar enough with this sort of thing - can you give an example? Mark -- Mark Taylor Starlink Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ ============================================================================= From: Mark Taylor To: Ed Shaya Cc: Brian Thomas , Pedro Osuna , Martin Hill , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden Subject: Re: First-attempt Catalogue DM for discussion Date: Fri, 17 Sep 2004 11:51:34 +0100 (BST) Ed, On Thu, 16 Sep 2004, Ed Shaya wrote: > All, > Brian said most of what I would also say, but I would add that this > is not a matter of deciding how many N of an N-dimensional cube to start > at. The issue should be whether we want a cube, tree, or directed graph > (nodes with pointers to other nodes in a random space). The N-cube > does not work for the use cases that I gave (before we went into a > huddle). And as Brian mentions, that would just be a minor upgrade to > VOTable. A tree is simple, can be described fairly well in a > schema and therefore has advantages in query. It also happens to be > the way the universe is structured: > universe > region, cluster, field > region, galaxy, QSO, absoption line system (ALS), ICM > region, stellar cluster, IGM, molecular cloud > region, stellar system > region, planet, star, asteroid, comet > region, surface, core, layer > where each line is "contained" by the previous line. > Another advantage of using such a hierarchy is that a set of Catalogs > can be neatly merged into a new larger Catalog. Or, another way to > think of this, a query over many Catalogs can be expressed as if it is > over a single Catalog. You can call that all processing, I suppose, but > it sure is easier if the data model supports it! It's true that there's a whole load of possible data structures out there and we'd better think carefully before we settle on one. There are powerful aspects to a tree-like structure, but one important question is: how do you define what tree you're going to use? I'm not sure if you're suggesting that we decide on a particular tree that looks something like the above and hardwire that into the catalogue data model. As you say it would make certain sorts of query very tidy, but people are always going to want some items which don't fit in well - for instance where do you put an observed object for which you have provenance, magnitudes and spectra, but as yet no classification? The opposite approach is to have the tree in use defined on a per-catalogue basis, but this loses you a lot of the interoperability benefits. Which did you have in mind? Mark -- Mark Taylor Starlink Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/ ============================================================================= From: Ed Shaya Reply-To: Ed Shaya To: Data Model IVOA List Subject: [Fwd: Re: First-attempt Catalogue DM for discussion] Date: Fri, 17 Sep 2004 11:29:33 -0400 -------- Original Message -------- Subject: Re: First-attempt Catalogue DM for discussion Date: Fri, 17 Sep 2004 08:55:23 -0400 From: Ed Shaya To: Mark Taylor References: >There are powerful aspects to a tree-like structure, but one important >question is: how do you define what tree you're going to use? >I'm not sure if you're suggesting that we decide on a particular >tree that looks something like the above and hardwire that into the >catalogue data model. As you say it would make certain sorts of >query very tidy, but people are always going to want some items >which don't fit in well - for instance where do you put an >observed object for which you have provenance, magnitudes and spectra, >but as yet no classification? The opposite approach is to have the tree >in use defined on a per-catalogue basis, but this loses you a lot >of the interoperability benefits. Which did you have in mind? > >Mark > > > Both! If you don't know the classification yet, they are astroObjects which is permitted at any level. If they are real and you know from Lyman limit cutoff that they are beyond cz=8.3 8.3 If it is simulated and it is some unclassifyable clump of mass points within a cluster: Besides having Universe as a top level element we need other possibilities: universe, DataCenters, Filters, astroObject, etc. One chooses the top level element depending on what the catalog is about. By the way, the fact that Universe and universe have different meanings in English is one of several reasons why I disagreed with a standards that use capitals to indicate an element. Ed ============================================================================= From: Ed Shaya Reply-To: Ed Shaya To: Mark Taylor , Data Model IVOA List Subject: Re: First-attempt Catalogue DM for discussion Date: Fri, 17 Sep 2004 11:30:03 -0400 Mark Taylor wrote: > >VOTable addresses the problem of a standard transport/exchange/storage >format. It does not address the problem of semantic interpretation >of the data thus represented, although to the casual eye it might look >like it does. The introduction of the 'utype' attribute in VOTable 1.1 >makes this explicit (VOTable 1.1 recommendation sec 4.5). In order to >gain semantic information from a VOTable (e.g.: what class of physical >object does row #i represent? what is its position on the sky?) >you really need to associate elements of the VOTable with elements >of a data model. You can have a go at this kind of semantic >interpretation by grubbing around with UCDs and column names, but it >is not a rigorous or reliable way to go about things. > >So VOTable does stand in need of a data model it can hook up to. >I agree that attempting to answer this need is more a workmanlike task >than a great voyage of discovery, but I don't think that makes it >less worthwhile. > > Mark, I agree with you. We need to discuss semantics as well. We need to decide of UCDs are sufficient, or UCDs plus utypes, or something else altogether. So lets start. What do we need to know about the items within a field/column semantically? What they are to the highest degree of specifity: eg, variable giant stars in binary systems. What their relationship is to ID_MAIN: eg, hosts to the planets in ID_MAIN. What their relationship is to other objects in the catalog: eg components of Deneb and observed in image 12. Is there anything else that is needed? So a simple table of planets in orbit about components of a binary star would normally look like this ID_MAIN component e P1 A 0.1 112 P2 B 0.3 22 P3 A,B undef 6 How would we put this into Catalog in such a way that it is machine understandable? With just UCD type mechanism (without actually looking it up in our present UCD system, which?) we would get something like this for component column: "stellar; binary component, giant, variable planetary system; host star" (This is a parsing nightmare). For the e column, we have "orbit; ellipticity" For the column, we have "orbit; mean velocity" And we need some sort of link from component to the binary star name Deneb (perhaps in an upperl level table) and image 12. What is missing here (aside from sanity) is that nowhere does the table say that the planets in ID_MAIN orbit the stars in component! Humans comprehend it pretty fast, but a computer would not. If I understand utype (I am not on any discussion group that has discussed utype), utype tries to provide some additional knowledge here. One provides a model of gravitional orbit and point from the table fields to parts in the model. I could imagine having an OWL description of orbitingSystem: 2 objects in circular motion about each other. A subclass would be a planetaryOrbitingSystem. It would require that atleast one of its components is a planet. The ID_MAIN would have utype that indicated these are planets and another pointer to indicate component of planetaryOrbitingSystem, while the component would point to planetaryOrbitingSystem and binarySystem where binarySystem is another subclass of orbitingSystem with two similar components. Ellipticity could also point to something in the "model". However, it is ambiguous if the ellipticity refers to the orbit of component A and B in Deneb or the planets around their hosts. Same with mean velocity. In a Catalog with a tree approach, information can be inserted directly into the right locations. We can know that the velocity applies to the planet because it is a child of planet. If it were the velocity of the component star, then it would be a child of that star. In the following the binary system Deneb and its components would be intoduced as ancestors to the planets. K Giant 0.1 km/s 112 K Giant 0.3 km/s 22 km/s 112 The relationships and containments are now clear and parsed by standard XML tools. I introduced an orbit which takes around, ellipticity , and velocity. I allow around to take one or more objects to allow a planet to go around A and B although I could have said it goes around Deneb. Perhaps if it is done this way it would mean that it weaves between components and the other way it means it just goes completely around the system. ============================================================================= From: Pedro Osuna To: Martin Hill , Brian Thomas , Jesus Salgado , Kirk Borne , Jonathan McDowell , Elizabeth Auden , Ed Shaya , Mark Taylor , Clive Davenhall , Mireille Louys Cc: po, js Subject: Latest issues before Poona Date: Mon, 20 Sep 2004 17:12:58 +0200 Dear all, Clive Davenhall joined late recently, welcome Clive. Also, and after a very interesting two-days meeting with CDS people here in Madrid, I have stubbornly invited Mireille Louys to join the group in representation of the CDS (in view of the absence of any answer from them) invitation which she has kindly accepted ;-). Therefore, welcome Mireille as well. Our reduced group therefore grows to: Elizabeth Auden Kirk Borne Clive Davenhall Martin Hill Mireille Louys Jonathan McDowell (DM Chairman) Pedro Osuna (Catalogue subgroup DM chairman) Jesus Salgado Ed Shaya Mark Taylor Brian Thomas I have been asked by Jonathan whether I could present something for the Poona meeting. I have, as you've probably seen, answered that the main discussion topic is, as of yet, _what_ we are trying to model. I believe I will make a presentation with the main discussion topic on whether we want to model either Source Catalogues for the time being only (the original intention from Jonathan) or more general catalogues including N-dimensional ones. I pretend the presentation to be the "official" birth of the Catalogue Data Model subgroup, as soon as people start the on-line discussions on what we should and should not be modeling. In this respect, I send below our (Jesus Salgado and myself) own ideas with regards to the 2-D versus N-D issue. I still believe there is confusion on what we mean by a catalogue. In our opinion, the n-dimensionality of a catalogue (and this is the sense I tried to give to it in my initial mails) is in the fact that we allow entries -in catalogues- to have entries inside as well, i.e., be catalogues themselves. Also, in allowing different "entry types" to coexist in the same catalogue. The former approach would pose problems like infinite loops (an entry can have references inside to a catalogue that might be the same as the catalogue it belongs to). The latter would give rise to multi-inheritance problems (a source could have been observed by several observations, which would therefore appear as both an "event" (not an astroEvent, as pointed by Ed) and an AstroObject (through the inheritance of a Source). In summary, that type of abnormal behaviour coming from allowing entries inside entries is the type of -probably wrongly called- N-dimensionality I was talking about. Other types of N-Dimensionality can find their place naturally in our model of a catalogue. For example, in the first attempt data model I sent, there is certainly N-dimensionality in the sense that, e.g., a Source makes reference to Observation, StcWcs, and Quantity, which are other Data Models being worked on. Therefore, other dimensions are allowed through the entries themselves, but the dimensions are coming from other Data Models and not the model of the catalogue. Things like trees or direct data graphs are, in my view, ways to represent data, rather than data models in themselves. I appreciate that some Astronomical objects are better represented in a tree like form, but again this is not the objective of the Catalogue Data model, as we are not trying to model all the objects in the universe. In line with this, I already placed my concern in the sense that we will have to model Astronomical Objects in a one-by-one basis, i.e., we might start by modelling the Source object (initial idea of the whole of the CAtalogue DM) and then go ahead to model other objects, like Galaxy, or Star, etc. Again, the modeling of the whole Universe in this sense might take a long time, and whether it is worth the effort would have to be balanced... Also, I do not believe that VOTable is a Data model for anything, but just a representation of data. VOTable is not a data model for a table, is just an agreed way to represent tables. Neither does it give any information about _the model_ itself, but on how to represent it. Data model and representation go separated in my view. It is neither, in my understanding, the intention of the Catalogue Data Model to be able to give answers to querying problems, as mentioned by some of you. The Data models are just one part of the overall picture of the VO, and they will have to be isolated enough from other parts of the VO that they can be plugged in without friction. In this sense, it will be up to the VOQL to define _how_ to access resources, and our models will have to be independent enough to adapt to any way of querying defined. How effectively will joins work between different catalogues will not depend, then, on the model but on how well the catalogue has been structured. As a summary of our view, I would like to call your attention to a very easy Use Case proposed by Mireille that I believe is a very good example of what a Catalogue Data Model might be used for. This is the crossmatch of two sources found in two different catalogues. An example of this could be a VO client that wants to be able to call two different catalogues to check whether the source found by project X in (RA, DEC) in the infrared corresponds to the source found around (RA,DEC) in the radio project Y. In my view, the steps that this tool would have to execute are the following: - Contact the "nearest" registry and ask for "Catalogue" resources (this is the "Registry" part in the first attempt DM I sent. It is part of the registry data of the Catlogue object itself). - select the Infrared and Radio catalogues available (this is part of the registry data of the Catalogue) - select entries in the catalogue around (RA,DEC) with a certain size (this is in the Coordinates part of the Sources object) - from those entries, select the ones above certain likelihood (this is part of the Quantity data -however this one is represented- of the Source) - calculate whatever the algorithm to decide whether the sources represent the same object (I guess this is up to the client; the quantity and quality of those algorithms can be varied) - if there is an image of the source, overlay images from the different sources in the different wavelengths (this is again at the entry (Source) level). - decide if the source emitting in the IR corresponds or not to the one in the radio (this last one is a human action). In this specific case, the "N-dimensionality" has come from the Registry model, the Source model, the Quantity model and the STC model, whereas we have always been working with only a couple of "2-D" catalogues (in the sense of being an X-Y collection). Mireille asked me why in the model I sent, a Galaxy is separated from a Source, as a Source "could be a Galaxy". In this very first attempt model, there is a place reserved for "Type" in the Source Object. That type would be a reference to the Galaxy object, in the case that the source happened to be a Galaxy. However, in most of the cases there is not enough information about a Source to know whether it is a Galaxy or a Quasar or whatever (see, e.g., the 1-XMM Catalogue) and therefore, the placeholder for Type can be left empty. In case of future better knowledge of a specific source, data can be added to the Object in particular which the source happened to be. On the other hand, and in view of the fact that people wanted to have all sorts of catalogues, and not only Sources, there could be a catalogue of Galaxies (e.g., some of the Vizier ones) and in those, the entries would just be Galaxies and therefore modeled after the Galaxy object. As a final summary, I believe that the first attempt data model covers most of what has been said in all your mails, except for all details regarding UCDs, Semantics etc., which I believe have to be tackled later. It covers N-dimensionality through the link to other objects while keeping the simplicity of modeling "simple X-Y" catalogues (as opposed to "complex" catalogues of catalogues) (I have intentionally left out 2-D wording here). Cheers, P. -- Pedro Osuna Alcalaya Software Engineer European Space Astronomy Center (ESAC/ESA) e-mail: Pedro.Osuna@esa.int Tel + 34 91 8131314 European Space Astronomy Center European Space Agency P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN ============================================================================= From: Jonathan McDowell To: Pedro.Osuna@sciops.esa.int Subject: Catalog requirements Date: Mon, 20 Sep 2004 11:16:31 -0400 (EDT) Pedro, I think I forgot to send you some possible requirements from Tony Linde sent to me in July. Here they are, belatedly. Jonathan > From: "Tony Linde" > To: "'Jonathan McDowell'" > Cc: "Andy Lawrence" > Subject: Catalog/Tabular data model > Date: Mon, 19 Jul 2004 15:55:29 +0100 > MIME-Version: 1.0 > Content-Type: text/plain; > charset="us-ascii" > Content-Transfer-Encoding: 7bit > X-Mailer: Microsoft Office Outlook, Build 11.0.5510 > X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 > Thread-Index: AcRtoG6TQmUG1QnJRW2D6gVsQnvqKA== > X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on > head.cfa.harvard.edu > X-Spam-Level: > X-Spam-Status: No, hits=0.8 required=4.5 tests=MSGID_FROM_MTA_HEADER > autolearn=no version=2.61 > > Hi Jonathan, > > Andy asked me to send you some requriements for a catalogue/tabular data > model as needed by Registry and for a prototype DM-based data exchange > mechanism. > > I think the focus of this needs to be a small team, defining the > requirements and working on a first draft, much as you've done in the past. > I'd suggest you, me and someone from each of NVO, AstroGrid and CDS plus > whoever is already working on this for your workgroup - what do you think? > Personally, I think Mireille should be in it because of her work on IDHA or > am I misreading that? I'm not sure yet who from AstroGrid should come in: > Martin has done work on this but is rather confrontational; Elizabeth has > done good work but is more used to Solar area (though that might be > beneficial). > > The scope of the DM is that it ought to be able to model systems like LEDAS, > VizieR, SDSS etc. > > Some initial requirements might be: > > 1. implementation agnostic (though I'm most interested in xsd). > > 2. able to model any catalog and tabular based data (and the route by which > it is known?), from the level of the data centre down to columns in the > tables, via catalogs and other intermediate representations. > > 3. able to represent the structure of holdings (to the extent to which this > is needed for querying etc). > > 4. allows for the modelling of 'associated' metadata such as observation log > details etc. > > But all these ought be developed by the working party we set up. > > Anyway, let me know if you think this is enough to get started with and who > we ought to approach to join the party. > > Cheers, > Tony. > > __ > Tony Linde > Phone: +44 (0)116 223 1292 Mobile: +44 (0)7753 603356 > Fax: +44 (0)116 252 3311 Email: ael@star.le.ac.uk > Post: Department of Physics & Astronomy, > University of Leicester > Leicester, UK LE1 7RH > > Project Manager, Director, > AstroGrid Leicester e-Science Centre > http://www.astrogrid.org http://www.e-science.le.ac.uk/ > =============================================================================