===================================================================
From: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Reply-To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
To: 	dm@ivoa.net
Cc: 	Pedro.Osuna@esa.int, Christophe.Arviset@esa.int, Jesus.Salgado@sciops.esa.int
Subject: 	[CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 12:48:16 +0200	
Dear all,

at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to
get the vacant responsibility of coordinating the efforts in a
"Catalogue" subgroup of the Data Model.

After his agreement, I'm sending this note to ask for volunteers to join
this subgroup and start getting inputs from all of you to compile
information that would eventually become a Catalogue Data Model
recommendation. 

In order to give a bit of flesh on what I understand we are after, I
send you some brainstorming on the whole idea of the Catalogue DM and
hope it serves to start proper discussions on the issue.

Further mails on this will appear with a [CATALOGUE] heading so that
they can be conveniently filtered/trashed.

Thank you.

Cheers,
Pedro Osuna. 


Catalogue Data Model Subgroup starting inputs
---------------------------------------------

In order to build a proper Data Model for Catalogues, I think it would
be important to answer the following questions:

1) What is a Catalogue?
2) What is a Catalogue used for?
3) Why do we want to model Catalogues?
4) Where do Catalogues find a place within the VO?
5) What are the interesting Use Cases for a Catalogue DM?


The most important in the first stages of this work is to
identify what exactly we mean by a Catalogue, to come to a common
agreement on what we will be modeling.
Some of my own views on the definition of what a Catalogue is follow
with the idea to serve as a starting/discussion point.


DEFINITION OF A CATALOGUE
-------------------------

From "Webster's Revised Unabridged Dictionary (1913)":

"[...]A list or enumeration of names, or articles arranged
methodically, often in alphabetical order; as, a catalogue of
the students of a college, or of books, or of the stars.[...]"


In the case of astronomy, thus, a catalogue would be a list or
enumeration of certain astronomical objects (to be clarified later) in a
certain order and including certain information per object.

The definition of an astronomical object in this context would vary.
An astronomical object could be anything from Stars to Galaxies, etc.,
but also something more general like Observations, Sources or
Observatories.

In this sense, the Catalogue data model would not have to describe the
inner details of the object it is cataloging, that should be described
in other data models, but just the information relevant for the
catalogue itself.  It is also true that some of the internal properties
of the astronomical objects would appear in the catalogue itself through
its columns.

For example, the XMM-Newton "1XMM" is a list of serendipitous sources
detected by the satellite in its observing campaign. The model for this
catalogue could consist of things like the provenance (ESA), number of
columns (400) number of rows (~32000), etc., or it might give more
relevant information like: column number three in the catalogue is the
Source.likelihood where likelihood is an attribute of the Source Data
Model.
I think this is an interesting point for discussion.....


A place to find literally thousands of catalogues is the CDS, where they
have 5587 Catalogues available. Their clasification of the catalogues
obeys to the type of data they are cataloging, e.g., Astrometric Data,
Photometric data, Spectroscopic data, etc.. The same question as above
on whether we would have to create specific data model for each of the
eventual astronomical object categories we are cataloging arises.

It would be nice, in passing, to get someone from CDS directly involved
in this subgroup, given their experience in catalogues.


A point to clarify as well is whether a catalogue -in the Data Model
sense- has to be bi-dimensional or can have more than two dimensions.
What I mean by this is that, for example, we might have two different
catalogues for the same set of objects, one for filter A and the other
one for filter B. In the Data Model, however, we might have a unique
object with just three axes, one for the objects, other for filter A and
the other for filter B. The final representations of the catalogues
would always be bi-dimensional, but a Data Model representation allowing
more axes would be more compact, powerful and flexible. Whether this
would be a Pandora box or not I hope to get people's impressions....


In summary, there are obvious things to model from a catalogue, like its
provenance, number of columns, type of columns, names of columns, number
of rows, etc., but there are others which might make the model more
interesting and powerful, like including n-dimensions (in the, let's
say, cartesian sense of orthogonal catalogues, not in a relational one)
or linking the objects cataloged with their own data model....

Hope this serves somehow as a starting point.

I will be on holiday, back on Aug 23., then I'll process any eventual
inputs you sent.


P.S.: on a personal note to me, Jonathan was touching on the issue of
whether we should say CATALOG or CATALOGUE, and the same for other IVOA
standard docs, whether we should use British or American english.
Not being a native speaker, I don't feel with the right to say anything
and apologize beforehand because of my absence of accuracy when writing
this, and other, word(s).

-- 
Pedro Osuna Alcalaya

 
Software Engineer
European Space Astronomy Center
(ESAC/ESA)
e-mail: Pedro.Osuna@esa.int
Tel + 34 91 8131314
                                                                                
European Space Agency
VILLAFRANCA Satellites Tracking Station
P.O. Box 50727
E-28080 Villafranca del Castillo
MADRID - SPAIN
=============================================================================

	From: 	Matthew Graham <mjg@cacr.caltech.edu>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Cc: 	dm@ivoa.net, Pedro.Osuna@esa.int, Christophe.Arviset@esa.int, Jesus.Salgado@sciops.esa.int
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 08:31:16 -0700 (PDT)	

Hi Pedro,

I am curious how you see the difference between the Catalogue DM and
VOTable? Surely, at least, VOTable is the XML serialization of whatever it
is that the Catalogue DM group come up with, so isn't this more a case of
reverse engineering?

        Cheers,

        Matthwe
=============================================================================
	From: 	Jonathan McDowell <jcm@head.cfa.harvard.edu>
Reply-To: 	Jonathan McDowell <jcm@head.cfa.harvard.edu>
To: 	dm@ivoa.net
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 11:50:43 -0400 (EDT)	

> VOTable is the XML serialization of whatever it
is that the Catalogue DM group come up with, so isn't this more a case of
reverse engineering?

No - the other working groups have made it clear they have requirements
for a standard model for astronomical source catalogs. The VOTable
is a serialization of a simple table, but an astronomical catalog
is more than a table - it will have extra standard metadata linking
the sources to their parent observations and extraction algorithms,
for instance. One output of the Catalogue data model effort will
certainly be a more formal statement of the VOTable model, but another
output will be recommendations for ways to serialize this extra metadata
in VOTable (particular PARAM and FIELD values for certain things, for
example). And yet another output will be an XML schema for those who
prefer to use generic XML, although in the particular case of catalogs
I hope that a VOTable-based serialization will be the preferred 
approach. But just saying "write a VOTable" is not a sufficient spec.
I hope the CDS folks can say a little about how a Vizier README is
converted to VOTable, and others can comment on how pipeline-generated
catalogs should be recorded, and what extra metadata (wavelet scales,
data characterization like wavelength band, etc) are appropriate.

  - Jonathan
=============================================================================
	From: 	Brian Thomas <thomas@astro.umd.edu>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Cc: 	Pedro.Osuna@esa.int, Christophe.Arviset@esa.int, Jesus.Salgado@sciops.esa.int
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 12:06:07 -0400	
On Monday 26 July 2004 06:48 am, Pedro Osuna wrote:
> Dear all,
>
> at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to
> get the vacant responsibility of coordinating the efforts in a
> "Catalogue" subgroup of the Data Model.
>
> After his agreement, I'm sending this note to ask for volunteers to join
> this subgroup and start getting inputs from all of you to compile
> information that would eventually become a Catalogue Data Model
> recommendation.
>

        Hi Pedro,

        I have been working on catalog schema for some time, most recently for
        NOAO survey data. 

        I am interested in belonging to this subgroup.


        =b.t.


-- 

  * Dr. Brian Thomas 

  * Dept of Astronomy/University of Maryland-College Park
  * NOAO Science Archive
  * Code 630.1/Goddard Space Flight Center-NASA

  *   fax: (301) 286-1775
  * phone: (301) 286-6128 [GSFC]
           (301) 405-2312 [UMD] 
=============================================================================
	From: 	Roy Williams <roy@caltech.edu>
Reply-To: 	Roy Williams <roy@caltech.edu>
To: 	Jonathan McDowell <jcm@head.cfa.harvard.edu>, dm@ivoa.net
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 11:37:34 -0700	
> The VOTable
> is a serialization of a simple table, but an astronomical catalog
> is more than a table - it will have extra standard metadata linking
> the sources to their parent observations and extraction algorithms,
> for instance.

You will use inheritance, I hope. Not just build everything from scratch?

We already have a simple example. A ConeSearchResponse inherits from the
VOTable model. It is a VOTable that must have RA, Dec, and ID attributes.

We can also inherit curatedTable from Table by adding the VOResource
curation information.

Please tell me you are not going to rebuild all this stuff that we already
have....?

Roy
=============================================================================
	From: 	Kirk Borne <borne@rings.gsfc.nasa.gov>
Reply-To: 	Kirk Borne <borne@rings.gsfc.nasa.gov>
To: 	roy@caltech.edu
Cc: 	Jonathan McDowell <jcm@head.cfa.harvard.edu>, dm@ivoa.net, Kirk Borne (at George Mason University) <kborne@gmu.edu>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 15:11:00 -0400 (EDT)	
A catalogue is not a table, generally speaking.  It can be expressed 
as a table or as a set of many tables, but that is not the point.  
A catalogue is a set of derived data: derived from imaging, spectral, 
event lists, time series, interferometric, or other types of data.
The data model must describe not only the structure of the "table", 
including attributes and values, but also inheritance, provenance,
semantics, error columns, and more (units, formats, column-column 
relationships, table-table relationships).  

We will *not* have to rebuild all this stuff that we already have,
if we simply recognize the work of the former-ADC group, which started 
building the catalogue model already, as expressed in "dataset"...

    http://xml.gsfc.nasa.gov/#dataset

Furthermore, UCDs already provide the semantics.

- Kirk


> From owner-dm@eso.org  Mon Jul 26 14:51:05 2004
> From: "Roy Williams" <roy@caltech.edu>
> To: "Jonathan McDowell" <jcm@head.cfa.harvard.edu>, <dm@ivoa.net>
> Subject: Re: [CATALOGUE]Starting Data Model Subgroup
> Date: Mon, 26 Jul 2004 11:37:34 -0700
> 
> > The VOTable
> > is a serialization of a simple table, but an astronomical catalog
> > is more than a table - it will have extra standard metadata linking
> > the sources to their parent observations and extraction algorithms,
> > for instance.
> 
> You will use inheritance, I hope. Not just build everything from scratch?
> 
> We already have a simple example. A ConeSearchResponse inherits from the
> VOTable model. It is a VOTable that must have RA, Dec, and ID attributes.
> 
> We can also inherit curatedTable from Table by adding the VOResource
> curation information.
> 
> Please tell me you are not going to rebuild all this stuff that we already
> have....?
> 
> Roy
> 
=============================================================================
 	From: 	Arnold Rots <arots@head.cfa.harvard.edu>
Reply-To: 	Arnold Rots <arots@head.cfa.harvard.edu>
To: 	dm@ivoa.net
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 15:29:23 -0400 (EDT)	
That's correct, it's a set of derived data - often in the past, for
practical reasons, printed as a table.
But now that we are much more flexible in choosing our publication
forms, there may be links to data objects from which parameter values
may be derived on the fly and according to the user's specifications.
Or, for that matter, the publication form may be graphical, or a
spreadsheet.  In other words, we now have the ability to create truly
interactive catalogs and it would be a pity to constrain ourselves by
what a catalog(ue) used to look like.
And, by the way, STC contains space-time coordinate metadata elements
specifically intended for catalog records.

  - Arnold

Kirk Borne wrote:
> A catalogue is not a table, generally speaking.  It can be expressed 
> as a table or as a set of many tables, but that is not the point.  
> A catalogue is a set of derived data: derived from imaging, spectral, 
> event lists, time series, interferometric, or other types of data.
> The data model must describe not only the structure of the "table", 
> including attributes and values, but also inheritance, provenance,
> semantics, error columns, and more (units, formats, column-column 
> relationships, table-table relationships).  
> 
> We will *not* have to rebuild all this stuff that we already have,
> if we simply recognize the work of the former-ADC group, which started 
> building the catalogue model already, as expressed in "dataset"...
> 
>     http://xml.gsfc.nasa.gov/#dataset
> 
> Furthermore, UCDs already provide the semantics.
> 
> - Kirk
> 
> 
> > From owner-dm@eso.org  Mon Jul 26 14:51:05 2004
> > From: "Roy Williams" <roy@caltech.edu>
> > To: "Jonathan McDowell" <jcm@head.cfa.harvard.edu>, <dm@ivoa.net>
> > Subject: Re: [CATALOGUE]Starting Data Model Subgroup
> > Date: Mon, 26 Jul 2004 11:37:34 -0700
> > 
> > > The VOTable
> > > is a serialization of a simple table, but an astronomical catalog
> > > is more than a table - it will have extra standard metadata linking
> > > the sources to their parent observations and extraction algorithms,
> > > for instance.
> > 
> > You will use inheritance, I hope. Not just build everything from scratch?
> > 
> > We already have a simple example. A ConeSearchResponse inherits from the
> > VOTable model. It is a VOTable that must have RA, Dec, and ID attributes.
> > 
> > We can also inherit curatedTable from Table by adding the VOResource
> > curation information.
> > 
> > Please tell me you are not going to rebuild all this stuff that we already
> > have....?
> > 
> > Roy
> > 
> > 
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots@head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------

=============================================================================
	From: 	Matthew Graham <mjg@cacr.caltech.edu>
Reply-To: 	Matthew Graham <mjg@cacr.caltech.edu>
To: 	Kirk Borne <borne@rings.gsfc.nasa.gov>
Cc: 	roy@caltech.edu, Jonathan McDowell <jcm@head.cfa.harvard.edu>, dm@ivoa.net, Kirk Borne (at George Mason University) <kborne@gmu.edu>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 26 Jul 2004 12:32:16 -0700 (PDT)	

Hi,

> A catalogue is not a table, generally speaking.  It can be expressed 
> as a table or as a set of many tables, but that is not the point.  
> A catalogue is a set of derived data: derived from imaging, spectral, 
> event lists, time series, interferometric, or other types of data.
> The data model must describe not only the structure of the "table", 
> including attributes and values, but also inheritance, provenance,
> semantics, error columns, and more (units, formats, column-column 
> relationships, table-table relationships).  

This really just sounds like defining the VOTable superclass. If there is
more to it than that might I suggest changing the name of this DM from
Catalogue to Derived Data Set or something similar because when I think of
a catalogue, I think of a tabulated structure, be it for books in my local
library or lingerie from Victoria's Secret.

        Cheers,

        Matthew
=============================================================================
	From: 	Gerard Lemson <gerard.lemson@mpe.mpg.de>
Reply-To: 	Gerard Lemson <gerard.lemson@mpe.mpg.de>
To: 	Matthew Graham <mjg@cacr.caltech.edu>
Cc: 	dm@ivoa.net
Subject: 	RE: [CATALOGUE]Starting Data Model Subgroup
Date: 	Tue, 27 Jul 2004 10:57:56 +0200	
Hi,

> > A catalogue is not a table, generally speaking.  It can be expressed
> > as a table or as a set of many tables, but that is not the point.
> > A catalogue is a set of derived data: derived from imaging, spectral,
> > event lists, time series, interferometric, or other types of data.
> > The data model must describe not only the structure of the "table",
> > including attributes and values, but also inheritance, provenance,
> > semantics, error columns, and more (units, formats, column-column
> > relationships, table-table relationships).
>
> This really just sounds like defining the VOTable superclass. If there is
> more to it than that might I suggest changing the name of this DM from
> Catalogue to Derived Data Set or something similar because when I think of
> a catalogue, I think of a tabulated structure, be it for books in my local
> library or lingerie from Victoria's Secret.
>
The Catalogue data model will hopefully provide a formal way to describe
an astronomical catalogue "completely".
This will include more than simply saying it is a table with a certain
number of columns whose meaning can sometimes be described by some string
expression. The fact that some catalogue instances can be serialized
into such a structure does not mean that a catalogue "is-a" (VO)table,
or a (VO)table "is-a" catalogue.

It is precisely the task of the DM group to define ways by which one can
distinguish different tabular data structures by providing means for
describing their contents, i.e. by providing a model for the meta-data
that should be attached to the structure.

To stay with your metaphor, such a description will help you
make the right choice in the morning so that you will come to work wearing a
boxer short instead of the latest Harry Potter.

Cheers

Gerard

=============================================================================
	From: 	Pierre Didelon <pdidelon@cea.fr>
Reply-To: 	Pierre Didelon <pdidelon@cea.fr>
To: 	dm@ivoa.net
Cc: 	Kirk Borne <borne@rings.gsfc.nasa.gov>, Kirk Borne (at George Mason University) <kborne@gmu.edu>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Tue, 27 Jul 2004 11:14:07 +0200	
Hi everybody,

some comment concerning provenance/history handling,
a little bit off from the main subject of this thread,
but which may (perhaps) impact deeply on catalog design.
Even if obvious as claimed by Pedro Osuna,
"In summary, there are obvious things to model from a catalogue, like its
provenance...", it can be complex depending of the level considered.

In fact a catalog history can be handled at different level;
- first it can be related to the whole catalog it self
        -> one catalog - one provenance/history
- it can be related to a column or a row
        -> one row/column - one provenance/history
- it can be related to a cell
        -> one cell - one provenance/history
- or even it can be related to a group of cell,
a sub-cube in the catalog cube ( of eventually n dim.)
        -> one sub-cube/group_of_data - one provenance/history
One obvious and simple example of this, is illustrated by
all RA, DEC ,ErrRA, ErrDEC obtained with one astrometric
calibration for a certain set of astronomical objects;
in this case this sub-cube (4 cols * n rows) has a common
history which can be different from a photometric data part
of the (same?) catalog.

I remember that I had a fruitfull conversion with Pat Dowler
in Cambridge (UK) concerning this subject, and their related experience
in CADC. He can may be add some comments on this subject.

But it can be seen already, that depending of the kind of granularity
we want to handle with provenance/history, the implementation may be
different and more or less complex.


Kirk Borne wrote:

> A catalogue is not a table, generally speaking.  It can be expressed 
> as a table or as a set of many tables, but that is not the point.  
> A catalogue is a set of derived data: derived from imaging, spectral, 
> event lists, time series, interferometric, or other types of data.
Yes. But how derivation is made, and how is it kept in (or with) the
catalog is not always identical depending of the kind of catalog.
I remember an article of C.Jaschek making a kind of classification
of catalog types between Observation Catalog to Compilation Catalog
of data collections merging and homogenisation :

http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1984QJRAS..25..259J&db_key=AST&high=3d9c6cf76d17675

some considerations may be of interest, as well as some from
another article concerning information and catalogues,

http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1973IAUS...50..275J&db_key=AST&high=3d9c6cf76d17675


> The data model must describe not only the structure of the "table", 
> including attributes and values, but also inheritance, provenance,
remarks, above.
> semantics, error columns, and more (units, formats, column-column 
> relationships, table-table relationships).  
> 
> We will *not* have to rebuild all this stuff that we already have,
> if we simply recognize the work of the former-ADC group, which started 
> building the catalogue model already, as expressed in "dataset"...
> 
>     http://xml.gsfc.nasa.gov/#dataset
As well as the CDS ReadMe and all former catalog descriptions they use,
(see http://vizier.u-strasbg.fr/doc/catstd.htx). All this, including VOTable,
must guide the DM group, but not frooze the DM Catalogue elaboration.
> 
> Furthermore, UCDs already provide the semantics.
> 
> - Kirk
> 

SY
--
Pierre
--------------------------------------------------------------------------
DIDELON :@: pdidelon_at_cea.fr        Phone : 33 (0)1 69 08 58 89
CEA SACLAY - Service d'Astrophysique  91191 Gif-Sur-Yvette Cedex
--------------------------------------------------------------------------


=============================================================================
	From: 	Gerard Lemson <gerard.lemson@mpe.mpg.de>
Reply-To: 	Gerard Lemson <gerard.lemson@mpe.mpg.de>
To: 	dm@ivoa.net
Subject: 	RE: [CATALOGUE]Starting Data Model Subgroup
Date: 	Tue, 27 Jul 2004 13:18:45 +0200	
> > VOTable is the XML serialization of whatever it
> is that the Catalogue DM group come up with, so isn't this more a case of
> reverse engineering?
>
> No - the other working groups have made it clear they have requirements
> for a standard model for astronomical source catalogs. The VOTable
> is a serialization of a simple table, but an astronomical catalog
> is more than a table - it will have extra standard metadata linking
> the sources to their parent observations and extraction algorithms,
> for instance. One output of the Catalogue data model effort will
> certainly be a more formal statement of the VOTable model, but another
> output will be recommendations for ways to serialize this extra metadata
> in VOTable (particular PARAM and FIELD values for certain things, for
> example). And yet another output will be an XML schema for those who
> prefer to use generic XML, although in the particular case of catalogs
> I hope that a VOTable-based serialization will be the preferred
> approach. But just saying "write a VOTable" is not a sufficient spec.
> I hope the CDS folks can say a little about how a Vizier README is
> converted to VOTable, and others can comment on how pipeline-generated
> catalogs should be recorded, and what extra metadata (wavelet scales,
> data characterization like wavelength band, etc) are appropriate.
>

I fully agree with Jonathan here, but would like to add some comments.

I think one of the things that is often not realized is the fact that
the DM WG's needs to provide models for the meta-data describing the
contents of some data product, as well as models for the data themselves.
For example, the Observation model is mainly a model for the meta-data
describing
the results of an observation. This is more than describing how the data is
stored
and/or formatted. The latter may be done using the Quantity model, I guess.

Secondly, it still seems that people confuse the act of defining a datamodel
with
that of defining representations/serializations of the data model applicable
to
a particular runtime environment within which one wants to deal with
instances
of the datamodel, be that messaging (XML), Java virtual machine or
relational database.
Defining such serializations is, or should be part of the DM WG's tasks.

In the data modeling effort it *is* extremely useful to look at existing
data models,
even if only implicitly represented in particular serializations, if only to
see
which concepts, entities, attributes and relationships others have thought
of already
and should therefore probably be incorporated into the IVOA data model.
One can however not insist in advance that the data model itself should be
tied to
some existing representation, as this may be unsuitable for representations
that must
work in a different environment.

Even when we interpret some of the comments in the context of the definition
of a
serialization I think we should not predefine *how* exactly to use the
results of existing efforts.
For example I see no a-priori reason why we should follow Roy's suggestion
to "use inheritance".
Inheritance is only one way in which the results of the VOTable/conesearch
can be reused.
Data modeling languages allow many different types of relations between
entities
and in fact inheritance is the one most often abused.

Cheers

Gerard	

=============================================================================
	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>, Data Model IVOA List <dm@ivoa.net>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup
Date: 	Mon, 02 Aug 2004 16:04:30 -0400	
Pedro Osuna wrote:

>Dear all,
>
>at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to
>get the vacant responsibility of coordinating the efforts in a
>"Catalogue" subgroup of the Data Model.
>  
>
It is great that someone is taking this on!

>
>
>
>DEFINITION OF A CATALOGUE
>-------------------------
>
>>From "Webster's Revised Unabridged Dictionary (1913)":
>
>"[...]A list or enumeration of names, or articles arranged
>methodically, often in alphabetical order; as, a catalogue of
>the students of a college, or of books, or of the stars.[...]"
>
>
>In the case of astronomy, thus, a catalogue would be a list or
>enumeration of certain astronomical objects (to be clarified later) in a
>certain order and including certain information per object.
>
>The definition of an astronomical object in this context would vary.
>An astronomical object could be anything from Stars to Galaxies, etc.,
>but also something more general like Observations, Sources or
>Observatories.
>
>In this sense, the Catalogue data model would not have to describe the
>inner details of the object it is cataloging, that should be described
>in other data models, but just the information relevant for the
>catalogue itself.  
>
I agree with everything up to this point.

>It is also true that some of the internal properties
>of the astronomical objects would appear in the catalogue itself through
>its columns.
>
Here.  The mere mention of columns is, in my opinion, out of place.  The 
concept of rows and columns should not appear in any component of our 
data model.  They belong in a relational database data model.  Here I 
think we are working on a more abstract level in which objects may 
contain other objects.  This results in  tree-like structures.  We 
should worry about transformation into a set of interelated relational 
tables only after the VO data model for this is complete.  I believe 
that Roy correctly chimed in that  VOTable can  already do this only 
because Pedro incorrectly brought up the issue of  describing rows and 
columns.

>
>For example, the XMM-Newton "1XMM" is a list of serendipitous sources
>detected by the satellite in its observing campaign. The model for this
>catalogue could consist of things like the provenance (ESA), number of
>columns (400) number of rows (~32000), etc., or it might give more
>relevant information like: column number three in the catalogue is the
>Source.likelihood where likelihood is an attribute of the Source Data
>Model.
>I think this is an interesting point for discussion.....
>
>  
>
 A catalog should be a list of sourceObjects which 
holds/contains/aggregates  Quantities.  The quantities should be allowed 
to be of arbitrary depth and detail.  That is, one should be free to 
enter QuntitySets of QuantitySets.   To make this more concrete, lets 
talk about a general catalog of galaxies. We wish to provide at a 
minimum basic data about each galaxy (ie. simple quantities: magnitudes, 
ra, dec, morphological class).  Also, one wants the Observations of each 
galaxy, such as Image.  We may just want to hold crucial metadata about 
each image (exposure time, ra,dec, filter) and perhaps a URL to the 
actual data.  But we may want to group these images into various 
regions. So we have /galaxy/region/observation/image so far.  Region may 
specify not just the location on the celestial sphere, but  also give 
information on the type of region (spiral arm, interarm, open cluster 
region, outerhalo, etc). There may be photometry catalogs created from 
these images that are to be included.  These catalogs should have 
starObjects with mags with errors and filter info, and location pointers 
to pixel coordinates in the image.  Some of the photometryCatalogs are 
the children of  images but some may be concatention of several tables 
within a region.  That would be a child of the region.  Also in the 
region may be some higher resolution images in a crowded region 
(/galaxy/region/region/observation/photoCatalog).    We may want to 
point out variable stars, supernovae, etc so one has special subCatalogs 
of these.   There may be reasons for others to attach additional info 
about  the variable stars since they may be messing up the TRGB 
distances.  Finally there are outputs of the tip edge detectors and 
their input paramters as well.

Columns does not mean anything in this context.  Although one could and 
will provide a mechanism to serialize this by VOTABLE, a more object 
oriented method is prefered, not because it is easier for the human to 
read, but because it is easier for the machine to read.  To make it 
manageable to the human one has XSLT scripts for each object type.  One 
can provide skeleton views to see the general nested structure and then 
click on an object to display it more completely.

>A place to find literally thousands of catalogues is the CDS, where they
>have 5587 Catalogues available. Their clasification of the catalogues
>obeys to the type of data they are cataloging, e.g., Astrometric Data,
>Photometric data, Spectroscopic data, etc.. The same question as above
>on whether we would have to create specific data model for each of the
>eventual astronomical object categories we are cataloging arises.
>
>  
>
I think this is what the DM is all about.  We are creating spectral 
object, bandpass object, and STCobject. These are building blocks for 
spectralCatalog, photometricCatalog, and astrometricCatalog respectively.
However, I think 90% of what one wants in any astronomicalObject is 
satisfied by the same set of things.  Universe, cluster of galaxies, 
galaxy, cluster, star, planet, comet,  can all take STC for location, 
Quantity for any global property, Region for subregions, Layer for 
layers like convection zone or mesosphere,  Members or perhaps Parts  
for  component parts.

The real power of this schema is that one can establish a data model 
schema for query that is acceptable to all data centers, but completely 
hides each datacenters internal organization.  A query for galaxies 
with  supergiant stars in the interarm region is a simple XPath:

//galaxy//region[@type="interarm"]//star//spectralType/value="supergiant"

This query could be sent to all datacenters and be decomposed at the 
datacenters into a set of  SQLs  to retrieve the appropriate data and 
then  construct a  galaxyCatalog for output.  Used inside of an XQuery, 
the request could compose an alternate structure for the output XML 
object such as starCatalog rather than galaxyCatalog.

Ed

=============================================================================

	From: 	Martin Hill <mchill@dial.pipex.com>
Reply-To: 	Martin Hill <mchill@dial.pipex.com>
To: 	Data Model IVOA List <dm@ivoa.net>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup - terms like 'column'
Date: 	Tue, 03 Aug 2004 09:32:11 +0100	
Ed Shaya wrote:

> Here.  The mere mention of columns is, in my opinion, out of place.  The 
> concept of rows and columns should not appear in any component of our 
> data model.  They belong in a relational database data model.  Here I 
> think we are working on a more abstract level in which objects may 
> contain other objects.  This results in  tree-like structures.  We 
> should worry about transformation into a set of interelated relational 
> tables only after the VO data model for this is complete.  I believe 
> that Roy correctly chimed in that  VOTable can  already do this only 
> because Pedro incorrectly brought up the issue of  describing rows and 
> columns.

Being a bit pedantic, but our data models won't necessarily be trees either.  In 
fact our data models are 'relational' - they consist of various bits of 
information in 'lumps' that make sense to us, related to other 'lumps'.  Some of 
these relations will be tree-like, but some won't.  We *could* write down our 
models where the 'lumps' are 'table definitions' and the 'bits of information' 
are 'columns' in those tables.   I believe however we are intending to model 
these lumps as 'objects' and the bits of information as 'properties' of those 
objects, and use UML relational diagrams to write it down.

Using UML to represent our data and its relationships is fine, but we must also 
remember that our data may be stored and processed in non-OO languages, such as 
FORTRAN.  If some find it easy to think in columns and tables, and others in 
terms of objects and properties, we should be able to cope with both.

But we should avoid using particular implemenations of representations; we 
shouldn't try and describe *models* in terms of Java Objects/Interfaces or 
Sybase or VOTables or FORTRAN structs or XML Schemas.  These are specific 
implementations of representations, not suitable for our general models, but we 
may want to use them for 'worked examples' of how our models might be used in 
practice.

Cheers,

Martin

-- 
Martin Hill
www.mchill.net
+44 7901 55 24 66
=============================================================================

	From: 	Martin Hill <mchill@dial.pipex.com>
Reply-To: 	Martin Hill <mchill@dial.pipex.com>
Cc: 	Data Model IVOA List <dm@ivoa.net>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup - reinventing modelling
Date: 	Tue, 03 Aug 2004 09:44:24 +0100	
Ed Shaya wrote:

> Pedro Osuna wrote:
>> For example, the XMM-Newton "1XMM" is a list of serendipitous sources
>> detected by the satellite in its observing campaign. The model for this
>> catalogue could consist of things like the provenance (ESA), number of
>> columns (400) number of rows (~32000), etc., or it might give more
>> relevant information like: column number three in the catalogue is the
>> Source.likelihood where likelihood is an attribute of the Source Data
>> Model.
>> I think this is an interesting point for discussion.....
>>
>>  
>>
> A catalog should be a list of sourceObjects which 
> holds/contains/aggregates  Quantities.  The quantities should be allowed 
> to be of arbitrary depth and detail.  That is, one should be free to 
> enter QuntitySets of QuantitySets.   

I'm happy with the next bit, which is a prose description of some of the things 
that Ed would like to see in the catalogue data model.  Bearing in mind my 
previous email though, our models can be relational rather than restricted to 
trees, so our catalogues might not be just a set of things that might be sets of 
other things.

However the above bit seems to complicate what should be straightforward; we 
already have a system for modelling 'things' that are aggregates of other 
'things' - they are 'Objects' in UML.  Let's model those things first, *then* 
see if there are common elements we can factor out to 'QuantitySets'.  Doing it 
the other way around is Bad Practice (as I have mentioned before) and adds an 
unnecessary layer of IVO-specific terms to what should be a straightforward 
exercise.

Let's hear more about what people need to know, and also about what people don't 
want to know.  For example, it seems some people don't care about Passband 
details for most cases; they just need a simple Fravergy error band on a flux 
measurement.  This implies we may need more than one way of modelling similar 
information.

Cheers,

Martin

> To make this more concrete, lets 
> talk about a general catalog of galaxies. We wish to provide at a 
> minimum basic data about each galaxy (ie. simple quantities: magnitudes, 
> ra, dec, morphological class).  Also, one wants the Observations of each 
> galaxy, such as Image.  We may just want to hold crucial metadata about 
> each image (exposure time, ra,dec, filter) and perhaps a URL to the 
> actual data.  But we may want to group these images into various 
> regions. So we have /galaxy/region/observation/image so far.  Region may 
> specify not just the location on the celestial sphere, but  also give 
> information on the type of region (spiral arm, interarm, open cluster 
> region, outerhalo, etc). There may be photometry catalogs created from 
> these images that are to be included.  These catalogs should have 
> starObjects with mags with errors and filter info, and location pointers 
> to pixel coordinates in the image.  Some of the photometryCatalogs are 
> the children of  images but some may be concatention of several tables 
> within a region.  That would be a child of the region.  Also in the 
> region may be some higher resolution images in a crowded region 
> (/galaxy/region/region/observation/photoCatalog).    We may want to 
> point out variable stars, supernovae, etc so one has special subCatalogs 
> of these.   There may be reasons for others to attach additional info 
> about  the variable stars since they may be messing up the TRGB 
> distances.  Finally there are outputs of the tip edge detectors and 
> their input paramters as well.
> 
> 

-- 
Martin Hill
www.mchill.net
+44 7901 55 24 66

=============================================================================
	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Reply-To: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Martin Hill <mchill@dial.pipex.com>
Cc: 	Data Model IVOA List <dm@ivoa.net>
Subject: 	Re: [CATALOGUE]Starting Data Model Subgroup - terms like 'column'
Date: 	Tue, 03 Aug 2004 13:14:10 -0400	
Martin Hill wrote:

> Ed Shaya wrote:
>
>> Here.  The mere mention of columns is, in my opinion, out of place.  
>> The concept of rows and columns should not appear in any component of 
>> our data model.  They belong in a relational database data model.  
>> Here I think we are working on a more abstract level in which objects 
>> may contain other objects.  This results in  tree-like structures.  
>> We should worry about transformation into a set of interelated 
>> relational tables only after the VO data model for this is complete.  
>> I believe that Roy correctly chimed in that  VOTable can  already do 
>> this only because Pedro incorrectly brought up the issue of  
>> describing rows and columns.
>
>
> Being a bit pedantic, but our data models won't necessarily be trees 
> either.  In fact our data models are 'relational' - they consist of 
> various bits of information in 'lumps' that make sense to us, related 
> to other 'lumps'.  Some of these relations will be tree-like, but some 
> won't.  We *could* write down our models where the 'lumps' are 'table 
> definitions' and the 'bits of information' are 'columns' in those 
> tables.   I believe however we are intending to model these lumps as 
> 'objects' and the bits of information as 'properties' of those 
> objects, and use UML relational diagrams to write it down.

My use of the word relational was a poor choice.  I was just saying that 
a catalog should not be restricted to only  2 dimensional datasets.   
You are right that tree-like is  also not  general  enough since there 
could be explicit relationships from any object to any other object.   
If we agree to extend the meaning of the word column to mean a set of 
similar classed objects, then I could accept its use as well.   But 
still a Catalog may not have any columns since  each object may  have 
differing sets of properties.  For instance a list of two clusters of 
galaxies.  For one we know its richness and X-ray properties, for the 
other we know its member names and its mass.

>
> Using UML to represent our data and its relationships is fine, but we 
> must also remember that our data may be stored and processed in non-OO 
> languages, such as FORTRAN.  If some find it easy to think in columns 
> and tables, and others in terms of objects and properties, we should 
> be able to cope with both.
>
I'm afraid it will be just too difficult to program in FORTRAN77 for the 
general Catalog.  But for certain common subclasses of Catalog it should 
be fine. 

> But we should avoid using particular implemenations of 
> representations; we shouldn't try and describe *models* in terms of 
> Java Objects/Interfaces or Sybase or VOTables or FORTRAN structs or 
> XML Schemas.  These are specific implementations of representations, 
> not suitable for our general models, but we may want to use them for 
> 'worked examples' of how our models might be used in practice.
>
Of course.  One needs either a modeling language or an ontology.  Along 
these lines, I believe that modeling languages like UML are best for 
processing and data flow architectures.  Ontology is best for 
information and knowledge statement architectures.  Most of what we are 
trying to do in DM  is the latter.

> Cheers,
>
> Martin
>
=============================================================================
	From: 	Elizabeth Auden <eca@mssl.ucl.ac.uk>
To: 	Pedro.Osuna@esa.int
Subject: 	[CATALOGUE]Starting Data Model Subgroup - Astrogrid joiner
Date: 	Wed, 11 Aug 2004 16:43:48 +0100 (BST)	
Hi Pedro,

> at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to 
> get the vacant responsibility of coordinating the efforts in a 
> "Catalogue" subgroup of the Data Model.

Tony Linde has asked me to join the catalogue data model subgroup on 
behalf of Astrogrid. I have just joined the data model mail list, and I've 
been working my way through the Catalogue thread.  The main experience 
I've had with catalogues in the past has been 1) bright star catalogues 
(such as Tycho II), 2) photometric catalogues (like HST standard candles), 
and 3) solar event catalogues (flares, loops, coronal mass ejections, 
etc).

> What is a Catalogue?
In my experience, either a list of objects with something in common (ie 
bright stars, non-variable UV standard candles) that are organized by 
coordinates, OR a list of events such as gamma ray bursts and  solar 
flares, organized by coordinates and by time.

> What is a Catalogue used for?

I've used different catalogues for different things:
1. Bright stars: used for navigation for the Swift satellite ("don't point 
the satellite at this")
2. Photometric standards: calibration of filters and grisms for XMM-OM and 
Swift
3. Solar event catalogues: producing movies from specific pieces of solar 
satellite data (ie, give me a movie showing yesterday's solar flare, but 
don't give me a movie of the rest of the time when nothing exciting 
happened)

> Why do we want to model Catalogues?

Zillions of column headers have been identified since space objects and 
events were first catalogues; the data will be more efficiently searched 
(and hopefully yield more efficient science) if common themes - such as 
spatial, spectral, and temporal information, can be exploited.

Catalogues aren't just used for pure science research; they're also useful 
for hardware and software instrumentation.

> Where do Catalogues find a place within the VO?

As the front gate to a data source, or as a starting point to obtain data 
products from multiple sources.

> What are the interesting Use Cases for a Catalogue DM? 
I'll give this one some more thought!

cheers,
Elizabeth Auden

=============================================================================

	From: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
To: 	Elizabeth Auden <eca@mssl.ucl.ac.uk>, Brian Thomas <thomas@astro.umd.edu>, Jonathan McDowell <jcm@head.cfa.harvard.edu>
Cc: 	Pedro.Osuna@esa.int, Jesus.Salgado@sciops.esa.int
Subject: 	[CATALOGUE]Warming up...
Date: 	Mon, 30 Aug 2004 17:25:07 +0200	
Dear all,


I went through all the mails received on the Catalogue issue. Thank you
for volunteering to join the effort.

I think Brian has done some work related to Catalogues and so has
Elizabeth. I haven't seen comments from any of you two on the mails sent
by the people, so it would be nice to make a survey of the claimed
things to try to come to an agreement on what we are after.

So I'll start myself with a summary on mine and Jesus view of the
comments from people, and hope to get inputs from you as well.


- With respect to the comment that "VOTable is the XML serialization of
whatever it is that the DM group come up with" (M. Graham) I disagree,
as I think the serialization of a data model (whichever it is) is
independent from the way it is serialized. For "complex" data models,
the VOTable might NOT allow for complete serialization, whereas for
simple ones it might (e.g., SIAP extensions). But in summary, the
serialization is independent of the model.

- with respect to the "use of inheritance" (R. Williams) I didn't quite
understand the point. We will use inheritance in the model wherever it
is appropriate for the data model, just as part of the modeling effort,
but using inheritance as a general sort of tool I do not understand. I
did not understand either what the Data Model for VOTable is (mentioned
by Roy and also in the answer from Jonathan to M .Graham) so may be
Jonathan could tell us more about what that means (as far as I know,
there are only three data models going on: Observation, Quantity and
Spectrum (plus the current Catalogue)).

- about the use of the words "tables", "attributes" and "values" in the
mail from Kirk Borne, I'd prefer to avoid in the future using these type
of words (like column and row, which created so much discussion after my
mail) as people tend to interpret words literally for what they mean in
their experience and do not go any further in the interpretation. This
is a very important point in the definition of the Catalogue. 

In a private mail from Jonathan before I sent the mail to the DM, I was
wondering whether we should model "two-dimensional" catalogues or
"n-dimensional" ones. Jonathan answered to me on these lines:
"[...]I'm going to argue (as a member of the group, not wearing my Chair
hat) that we should focus on source catalog(ue)s rather
than general tables like lists of observatories[...]" "[...]To put it
another way, is there a difference between a CATALOG(UE) model and a
TABLE model? What things are tables but not catalog(ue)s? I am not
convinced we should spend too much effort on a very general table
model[...]"

However, I see that some of the people answering my original mail seemed
to be pushing for a very general Data Model including all possible types
of catalogues.

I have to say that trying to model all types of catalogues could be a
never-ending task and I tend to agree with Jonathan that we should be
concentrating on standard source type catalogues, but could you all
please give an opinion on this? 

- the mail from Arnold was mentioning that we are in a position to have
"fully interacting catalogues". I'll send a mail to Arnold copy to you
asking him to provide us with a use case for interacting catalogues.

- Pierre Didelon posed his concern on "trivialising" things like
Provenance. It is clear that some of these things are not trivial at
all, but it shows that we should be very careful when designing the data
model as people seem to be very touchy on the things they know most.

- G. Lemson says that "[...]Defining such serializations -for the data
models- should be part of the DM WG's task[...]"
I think this is a very important point with which I disagree, as I
believe (as said before) that a DM is independent from the
serialization. Could you all please comment on this?


Jesus and myself are preparing a first attempt data model for a General
catalogue and will be sending it during the course of the week. We will
try to also model the Source object which I think is the main object we
should be cataloguing. If you have any inputs, please let us know.


Wait for your news....

Cheers,
P.

 
-- 
Pedro Osuna Alcalaya

 
Software Engineer
European Space Astronomy Center
(ESAC/ESA)
e-mail: Pedro.Osuna@esa.int
Tel + 34 91 8131314
                                                                                
European Space Agency
VILLAFRANCA Satellites Tracking Station
P.O. Box 50727
E-28080 Villafranca del Castillo
MADRID - SPAIN
=============================================================================
	From: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Reply-To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
To: 	dm@ivoa.net
Cc: 	Pedro.Osuna@esa.int
Subject: 	[CATALOGUE]Second round seeking people....
Date: 	Mon, 30 Aug 2004 17:26:57 +0200	
Dear all,


after having processed all the mails related to the Catalogue Subgroup
creation, I have seen only two people showing interest in joining the
group, Brian Thomas and Elizabeth Auden.
 
I have, however, seen a lot of interesting discussions, so I'd again
insist in having more people joining the group. 

For the time being, I'll consider the group as being formed by
Elizabeth, Brian, Jonathan (as head of the general group), Jesus
(Salgado) and myself, to whom more restricted mails will be sent
whenever appropriate.

With respect to all the mails received, they give a lot of meat to
discuss, as expected from my original mail, so we will start with
internal discussions and let the rest of the general DM group know with
a proper "[CATALOGUE]" header as already agreed whenever appropriate.


Cheers,
P.

-- 
Pedro Osuna Alcalaya

 
Software Engineer
European Space Astronomy Center
(ESAC/ESA)
e-mail: Pedro.Osuna@esa.int
Tel + 34 91 8131314
                                                                                
European Space Agency
VILLAFRANCA Satellites Tracking Station
P.O. Box 50727
E-28080 Villafranca del Castillo
MADRID - SPAIN

=============================================================================
	From: 	Martin Hill @ ROE <mch@roe.ac.uk>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Subject: 	Re: [CATALOGUE]Second round seeking people....
Date: 	Mon, 30 Aug 2004 16:45:32 +0100	
My apologies Pedro, I obviously spent too much time replying to your first 
round's technical comments without actually replying to the original question...

Could you stick me in the list too please?  Most of the data we're publishing 
just now is catalogue data (INT-WFS, SuperCOSMOS, 6dF, 2dF and 2MASS).

Thanks!

Martin

Pedro Osuna wrote:

> Dear all,
> 
> 
> after having processed all the mails related to the Catalogue Subgroup
> creation, I have seen only two people showing interest in joining the
> group, Brian Thomas and Elizabeth Auden.
>  
> I have, however, seen a lot of interesting discussions, so I'd again
> insist in having more people joining the group. 
> 
> For the time being, I'll consider the group as being formed by
> Elizabeth, Brian, Jonathan (as head of the general group), Jesus
> (Salgado) and myself, to whom more restricted mails will be sent
> whenever appropriate.
> 
> With respect to all the mails received, they give a lot of meat to
> discuss, as expected from my original mail, so we will start with
> internal discussions and let the rest of the general DM group know with
> a proper "[CATALOGUE]" header as already agreed whenever appropriate.
> 
> 
> Cheers,
> P.
> 

-- 
Martin Hill, Software Engineer
AstroGrid (ROE)
+44 7901 55 24 66
http://www.roe.ac.uk/~mch/


=============================================================================
	From: 	Kirk Borne <borne@rings.gsfc.nasa.gov>
Reply-To: 	Kirk Borne (at George Mason University) <kborne@gmu.edu>
To: 	Pedro.Osuna@sciops.esa.int
Cc: 	Kirk Borne (at George Mason University) <kborne@gmu.edu>
Subject: 	Re: [CATALOGUE]Second round seeking people....
Date: 	Mon, 30 Aug 2004 13:28:26 -0400 (EDT)	
Hello Pedro.  Please include me in the group.  I will try 
to contribute as much as time permits.

- Kirk


> From owner-dm@eso.org  Mon Aug 30 11:48:25 2004
> Date: Mon, 30 Aug 2004 17:26:57 +0200
> From: Pedro Osuna <Pedro.Osuna@sciops.esa.int>
> Subject: [CATALOGUE]Second round seeking people....
> To: dm@ivoa.net
> Cc: Pedro.Osuna@esa.int
> 
> Dear all,
> 
> 
> after having processed all the mails related to the Catalogue Subgroup
> creation, I have seen only two people showing interest in joining the
> group, Brian Thomas and Elizabeth Auden.
>  
> I have, however, seen a lot of interesting discussions, so I'd again
> insist in having more people joining the group. 
> 
> For the time being, I'll consider the group as being formed by
> Elizabeth, Brian, Jonathan (as head of the general group), Jesus
> (Salgado) and myself, to whom more restricted mails will be sent
> whenever appropriate.
> 
> With respect to all the mails received, they give a lot of meat to
> discuss, as expected from my original mail, so we will start with
> internal discussions and let the rest of the general DM group know with
> a proper "[CATALOGUE]" header as already agreed whenever appropriate.
> 
> 
> Cheers,
> P.
> 
> -- 
> Pedro Osuna Alcalaya
> 
>  
> Software Engineer
> European Space Astronomy Center
> (ESAC/ESA)
> e-mail: Pedro.Osuna@esa.int
> Tel + 34 91 8131314
>                                                                                 
> European Space Agency
> VILLAFRANCA Satellites Tracking Station
> P.O. Box 50727
> E-28080 Villafranca del Castillo
> MADRID - SPAIN
=============================================================================
	From: 	Elizabeth Auden <eca@mssl.ucl.ac.uk>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Cc: 	Brian Thomas <thomas@astro.umd.edu>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Pedro.Osuna@esa.int, Jesus.Salgado@sciops.esa.int
Subject: 	Re: [CATALOGUE]Warming up...
Date: 	Mon, 30 Aug 2004 21:46:22 +0100 (BST)	
Hi,

My recent experience with catalogues has mainly been with solar and solar 
terrestrial physics data. I have been working on registering data archives 
in these disciplines with the Astrogrid registry.

> there are only three data models going on: 
> Observation, Quantity and > Spectrum (plus the current Catalogue)).

Where does magnetic data fit into this? There are several solar and STP 
magnetic data sources, including the RAL world data centre's ionosonde 
data or the upcoming Solar Dynamic Observatory's helioseismic magnetic 
imager.

> In a private mail from Jonathan before I sent the mail to the DM, I was
> wondering whether we should model "two-dimensional" catalogues or
> "n-dimensional" ones.

Drawing on experience with RAL Ionosonde data again, the setup for that 
data is a collection of several 2-D tables that are many layers deep. 
Would the cataloguing effort be concerned with how interconnecting tables 
are modelled, or is this a job left for VO workflows and advanced registry 
searches?

> concentrating on standard source type catalogues, but could you all
> please give an opinion on this?

I agree - standard source catalogues are a good starting point, but I'd 
like to see the group work with standard solar and STP catalogues, too. 
Solar event catalogues seem to be mainly 2-D tables, and I'm still getting 
to grips with STP catalogues. A quick google has given me a link for the 
OSSE solar flare catalogue to view as an example: 
http://heseweb.nrl.navy.mil/gamma/solarflare/flarelib.htm

cheers,
Elizabeth
=============================================================================
	From: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Cc: 	Pedro.Osuna@esa.int
Subject: 	First-attempt Catalogue DM for discussion
Date: 	Wed, 01 Sep 2004 16:10:34 +0200	

Dear all,


our small group has increased in two members. 
The list is currentl (in alphabetical order):

Elizabeth Auden
Kirk Borne
Martin Hill
Jonathan McDowell (DM Chairman)
Jesus Salgado 
Brian Thomas

and myself.


I think it is really important that we come to an agreement on _what_
exactly we are trying to model when we talk about the CATALOGUE Data
Model.

If we take as example the many CDS catalogues, they are merely 2
dimensional tables, with rows and columns. They subdivide their
catalogues in categories, somehow arbitrary, in terms of astronomical
"branches". They also have "Tables" coming from publications (again, 2-D
tables).

Another example of flat tables is the pointer that Elizabeth was giving.
That's just a 2-D table giving solar flare information happening in
certain dates with certain conditions. 

Another 2-D example is the 1XMM-Newton Source catalogue.

If what we are after is just the model for a 2-D table, the model as
such would be quite simple. We have done an example Data Model for that
that could serve as a starting internal discussion point. This is the
diagram below called CATALOGUE_DM_UML.jpg. The real data model is only
the upper part. Then, already existing DataModels like the Observation
or Quantity would come naturally into the game.
 
As you can see in this Data Model, a "Catalogue" is formed of
"CatalogueEntry"-ies, which either AstronomicalObject or
AstronomicalEvent extend. A SolarFlare, e.g., would extend an
AstronomicalEvent, and an Observation (could be called
CatalogueObservation and implement the existing Observation DM) would
extend it as well (this would be the case if we want to allow for
"Observation" Catalogues. Otherwise -following Jonathan comments
questioning whether we shoudl discuss these type of catalogues) they
would just not appear there as an extension of an AstronomicalEvent, as
they would NOT in general be "Entry"-ies ever).
 
In this type of model, a Source would just be an AstronomicalObject, and
what we would really have to concentrate on would be the attributes,
etc. of the objects marked in red, i.e., Catalogue, AstronomicalEvent,
AstronomicalObject and Source for the time being.

The Source could be just one of other many objects whose attributes
might be modeled when necessity arises. I guess there should be a
centralized point and procedure where models are being added to the
whole VO machinery. In this case, someone modeling a SolarFlare (i.e.,
identifying its attributes, etc.) would propose that model and then it
would be accepted by the board and become part of the VO general model.
Someone would also like to model the object Galaxy and then we could go
on and on, and models added regularly..... what the position of the VO
in general is with respect to this issue I don't know...

We give an instantiation example of this simple model for the case of
the SolarFlare example catalogue that Elizabeth sent due to its
simplicity in the diagram CATALOGUE_DM_Instantiation_UML.jpg.


In case we are after allowing that an Entry in the Catalogue can be
composed of one or more entries, and then those entries as well can
include more entries, then the modeling effort would get much more
complicated. The problem of allowing, for example, different
Observations which observed the same Source in a catalogue and/or
different sources observed by the same observation (Observation being a
valid Entry) is a tough problem. In the image called
CATALOGUE_DM_Recursive_UML.jpg below, we have added an auto-association
in the entries to reflect that idea. However, how to deal with the
Jekyll/Hyde problem (multi-inheritance) of an Observation being an Entry
and cotaining one or more sources which can be Entry as well would not
be easy to solve.


I would say that for the time being, and again following the
recommendation of the DM Team Leader, we might want to concentrate in
modeling the CATALOGUE in the first option (2-D tables) and define the
AstronomicalObject and Events, etc. as mentioned before. This could give
people an idea of how we want to proceed and then evolve further when
discussions start.

Please send me your comments and ideas in this respect. 
Please don't be too touchy on real UML modeling, as the model is just
only illustrative and does not mean to be rigurous in any aspect. As
soon as we come to an agreement on how to attack the problem, we can
start thinking on doing the things rigorously.

Wait for your comments.

Cheers,
P.

-- 
Pedro Osuna Alcalaya

 
Software Engineer
European Space Astronomy Center
(ESAC/ESA)
e-mail: Pedro.Osuna@esa.int
Tel + 34 91 8131314
                                                                                
European Space Agency
VILLAFRANCA Satellites Tracking Station
P.O. Box 50727
E-28080 Villafranca del Castillo
MADRID - SPAIN

			JPEG image attachment (CATALOGUE_DM_Recursive_UML.jpg)


			JPEG image attachment (CATALOGUE_DM_UML.jpg)


			JPEG image attachment (CATALOGUE_DM_Instantiation_UML.jpg)


=============================================================================
	From: 	Elizabeth Auden <eca@mssl.ucl.ac.uk>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Wed, 01 Sep 2004 15:38:10 +0100 (BST)	
Hi Pedro,

> If what we are after is just the model for a 2-D table, the model as
> such would be quite simple.

I'm just going to send out an email to the Astrogridders and my colleagues 
at MSSL to see if there are any important non-2-D catalogues that we 
should take into consideration while working on the data model.  I'll get 
back to you on this by Friday at the latest.

cheers,
Elizabeth
=============================================================================

	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Subject: 	Re: [CATALOGUE]Second round seeking people....
Date: 	Fri, 03 Sep 2004 09:12:38 -0400	
Pedro,
    I would like to be on this subgroup.
Ed


Pedro Osuna wrote:

>Dear all,
>
>
>after having processed all the mails related to the Catalogue Subgroup
>creation, I have seen only two people showing interest in joining the
>group, Brian Thomas and Elizabeth Auden.
> 
>I have, however, seen a lot of interesting discussions, so I'd again
>insist in having more people joining the group. 
>
>For the time being, I'll consider the group as being formed by
>Elizabeth, Brian, Jonathan (as head of the general group), Jesus
>(Salgado) and myself, to whom more restricted mails will be sent
>whenever appropriate.
>
>With respect to all the mails received, they give a lot of meat to
>discuss, as expected from my original mail, so we will start with
>internal discussions and let the rest of the general DM group know with
>a proper "[CATALOGUE]" header as already agreed whenever appropriate.
>
>
>Cheers,
>P.
>
>  
>

=============================================================================
	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Subject: 	Re: [CATALOGUE]Second round seeking people....
Date: 	Fri, 03 Sep 2004 10:30:35 -0400	
Pedro,
    In the WCS we will need redshift or velocities as well.

    I think the recursion is not so bad because an observation of an 
astroObject leads to multiple
objects at smaller scales and Observation of those objects leads to 
objects on yet smaller scales. 
Therefore one never gets led back to the original object.  So there is 
no circularity.

    Still missing:
    1)  subregions of an object - "NW spiral arm of a galaxy", 
chromosphere of the star,  along the ionizing rim of Orion cloud, etc
    2)  references - A catalog of  papers or articles about kinematics 
within galaxies in the Coma Cluster. 
     3)  pointers to external Observations - One might have a photometry 
table but the observations are not in the same catalog.
    
  I have trouble with calling an Observation a type of astroEvent.  To 
me an astroEvent is something that happens in the universe independent 
of humans: a flare, a supernova, a neutron star - neutron star 
collision.  An observation is something we generate.  Just because both 
happen in time does not make them the same or even related.  I could 
agree that astroEvent and Observation are both types of Events. 

Ed
PS - I do not have everyone's address in the subgroup so would you be 
kind enough to relay this to the rest of the group.


Pedro Osuna wrote:

>Hi Ed,
>
>thanks for joining.
>
>Please have a look at the attached emial I sent to the reduced
>distribution list a couple of days ago.
>
>Cheers,
>P.
>
>On Fri, 2004-09-03 at 15:12, Ed Shaya wrote:
>  
>
>>Pedro,
>>    I would like to be on this subgroup.
>>Ed
>>
>>
>>Pedro Osuna wrote:
>>
>>    
>>
>>>Dear all,
>>>
>>>
>>>after having processed all the mails related to the Catalogue Subgroup
>>>creation, I have seen only two people showing interest in joining the
>>>group, Brian Thomas and Elizabeth Auden.
>>>
>>>I have, however, seen a lot of interesting discussions, so I'd again
>>>insist in having more people joining the group. 
>>>
>>>For the time being, I'll consider the group as being formed by
>>>Elizabeth, Brian, Jonathan (as head of the general group), Jesus
>>>(Salgado) and myself, to whom more restricted mails will be sent
>>>whenever appropriate.
>>>
>>>With respect to all the mails received, they give a lot of meat to
>>>discuss, as expected from my original mail, so we will start with
>>>internal discussions and let the rest of the general DM group know with
>>>a proper "[CATALOGUE]" header as already agreed whenever appropriate.
>>>
>>>
>>>Cheers,
>>>P.
>>>
>>> 
>>>
>>>      
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Subject:
>>> First-attempt Catalogue DM for discussion
>>> From:
>>> Pedro Osuna <Pedro.Osuna@sciops.esa.int>
>>> Date:
>>> Wed, 01 Sep 2004 16:10:34 +0200
>>> To:
>>> Undisclosed-Recipient: ;
>>>
>>> To:
>>> Undisclosed-Recipient: ;
>>> CC:
>>> Pedro.Osuna@esa.int
>>>
>>>
>>>Dear all,
>>>
>>>
>>>our small group has increased in two members. 
>>>The list is currentl (in alphabetical order):
>>>
>>>Elizabeth Auden
>>>Kirk Borne
>>>Martin Hill
>>>Jonathan McDowell (DM Chairman)
>>>Jesus Salgado 
>>>Brian Thomas
>>>
>>>and myself.
>>>
>>>
>>>
>>>I think it is really important that we come to an agreement on _what_
>>>exactly we are trying to model when we talk about the CATALOGUE Data
>>>Model.
>>>
>>>If we take as example the many CDS catalogues, they are merely 2
>>>dimensional tables, with rows and columns. They subdivide their
>>>catalogues in categories, somehow arbitrary, in terms of astronomical
>>>"branches". They also have "Tables" coming from publications (again, 2-D
>>>tables).
>>>
>>>Another example of flat tables is the pointer that Elizabeth was giving.
>>>That's just a 2-D table giving solar flare information happening in
>>>certain dates with certain conditions. 
>>>
>>>Another 2-D example is the 1XMM-Newton Source catalogue.
>>>
>>>If what we are after is just the model for a 2-D table, the model as
>>>such would be quite simple. We have done an example Data Model for that
>>>that could serve as a starting internal discussion point. This is the
>>>diagram below called CATALOGUE_DM_UML.jpg. The real data model is only
>>>the upper part. Then, already existing DataModels like the Observation
>>>or Quantity would come naturally into the game.
>>> 
>>>As you can see in this Data Model, a "Catalogue" is formed of
>>>"CatalogueEntry"-ies, which either AstronomicalObject or
>>>AstronomicalEvent extend. A SolarFlare, e.g., would extend an
>>>AstronomicalEvent, and an Observation (could be called
>>>CatalogueObservation and implement the existing Observation DM) would
>>>extend it as well (this would be the case if we want to allow for
>>>"Observation" Catalogues. Otherwise -following Jonathan comments
>>>questioning whether we shoudl discuss these type of catalogues) they
>>>would just not appear there as an extension of an AstronomicalEvent, as
>>>they would NOT in general be "Entry"-ies ever).
>>> 
>>>In this type of model, a Source would just be an AstronomicalObject, and
>>>what we would really have to concentrate on would be the attributes,
>>>etc. of the objects marked in red, i.e., Catalogue, AstronomicalEvent,
>>>AstronomicalObject and Source for the time being.
>>>
>>>The Source could be just one of other many objects whose attributes
>>>might be modeled when necessity arises. I guess there should be a
>>>centralized point and procedure where models are being added to the
>>>whole VO machinery. In this case, someone modeling a SolarFlare (i.e.,
>>>identifying its attributes, etc.) would propose that model and then it
>>>would be accepted by the board and become part of the VO general model.
>>>Someone would also like to model the object Galaxy and then we could go
>>>on and on, and models added regularly..... what the position of the VO
>>>in general is with respect to this issue I don't know...
>>>
>>>We give an instantiation example of this simple model for the case of
>>>the SolarFlare example catalogue that Elizabeth sent due to its
>>>simplicity in the diagram CATALOGUE_DM_Instantiation_UML.jpg.
>>>
>>>
>>>In case we are after allowing that an Entry in the Catalogue can be
>>>composed of one or more entries, and then those entries as well can
>>>include more entries, then the modeling effort would get much more
>>>complicated. The problem of allowing, for example, different
>>>Observations which observed the same Source in a catalogue and/or
>>>different sources observed by the same observation (Observation being a
>>>valid Entry) is a tough problem. In the image called
>>>CATALOGUE_DM_Recursive_UML.jpg below, we have added an auto-association
>>>in the entries to reflect that idea. However, how to deal with the
>>>Jekyll/Hyde problem (multi-inheritance) of an Observation being an Entry
>>>and cotaining one or more sources which can be Entry as well would not
>>>be easy to solve.
>>>
>>>
>>>I would say that for the time being, and again following the
>>>recommendation of the DM Team Leader, we might want to concentrate in
>>>modeling the CATALOGUE in the first option (2-D tables) and define the
>>>AstronomicalObject and Events, etc. as mentioned before. This could give
>>>people an idea of how we want to proceed and then evolve further when
>>>discussions start.
>>>
>>>Please send me your comments and ideas in this respect. 
>>>Please don't be too touchy on real UML modeling, as the model is just
>>>only illustrative and does not mean to be rigurous in any aspect. As
>>>soon as we come to an agreement on how to attack the problem, we can
>>>start thinking on doing the things rigorously.
>>>
>>>Wait for your comments.
>>>
>>>Cheers,
>>>P.
>>>
>>>      
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
=============================================================================
	From: 	Elizabeth Auden <eca@mssl.ucl.ac.uk>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Cc: 	Martin Hill <mchill@dial.pipex.com>, Brian Thomas <thomas@astro.umd.edu>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>, Pedro.Osuna@esa.int
Subject: 	n-dimensional catalogues
Date: 	Fri, 03 Sep 2004 15:52:25 +0100 (BST)	
Hi all,

A few days ago Pedro discussed basing the DM catalogue model on 2-d tables 
to begin with. I emailed my colleagues at MSSL and on the Astrogrid 
project to see if anyone regularly used catalogues with more than 2 
dimensions in tables. Aside from several responses along the lines of 
"What do you mean by 'catalogue'?" (well, exactly), I received a few 
examples of tables containing members which are either vectors or 
n-dimensional tables.

1. From MSSL:
XID-DB has an oject-oriented structure which means that tables -
for us objects - have members which are themselves tables - objects.
Is is also the case for XCat which is the catalogue of XMM sources.

2. From Astrogrid:
A catalogue of spectral line fluxes in galaxies may fall into that 
category (of n-dimensional tables), whereas for each galaxy one can have 
more like a 2d table associated to a conventional data-cell than a single 
point, eg,

Object:         line    Flux
NGC 1068        Halpha  1.234
                 HBeta   2.345
                 [OIII]  2.347

If we choose not to model these kinds of tables during a first attempt, it 
would be good to make sure our model can evolve to describe such 
catalogues.

cheers,
Elizabeth
=============================================================================

	From: 	Mark Taylor <m.b.taylor@bristol.ac.uk>
To: 	Pedro.Osuna@esa.int
Subject: 	Re: [CATALOGUE]Second round seeking people....
Date: 	Tue, 07 Sep 2004 09:27:45 +0100 (BST)	
> From: owner-dm@eso.org [mailto:owner-dm@eso.org] On Behalf Of Pedro Osuna
> Sent: 30 August 2004 16:27
> To: dm@ivoa.net
> Cc: Pedro.Osuna@esa.int
> Subject: [CATALOGUE]Second round seeking people....
> 
> Dear all,
> 
> 
> after having processed all the mails related to the Catalogue Subgroup
> creation, I have seen only two people showing interest in joining the
> group, Brian Thomas and Elizabeth Auden.
>  
> I have, however, seen a lot of interesting discussions, so I'd again
> insist in having more people joining the group. 
> 
> For the time being, I'll consider the group as being formed by
> Elizabeth, Brian, Jonathan (as head of the general group), Jesus
> (Salgado) and myself, to whom more restricted mails will be sent
> whenever appropriate.
> 
> With respect to all the mails received, they give a lot of meat to
> discuss, as expected from my original mail, so we will start with
> internal discussions and let the rest of the general DM group know with
> a proper "[CATALOGUE]" header as already agreed whenever appropriate.
> 
> 
> Cheers,
> P.

Dear Pedro,

sorry for the delay in replying, I don't normally read the DM list
and this message was brought to my attention by someone else.

If you're agreeable I would like to be in on the discussions of
the Catalog(ue) Subgroup.  My interest is as an author of 
catalogue-handling software (TOPCAT, STIL) - I'm not sure at present
how much contribution I would have to make to catalogue data model
design, but I would at least be interested to know the way that
the discussions are going, and may be able to comment on some of
the software implications.

Thanks, 

Mark Taylor
Starlink project (UK)

-- 
Mark Taylor    Starlink Programmer     Physics,  Bristol University, UK
m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/

=============================================================================
	From: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
To: 	Martin Hill <mchill@dial.pipex.com>, Brian Thomas <thomas@astro.umd.edu>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>, Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>, Mark Taylor <m.b.taylor@bristol.ac.uk>
Cc: 	Pedro.Osuna@esa.int
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Tue, 14 Sep 2004 16:29:11 +0200	
Dear all,

Mark Taylor joined the group late August, so welcome Mark.


After my email on a first attempt for the CAtalogue DM, I only got a
comment from Elizabeth concerning 3-D catalogues and one from Ed (which
I forwarded by that time).

From Elizabeth mail:

[...]1. From MSSL:
XID-DB has an oject-oriented structure which means that tables -
for us objects - have members which are themselves tables - objects.
Is is also the case for XCat which is the catalogue of XMM sources.
[...]

I do not think this is a sound example of an n-D catalogue. A catalogue
is not a set of tables or objects in a database, but a collection of
items in a certain order. How they are organized internally does not
matter. What matters is the final catalogue they give, and for a source
catalogue like their XMM one, a 2-D catalogue can be produced.

For the second example:

[...]A catalogue of spectral line fluxes in galaxies may fall into that
category (of n-dimensional tables), whereas for each galaxy one can have
more like a 2d table associated to a conventional data-cell than a
single point, eg,

Object:         line    Flux
NGC 1068        Halpha  1.234
                 HBeta   2.345
                 [OIII]  2.347
[...]

again, this can be converted to a 2-D table by displaying (which is very
often the case):

Object  Halpha Flux     HBeta Flux      OIII Flux
------  -----------     ----------      ---------
NGC1068 1.234           2.345           3.456
NGC1222 5.432           4.321           3.210


What I meant with n-D catalogues was more in the direction of allowing
an entry to contain one or more entries inside, and that's where the
serious problems appear when dealing with multiple inheritance, etc. 


With respect to Ed's comments, I basically agree with the missing bits
in the WCS (it was just an example from me without trying to put all the
attributes) and the references and pointers to external observations (I
think this last one is implicit in the model already) although I'm not
so sure about the "subregions" of an object.


Anyway, this mail is trying to get some feedback from you all, as I
haven't heard anything back since I sent the first attempt DM.

In particular, I would like Jonathan to give me green light or otherwise
to start working on a first draft (written in the conventional IVOA
format, etc) in the lines that I wrote in the mail.

Wait for your news.

Cheers,
p.


P.S.: group members list:
Elizabeth Auden
Kirk Borne
Martin Hill
Jonathan McDowell (DM chairman)
Jesus Salgado
Mark Taylor
Brian Thomas


On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote:
> Dear all,
> 
> 
> our small group has increased in two members. 
> The list is currentl (in alphabetical order):
> 
> Elizabeth Auden
> Kirk Borne
> Martin Hill
> Jonathan McDowell (DM Chairman)
> Jesus Salgado 
> Brian Thomas
> 
> and myself.
> 
> 
> 
> I think it is really important that we come to an agreement on _what_
> exactly we are trying to model when we talk about the CATALOGUE Data
> Model.
> 
> If we take as example the many CDS catalogues, they are merely 2
> dimensional tables, with rows and columns. They subdivide their
> catalogues in categories, somehow arbitrary, in terms of astronomical
> "branches". They also have "Tables" coming from publications (again, 2-D
> tables).
> 
> Another example of flat tables is the pointer that Elizabeth was giving.
> That's just a 2-D table giving solar flare information happening in
> certain dates with certain conditions. 
> 
> Another 2-D example is the 1XMM-Newton Source catalogue.
> 
> If what we are after is just the model for a 2-D table, the model as
> such would be quite simple. We have done an example Data Model for that
> that could serve as a starting internal discussion point. This is the
> diagram below called CATALOGUE_DM_UML.jpg. The real data model is only
> the upper part. Then, already existing DataModels like the Observation
> or Quantity would come naturally into the game.
>  
> As you can see in this Data Model, a "Catalogue" is formed of
> "CatalogueEntry"-ies, which either AstronomicalObject or
> AstronomicalEvent extend. A SolarFlare, e.g., would extend an
> AstronomicalEvent, and an Observation (could be called
> CatalogueObservation and implement the existing Observation DM) would
> extend it as well (this would be the case if we want to allow for
> "Observation" Catalogues. Otherwise -following Jonathan comments
> questioning whether we shoudl discuss these type of catalogues) they
> would just not appear there as an extension of an AstronomicalEvent, as
> they would NOT in general be "Entry"-ies ever).
>  
> In this type of model, a Source would just be an AstronomicalObject, and
> what we would really have to concentrate on would be the attributes,
> etc. of the objects marked in red, i.e., Catalogue, AstronomicalEvent,
> AstronomicalObject and Source for the time being.
> 
> The Source could be just one of other many objects whose attributes
> might be modeled when necessity arises. I guess there should be a
> centralized point and procedure where models are being added to the
> whole VO machinery. In this case, someone modeling a SolarFlare (i.e.,
> identifying its attributes, etc.) would propose that model and then it
> would be accepted by the board and become part of the VO general model.
> Someone would also like to model the object Galaxy and then we could go
> on and on, and models added regularly..... what the position of the VO
> in general is with respect to this issue I don't know...
> 
> We give an instantiation example of this simple model for the case of
> the SolarFlare example catalogue that Elizabeth sent due to its
> simplicity in the diagram CATALOGUE_DM_Instantiation_UML.jpg.
> 
> 
> In case we are after allowing that an Entry in the Catalogue can be
> composed of one or more entries, and then those entries as well can
> include more entries, then the modeling effort would get much more
> complicated. The problem of allowing, for example, different
> Observations which observed the same Source in a catalogue and/or
> different sources observed by the same observation (Observation being a
> valid Entry) is a tough problem. In the image called
> CATALOGUE_DM_Recursive_UML.jpg below, we have added an auto-association
> in the entries to reflect that idea. However, how to deal with the
> Jekyll/Hyde problem (multi-inheritance) of an Observation being an Entry
> and cotaining one or more sources which can be Entry as well would not
> be easy to solve.
> 
> 
> I would say that for the time being, and again following the
> recommendation of the DM Team Leader, we might want to concentrate in
> modeling the CATALOGUE in the first option (2-D tables) and define the
> AstronomicalObject and Events, etc. as mentioned before. This could give
> people an idea of how we want to proceed and then evolve further when
> discussions start.
> 
> Please send me your comments and ideas in this respect. 
> Please don't be too touchy on real UML modeling, as the model is just
> only illustrative and does not mean to be rigurous in any aspect. As
> soon as we come to an agreement on how to attack the problem, we can
> start thinking on doing the things rigorously.
> 
> Wait for your comments.
> 
> Cheers,
> P.
-- 
Pedro Osuna Alcalaya

 
Software Engineer
European Space Astronomy Center
(ESAC/ESA)
e-mail: Pedro.Osuna@esa.int
Tel + 34 91 8131314
                                                                                
European Space Astronomy Center
European Space Agency
P.O. Box 50727
E-28080 Villafranca del Castillo
MADRID - SPAIN

=============================================================================
	From: 	Mark Taylor <m.b.taylor@bristol.ac.uk>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Wed, 15 Sep 2004 10:13:40 +0100 (BST)	
On Tue, 14 Sep 2004, Pedro Osuna wrote:

> Dear all,
> 
> Mark Taylor joined the group late August, so welcome Mark.
> 
> 
> After my email on a first attempt for the CAtalogue DM, I only got a
> comment from Elizabeth concerning 3-D catalogues and one from Ed (which
> I forwarded by that time).

Pedro,

thanks for including me on the circulation for the catalogue effort
as requested.  I presume that the text part of your 'first attempt' 
is the dicusssion included at the end of this message (originally 
sent 1 September); this references a couple of diagrams 
(CATALOGUE_DM_UML.jpg, CATALOGUE_DM_Instantiation_UML.jpg) that I 
don't have.  Would you be kind enough to send me copies?

Thanks a lot

Mark

-- 
Mark Taylor    Starlink Programmer     Physics,  Bristol University, UK
m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/


=============================================================================
	From: 	Mark Taylor <m.b.taylor@bristol.ac.uk>
To: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
Cc: 	Martin Hill <mchill@dial.pipex.com>, Brian Thomas <thomas@astro.umd.edu>, Jesus Salgado <Jesus.Juan.Salgado@ESA.INT>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>, Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Wed, 15 Sep 2004 19:56:25 +0100 (BST)	
Pedro,

Your comments seem like a good starting point.  In particular:

On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote:

> I would say that for the time being, and again following the
> recommendation of the DM Team Leader, we might want to concentrate in
> modeling the CATALOGUE in the first option (2-D tables) and define the
> AstronomicalObject and Events, etc. as mentioned before. This could give
> people an idea of how we want to proceed and then evolve further when
> discussions start.

I agree with this.  2-d catalogues represent a well-defined structure and
the questions to be answered are relatively clear, as you've outlined.
While there are many situations in which one wants to think in terms
of more complicated data structures than this, I'd say that such
situations fall more into the domain of data processing than of
data modelling.  If a simple and well-defined model for catalogues 
is available, users can operate on them (for instance perform 
various kinds of joins) in customised ways which don't have to be 
codified by the IVOA.  In my opinion, attempting to come up with a 
model which can cope with the various kinds of "n-dimensional" tables 
would risk producing something which is too complicated to be 
implemented and/or too restrictive to be useful.

Mark

-- 
Mark Taylor    Starlink Programmer     Physics,  Bristol University, UK
m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
=============================================================================
	From: 	Clive Davenhall <acd@roe.ac.uk>
To: 	Pedro.Osuna@esa.int
Subject: 	Catalogue subgroup of the IVOA Data Modelling group.
Date: 	Thu, 16 Sep 2004 11:02:38 +0100 (BST)	
16/9/04.

Pedro,

I saw a copy of the message that you circulated a couple of weeks ago
inviting expressions of interest in joining a Catalogue Subgroup of the
IVOA Data Modelling group.  I'd like to become involved in this Catalogue
Subgroup, if this is still possible.  Obviously I only have a limited
amount of time available for this work, but that is always the case.
Maybe I should mention that I've worked on developing astronomical
catalogue software for many years  and have been involved in the VOTable
work.

Maybe you could let me know whether you're still open to new members.

regards,
Clive.

-----------------------------------------------------------------------------
Clive Davenhall                                      Institute for Astronomy,
e-mail (internet, JANET): acd @ roe.ac.uk        Royal Observatory Edinburgh,
fax from within the UK:   0131-668-8416            Blackford Hill, Edinburgh,
fax from overseas:     +44-131-668-8416                    EH9 3HJ, Scotland.

=============================================================================
	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Brian Thomas <thomas@astro.umd.edu>
Cc: 	Mark Taylor <m.b.taylor@bristol.ac.uk>, Pedro Osuna <Pedro.Osuna@sciops.esa.int>, Martin Hill <mchill@dial.pipex.com>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Thu, 16 Sep 2004 10:54:22 -0400	
All,
    Brian said most of what I would also say, but I would add that this 
is not a matter of deciding how many N of an N-dimensional cube to start 
at.  The issue should be whether we want a cube, tree, or directed graph 
(nodes with pointers to other nodes in a random space).   The N-cube 
does not work for the use cases that I gave (before we went into a 
huddle).   And as Brian mentions, that would just be a minor upgrade to 
VOTable.    A  tree is simple, can be  described fairly well  in a 
schema and therefore  has  advantages in query.  It also happens to  be 
the way the universe is structured:
universe
    region, cluster, field
       region, galaxy, QSO, absoption line system (ALS), ICM
          region, stellar cluster, IGM, molecular cloud
             region, stellar system
                region, planet, star, asteroid, comet
                   region, surface, core,  layer
where each line is "contained" by the previous line. 
Another advantage of using such a hierarchy is that a set of  Catalogs 
can be neatly merged into a new larger Catalog.  Or, another way to 
think of this, a query over many Catalogs can be expressed as if it is 
over a single Catalog.  You can call that all processing, I suppose, but 
it sure is easier if the data model supports it!

Directed graph has the advantage of allowing any topological 
connectedness and it is supported by both OWL and Topic Maps.  While I 
am a big fan of such things,  the tree is simpler, closely matches the 
actual relationships between objects in our discipline, and  one can 
support the occasional link between distant objects in a tree with a 
relationship pointer.

Ed

Brian Thomas wrote:

> Hi All,
>
> Been silent recently because of being mobbed at work..but I wanted to 
> throw in my 2 cents on this..
>
>On Wednesday 15 September 2004 06:56 pm, Mark Taylor wrote:
>  
>
>>Pedro,
>>
>>Your comments seem like a good starting point.  In particular:
>>
>>On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote:
>>
>>    
>>
>>>I would say that for the time being, and again following the
>>>recommendation of the DM Team Leader, we might want to concentrate in
>>>modeling the CATALOGUE in the first option (2-D tables) and define the
>>>AstronomicalObject and Events, etc. as mentioned before. This could give
>>>people an idea of how we want to proceed and then evolve further when
>>>discussions start.
>>>      
>>>
>>I agree with this.  2-d catalogues represent a well-defined structure and
>>the questions to be answered are relatively clear, as you've outlined.
>>    
>>
>
> I think we have an opportunity to think larger than standard 2D
> catalogs here. If the goal of this group is to get a standard out quickly,
> then I would be forced to agree that the focus is constrained to 2D
> catalogs (objects are simple collection of properties). IF the goal is to 
> consider longer term effects of what we would like to be able to do with 
> catalogs, then we are short-changing ourselves. 
>
> I don't think that N-Dimensional catalogs are that hard a problem. I (and Ed)
> have a proposal for doing such things already based on the Quantity. And
> this is no amazing ground that we are breaking..I know that former formats 
> exist which do the same as well, to wit NDF (I think) and XDF (and there
> must be others).
>
> Another consideration for going beyond 2D is social/political: A narrow focus 
> on 2D catalogs is essentially a study of astronomical tables. This has been 
> already done by the VOTable people. Are we then to just rubber-stamp VOTable?
> Or appear as if we want to re-invent the wheel?
>
> I think we should have a clear picture of what the catalog standard should
> provide to the VO. What comes to mind are the following:
>
> 1. Standard for transport/exchange
>
> 2. Standard for a "catalog" search across the VO
>
> I've gone a bit out of MDA ordering..but we can form an opinion of whether or 
> not 2D catalogs will be sufficient in light of a variety of use-cases. I imagine
> the above 2 requirements will appear. What use-cases we accept as "needed"
> will say whether or not 2D is the only catalog we wish to have (and I present
> one such 3D use-case below)
>
> Now a few specific replies...
>
>  
>
>>While there are many situations in which one wants to think in terms
>>of more complicated data structures than this, I'd say that such
>>situations fall more into the domain of data processing than of
>>data modelling.  
>>    
>>
>
> I respectfully disagree. Any time you start talking about cataloging
> "objects" which are more complex than a simple collection of scalar
> properties, you have 3+ dimensions to store. And this isn't theoretical
> argument. There is much astronomical data which is better modeled
> as higher dimensional catalogs..for example what about a grism image 
> survey where each spectra is associated with a sky position? Thats clearly 
> 3D in nature (although I allow that  you could "flatten it" to 2D if you liked...
> but thats ugly and makes it harder to design an appropriate search).
>
>  
>
>>If a simple and well-defined model for catalogues  
>>is available, users can operate on them (for instance perform 
>>various kinds of joins) in customised ways which don't have to be 
>>codified by the IVOA.  In my opinion, attempting to come up with a 
>>model which can cope with the various kinds of "n-dimensional" tables 
>>would risk producing something which is too complicated to be 
>>implemented and/or too restrictive to be useful.
>>    
>>
>
> I do agree that we need a simple 2D catalog that will be used in 60% of cases.
> But all that is needed there is to see that the full N-D model can "collapse" to
> the 2D case. If people are interested, I can present a possible model that 
> does this.
>
> Laters,
>
> =b.t.
>
>  
>

=============================================================================
	From: 	Brian Thomas <thomas@astro.umd.edu>
To: 	Mark Taylor <m.b.taylor@bristol.ac.uk>
Cc: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>, Martin Hill <mchill@dial.pipex.com>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>, Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Thu, 16 Sep 2004 16:57:10 +0000	

 Hi All,

 Been silent recently because of being mobbed at work..but I wanted to 
 throw in my 2 cents on this..

On Wednesday 15 September 2004 06:56 pm, Mark Taylor wrote:
> Pedro,
> 
> Your comments seem like a good starting point.  In particular:
> 
> On Wed, 2004-09-01 at 16:10, Pedro Osuna wrote:
> 
> > I would say that for the time being, and again following the
> > recommendation of the DM Team Leader, we might want to concentrate in
> > modeling the CATALOGUE in the first option (2-D tables) and define the
> > AstronomicalObject and Events, etc. as mentioned before. This could give
> > people an idea of how we want to proceed and then evolve further when
> > discussions start.
> 
> I agree with this.  2-d catalogues represent a well-defined structure and
> the questions to be answered are relatively clear, as you've outlined.

 I think we have an opportunity to think larger than standard 2D
 catalogs here. If the goal of this group is to get a standard out quickly,
 then I would be forced to agree that the focus is constrained to 2D
 catalogs (objects are simple collection of properties). IF the goal is to 
 consider longer term effects of what we would like to be able to do with 
 catalogs, then we are short-changing ourselves. 

 I don't think that N-Dimensional catalogs are that hard a problem. I (and Ed)
 have a proposal for doing such things already based on the Quantity. And
 this is no amazing ground that we are breaking..I know that former formats 
 exist which do the same as well, to wit NDF (I think) and XDF (and there
 must be others).

 Another consideration for going beyond 2D is social/political: A narrow focus 
 on 2D catalogs is essentially a study of astronomical tables. This has been 
 already done by the VOTable people. Are we then to just rubber-stamp VOTable?
 Or appear as if we want to re-invent the wheel?

 I think we should have a clear picture of what the catalog standard should
 provide to the VO. What comes to mind are the following:

 1. Standard for transport/exchange

 2. Standard for a "catalog" search across the VO

 I've gone a bit out of MDA ordering..but we can form an opinion of whether or 
 not 2D catalogs will be sufficient in light of a variety of use-cases. I imagine
 the above 2 requirements will appear. What use-cases we accept as "needed"
 will say whether or not 2D is the only catalog we wish to have (and I present
 one such 3D use-case below)

 Now a few specific replies...

> While there are many situations in which one wants to think in terms
> of more complicated data structures than this, I'd say that such
> situations fall more into the domain of data processing than of
> data modelling.  

 I respectfully disagree. Any time you start talking about cataloging
 "objects" which are more complex than a simple collection of scalar
 properties, you have 3+ dimensions to store. And this isn't theoretical
 argument. There is much astronomical data which is better modeled
 as higher dimensional catalogs..for example what about a grism image 
 survey where each spectra is associated with a sky position? Thats clearly 
 3D in nature (although I allow that  you could "flatten it" to 2D if you liked...
 but thats ugly and makes it harder to design an appropriate search).

> If a simple and well-defined model for catalogues  
> is available, users can operate on them (for instance perform 
> various kinds of joins) in customised ways which don't have to be 
> codified by the IVOA.  In my opinion, attempting to come up with a 
> model which can cope with the various kinds of "n-dimensional" tables 
> would risk producing something which is too complicated to be 
> implemented and/or too restrictive to be useful.

 I do agree that we need a simple 2D catalog that will be used in 60% of cases.
 But all that is needed there is to see that the full N-D model can "collapse" to
 the 2D case. If people are interested, I can present a possible model that 
 does this.

 Laters,

 =b.t.

-- 

  * Dr. Brian Thomas 

  * Dept of Astronomy/University of Maryland-College Park 
  * Code 630.1/Goddard Space Flight Center-NASA

  *   fax: (301) 286-1775
  * phone: (301) 286-6128 [GSFC]
           (301) 405-2312 [UMD] 

=============================================================================
	From: 	Mark Taylor <m.b.taylor@bristol.ac.uk>
To: 	Brian Thomas <thomas@astro.umd.edu>
Cc: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>, Martin Hill <mchill@dial.pipex.com>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>, Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Fri, 17 Sep 2004 11:48:10 +0100 (BST)	
Brian,
 
On Thu, 16 Sep 2004, Brian Thomas wrote:
>
> I think we have an opportunity to think larger than standard 2D
> catalogs here. If the goal of this group is to get a standard out quickly,
> then I would be forced to agree that the focus is constrained to 2D
> catalogs (objects are simple collection of properties). IF the goal is to
> consider longer term effects of what we would like to be able to do with
> catalogs, then we are short-changing ourselves.
>
> I don't think that N-Dimensional catalogs are that hard a problem. I (and Ed)
> have a proposal for doing such things already based on the Quantity. And
> this is no amazing ground that we are breaking..I know that former formats
> exist which do the same as well, to wit NDF (I think) and XDF (and there
> must be others).
 
NDF doesn't describe anything like a catalogue or table, it only
describes a single N-dimensional array of primitives, associating
coordinate systems and per-pixel quality flags and error values.
However, I realise that XDF does allow much more flexibility than this.
 
I agree that describing a model which can handle N-dimensional tables
or other flexible data structures is not in itself that difficult.
My concern is that having done so the resulting structure will be
difficult (a) for data centres to implement and (b) for VO clients
to make sense of (of these I think that (b) is the more serious problem).
 
 
> Another consideration for going beyond 2D is social/political: A narrow focus
> on 2D catalogs is essentially a study of astronomical tables. This has been
> already done by the VOTable people. Are we then to just rubber-stamp VOTable?
> Or appear as if we want to re-invent the wheel?
 
VOTable addresses the problem of a standard transport/exchange/storage
format.  It does not address the problem of semantic interpretation
of the data thus represented, although to the casual eye it might look
like it does.  The introduction of the 'utype' attribute in VOTable 1.1
makes this explicit (VOTable 1.1 recommendation sec 4.5).  In order to
gain semantic information from a VOTable (e.g.: what class of physical
object does row #i represent? what is its position on the sky?)
you really need to associate elements of the VOTable with elements
of a data model.  You can have a go at this kind of semantic
interpretation by grubbing around with UCDs and column names, but it
is not a rigorous or reliable way to go about things.
 
So VOTable does stand in need of a data model it can hook up to.
I agree that attempting to answer this need is more a workmanlike task
than a great voyage of discovery, but I don't think that makes it
less worthwhile.
 
> I think we should have a clear picture of what the catalog standard should
> provide to the VO. What comes to mind are the following:
 
Excellent idea!
 
> 1. Standard for transport/exchange
>
> 2. Standard for a "catalog" search across the VO
 
These things are needed, but we don't necessarily have to start from
scratch to provide them.  *If* we go with 2-d-like tables, then I 
believe that VOTable may be most of the answer to (1).  
Clearly (2) has got a lot to do with the VOQL effort.
My conception of what we are supposed to be doing is to provide the 
semantic glue which will permit these things to be able to work.
However, I don't recall seeing a terms of reference or mission statement
or similar for this group, and some of the disagreements may be
as a result of disagreements about our aims.  Pedro: what is the
question that the catalogue-dm subgroup is supposed to be answering?
 
 
>  Now a few specific replies...
>
> > While there are many situations in which one wants to think in terms
> > of more complicated data structures than this, I'd say that such
> > situations fall more into the domain of data processing than of
> > data modelling.
>
>  I respectfully disagree. Any time you start talking about cataloging
>  "objects" which are more complex than a simple collection of scalar
>  properties, you have 3+ dimensions to store. And this isn't theoretical
>  argument. There is much astronomical data which is better modeled
>  as higher dimensional catalogs..for example what about a grism image
>  survey where each spectra is associated with a sky position? Thats clearly
>  3D in nature (although I allow that  you could "flatten it" to 2D if you
> liked...
>  but thats ugly and makes it harder to design an appropriate search).
 
If I understand correctly the structure you're talking about (in its
'flattened' form a 2-d table with cells in some columns containing
a vector of numeric values representing a spectrum) then I'm not sure
I agree that the flattened form is a bad way to deal with it.
VOTable and FITS are quite happy to deal with N-dimensional
arrays of primitives in a cell like this.  As for searching - is a
search on particular pixels of an array of this sort the kind of
thing you'd want to do?  I'd have thought that to search on the
characteristics of a spectrum you would typically need the whole
thing so you could fit lines etc, but perhaps I'm just not familiar
enough with this sort of thing - can you give an example?

Mark

-- 
Mark Taylor    Starlink Programmer     Physics,  Bristol University, UK
m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
=============================================================================

	From: 	Mark Taylor <m.b.taylor@bristol.ac.uk>
To: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Cc: 	Brian Thomas <thomas@astro.umd.edu>, Pedro Osuna <Pedro.Osuna@sciops.esa.int>, Martin Hill <mchill@dial.pipex.com>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Fri, 17 Sep 2004 11:51:34 +0100 (BST)	
Ed,

On Thu, 16 Sep 2004, Ed Shaya wrote:
 
> All,
>     Brian said most of what I would also say, but I would add that this
> is not a matter of deciding how many N of an N-dimensional cube to start
> at.  The issue should be whether we want a cube, tree, or directed graph
> (nodes with pointers to other nodes in a random space).   The N-cube
> does not work for the use cases that I gave (before we went into a
> huddle).   And as Brian mentions, that would just be a minor upgrade to
> VOTable.    A  tree is simple, can be  described fairly well  in a
> schema and therefore  has  advantages in query.  It also happens to  be
> the way the universe is structured:
> universe
>     region, cluster, field
>        region, galaxy, QSO, absoption line system (ALS), ICM
>           region, stellar cluster, IGM, molecular cloud
>              region, stellar system
>                 region, planet, star, asteroid, comet
>                    region, surface, core,  layer
> where each line is "contained" by the previous line.
> Another advantage of using such a hierarchy is that a set of  Catalogs
> can be neatly merged into a new larger Catalog.  Or, another way to
> think of this, a query over many Catalogs can be expressed as if it is
> over a single Catalog.  You can call that all processing, I suppose, but
> it sure is easier if the data model supports it!

It's true that there's a whole load of possible data structures out there
and we'd better think carefully before we settle on one.
 
There are powerful aspects to a tree-like structure, but one important
question is: how do you define what tree you're going to use?
I'm not sure if you're suggesting that we decide on a particular
tree that looks something like the above and hardwire that into the
catalogue data model.  As you say it would make certain sorts of
query very tidy, but people are always going to want some items
which don't fit in well - for instance where do you put an
observed object for which you have provenance, magnitudes and spectra,
but as yet no classification?  The opposite approach is to have the tree
in use defined on a per-catalogue basis, but this loses you a lot
of the interoperability benefits.  Which did you have in mind?

Mark

-- 
Mark Taylor    Starlink Programmer     Physics,  Bristol University, UK
m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
=============================================================================
	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Reply-To: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Data Model IVOA List <dm@ivoa.net>
Subject: 	[Fwd: Re: First-attempt Catalogue DM for discussion]
Date: 	Fri, 17 Sep 2004 11:29:33 -0400	


-------- Original Message --------
Subject:        Re: First-attempt Catalogue DM for discussion
Date:   Fri, 17 Sep 2004 08:55:23 -0400
From:   Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To:     Mark Taylor <m.b.taylor@bristol.ac.uk>
References: 
<Pine.LNX.4.44.0409171148160.10750-100000@andromeda.star.bris.ac.uk>


>There are powerful aspects to a tree-like structure, but one important
>question is: how do you define what tree you're going to use?
>I'm not sure if you're suggesting that we decide on a particular
>tree that looks something like the above and hardwire that into the
>catalogue data model.  As you say it would make certain sorts of
>query very tidy, but people are always going to want some items
>which don't fit in well - for instance where do you put an
>observed object for which you have provenance, magnitudes and spectra,
>but as yet no classification?  The opposite approach is to have the tree
>in use defined on a per-catalogue basis, but this loses you a lot
>of the interoperability benefits.  Which did you have in mind?
>
>Mark
>
>  
>
Both!  If you don't know the classification yet,  they are astroObjects 
which is permitted at any level.  If they are real and you know from 
Lyman limit cutoff that they are beyond cz=8.3
<Universe>
   <region  type="redshiftCut">
      <redshift></unitless><value 
inequality="greaterThan">8.3</value></redshift>
      <astroObject name="A2323"><position...
      <astroObject name="A2324"><position...
      <astroObject name="A2325"><position...
   </region>
</Universe>

If it is simulated and it is some unclassifyable clump of mass points 
within a cluster:
<universe type="N-body" run="879">
   <GalaxyCluster name="326">
      <AstroObject>
   </GalaxyCluster>
</universe>

Besides having Universe as a top level element we need other 
possibilities: universe, DataCenters,  Filters, astroObject, etc.  One 
chooses the top level element depending on what the catalog is about.

By the way, the fact that Universe and universe have different meanings 
in English is one of several reasons why I disagreed with a standards 
that use capitals to indicate an element.

Ed

=============================================================================
	From: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
Reply-To: 	Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>
To: 	Mark Taylor <m.b.taylor@bristol.ac.uk>, Data Model IVOA List <dm@ivoa.net>
Subject: 	Re: First-attempt Catalogue DM for discussion
Date: 	Fri, 17 Sep 2004 11:30:03 -0400	
Mark Taylor wrote:

>
>VOTable addresses the problem of a standard transport/exchange/storage
>format.  It does not address the problem of semantic interpretation
>of the data thus represented, although to the casual eye it might look
>like it does.  The introduction of the 'utype' attribute in VOTable 1.1
>makes this explicit (VOTable 1.1 recommendation sec 4.5).  In order to
>gain semantic information from a VOTable (e.g.: what class of physical
>object does row #i represent? what is its position on the sky?)
>you really need to associate elements of the VOTable with elements
>of a data model.  You can have a go at this kind of semantic
>interpretation by grubbing around with UCDs and column names, but it
>is not a rigorous or reliable way to go about things.
> 
>So VOTable does stand in need of a data model it can hook up to.
>I agree that attempting to answer this need is more a workmanlike task
>than a great voyage of discovery, but I don't think that makes it
>less worthwhile.
>  
>
Mark,
    I agree with you.  We need to discuss semantics as well.  We need to 
decide of UCDs are sufficient, or UCDs plus utypes, or something else 
altogether.  So lets start.
    What do we need to know about the items within a  field/column 
semantically?  What they are to the highest degree of specifity: eg, 
variable giant stars in binary systems.  What their relationship is to 
ID_MAIN: eg, hosts to the planets in ID_MAIN.  What their relationship 
is to other objects in the catalog:  eg components of Deneb and observed 
in image 12.  Is there anything else that is needed?

So a simple table of planets in orbit about components of a binary star 
would normally look like this
ID_MAIN       component         e     <v>  
P1               A             0.1    112
P2               B             0.3     22
P3               A,B          undef     6


How would we put this into Catalog in such a way that it is  machine 
understandable?
With just UCD type mechanism (without actually looking it up in our 
present UCD system, which?) we would get something like this for 
component column:
"stellar; binary component, giant, variable  planetary system; host 
star" (This is a parsing nightmare).
For the e column, we have "orbit; ellipticity"
For the <v> column, we have "orbit; mean velocity"
And we need some sort of link from component to the binary star name 
Deneb (perhaps in an upperl level table) and image 12.
What is missing here (aside from sanity)  is that nowhere does the table 
say that the planets in ID_MAIN orbit the stars in component!  Humans 
comprehend it pretty fast, but a computer would not.

If I understand utype (I am not on any discussion group that has 
discussed utype),  utype tries to provide some additional knowledge 
here.  One provides a model of  gravitional orbit and point from the 
table fields to parts in the model.   I could imagine having an OWL 
description of orbitingSystem: 2 objects in circular motion about each 
other.  A subclass would be a planetaryOrbitingSystem.  It would  
require that atleast one of its components is a planet. The ID_MAIN 
would have utype that indicated these are planets and another pointer to 
indicate component of planetaryOrbitingSystem, while the component would 
point to planetaryOrbitingSystem and binarySystem where binarySystem is 
another subclass of orbitingSystem with two similar components.

Ellipticity could also point to something in the "model".   However, it 
is ambiguous if the ellipticity refers to the orbit of  component A and 
B in Deneb or the planets around their hosts.  Same with mean velocity.

In a Catalog with a tree approach, information can be inserted directly 
into the right locations.  We can know that the velocity applies to the 
planet because it is a child of planet.  If it were the velocity of the 
component star, then it would be a child of  that star.  In the 
following the binary system Deneb and its components would be intoduced 
as ancestors to the planets. 
<binaryStar name="Deneb">
    <component id="A">
       <spect_class>K Giant</spec_class>
       <variable/>
       <planet name="P1">
         <orbit>
            <around>
                <star ref="A"/>
            </around>
            <ellipticity>
                </unitless>
                <value>0.1</value>
            </ellipticity>
            <velocity>
                <unit>km/s</unit>
                <value>112</value>
            </velocity>
        </orbit>
       </planet>
    </component>
    <component id="B">
       <spect_class>K Giant</spec_class>
       <variable/>    
       <planet name="P2">
         <orbit>
            <around>
                <star ref="B"/>
            </around>
            <ellipticity>
                </unitless>
                <value>0.3</value>
            </ellipticity>
            <velocity>
                <unit>km/s</unit>
                <value>22</value>
            </velocity>
         </orbit>
       </planet>
    </component>
    <planet name="P1">
        <orbit>
          <around>
              <star ref="A"/>
              <star ref="B"/>
          </around>
          <ellipticity>
                </unitless>
                <value special="undefined"/>
          </ellipticity>
          <velocity>
            <unit>km/s</unit>
            <value>112</value>
          </velocity>
        </orbit>
       </planet>
</binaryStar>
      
The relationships and containments  are now clear and parsed by standard 
XML tools.  I  introduced an orbit which takes  around, ellipticity , 
and  velocity.   I allow around to take one or more objects to allow  a 
planet to go around A and B although I could have said it goes around 
Deneb.  Perhaps if it is done this way it would mean that it weaves 
between components and the other way it means it just goes completely 
around the system.


=============================================================================

	From: 	Pedro Osuna <Pedro.Osuna@sciops.esa.int>
To: 	Martin Hill <mchill@dial.pipex.com>, Brian Thomas <thomas@astro.umd.edu>, Jesus Salgado <Jesus.Juan.Salgado@esa.int>, Kirk Borne <borne@rings.gsfc.nasa.gov>, Jonathan McDowell <jcm@head.cfa.harvard.edu>, Elizabeth Auden <eca@mssl.ucl.ac.uk>, Ed Shaya <edward.j.shaya.1@gsfc.nasa.gov>, Mark Taylor <m.b.taylor@bristol.ac.uk>, Clive Davenhall <acd@roe.ac.uk>, Mireille Louys <louys@alinda.u-strasbg.fr>
Cc: 	po, js
Subject: 	Latest issues before Poona
Date: 	Mon, 20 Sep 2004 17:12:58 +0200	
Dear all,

Clive Davenhall joined late recently, welcome Clive.

Also, and after a very interesting two-days meeting with CDS people here
in Madrid, I have stubbornly invited Mireille Louys to join the group in
representation of the CDS (in view of the absence of any answer from
them) invitation which she has kindly accepted ;-). Therefore, welcome
Mireille as well.


Our reduced group therefore grows to:

Elizabeth Auden
Kirk Borne
Clive Davenhall
Martin Hill
Mireille Louys
Jonathan McDowell (DM Chairman)
Pedro Osuna (Catalogue subgroup DM chairman)
Jesus Salgado 
Ed Shaya
Mark Taylor
Brian Thomas


I have been asked by Jonathan whether I could present something for the
Poona meeting. I have, as you've probably seen, answered that the main
discussion topic is, as of yet, _what_ we are trying to model.

I believe I will make a presentation with the main discussion topic on
whether we want to model either Source Catalogues for the time being
only (the original intention from Jonathan) or more general catalogues
including N-dimensional ones. I pretend the presentation to be the
"official" birth of the Catalogue Data Model subgroup, as soon as people
start the on-line discussions on what we should and should not be
modeling.

In this respect, I send below our (Jesus Salgado and myself) own ideas
with regards to the 2-D versus N-D issue.

I still believe there is confusion on what we mean by a catalogue. In
our opinion, the n-dimensionality of a catalogue (and this is the sense
I tried to give to it in my initial mails) is in the fact that we allow
entries -in catalogues- to have entries inside as well, i.e., be
catalogues themselves. Also, in allowing different "entry types" to
coexist in the same catalogue.
 
The former approach would pose problems like infinite loops (an entry
can have references inside to a catalogue that might be the same as the
catalogue it belongs to).
The latter would give rise to multi-inheritance problems (a source could
have been observed by several observations, which would therefore appear
as both an "event" (not an astroEvent, as pointed by Ed) and an
AstroObject (through the inheritance of a Source). 

In summary, that type of abnormal behaviour coming from allowing entries
inside entries is the type of -probably wrongly called- N-dimensionality
I was talking about.

Other types of N-Dimensionality can find their place naturally in our
model of a catalogue. For example, in the first attempt data model I
sent, there is certainly N-dimensionality in the sense that, e.g., a
Source makes reference to Observation, StcWcs, and Quantity, which are
other Data Models being worked on. Therefore, other dimensions are
allowed through the entries themselves, but the dimensions are coming
from other Data Models and not the model of the catalogue.

Things like trees or direct data graphs are, in my view, ways to
represent data, rather than data models in themselves. I appreciate that
some Astronomical objects are better represented in a tree like form,
but again this is not the objective of the Catalogue Data model, as we
are not trying to model all the objects in the universe.
In line with this, I already placed my concern in the sense that we will
have to model Astronomical Objects in a one-by-one basis, i.e., we might
start by modelling the Source object (initial idea of the whole of the
CAtalogue DM) and then go ahead to model other objects, like Galaxy, or
Star, etc. Again, the modeling of the whole Universe in this sense might
take a long time, and whether it is worth the effort would have to be
balanced...

Also, I do not believe that VOTable is a Data model for anything, but
just a representation of data. VOTable is not a data model for a table,
is just an agreed way to represent tables. Neither does it give any
information about _the model_ itself, but on how to represent it. Data
model and representation go separated in my view.

It is neither, in my understanding, the intention of the Catalogue Data
Model to be able to give answers to querying problems, as mentioned by
some of you. The Data models are just one part of the overall picture of
the VO, and they will have to be isolated enough from other parts of the
VO that they can be plugged in without friction. In this sense, it will
be up to the VOQL to define _how_ to access resources, and our models
will have to be independent enough to adapt to any way of querying
defined. How effectively will joins work between different catalogues
will not depend, then, on the model but on how well the catalogue has
been structured.


As a summary of our view, I would like to call your attention to a very
easy Use Case proposed by Mireille that I believe is a very good example
of what a Catalogue Data Model might be used for.

This is the crossmatch of two sources found in two different catalogues.
An example of this could be a VO client that wants to be able to call
two different catalogues to check whether the source found by project X
in (RA, DEC) in the infrared corresponds to the source found around
(RA,DEC) in the radio project Y.

In my view, the steps that this tool would have to execute are the
following:

- Contact the "nearest" registry and ask for "Catalogue" resources (this
is the "Registry" part in the first attempt DM I sent. It is part of the
registry data of the Catlogue object itself).

- select the Infrared and Radio catalogues available (this is part of
the registry data of the Catalogue)

- select entries in the catalogue around (RA,DEC) with a certain size
(this is in the Coordinates part of the Sources object)

- from those entries, select the ones above certain likelihood
(this is part of the Quantity data -however this one is represented- of
the Source)

- calculate whatever the algorithm to decide whether the sources
represent the same object (I guess this is up to the client; the
quantity and quality of those algorithms can be varied)

- if there is an image of the source, overlay images from the different
sources in the different wavelengths (this is again at the entry
(Source) level).

- decide if the source emitting in the IR corresponds or not to the one
in the radio (this last one is a human action).

In this specific case, the "N-dimensionality" has come from the Registry
model, the Source model, the Quantity model and the STC model, whereas
we have always been working with only a couple of "2-D" catalogues (in
the sense of being an X-Y collection).


Mireille asked me why in the model I sent, a Galaxy is separated from a
Source, as a Source "could be a Galaxy". In this very first attempt
model, there is a place reserved for "Type" in the Source Object. That
type would be a reference to the Galaxy object, in the case that the
source happened to be a Galaxy. However, in most of the cases there is
not enough information about a Source to know whether it is a Galaxy or
a Quasar or whatever (see, e.g., the 1-XMM Catalogue) and therefore, the
placeholder for Type can be left empty. In case of future better
knowledge of a specific source, data can be added to the Object in
particular which the source happened to be.
On the other hand, and in view of the fact that people wanted to have
all sorts of catalogues, and not only Sources, there could be a
catalogue of Galaxies (e.g., some of the Vizier ones) and in those, the
entries would just be Galaxies and therefore modeled after the Galaxy
object.


As a final summary, I believe that the first attempt data model covers
most of what has been said in all your mails, except for all details
regarding UCDs, Semantics etc., which I believe have to be tackled
later. It covers N-dimensionality through the link to other objects
while keeping the simplicity of modeling "simple X-Y" catalogues (as
opposed to "complex" catalogues of catalogues) (I have intentionally
left out 2-D wording here).


Cheers,
P.


-- 
Pedro Osuna Alcalaya

 
Software Engineer
European Space Astronomy Center
(ESAC/ESA)
e-mail: Pedro.Osuna@esa.int
Tel + 34 91 8131314
                                                                                
European Space Astronomy Center
European Space Agency
P.O. Box 50727
E-28080 Villafranca del Castillo
MADRID - SPAIN

 
=============================================================================


	From: 	Jonathan McDowell <jcm@head.cfa.harvard.edu>
To: 	Pedro.Osuna@sciops.esa.int
Subject: 	Catalog requirements
Date: 	Mon, 20 Sep 2004 11:16:31 -0400 (EDT)	
Pedro,
 I think I forgot to send you some possible requirements from Tony Linde
sent to me in July. Here they are, belatedly.
 Jonathan


> From: "Tony Linde" <ael@star.le.ac.uk>
> To: "'Jonathan McDowell'" <jcm@head.cfa.harvard.edu>
> Cc: "Andy Lawrence" <al@roe.ac.uk>
> Subject: Catalog/Tabular data model
> Date: Mon, 19 Jul 2004 15:55:29 +0100
> MIME-Version: 1.0
> Content-Type: text/plain;
>       charset="us-ascii"
> Content-Transfer-Encoding: 7bit
> X-Mailer: Microsoft Office Outlook, Build 11.0.5510
> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
> Thread-Index: AcRtoG6TQmUG1QnJRW2D6gVsQnvqKA==
> X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
>       head.cfa.harvard.edu
> X-Spam-Level: 
> X-Spam-Status: No, hits=0.8 required=4.5 tests=MSGID_FROM_MTA_HEADER 
>       autolearn=no version=2.61
> 
> Hi Jonathan,
> 
> Andy asked me to send you some requriements for a catalogue/tabular data
> model as needed by Registry and for a prototype DM-based data exchange
> mechanism. 
> 
> I think the focus of this needs to be a small team, defining the
> requirements and working on a first draft, much as you've done in the past.
> I'd suggest you, me and someone from each of NVO, AstroGrid and CDS plus
> whoever is already working on this for your workgroup - what do you think?
> Personally, I think Mireille should be in it because of her work on IDHA or
> am I misreading that? I'm not sure yet who from AstroGrid should come in:
> Martin has done work on this but is rather confrontational; Elizabeth has
> done good work but is more used to Solar area (though that might be
> beneficial).
> 
> The scope of the DM is that it ought to be able to model systems like LEDAS,
> VizieR, SDSS etc.
> 
> Some initial requirements might be:
> 
> 1. implementation agnostic (though I'm most interested in xsd).
> 
> 2. able to model any catalog and tabular based data (and the route by which
> it is known?), from the level of the data centre down to columns in the
> tables, via catalogs and other intermediate representations. 
> 
> 3. able to represent the structure of holdings (to the extent to which this
> is needed for querying etc).
> 
> 4. allows for the modelling of 'associated' metadata such as observation log
> details etc.
> 
> But all these ought be developed by the working party we set up.
> 
> Anyway, let me know if you think this is enough to get started with and who
> we ought to approach to join the party.
> 
> Cheers,
> Tony. 
> 
> __
> Tony Linde                 
> Phone:  +44 (0)116 223 1292    Mobile: +44 (0)7753 603356
> Fax:    +44 (0)116 252 3311    Email:  ael@star.le.ac.uk
> Post:   Department of Physics & Astronomy,
>         University of Leicester
>         Leicester, UK   LE1 7RH
> 
> Project Manager,            Director,
> AstroGrid                   Leicester e-Science Centre
> http://www.astrogrid.org    http://www.e-science.le.ac.uk/
> 
=============================================================================