Contents:

My understanding of UCDs

The way this came up was my question in the plenary UCD session about how we can identify columns within a table uniquely. Basically, the answer was that UCDs will not solve this problem and are not intended to do so. This page will summarise what I now understand as the purpose of UCDs and some of the implications of this.

UCDs as Data Types

Comment was made that UCDs can be considered as data types, so a column in a table has a data type of, POS_EQ_RA, say. I assume that the reasons for having UCDs as data types are to allow:

  • operations on columns: comparison, addition, subtraction, multiplication, etc plus specific astronomical operations
  • conversion between data types: eg converting between equitorial and galactic coordinates

Do we thus need (or already have) some hierarchical structure of the UCDs based on allowable operations? In normal data types, we have numerical types, subdivided by integral and floating point, subdivided by storage size etc; one can add all numerical types but (generally) cannot add a number and a string (without pre-defining what such an addition will do).

Aligned to that: should we define the operations that can be performed on the individual data types (UCDs), the rules for those operations given specific types, and the type resulting from such operations.

UCDs as Keywords

In this context, the UCDs is part of the metadata for a table. It indicates the type of data held in a table, so having POS_EQ_RA identified with a table says that this table includes positional data in equitorial coordinates. That said, maybe the UCD for the table should include POS_EQ instead (since it is unlikely that it'll have RA without DEC).

So the idea of being able to query which resources have POS_EQ* makes sense.

UCDs as Pointers into Data Model

This was a very interesting comment, that UCDs can be seen as a pointer into the data model (DM). How this might be implemented and how feasible it is is still open. I guess there are two potential problem areas:

  • a UCD refers to multiple DM points (classes, objects or whatever they are called)
    this is likely but does indicate areas in which the UCDs are not the lowest level of metadata
  • one DM point is referred to by several UCDs
    if this occurs, it would indicate that the DM requires further analysis

I suspect that, as the DM expands and covers more areas of astronomy, we will need a more efficient version of UCDs that accurately maps to the DM; the current '_' separated textual names will have limited extensibility (even with the additional modifiers agreed at this meeting).

Unique Column Identification

Given that we cannot use UCDs as unique column identifiers, how do we do this?

It seems that the only possible unique identifier for a column in a table is the resourceID of the table (from the Registry) plus the columnName (for explanation of resourceID, see the discussion on this in the Registry mailing list: http://www.ivoa.net/forum/registry/0091.htm and related messages).

So, to summarise the discussion from the plenary session, a query can be sent to a table with either UCDs or column names or a mixture of both. If a UCD is included in a query, the data source can resolve this if there is only one column with that UCD or there are multiple columns but one has the modifier MAIN attached to only one of the column UCDs. Otherwise the query will fail.

Example query

A possible query structure (using xml-ised SQL) would be:

<query> <from> <resourceID asName="cat1"> <authorityID>...</authorityID> <resourceKey>...</resourceKey> </resourceID> <resourceID asName="cat2"> <authorityID>...</authorityID> <resourceKey>...</resourceKey> </resourceID> </from> <select> <field asName="pos-ra" ucd="POS_EQ_RA" /> <field ucd="POS_EQ_DEC"> <useColumn columnName="DEJ2000" inResource="cat2" /> </field> <field ucd="..." /> </select> <where> ... </where> </query>

The asName attribute allows the possibility of referring to an item later in the query structure. The ucd attribute is obvious. The key aspect of the above query is the inclusion of the <useColumn ...> tag within the field tag allowing for identification of columns where the UCD is not unique.

Conclusion

I hope people will provide feedback on the mailing list to these comments. I reiterate that they are only my understanding of what was said and my belief of the implications.

-- TonyLinde - 16 May 2003

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2003-05-16 - TonyLinde
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback