VO data types

A review of the data types defined in the VO specifications.

Specifically looking at the relationships between types, attributes and columns with similar names in different standards and how they relae to each other.

VODataService

The VODataService specification defines an XML schema for describing data collections and the services that access them.

This review refers to version 1.1 (20101202) of the specification.

The data types defined in VODataService are intended to be used to describe the data in VO data sets and the services and protocols used to access them.

DataType

DataType is a XML element defined in the VODataService specification that is used as a common base class for defining data types.

DataType defines the following attributes:

DataType =arraysize

The DataType arraysize attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the arraysize attribute as follows:

  • "The arraysize attribute indicates the parameter is an array of values of the named type."
  • "the VOTable arraysize format (vs:ArrayShape): LxMxN..., where each x-delimited positive integer is a length along a dimension of a multi-dimensional array. A single integer indicates a one dimensional array. Instead of an integer, the last length can be set to "*" which indicates a variable length."
  • "The attribute's presence indicates that parameter holds an array values; the attribute's value indicates the length of the array along each dimension of the multi-dimensional array."

delim

The DataType delim attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the delim attribute as follows:

  • "the string that is used to delimit element of an array value when arraysize is not "1""

The specification text does not define a default value for the delim attribute.

The specification text encourages applications to allow optional spaces before and after the delimiter (e.g. "1, 5" when delim=",").

The XML schema defines a default value as a single white space " ".

    <xs:attribute name="delim" type="xs:string" default=" ">

The comments in the XML schema specification encourages applications to allow optional spaces before and after the delimiter (e.g. "1, 5" when delim=",").

The XML schema itself does not attempt to encode that in XML schema notation.

All of the examples we have found in the VO specifications use white space as the delimiter:

  • VOTable uses space as the delimiter for arrays of numeric values.
  • POINT
  • POLYGON

The delim attribute is not referred to by any of the other VO specifications.

extendedType

The DataType extendedType attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the extendedType attribute as follows:

  • "The data value represented by this type can be interpreted as of a custom type identified by the value of this attribute. "
  • "The name implies a particular expected format for the data value that can be parsed into a value in memory."
  • " If an application does not recognize this extendedType, it should attempt to handle value assuming the type given by the element's value. "string" (or its equivalent) is a recommended default type."
  • " This element may make use of the extendedSchema attribute and/or any arbitrary (qualified) attribute to refine the identification of the type. "

Looking at the body of standards as a whole, we assume that the extendedType attribute is functionally equivalent to the xtype attribute defined in the something specification.

However, as far as we can tell, this is not explicitly stated anywhere, and there in no mapping defined between the extendedType | extendedSchema attribute pair defined in VODataService and the xtype attribute with a prefix defined in the something specification.

The VODataService specification does not provide an example of how the extendedType attribute could be used.

The extendedType attribute is not referred to by any of the other VO specifications.

extendedSchema

The DataType extendedType attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the extendedType attribute as follows:

  • "An identifier for the schema that the value given by the extended attribute is drawn from."

The VODataService specification does not provide an example of how the extendedSchema attribute could be used.

The extendedSchema attribute is not referred to by any of the other VO specifications.

TableDataType

TableDataType is a XML element defined in the VODataService specification that is used as a common base class for defining data types for table columns.

TableDataType extends DataType.

VOTableType

VOTableType is a XML element defined in the VODataService specification that describes data types defined in the VOTable specification.

VOTableType inherits the following attributes from DataType:

VOTableType defines the following set of allowed values:

  • boolean
  • bit
  • unsignedByte
  • short
  • int
  • long
  • char
  • unicodeChar
  • float
  • double
  • floatComplex
  • doubleComplex

Notes:

VOTableType is described in section 3.5.3 of the specification.

In the VODataService specification, VOTableType is described as "data types that correspond to the parameter and column types defined in the VOTable schema"

In the VODataService XML schema, VOTableType is described as "a data type supported explicitly by the VOTable format".

The definition of VOTableType does not provide any further details about the sizes, ranges or content of the data types. It is left to the reader to refer to the VOTable specification for details about the data types.

In the bibliography the reference to the VOTable specification explicitly refers to version 1.2 (20091130) of the specification, this has since been superceded by version 1.3 (20130920).

The definition of VOTableType states that string values of arbitrary length are represented by a data type of char with arraysize="*". This excludes the option of using unicodeChar as the data type with arraysize="*". It may be clearer to explicitly state ASCII strings are represented by char with arraysize="*" and unicode strings are represented by unicodeChar and arraysize="*".

TAPDataType

TAPDataType is a XML element defined in the VODataService specification that describes a base class for data types defined in the TAP ADQL specification.

TAPDataType defines the following attributes:

Notes:

The XML element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for compatibility reasons, the XML element name has not been changed.

size

The size attribute is defined as an attribute of the TAPDataType XML element in the VODataService specification.

The VODataService specification describes the size attribute as follows:

  • "The length of the variable-length data type."
  • "In the context of TAP, this attribute is only meaning when the data type is CHAR or BINARY; see discussion below."

This restriction seems to imply that CHAR and BINARY values have an inherent 'size' property, and are not treated as arrays of values, which have a different 'arraysize' property.

In the discussion that follows, the VODataService specification gives two examples which are equivalent:

 
    <dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
and
 
    <dataType xsi:type="vs:TAPType"> VARCHAR </dataType>
and a third example that describes a fixed length string, using the size rather than the arraysize attribute:
 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>

However, the VODataService specification does not explicitly explain the difference (if any) between:

 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
and
 
    <dataType xsi:type="vs:TAPType" arraysize="8" > CHAR </dataType>

This distinction between CHAR, VARCHAR and BINARY values with a 'size' property, and arrays of numeric values with an 'arraysize' property are possibly left over from previous versions of the VO specifications.

The documentation element in the XML schema for TAPDataType describes the size attribute as follows:

  • "This corresponds to the size Column attribute in the TAP_SCHEMA and can be used with data types that are defined with a length (CHAR, BINARY)."

This establishes a 'forward' link from TAPDataType in the VODataService specification to TAP_SCHEMA.columns in the TAP specification.

The TAP_SCHEMA.columns table contains a size column. The text in the current working draft of the TAP specification describes this column as "retained for backwards compatibility to TAP-1.0".

The original text in version 1.0 of the TAP specification describes the size column as follows :

  • "The “size” gives the length of variable length datatypes, for example varchar(256);"

Neither version of the TAP specification contain a 'backward' link between the TAPDataType size attribute and the size column in the TAP_SCHEMA.columns table.

The size attribute is not referred to by any of the other VO specifications.

TAPType

TAPType is a XML element defined in the VODataService specification that describes data types defined in the TAP ADQL specification.

TAPType inherits the following attributes from DataType:

TAPType inherits the following attributes from TAPDataType:

TAPType defines the following set of allowed values:

  • BOOLEAN
  • SMALLINT
  • INTEGER
  • BIGINT
  • REAL
  • DOUBLE
  • TIMESTAMP
  • CHAR
  • VARCHAR
  • BINARY
  • VARBINARY
  • POINT
  • REGION
  • CLOB
  • BLOB

Notes:

TAPType is described in section 3.5.3 of the specification.

The XML element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for compatibility reasons, the XML element name has not been changed.

The definition of TAPType does not provide any further details about the sizes, ranges or content of the data types.

It is left to the reader to refer to the TAP ADQL specification for details about the data types.

The text at the end of section 3.5.3 on Table Column Data Types refers to a mapping between TAP_SCHEMA types and [[#Votable][VOTable] types in the TAP specification.

"Note that the TAP standard [TAP] defines an explicit mapping between TAP_SCHEMA types and VOTable types."

This mapping is no longer part of the TAP specification.

The definition of TAPType states that string values should be represented by a data type of VARCHAR, the definition does not say whether this should be accompanied by a =size or arraysize attribute.

VOTable

#VOTable

The VOTable specification defines a common data exchange format for tabular data within the VO.

VOTableTypes

VOTableArrays

The VOTable specification and XML schema includes an arraysize attribute, but it does not include a delim attribute.

Section 2.2 of the VOTable specification describes arrays of values using the arraysize attribute. However, it does not mention anything about a delimiter.

Section 5.1 of the VOTable specification describes the TABLEDATA serialization of arrays as follows:

"If a cell contains an array of numbers or a complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings (declared in the corresponding FIELD as an array of char or unicodeChar datatype), no separator should exist."

It uses the following example to illustrate this:

    <TABLE>
      <FIELD name="aString" datatype="char" arraysize="10"/>
      <FIELD name="aShort"  datatype="short"/>
      <FIELD name="varInts" datatype="int"  arraysize="*"/>
      <FIELD name="Floats"  datatype="float"arraysize="3"/>
      <DATA><TABLEDATA>
        <TR> <TD>Apple</TD>  <TD/>       <TD>1 2 4 8 16</TD> <TD>1.62 4.56 3.44</TD> </TR>
        <TR> <TD>Orange</TD> <TD>15</TD> <TD>23 -11 9</TD>   <TD>2.33 4.66 9.53</TD> </TR>
      </TABLEDATA></DATA>
    </TABLE>

DALI

#DALI

The DALI specification defines ...

TAP

The TAP specification defines ...

ADQL

#ADQL

The ADQL specification defines ...

xtype

#xtype

The xtype attribute is defined in ...

The xtype attribute is referred to in ...

TAP_SCHEMA

The TAP_SCHEMA tables are defined in ...

The TAP_SCHEMA tables are referred in ...

Proposed changes

Mark the VODataService size as deprecated and update documentation to reflect this.

Edit | Attach | Watch | Print version | History: r19 | r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2017-08-07 - DaveMorris
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback