VO data types

A review of the data types defined in the VO specifications.

Specifically looking at the relationships between types, attributes and columns with similar names in different standards and how they relae to each other.

VODataService

The VODataService specification defines an XML schema for describing data collections and the services that access them.

This review refers to version 1.1 (20101202) of the specification.

The data types defined in VODataService are intended to be used to describe the data in VO data sets and the services and protocols used to access them.

DataType element

The DataType XML element is defined in section 3.5 (Data Parameters) of the VODataService specification.

DataType defines the following attributes:

DataType =arraysize

The DataType arraysize attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the arraysize attribute as follows:

  • "The arraysize attribute indicates the parameter is an array of values of the named type."
  • "Its value describes the shape of the array, and the delim attribute may be used to indicate the delimiter that should appear between elements of an array value."
  • "The attribute's presence indicates that parameter holds an array values; the attribute's value indicates the length of the array along each dimension of the multi-dimensional array."

VODataService ArrayShape

The text of the VODataService specification describes the syntax for the arraysize attribute value as follows:

  • "the VOTable arraysize format (vs:ArrayShape): LxMxN..., where each x-delimited positive integer is a length along a dimension of a multi-dimensional array. A single integer indicates a one dimensional array. Instead of an integer, the last length can be set to "*" which indicates a variable length."

Note - The reference to "the VOTable arraysize format (vs:ArrayShape)" should probably be "the vs:ArrayShape format ".

The text of the VODataService specification does not describe the ArrayShape string syntax.

The VODataService XML schema defines the ArrayShape string syntax as follows:

    <!--
      -  this definition is taken from the VOTable arrayDEF type
      -->
    <xs:simpleType  name="ArrayShape">
      <xs:annotation>
        <xs:documentation>
          An expression of a the shape of a multi-dimensional array
          of the form LxNxM... where each value between gives the
          integer length of the array along a dimension.  An
          asterisk (*) as the last dimension of the shape indicates 
          that the length of the last axis is variable or
          undetermined. 
        </xs:documentation>
      </xs:annotation>

      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?"/>
      </xs:restriction>
    </xs:simpleType>

As the comment in the XML schema suggests, the ArrayShape string syntax defined in the VODataService schema is similar to, but not explicitly linked to, the arrayDEF string format defined in the VOTable specification.

The ArrayShape string syntax is used in several places in the VODataService XML schema to define the content of arraysize attributes on elements derived from DataType, including VOTableType and TAPType.

The ArrayShape string syntax is not used in any of the other VO specifications.

DataType =delim

The DataType delim attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the delim attribute as follows:

  • "the string that is used to delimit element of an array value when arraysize is not "1""

The specification text does not define a default value for the delim attribute.

The specification text encourages applications to allow optional spaces before and after the delimiter (e.g. "1, 5" when delim=",").

The XML schema defines a default value as a single white space " ".

    <xs:attribute name="delim" type="xs:string" default=" ">

The comments in the XML schema specification encourages applications to allow optional spaces before and after the delimiter (e.g. "1, 5" when delim=","), but that is not encoded in the XML schema itself.

The delim attribute is not referred to by any of the other VO specifications.

So far, the examples we have found in the other VO specifications all use white space as the delimiter:

  • The VOTable TABLEDATA serialization for arrays of numeric values explicity uses white space as the delimiter.
  • The VOTable TABLEDATA serialization for floatComplex and doubleComplex explicity uses white space as the delimiter.

  • POINT
  • POLYGON

DataType =extendedType

The DataType extendedType attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the extendedType attribute as follows:

  • "The data value represented by this type can be interpreted as of a custom type identified by the value of this attribute. "
  • "The name implies a particular expected format for the data value that can be parsed into a value in memory."
  • " If an application does not recognize this extendedType, it should attempt to handle value assuming the type given by the element's value. "string" (or its equivalent) is a recommended default type."
  • " This element may make use of the extendedSchema attribute and/or any arbitrary (qualified) attribute to refine the identification of the type. "

Looking at the body of standards as a whole, we assume that the extendedType attribute is functionally equivalent to the xtype attribute defined in the something specification.

However, as far as we can tell, this is not explicitly stated anywhere, and there in no mapping defined between the (extendedType | extendedSchema) attribute pair defined in VODataService and the (xtype with a prefix) attribute defined in the something specification.

The VODataService specification does not provide an example of how the extendedType attribute could be used.

The extendedType attribute is not referred to in any of the other VO specifications.

DataType =extendedSchema

The DataType extendedType attribute is defined in section 3.5 (Data Parameters) of the VODataService specification.

The specification text describes the extendedType attribute as follows:

  • "An identifier for the schema that the value given by the extended attribute is drawn from."

The VODataService specification does not provide an example of how the extendedSchema attribute could be used.

The extendedSchema attribute is not referred to in any of the other VO specifications.

TableDataType element

The TableDataType XML element is defined in section 3.5.3 (Table Column Data Types) of the VODataService specification.

TableDataType extends DataType.

The comment in the XML schema describe TableDataType as:

  • "an abstract parent for a class of data types that can be used to specify the data type of a table column."

VOTableType element

The VOTableType XML element is defined in section 3.5.3 (Table Column Data Types) of the VODataService specification.

VOTableType inherits the following attributes from DataType:

VOTableType defines the following set of allowed values:

  • boolean
  • bit
  • unsignedByte
  • short
  • int
  • long
  • char
  • unicodeChar
  • float
  • double
  • floatComplex
  • doubleComplex

The specification text describes VOTableType as follows :

  • "data types that correspond to the parameter and column types defined in the VOTable schema"

The XML schema comments describe VOTableType as follows :

  • "a data type supported explicitly by the VOTable format".

The definition of VOTableType does not provide any further details about the sizes, ranges or content of the data types. It is left to the reader to refer to the VOTable specification for details about the data types.

Note - the bibliography reference to the VOTable specification explicitly refers to version 1.2 (20091130) of the specification, this has since been superceded by version 1.3 (20130920).

The definition of VOTableType states that string values of arbitrary length are represented by a data type of char with arraysize="*".

In order to support strings with unicode characters it may be clearer to explicitly state ASCII strings should be represented by a data type of char with arraysize="*" and Unicode strings should be represented by a data type of unicodeChar and arraysize="*".

TAPDataType

The specification text does not describe the TAPDataType element directly.

The XML schema comments describe TAPDataType as follows:.

  • "an abstract parent for the specific data types supported by the Table Access Protocol"

The TAPDataType element defines the following attributes:

Note - the TAPDataType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for backward compatibility the XML element name has not been changed.

TAPType

The TAPType XML element is defined in section 3.5.3 (Table Column Data Types) of the VODataService specification.

TAPType inherits the following attributes from DataType:

TAPType inherits the following attributes from TAPDataType:

TAPType defines the following set of allowed values:

  • BOOLEAN
  • SMALLINT
  • INTEGER
  • BIGINT
  • REAL
  • DOUBLE
  • TIMESTAMP
  • CHAR
  • VARCHAR
  • BINARY
  • VARBINARY
  • POINT
  • REGION
  • CLOB
  • BLOB

The specification text describes TAPType as follows :

  • "data types that correspond column types defined in the Table Access Protocol (v1.0) [TAP]"

The explicit reference to version 1.0 of the TAP specification is no longer valid.

The TAPType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for compatibility reasons, the XML element name has not been changed.

The definition of TAPType does not provide any further details about the sizes, ranges or content of the data types. It is left to the reader to refer to the TAP (now ADQL) specification for details about the data types.

The text at the end the section refers to a mapping between TAP_SCHEMA types and [[#Votable][VOTable] types in the TAP specification.

  • "Note that the TAP standard [TAP] defines an explicit mapping between TAP_SCHEMA types and VOTable types."

This mapping is no longer part of the TAP specification.

The definition of TAPType states that string values should be represented by a data type of VARCHAR, the definition does not say whether this should be accompanied by a =size or arraysize attribute.

Note - the TAPType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for backward compatibility the XML element name has not been changed.

TAPType =size

The size attribute is described as an attribute of the TAPType element in section 3.5.3 (Table Column Data Types) of the VODataService specification.

However, technically, in the XML schema size is an attribute of the abstract TAPDataType parent element, which is then inherited by TAPType.

The VODataService specification describes the size attribute as follows:

  • "The length of the variable-length data type."
  • "In the context of TAP, this attribute is only meaning when the data type is CHAR or BINARY; see discussion below."

This restriction seems to imply that CHAR and BINARY values have an inherent 'size' property, and are not treated as arrays of values, which have a different 'arraysize' property.

In the discussion that follows, the VODataService specification gives two examples which are equivalent:

 
    <dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
and
 
    <dataType xsi:type="vs:TAPType"> VARCHAR </dataType>

A third example describes a fixed length string, using the size rather than the arraysize attribute

 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>

However, the VODataService specification does not explicitly explain the difference (if any) between

 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
and
 
    <dataType xsi:type="vs:TAPType" arraysize="8" > CHAR </dataType>

This distinction between CHAR, VARCHAR and BINARY values with a 'size' property, and arrays of numeric values with an 'arraysize' property are possibly left over from previous versions of the VO specifications.

The documentation element in the XML schema for TAPDataType describes the size attribute as follows:

  • "This corresponds to the size Column attribute in the TAP_SCHEMA and can be used with data types that are defined with a length (CHAR, BINARY)."

This establishes a reference link from VODataService TAPDataType to TAP_SCHEMA.columns in the TAP specification.

In the TAP specification the corresponding size column is described as :

  • "retained for backwards compatibility to TAP-1.0"

The original text in version 1.0 of the TAP specification describes the size column as follows :

  • "The “size” gives the length of variable length datatypes, for example varchar(256);"

Neither version of the TAP specification contain a reference from the size column back to TAPDataType in the VODataService specification.

The size attribute is not referred to by any of the other VO specifications.

VOTable

The VOTable specification defines an XML based serialization format for exchanging tabular data within the VO.

VOTableTypes

Section 2.1 (Primitives) of the VOTable specification defines the following data types and their corresponding FITS data type and size in bytes:

datatype Meaning FITS Bytes
boolean Logical L 1
bit Bit X *
unsignedByte Byte (0 to 255) B 1
short Short Integer I 2
int Integer J 4
long Long integer K 8
char ASCII Character A 1
unicodeChar Unicode Character   2
float Floating point E 4
double Double D 8
floatComplex Float Complex C 8
doubleComplex Double Complex M 16

Section 6 (Definitions of Primitive Datatypes) of the VOTable specification describes the representation of these primitives in the BINARY, BINARY2 and TABLEDATA serializations.

VOTable =boolean

VOTable =bit

VOTable =unsignedByte

VOTable =short

VOTable =int

VOTable =long

VOTable =char

VOTable =float

VOTable =double

VOTable =unicodeChar

The description for the BINARY serialization of unicodeChar defines it as a Unicode (UCS-2) fixed width 2-byte character.

  • "Each Unicode character is represented in the BINARY/BINARY2 serialization by two bytes, using the big-endian UCS-2 encoding (ISO-10646-UCS-2)"

The UCS-2 character set includes all of the characters in the Basic Multilingual Plane (BMP), which contains characters for almost all modern languages.

The description for the TABLEDATA serialization includes an example showing how a unicodeChar that is outside the ASCII character set can be represented in an XML document by using a numeric character reference (NCR).

  • "The representation of a Unicode character in the TABLEDATA serialization follows the XML specifications, and e.g. the Cyrillic uppercase ``Ya'' can be written Я in UTF-8."

The reference to UTF-8 in the description of the TABLEDATA serialization may be misleading, because a UTF-8 XML document can contain the multi-byte Cyrillic uppercase ``Ya'' character, Я, shown in the example as-is, without needing to use a numeric character reference.

Declaring a UTF-8 encoding for a VOTable XML document containing TABLEDATA data may also be problematic,

    <?xml version=“1.0” encoding=“utf-8”?>
as this would mean the XML document would be able to contain characters that are beyond the range of the UCS-2 fixed-width character set.

Note; since 2005 it is no longer possible to encode all of the mandatory components defined in the [[https://en.wikipedia.org/wiki/GB_18030#As_a_national_standard][official character set of the People's Republic of China, (GB 18030-2005)] in a fixed width 2 byte character set. In addition, as of May 1, 2006, support for the GB 18030-2005 character set is officially required for all software products sold in the PRC.

VOTable =floatComplex

The description for the BINARY serialization of floatComplex defines it as a pair of 32-bit, single precision, floating point numbers.

  • "a sequence of pairs of 32-bit single precision floating point numbers in big-endian order"

The description for the TABLEDATA serialization of floatComplex defines it as a pair of floating point numbers separated by white space.

  • "two representations of a Single Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"

Note that this effectively fixes the delimter for the TABLEDATA serialization to white space, regardless of the delim attribute set by the VODataService description of the source data table.

VOTable =doubleComplex

The description for the BINARY serialization of doubleComplex defines it as a pair of 64-bit, double precision, floating point numbers.

  • "a sequence of pairs of 64-bit double precision floating point numbers in big-endian order"

The description for the TABLEDATA serialization of floatComplex defines it as a pair of floating point numbers separated by white space.

  • "two representations of a Double Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"

Note that this effectively fixes the delimter for the TABLEDATA serialization to white space, regardless of the delim attribute set by the VODataService description of the source data table.

VOTableArrays

The VOTable specification and schema include an arraysize attribute, but not a delim attribute.

Section 2.2 of the VOTable specification uses a number of examples to show how a combination of datatype and arraysize attributes can be used to describe arrays of values in the metadata for a FIELD.

Section 5.1 of the VOTable specification describes the TABLEDATA serialization of arrays as follows:

  • "If a cell contains an array of numbers or a complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings (declared in the corresponding FIELD as an array of char or unicodeChar datatype), no separator should exist."

It uses the following example to illustrate the difference between arrays of numbers and arrays of characters:

    <TABLE>
      <FIELD name="aString" datatype="char" arraysize="10"/>
      <FIELD name="aShort"  datatype="short"/>
      <FIELD name="varInts" datatype="int"  arraysize="*"/>
      <FIELD name="Floats"  datatype="float"arraysize="3"/>
      <DATA><TABLEDATA>
        <TR> <TD>Apple</TD>  <TD/>       <TD>1 2 4 8 16</TD> <TD>1.62 4.56 3.44</TD> </TR>
        <TR> <TD>Orange</TD> <TD>15</TD> <TD>23 -11 9</TD>   <TD>2.33 4.66 9.53</TD> </TR>
      </TABLEDATA></DATA>
    </TABLE>

VOTable =arraysize

The text of the VOTable specification does not explicitly define the arraysize attribute.

The text of the VOTable specification does not link the VOTable arraysize attribute with the DataType =arraysize attribute defined in the VODataService specification.

VOTable =arrayDEF

The text of the VOTable specification does not explicitly define the format of the arraysize attribute value.

The VOTable XML schema defines the arrayDEF string syntax as follows:

    <xs:simpleType  name="arrayDEF">
      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
      </xs:restriction>
    </xs:simpleType>

However, the arrayDEF string syntax is not used in the definition of the arraysize attribute

    <xs:complexType name="Field">
      ....
      <xs:attribute name="arraysize" type="xs:string"/>
      ....
    </xs:complexType>

The only reference to the VOTable arrayDEF string syntax in the other VO specifications is a comment in the definition of the ArrayShape in the VODataService schema.

The text of the VOTable specification does not link the VOTable arrayDEF string syntax with the ArrayShape string syntax defined in the VODataService schema.

The arrayDEF string syntax is not used anywhere in VOTable XML schema.

The arrayDEF string syntax is not used in any of the other VO specifications.

DALI

#DALI

The DALI specification defines ...

TAP

The TAP specification defines ...

ADQL

#ADQL

The ADQL specification defines ...

xtype

#xtype

The xtype attribute is defined in ...

The xtype attribute is referred to in ...

TAP_SCHEMA

The TAP_SCHEMA tables are defined in ...

The TAP_SCHEMA tables are referred in ...

Edit | Attach | Watch | Print version | History: r19 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2017-08-09 - DaveMorris
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback