With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords"> <feature> <form>VECTORMATH</form> <description>You can compute with vectors here. See https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath for an overview of the functions and operators available. </description> </feature> </languageFeatures>

To access an element of a vector, write

, where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.
**[element-index]**

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

- vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
- vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
- vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
- vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
- scal*vec and vec*scal is the scalar multiplication of a vector.
- vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.

- arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).

- arr_avg(arr) returns the arithmetic mean of arr's elements
- arr_max(arr) returns the largest element of arr
- arr_min(arr) returns the smallest element of arr
- arr_sum(arr) returns the sum of arr's elements
- arr_count(arr) returns the number of elements in the array (the “array length”, where NaN elements count). This is always an integer, regardless of the array type.

The following standard ADQL aggregate functions, applied to arrays, work component-wise:

- AVG
- MIN
- MAX
- SUM

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a **lot** more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.

Test cases for implementors can be derived from sqlarraytest.py.

Topic revision: r10 - 2023-09-05 - MarkusDemleitner

**IVOA.net**

Wiki Home

WebChanges

WebTopicList

WebStatistics

**Twiki Meta & Help**

IVOA

Know

Main

Sandbox

TWiki

TWiki intro

TWiki tutorial

User registration

Notify me

**Working Groups**

**Interest Groups**

- Data Curation
- Education
- Knowledge Discovery
- Operations
- Radio Astronomy
- Solar System
- Theory
- Time Domain

**Committees**

Copyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback