International Virtual Observatory Alliance

Difference: ADQLVectorMath (1 vs. 13)

Revision 132024-11-28 - TamaraCivera

META TOPICPARENT	name="ADQL"

A proposal for vector math in ADQL

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

Deleted:

<
<

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>

Element Access

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

It is also possible to access to multiple elements of a vector (as a sub-array). To access, write [lower-bound:upper-bound], where lower and upper bounds are integer-valued expressions and are both included. Lower-bound must be equal or greater to 1 due to indexes in ADQL are 1-based and if upper-bound is greater to the lenght of the array not error is going to be returned.

Basic Math

vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
scal*vec and vec*scal is the scalar multiplication of a vector.
vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.

Vector computations

arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

Array Aggregation

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).

arr_avg(arr) returns the arithmetic mean of arr's elements
arr_max(arr) returns the largest element of arr
arr_min(arr) returns the smallest element of arr
arr_sum(arr) returns the sum of arr's elements
arr_count(arr) returns the number of elements in the array (the “array length”, where NaN elements count). This is always an integer, regardless of the array type.

Aggregate Functions for Arrays

The following standard ADQL aggregate functions, applied to arrays, work component-wise:

When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].

Array Map

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.

Other array functions

Other functions related with vectors are:

arr_contains(element, arr) returns 1 if the element is in the array or otherwise 0 (implemented at CEFCA).
arr_contains_null(element) returns 1 if the array has 1 or more elements with null value or otherwise 0 (implemented at CEFCA).

Implementation Status

Changed:

<
< The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra. In this implementation, array multiple access (sub-arrays) and function arr_in are not implemented.

>
> The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra. In this implementation, array multiple access (sub-arrays) and functions arr_contains and arr_contains_null are not implemented.

Test cases for implementors can be derived from sqlarraytest.py.

This vector proposal has been also implemented at CEFCA as a part of the CEFCA Catalogues Portal. The functionality can be tried at the TAP service of the different projects/data releases offered.

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 122024-11-27 - TamaraCivera

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:
- META TOPICPARENT
+  name="ADQL"
->
>
 <languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

It is also possible to access to multiple elements of a vector (as a sub-array). To access, write [lower-bound:upper-bound], where lower and upper bounds are integer-valued expressions and are both included. Lower-bound must be equal or greater to 1 due to indexes in ADQL are 1-based and if upper-bound is greater to the lenght of the array not error is going to be returned.
 Basic Math 
 

 vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
 

 Vector computations 
 

 arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
 

 Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
  arr_sum(arr) returns the sum of arr's elements
  arr_count(arr) returns the number of elements in the array (the “array length”, where NaN elements count). This is always an integer, regardless of the array type.
 

 Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
 

 AVG
  MIN
  MAX
  SUM
 
When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].

 Array Map 

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.
 Other array functions 

Other functions related with vectors are:
-<
<
+ arr_in(element, arr) returns true if the element is in the array or otherwise false.
->
>
+ arr_contains(element, arr) returns 1 if the element is in the array or otherwise 0 (implemented at CEFCA).
->
>
+ arr_contains_null(element) returns 1 if the array has 1 or more elements with null value or otherwise 0 (implemented at CEFCA).
  Implementation Status 

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra. In this implementation, array multiple access (sub-arrays) and function arr_in are not implemented.

Test cases for implementors can be derived from sqlarraytest.py.

This vector proposal has been also implemented at CEFCA as a part of the CEFCA Catalogues Portal. The functionality can be tried at the TAP service of the different projects/data releases offered.

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 112023-10-17 - TamaraCivera

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:
- META TOPICPARENT
+  name="ADQL"
-<
<
 <languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.
->
>
+It is also possible to access to multiple elements of a vector (as a sub-array). To access, write [lower-bound:upper-bound], where lower and upper bounds are integer-valued expressions and are both included. Lower-bound must be equal or greater to 1 due to indexes in ADQL are 1-based and if upper-bound is greater to the lenght of the array not error is going to be returned.
  Basic Math 
 

 vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
 

 Vector computations 
 

 arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
 

 Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
  arr_sum(arr) returns the sum of arr's elements
  arr_count(arr) returns the number of elements in the array (the “array length”, where NaN elements count). This is always an integer, regardless of the array type.
 

 Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
 

 AVG
  MIN
  MAX
  SUM
 
When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].

 Array Map 

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.
->
>
+ Other array functions
->
>
+Other functions related with vectors are:
 

 arr_in(element, arr) returns true if the element is in the array or otherwise false.
  Implementation Status
-<
<
+The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.
->
>
+The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra. In this implementation, array multiple access (sub-arrays) and function arr_in are not implemented.
 Test cases for implementors can be derived from sqlarraytest.py.
->
>
+This vector proposal has been also implemented at CEFCA as a part of the CEFCA Catalogues Portal. The functionality can be tried at the TAP service of the different projects/data releases offered.
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->
-<
<

Revision 102023-09-05 - MarkusDemleitner

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

 Basic Math 
 

 vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
 

 Vector computations 
 

 arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
 

 Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
  arr_sum(arr) returns the sum of arr's elements
- META TOPICPARENT
+  name="ADQL"
->
>
+ arr_count(arr) returns the number of elements in the array (the “array length”, where NaN elements count). This is always an integer, regardless of the array type.
  Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
 

 AVG
  MIN
  MAX
  SUM
-<
<
+When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].
->
>
+When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].
  Array Map 

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.

 Implementation Status 

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.

Test cases for implementors can be derived from sqlarraytest.py.

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->
->
>

Revision 92022-10-07 - MarkusDemleitner

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

 Basic Math 
 

 vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
 

 Vector computations 
 

 arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
 

 Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
  arr_sum(arr) returns the sum of arr's elements
 

 Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
- META TOPICPARENT
+  name="ADQL"
-<
<
+ ARG
->
>
+ AVG
  MIN
  MAX
  SUM
 
When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].

 Array Map 

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.

 Implementation Status 

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.
-<
<
+Test cases for implementors can be derived from
->
>
+Test cases for implementors can be derived from sqlarraytest.py.
-<
<
+sqlarraytest.py.
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 82022-08-05 - MarkusDemleitner

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

 Basic Math
- META TOPICPARENT
+  name="ADQL"
-<
<
+ vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
->
>
+ vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
 

 Vector computations
-<
<
+* arr_scalprod(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
->
>
+ arr_dot(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
  Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
  arr_sum(arr) returns the sum of arr's elements
 

 Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
 

 ARG
  MIN
  MAX
  SUM
-<
<
+When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].
->
>
+When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].
  Array Map 

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).

Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.

 Implementation Status 

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.
->
>
+Test cases for implementors can be derived from
sqlarraytest.py.
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 72022-08-04 - MarkusDemleitner

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

 Basic Math 
 

 vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
- META TOPICPARENT
+  name="ADQL"
-<
<
+ vec1*vec2 is the component-wise quotien of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
->
>
+ vec1/vec2 is the component-wise quotient of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
 

 Vector computations 

* arr_scalprod(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

 Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
  arr_sum(arr) returns the sum of arr's elements
 

 Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
 

 ARG
  MIN
  MAX
  SUM
 
When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].
-<
<
  Array Map
-<
<
+arr_map(expr_over_x, arr) computes a new array by binding each element
->
>
+arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.
-<
<
+of arr to x in turn and then computing expr_over_x.
-<
<
+expr_over_x is an ADQL numeric_value_expression that can use column
->
>
+expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.
-<
<
+references as usual, except that the name x is reserved for the
evaluation.
-<
<
+For instance, arr_map(power(10, x), mags) will return an array
->
>
+For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].
-<
<
+[power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].
-<
<
+Admittedly, the artificial "x" here is not pretty.  The clean solution
->
>
+Admittedly, the artificial "x" here is not pretty. The clean solution would be to define some sort of lambda calculus for ADQL ("first class functions"), but that's almost certainly overdoing it (although: does anyone do that in SQL?).
-<
<
+would be to define some sort of lambda calculus for ADQL ("first class
functions"), but that's almost certainly overdoing it (although: does
anyone do that in SQL?).
-<
<
+Perhaps it is preferable to use the array name itself, as in
->
>
+Perhaps it is preferable to use the array name itself, as in arr_map(power(10, mags), mags)? That would at least not clobber other names that SQL might want to use somewhere else? In implementation, at least Markus had to massage these column references on the translator level anyway. On the other hand, one might be tempted then to leave out the second argument at all, and that would require a lot more thought, first, as regards finding arrays in the expression (do we want to require translators to be able to do that?), and then what should happen if there are multiple arrays.
-<
<
+arr_map(power(10, mags), mags)?  That would at least not clobber other
names that SQL might want to use somewhere else?  In implementation, at
least Markus had to massage these column references on the translator
level anyway.  On the other hand, one might be tempted then to leave out
the second argument at all, and that would require a lot more thought,
first, as regards finding arrays in the expression (do we want to
require translators to be able to do that?), and then what should
happend if there are multiple arrays.
  Implementation Status 

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 62022-08-01 - MarkusDemleitner

META TOPICPARENT	name="ADQL"

A proposal for vector math in ADQL

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>

Element Access

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

Basic Math

vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1*vec2 is the component-wise quotien of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
scal*vec and vec*scal is the scalar multiplication of a vector.
vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.

Vector computations

Changed:

<
< * arr_scalprod(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

>
> * arr_scalprod(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

Array Aggregation

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).

arr_avg(arr) returns the arithmetic mean of arr's elements
arr_max(arr) returns the largest element of arr
arr_min(arr) returns the smallest element of arr
arr_sum(arr) returns the sum of arr's elements

Aggregate Functions for Arrays

The following standard ADQL aggregate functions, applied to arrays, work component-wise:

Deleted:

<
<

Deleted:

<
<

Implementation Status

Changed:

>
>

Array Map

Added:

>
>

arr_map(expr_over_x, arr) computes a new array by binding each element of arr to x in turn and then computing expr_over_x.

expr_over_x is an ADQL numeric_value_expression that can use column references as usual, except that the name x is reserved for the evaluation.

For instance, arr_map(power(10, x), mags) will return an array [power(10, mags[1]), power(10, mags[2]), power(10, mags[3])...].

Implementation Status

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script. The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap. Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 52022-07-27 - MarkusDemleitner

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Element Access 

To access an element of a vector, write [element-index], where element-index is an integer-valued expression. In keeping with common SQL practices (and regrettably working against most programming languages), indexes in ADQL are 1-based (rather than 0-based). That is, the first element of an array with N elements has the index 1 and the last element has the index N.

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

 Basic Math 
 

 vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
- META TOPICPARENT
+  name="ADQL"
-<
<
+ vec1*vec2 is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
->
>
+ vec1*vec2 is the component-wise product of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
->
>
+ vec1*vec2 is the component-wise quotien of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  scal*vec and vec*scal is the scalar multiplication of a vector.
  vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.
->
>
+ Vector computations 

  * arr_scalprod(vec1,vec2) is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
  Array Aggregation 

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).
 

 arr_avg(arr) returns the arithmetic mean of arr's elements
  arr_max(arr) returns the largest element of arr
  arr_min(arr) returns the smallest element of arr
-<
<
+ arr_stddev(arr) returns the standard deviation of arr's elements
  arr_sum(arr) returns the sum of arr's elements
 

 Aggregate Functions for Arrays 

The following standard ADQL aggregate functions, applied to arrays, work component-wise:
 

 ARG
  MIN
  MAX
-<
<
+ STDDEV
  SUM
-<
<
+When aggregates are computed over arrays of different lengths, the
result undefined for now.  [Options would be erroring out, extending
with NaN – i.e., extended items are NaN –, or extending with NULL –
i.e., extended items are ignored.  Postgres chooses the third option for
their MIN and MAX, and it's most straightforward in implementation, so
it's also what DaCHS does.  But it's not necessarily a good idea].
->
>
+When aggregates are computed over arrays of different lengths, the result undefined for now. [Options would be erroring out, extending with NaN – i.e., extended items are NaN –, or extending with NULL – i.e., extended items are ignored. Postgres chooses the third option for their MIN and MAX, and it's most straightforward in implementation, so it's also what DaCHS does. But it's not necessarily a good idea].

 Implementation Status 

The SQL part of an implementation of this in postgresql is in DaCHS //adql RD, the create_array_operator script.  The functionality can be tried out at the TAP service at http://dc.g-vo.org/tap.  Suitable tables (i.e., with vector-like data) include sdssdr16.main, gaia.dr2epochflux, onebigb.ssa, or dfbsspec.spectra.
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 42022-07-08 - MarkusDemleitner

META TOPICPARENT	name="ADQL"

A proposal for vector math in ADQL

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>

Element Access

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

Basic Math

vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1*vec2 is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.
scal*vec and vec*scal is the scalar multiplication of a vector.

Changed:

<
<

vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.

>
>

vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.

Array Aggregation

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).

arr_avg(arr) returns the arithmetic mean of arr's elements
arr_max(arr) returns the largest element of arr
arr_min(arr) returns the smallest element of arr
arr_stddev(arr) returns the standard deviation of arr's elements
arr_sum(arr) returns the sum of arr's elements

Added:

>
>

Aggregate Functions for Arrays

The following standard ADQL aggregate functions, applied to arrays, work component-wise:

ARG
MIN
MAX
STDDEV
SUM

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 32022-07-06 - MarkusDemleitner

META TOPICPARENT	name="ADQL"

A proposal for vector math in ADQL

Changed:

<
< With tables containing massive amounts of vectors (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

>
> With tables containing massive amounts of vectors becoming common (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>

Added:

>
>

Element Access

Again in keeping with common SQL practices, accessing elements outside of that range gives NULL.

Basic Math

vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.

Changed:

<
<

vec1*vec2 is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

>
>

vec1*vec2 is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector. That is, the scalar product of vectors of unequal length is NaN.

scal*vec and vec*scal is the scalar multiplication of a vector.

Added:

>
>

vec/scal is the equivalent of (1/scal)*vec for a scalar. This is always floating point division, never integer division.

Array Aggregation

These are functions that work like SQL aggregate functions, just on the elements of arrays. These ought to return the types of the elements of the argument (real, double precision, integers).

arr_avg(arr) returns the arithmetic mean of arr's elements
arr_max(arr) returns the largest element of arr
arr_min(arr) returns the smallest element of arr
arr_stddev(arr) returns the standard deviation of arr's elements
arr_sum(arr) returns the sum of arr's elements

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 22022-06-30 - MarkusDemleitner

  META TOPICPARENT 
  name="ADQL"  

 A proposal for vector math in ADQL 

With tables containing massive amounts of vectors (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

TAP servers supporting this should declare that by defining a language feature. While no IVOA specification exists for array operations, use the VECTORMATH key from GAVO's ADQL extensions standards record, like this:

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>


 Basic Math
- META TOPICPARENT
+  name="ADQL"
-<
<
+ vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NULLs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NULLs to the length of the longer vector.
->
>
+ vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
  vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NaNs to the length of the longer vector.
->
>
+ vec1*vec2 is the scalar product of two vectors. Where vec1 and vec2 have unequal length, the shorter vector is padded with NaNs to the length of the longer vector.  That is, the scalar product of vectors of unequal length is NaN.
  scal*vec and vec*scal is the scalar multiplication of a vector.
 <-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

Revision 12022-06-30 - MarkusDemleitner

META TOPICPARENT	name="ADQL"

A proposal for vector math in ADQL

With tables containing massive amounts of vectors (e.g., the collections of low-resolution spectra within Gaia DR3 or the Digitised Byurakan Surveys), giving TAP users a toolset to do server-side work with arrays becomes highly desirable and will significantly enhance the power of ADQL to do server-side analyses. This is an attempt to provide a baseline feature set for that.

<languageFeatures type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
  <feature>
    <form>VECTORMATH</form>
      <description>You can compute with vectors here. See
        https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQLVectorMath
        for an overview of the functions and operators available.
      </description>
   </feature>
</languageFeatures>

Basic Math

vec1+vec2 is the component-wise sum of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NULLs to the length of the longer vector.
vec1-vec2 is the component-wise difference of two vectors. Where vec1 and vec2 have unequal length, the result is padded with NULLs to the length of the longer vector.

<-- 
 
 
 Set ALLOWTOPICRENAME = TWikiAdminGroup
 
 
-->

View topic | History: r13 < r12 < r11 < r10 | More topic actions...