TimeSeries2020 < IVOA

IVOA Web>IvoaVOEvent>TimeSeries2020 (2021-04-13, GiuliaIafrate)

TimeSeries 2020 proposal

A new proposal for serialising time series from Ada Nebot, intended to meet a specific limited set of requirements.

The LaTex source for the document is currently hosted on GitHub here (in Ada's private space) because this is not an official note yet.

The latest PDF version is here TimeSeries.pdf

This page is to provide an initial discussion forum about the proposed idea. We will collect the comments together and transfer them into issues on GitHub once we find a permanent home in the main IVOA-std section of GitHub.

To start the discussion I have transferred below a number of comments that we have received via email.

If you have comments about the proposed note please add them to this page.

There will be a hackathon discussion about this proposal on Wednesday 5th Feb 17:30 - 19:00 at the WP4 Technology Forum in Strasbourg.

On 2020-01-24 09:13, Jesus Salgado wrote:

Unfortunately, I will not be able to attend this time but I could connect remotely.

On 2020-01-15 17:06, Mireille LOUYS wrote:

I can contribute too, with the focus in mind that this "Lightcurve DM effort" will be a testbed for a wider scope DM as proposed by Laurent for CAB-MSD ( Component and association based- Model for source data).

On 2020-01-16 11:42, François Bonnarel wrote:

Yes. We can go this way

We agree to participate.

On 2020-01-24 11:50, Pierre Fernique wrote:

I'm a little bit surprised with the document proposal. My concern is that the document is trying to sell us FILTERSYS for TIMESYS or COOSYS when it is not the same approach - at least on the serialization XML point of view. TIMESYS / COOSYS are high level XML entities, without hierarchy, and they impose the units and the vocabulary used inside. In the document, FILTERSYS is an annotation of a GROUP (via name), potentially hierarchical. This approach bothers me because it confuses the discussion. For me, if we want to do GROUP then we must not try to make a wacky similarity with TIMESYS and COOSYS and fully assume the GROUP approach.

My second comment is the use of "name" as "tagging" of a GROUP. It gives a different role from the name when used in a GROUP compared to its use in a FIELD or in a PARAM. I think this is a consequence of the GROUP approach.

So, I would suggest to avoid FILTERSYS word if we want to use GROUP.

Or if we really want FILTERSYS, we must assume something like this (ADA example) :

    <FILTERSYS ID="phot_sys" uniqueIdentifier="Palomar/ZTF.g/Vega" 
           zeroPointFlux="3963.97" magnitudeSystem="Vega" 
           effectiveWavelength="4722.74" />

On 2020-01-28 15:02, ada nebot wrote:

One of the things that I talked with Dave, and that will be discussed there, is the possibility of substituting the GROUP for a new element to be added to future versions on VOTable. That would simplify annotation, at the expense of imposing units for the described attributes.

On 2020-01-28 16:15, Jesus Salgado wrote:

I have been reading the proposal and it is clear and simple so it is a very good starting point in my view. I think a standard like this (simple in format but powerful in content) will be very useful for the community and for data providers.

The only question I have from the initial reading is if there is a reason to use "ref" in FIELD instead of FIELDref in the GROUP (it looks the same but in most of the standards the ref was done from the group to the table and I cannot remember if there is any difference)

About the substitution of GROUP, if we can propose a modification of VOTable in some way it could make sense to explain the problem I found when I tried to serialize the Gaia time series (some of you already know it but, maybe, others not).

We tried to use a table that conceptually was like:

time	filter	wavelength	flux
12313	B	0.123	0.4
12314	V	0.234	0.5

........................ (random numbers)

How to annotate this in VOTable? Ideally, in a DB you could use B and V as a foreign key to another table that contains filters characterization (reference at row level) but VOTable annotates at column level (reference at column level by ref in FIELD or FIELDref)

First thing, I proposed to Gaia a multi-table response, one per filter. (quite similar to second example in Ada's proposal) However, we were not sure if any application should be able to read it. Also, these time series are difficult to be interpreted as spectral data

Maybe, if we are going to promote the multi-table format as a standard, the impact is reduced as applications would be adapted.

Another option was to use different columns, with empty values where applicable:

time	filterB	WavelenghtB	filterV	WavelenghtV	flux
12313	B	0.123	null	null	0.4
12314	null	null	V	0.234	0.5

(this is horrible)

And last option is to add in every row the characterization annotation:

time	filter	wavelenght	flux	linktoFilterProfileService
12313	B	0.123	0.4	http://filterProfileService?filter=B
12314	V	0.234	0.5	http://filterProfileService?filter=V

(maybe there are more options but these were the ones discussed more or less)

Last one is a quite verbose way to express time series (I have put only one extra value but the table grows more and more for more columns).

It works in the way that new points can be easily added like a stream, it can be seen as a a time series or a spectrum without too much effort (so it can be opened by spectral applications) and it does not contain empty values. The worst part is that it does not make use of any of the good VOTable annotation features... it is just a table.

After discussions, that last format was conceptually the approach we followed as it allows users, at least, to open it in several applications.

Now, if we could consider the possibility to modify VOTable, I think the more valuable change would be the possibility to annotate at row level, like a foreign key.

This is a tricky change and it would need to be thought carefully but, for the particular case of times series, conceptually it would be something like:

    <GROUP>
        <ROWRef="FilterColumn" key="filterB"/>
        ... all the characterization metadata for filter B ...
    </GROUP>
    <GROUP>
        <ROWRef="FilterColumn" key="filterV"/>
        ... all the characterization metadata for filter V ...
    </GROUP>

    <FIELD ID=time>..</FIELD>
    <FIELD ID=FilterColumn>..</FIELD>
    <FIELD ID=flux>..</FIELD>

12313	filterB	0.4
12314	filterV	0.5

(or, even, from one table to another preventing the use of groups, totally like a foreign key)

    <TABLE ID=Filters>
        <FIELD ID=name>..</FIELD>
        <FIELD ID=FilterColumn>..</FIELD>
        <FIELD ID=linktoFilterProfileService>..</FIELD>
        <FIELD ID=zeroPoint>..</FIELD>
    </TABLE>

B	filterB	http://filterProfileService?filter=B	1.32
V	filterV	http://filterProfileService?filter=V	1.12

    <TABLE ID=Values>
        <FIELD ID=time>
        <FIELD ID=FilterColumn FOREIGN=Filters.FilterColumn>
        <FIELD ID=flux>
    </TABLE>

12313	filterB	0.4
12314	filterV	0.5

I think Francois showed me a lot of years ago a very complex VOTable serialization that had something similar (with valid VOTable keys but with some non-standard logic). Maybe Francois could clarify is something like this was already tried in an experimental way at CDS.

On 2020-01-28 17:18, Laurent Michel wrote:

I understand the motivation of Ada to promote this solution since we still have no recommended TS model. Thus let’s work with what we have. Honestly, I am balanced between this pragmatism and the conviction that models remain indispensable, but there a point on which I agree, and that is that we must not get stuck in political considerations.

The current proposal works fine for simple TS but it has several limitations I would like to point out: * It is stated nowhere that the VOTable is a TS. I think that, if we have to move toward VOTable 1.5, it would be nice to reserve a little GROUP (or something) to say what is in the VOTable. * There is no way to clearly identify wich the TIME column is the independent axis (see bellow). * The proposed syntax cannot deal with table mixing time data with different filters or attached to different sources. * There no way to put together data spread out on several tables.

I worked out all of these items with my proposal, so before to follow the proposal, I would like to make sure it's worth the pain.

Bullets 2, 3, 4 are consequences of the GROUP approach. * The links between model and data goes from the FIELD to the GROUP. So, to retrieve the TIME column of a TS, we have to look for all FIELDS pointing to TIMSYS and to infer which one the good one. * GROUP éléments or attributes provide no way to filter or to group data by filter or by source ID or whatever. This would be difficult to implement for the same reason as above: semantic links from data to annotation. * GROUP are attached to one table and couldn’t see data out of that table.

Groups have however a big advantage, they are parts of the Votable standards for decades (almost 2) and all parsers know them. It is to noted that they are rather used as UTYPE containers than as hierarchical structure.

My proposal, namely VODML lite, resolves the issues mentioned here in a compact way but this is a new syntax that requires new parsers. I wrote two prototypes (Java and Python) but this not completed enough to say that everything is ready.

My feeling is that it would be a mistake to push the GROUP approach beyond what it has been designed for. Just limit the scope of the note to simplest cases and let the door open for something else for more complex situations.

On 2020-01-28 17:26, NEBOT GOMEZ-MORAN Ada (OBS) wrote:

One of the things you (Jesus) mention is referencing rows. I had a look at that too. As I understand this is possible according to VOTable Section 4.10. But the exact way this works and how to use this looked relatively complicated to me since I haven’t seen any example, but it relies on relation between two tables and using groups in a particular way.

As I understand we can define a first table with the info of filters and then another table with the info on the key and possible values (matching those of a row).

After talking to Dave, Pierre Fernique and Francois about that option they pointed to me this option would be more complicated. In particular for applications to combine time series.

That’s the main reason why I chose the several table option. Easier for clients (a priori).

But I agree adding some functionality to be able to select elements with a specific value in a row would be useful and deserves some exploring.

On 2020-01-28 17:49, Jesus Salgado wrote:

I totally overlooked the foreignKey point (!) with the table reference and it is exactly what I was looking for in a valid VOTable. I am not sure why this is not used more often.

It would be interesting to know why Dave, Pierre and Francois think that it would be more complicated as, in principle, any application would only try to read the main results table and only "clever" applications would need to read the filter characterization. I think this could be because you need to save in memory the filters table to use it for the second table (?)

On 2020-01-29 07:55, NEBOT GOMEZ-MORAN Ada (OBS) wrote:

I forgot to answer the question you (Jesus) asked about referencing GROUPs. There are two ways: FIELDref to the columns, or from the column to the GROUP by using ref=ID.

In this proposal I used the second case in agreement to how the elements COOSYS and TIMESYS are referenced. This allows the element to be defined externally if we find consensus. This element could be used for SED annotation as well, which is a plus for having it as external element of a time series. It’s scope is broader.

Although there seems to be a solution for referencing rows, it is unclear right now for me how to annotate that and unclear the time line for applications to work around it. Perhaps we should propose this multiple table for now. This is in line with what I wrote in the note. We know there is a possible solution, but it looks a bit more complicated.

In any case, I propose we add this point to the discussion for the Tech forum. That would allow others to contribute to the discussion.

Also, I would like to involve a broader community so I think it would be good to move this Note to GitHub for a collaborative work and send an email to the TDIG. This can be done now, unless you see some reason why not to. I can add there the issues we have talked about, and possible solutions and then wait to see reactions.

On 2020-01-29 09:18, Jesus Salgado wrote:

Fully clear. It makes total sense.

On 2020-02-02 23:11, François Bonnarel wrote:

I sent you an example which is made of a couple of changes inside Ada's approach.

I made some inputs coming from cab-msd, mainly the fact that a TimeSeries is attached to a source (or a target, bu this makes bo real difference)

A dummy simple ScalarTimeSeries model is assumed and should be proposed with the note

Le 28/01/2020 à 18:18, Laurent Michel a écrit :

> * It is stated nowhere that the VOTable is a TS. I think that, if we
> have to move toward VOTable 1.5, it would be nice to reserve a little
> GROUP (or something) to say what is in the VOTable.

I added a GROUP derived from cabb-msd Laurent.

> * There is no way to clearly identify wich the TIME column is the
> independent axis (see bellow).

An attribute of the ScalarTimeSeries model can say that. It is rendered here by a utype

> * The proposed syntax cannot deal with table mixing time data with
> different filters or attached to different sources.

Yes this requires indexation and joints on some columns. See my answer to Jesus tommorrow.

> * The links between model and data goes from the FIELD to the GROUP.
> So, to retrieve the TIME column of a TS, we have to look for all
> FIELDS pointing to TIMSYS and to infer which one the good one.

I think reference to TIMESYS is to find out the Time frame used for the time column. It should not be the only thing to identify the time column.

> * GROUP éléments or attributes provide no way to filter or to group
> data by filter or by source ID or whatever. This would be difficult to
> implement for the same reason as above: semantic links from data to
> annotation.

Yes, this requires additional indexing mechanism.

> * GROUP are attached to one table and couldn’t see data out of that
> table.

Although it's probably the most common usage, GROUPS can however be defined outside TABLES and even outside RESOUCES according to the xml schema. So ....

> My feeling is that it would be a mistake to push the GROUP approach
> beyond what it has been designed for. Just limit the scope of the note
> to simplest cases and let the door open for something else for more
> complex situations.

Yes I agree . More complex situations like TimeSeries of objects more complex than a couple of scalar parameters require your approach.

Notes from DaveMorris made during the ESCAPE WP4 Technology Forum hackathon discussion on Wednesday 5th Feb 17:30 - 19:00.

If we use FILTERSYS, the units would need to be fixed in the specification.
The idea of allowing units as attributes to the FILTERSYS would not work.
The multiple table solution proposed for Gaia data should be replaced by an example using ForeignKey.
The GROUP works, but to avoid confusion don't call the group 'filtersys'.

Describing filters depends on many things.
Using the GROUP mechanism is more flexible than a standardised FILTERSYS element.
MireilleLouys

Vizier the filter descriptions may be assigned by the archive curators (CDS) rather than the authors of the original data.
We would need an attribute in the GROUP that indicates who assigned the filter description.
However, the list of PARAM in the GROUP should be fixed.
Leaving it as an open list could become confusing.
GillesLandais

If we can get the foreignKey relationship to work, then this meets the Gaia use case.
JesusSalgado

Aladin would need extra code to understand the foreignKey relationship.
Without it, it would treate them as two separate tables.
If we have just one filter, then having two tables with a foreignKey relationship is extra overhead.
PierreFernique

Relying on UCD and simple utype only works for the simplest cases.
We will very soon need utype from data models.
Need to clearly specify the scope of this proposal.
MireilleLouys

How to associate magnitude and error as a pair.
If we have two magnitudes and two errors ?
Would need an extra GROUP for each pair.
GillesLandais

Questions about how we annotate the VOTable with links to data models.
Same questions as before.

Can we do a really simple implementation now, and add links to data models later?
If we really do need a ScalarTimeSeries model, then we need to do that now.
What is cab-msd ?
DaveMorris

We have examples for time and flux, we need examples with position.
MireilleLouys

Examples with source position are here TS-ada-modifFB_Bis.xml
FrancoisBonnarel

Solar system may exchange source for target
BaptisteCecconi

Notes from MireilleLouys made during the ESCAPE WP4 Technology Forum hackathon discussion on Wednesday 5th Feb 17:30 - 19:00.

Time domain discussion / serialisation proposal by AdaNebot / IVOA note

PierreFernique, FrancoisBonnarel, JesusSalgado, DaveMorris, BaptisteCecconi, MireilleLouys, GillesLandais

DaveMorris: proposed process : discuss this note on wiki then work to finalise on ivoa github and circulate on the list time domain and DAL and DM

cossys is a special entity

filtersys as an new entity - this depends on the definitions of a service ( identifiers) the param describing the filter and photcal depend on the PhotDMv1-1 and may be updated

this is better to use the group with a reserved name :

PierreFernique: other wise each change on filtersys would require an update of VOtable

MireilleLouys: group strategy is more flexible

strategy 2 tables : one table for lightcurve measures and one filtertable. each phot measure points to a row in the filter table ( foreign key) Robust?

GillesLandais: Vizier : Filter is not always part of the original data. Is a part of the value added by the curator

JesusSalgado: the connexion between phot points : needs a foreign key index rows. GROUP ids are available in VOtable to mimic foreign keys. check VOtable spec

multifilter case : how to avoid duplication of tables two tables : one for the data , one column pointing to rows in the filter table

FrancoisBonnarel: check in SIA experiments

gaia time series :

3 implementations strategies accross various archives ESAC , gavo , CDS:
1 table perfilter : too redundant
one column with the filter ref as a value : attached to the mag column
3 filters as 3 columns

here the ucd is enough to recognize columns except for the multiple filter: em band is not always available as ucd

t mag mag err err / column grouping is required to attach error columns properly

where does the position belong ?

FrancoisBonnarel example: added as a Param.

NB : if we put it in a field , it is better to jump on the next combination data set : to a table of time series .

conclusion : could we proceed with this note ? agreed all to procedd ; need to experienced examples further.

radio, high enery bands ?
experience with non optical filters
radial velocity time serie?

and provide more use-cases to see the limits

Meeting ended at 18:30

Comment from MarkTaylor - 2020-02-10:

Regarding the foreign key proposal: I've got no objection to storing the information in multiple tables using the relational conventions discussed in VOTable section 4.10. However, it's not likely that topcat would pay much attention to them.

The semantic sense of this relational linkage is to reference from a cell in table A a structured data object (a row in table B). Topcat does not currently have UI for representing structured data objects in table cells, only primitives, Strings and arrays. It would also present difficulties in de/serialisation of such tables, since topcat concenptually treats tables individually rather than in collections.

That might not matter: topcat's not really in a position to do much with filter values anyway. It looks to me like the filter (meta-)data is only going to be useful to a photometry-aware consumer, which I don't think topcat is going to be. The foreign key values would still give topcat enough information to, e.g., plot different filters in different colours. But something like saving a time-series table from topcat to disk, or forwarding it via SAMP to another client, with metadata intact, would be hard to get working.

Associating filter information with column or table metadata would certainly fit topcat's view of the world better than providing it in row-level linkage of multiple tables. But that doesn't necessarily mean it's the right thing to do. It might help to have more concrete views of what behaviour you want or expect from clients based on this filter metadata.

Comments from MireilleLouys - 2020-02-07:

Title : Lightcurve data representation in VOTable: The skinny profile for data and metadata

Abstract : we should mention : This proposal applies to light curves, the 1Dtable time series use-cases. Time series of higher dimensions : time series for images, spectra, cubes, etc. will be covered by the TimeSeries general data model based on the Cube DM.

Acknowledgements: This note is inspired by a previous annotation strategy developed for SED (Derrière et al. ...) [14] Derriere, S (2010) Providing Photometric Data Measurements Description in VOTables, IVOA Note, https://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-PPDMDesc-0.1-20101202.pdf

p.2 Introduction time series of tabular data ? not clear to me. --> proposal Simple time series where a set of measures are gathered in a 1D vector for each time stamp,typically light curves or radial velocity curves. Time series of images or arrays with wider dimensions are not covered here.

Sec 2.3 p 4 comments: I think we should insist on the role for FILTERSYS but not set it as a standard. The Filter description currently relies on an external service ,SVO profile service mainly for optical observations. This cannot be a definition in the VOTable standard. Some other filter features might be of interest for other regimes.

I like the "GROUP name ='filtersys'" strategy, because it is very flexible and it allows to add other filter features if necessary for the use-case. ( ex. access url to transmission curve) It corresponds to a special serialized block from PhotDMv1-1 including the following necessary classes of PhotDMv1-1 with some of their attributes:

    PhotometryFilter
        identifier
        spectral.Location.Value (i.e effectiveWavelegth)
    PhotCal
        MagnitudeSystem.type
        zeroPoint.flux.value

we miss the PhotometricSystem class +PhotometricSystem.type =(O energycounter, 1 photon counter) --> can be also a Param

    <PARAM
        name="PhotSystemDesc"
        ucd=""
        utype="phfdm:PhotometricSystem.description"
        unit=""
        datatype="char"
        arraysize="*"
        value="2MASS"
        />

this GROUP for filtersys should have a new utype = "phot:PhotSys" or "phot:PhotCalibration" minted for this specific simple time series use-case (photometricPoint is not correct here)

2.4 Points vs time Table these are the measurements part represented as in a TABLE.

one FIELD should have a ucd:time.epoch;meta.main (it represents the independant axis)
one other FIELD will have a ucd explicit to the measure .

we need to define profiles to distinguish

timed trajectory (Position vs time) ucd="pos.xxx" ref='coossys'
lightcurve ucd="phot.xxx" ref='filtersys'
radial velocity curve : ucd=spectral.veloc.xxx ref="??"

These ucds seems enough to distinguish between the 3 profiles but if necessary it can also be specified with a param added to the table. f.i.

for a light curve, there is no position information for one point in our example. are we sure we always see the exact same position ? if yes the position information can be included in extra columns which refer COOSSYS.

Comments from MireilleLouys - 2020-02-07:

now I see the predefined structure with groups and fields, to match some representation in the spirit of cab-MSD.

theses are actually the tree leaves of the CAB-MSB tree representation. when we reuse the utypes from Spectrum dm ( to be transformed in corresponding VODML-ids) then the profile for metadata labels actually fit. same for CharDM

Utypes for the various time series profiles: DM we can reuse the spectrumDM utypes / or corresponding VODML ids

    utype=Data.FluxAxis.Location.value with ucd="phot.*"
    utype=Data.FluxAxis.Accuracy.StatError with ucd ="stat.error;phot.*"

    utype=Data.SpatialAxis.Location.refval with ucd="pos.*" for a position point
    utype=Data.SpatialAxis.Accuracy.StatError with ucd="stat.error;pos.*" for position error

    utype=Data.RedshiftAxis.Location.refval with ucd="spectral.doppler.veloc"
    utype=Data.RedshiftAxis.Accuracy.StatError with
    ucd="stat.error;spectral.doppler.veloc"

the tree here is

   detected object ( Source)
           PhotCalibration
           flux ---> meas:Flux  or spec:Data.Fluxaxis
           time ---> meas:TimeStamp  or char:Data.TimeAxis
           etc ...

For me the mapping strategy appears similar and compatible with vodmlite strategy. And this note is the first step for simple data sets ...

Comments from BaptisteCecconi - 2020-02-08:

I agree with MireilleLouys about the proposed new title, since this document really applies light curve and not time-series in general. The proposal deals only with photometric/magnitude scalar data time series. Any other scalar parameter time-series (e.g., any non photometric derived parameters) would not fit into this.

Concerning Solar System time series, the proposed scheme can be applied to non-resolved (or poorly resolved) telescopic observations of solar system bodies (mostly small bodies). For other kinds of observations, e.g., in-situ measurements of physical parameters (magnetic field strength, plasma density, wind velocity...), there are 2 places where we need to be more flexible: the COOSYS must be able refer to any Solar System reference frame (usually referred to with a known name); and the time varying parameter description, which must be either simplified or extended to be much more flexible.