(r3) TimeSeries2020 < IVOA

IVOA Web>IvoaVOEvent>TimeSeries2020 (revision 3)~~EditAttach~~

TimeSeries 2020 proposal

A new proposal for serialising time series from Ada Nebot, intended to meet a specific limited set of requirements.

The LaTex source for the document is currently hosted on GitHub here (in Ada's private space) because this is not an official note yet.

The latest PDF version is here TimeSeries.pdf

This page is to provide an initial discussion forum about the proposed idea. We will collect the comments together and transfer them into issues on GitHub once we find a permanent home in the main IVOA-std section of GitHub.

To start the discussion I have transferred below a number of comments that we have received via email.

If you have comments about the proposed note please add them to this page.

There will be a hackathon discussion about this proposal on Wednesday 5th Feb 17:30 - 19:00 at the WP4 Technology Forum in Strasbourg.

On 2020-01-24 09:13, Jesus Salgado wrote:

Unfortunately, I will not be able to attend this time but I could connect remotely.

On 2020-01-15 17:06, Mireille LOUYS wrote:

I can contribute too, with the focus in mind that this "Lightcurve DM effort" will be a testbed for a wider scope DM as proposed by Laurent for CAB-MSD ( Component and association based- Model for source data).

On 2020-01-16 11:42, François Bonnarel wrote:

Yes. We can go this way

We agree to participate.

On 2020-01-24 11:50, Pierre Fernique wrote:

I'm a little bit surprised with the document proposal. My concern is that the document is trying to sell us FILTERSYS for TIMESYS or COOSYS when it is not the same approach - at least on the serialization XML point of view. TIMESYS / COOSYS are high level XML entities, without hierarchy, and they impose the units and the vocabulary used inside. In the document, FILTERSYS is an annotation of a GROUP (via name), potentially hierarchical. This approach bothers me because it confuses the discussion. For me, if we want to do GROUP then we must not try to make a wacky similarity with TIMESYS and COOSYS and fully assume the GROUP approach.

My second comment is the use of "name" as "tagging" of a GROUP. It gives a different role from the name when used in a GROUP compared to its use in a FIELD or in a PARAM. I think this is a consequence of the GROUP approach.

So, I would suggest to avoid FILTERSYS word if we want to use GROUP.

Or if we really want FILTERSYS, we must assume something like this (ADA example) :

    <FILTERSYS ID="phot_sys" uniqueIdentifier="Palomar/ZTF.g/Vega" 
           zeroPointFlux="3963.97" magnitudeSystem="Vega" 
           effectiveWavelength="4722.74" />

On 2020-01-28 15:02, ada nebot wrote:

One of the things that I talked with Dave, and that will be discussed there, is the possibility of substituting the GROUP for a new element to be added to future versions on VOTable. That would simplify annotation, at the expense of imposing units for the described attributes.

On 2020-01-28 16:15, Jesus Salgado wrote:

I have been reading the proposal and it is clear and simple so it is a very good starting point in my view. I think a standard like this (simple in format but powerful in content) will be very useful for the community and for data providers.

The only question I have from the initial reading is if there is a reason to use "ref" in FIELD instead of FIELDref in the GROUP (it looks the same but in most of the standards the ref was done from the group to the table and I cannot remember if there is any difference)

About the substitution of GROUP, if we can propose a modification of VOTable in some way it could make sense to explain the problem I found when I tried to serialize the Gaia time series (some of you already know it but, maybe, others not).

We tried to use a table that conceptually was like:

time	filter	wavelength	flux
12313	B	0.123	0.4
12314	V	0.234	0.5

........................ (random numbers)

How to annotate this in VOTable? Ideally, in a DB you could use B and V as a foreign key to another table that contains filters characterization (reference at row level) but VOTable annotates at column level (reference at column level by ref in FIELD or FIELDref)

First thing, I proposed to Gaia a multi-table response, one per filter. (quite similar to second example in Ada's proposal) However, we were not sure if any application should be able to read it. Also, these time series are difficult to be interpreted as spectral data

Maybe, if we are going to promote the multi-table format as a standard, the impact is reduced as applications would be adapted.

Another option was to use different columns, with empty values where applicable:

time	filterB	WavelenghtB	filterV	WavelenghtV	flux
12313	B	0.123	null	null	0.4
12314	null	null	V	0.234	0.5

(this is horrible)

And last option is to add in every row the characterization annotation:

time	filter	wavelenght	flux	linktoFilterProfileService
12313	B	0.123	0.4	http://filterProfileService?filter=B
12314	V	0.234	0.5	http://filterProfileService?filter=V

(maybe there are more options but these were the ones discussed more or less)

Last one is a quite verbose way to express time series (I have put only one extra value but the table grows more and more for more columns).

It works in the way that new points can be easily added like a stream, it can be seen as a a time series or a spectrum without too much effort (so it can be opened by spectral applications) and it does not contain empty values. The worst part is that it does not make use of any of the good VOTable annotation features... it is just a table.

After discussions, that last format was conceptually the approach we followed as it allows users, at least, to open it in several applications.

Now, if we could consider the possibility to modify VOTable, I think the more valuable change would be the possibility to annotate at row level, like a foreign key.

This is a tricky change and it would need to be thought carefully but, for the particular case of times series, conceptually it would be something like:

    <GROUP>
        <ROWRef="FilterColumn" key="filterB"/>
        ... all the characterization metadata for filter B ...
    </GROUP>
    <GROUP>
        <ROWRef="FilterColumn" key="filterV"/>
        ... all the characterization metadata for filter V ...
    </GROUP>

    <FIELD ID=time>..</FIELD>
    <FIELD ID=FilterColumn>..</FIELD>
    <FIELD ID=flux>..</FIELD>

12313	filterB	0.4
12314	filterV	0.5

(or, even, from one table to another preventing the use of groups, totally like a foreign key)

    <TABLE ID=Filters>
        <FIELD ID=name>..</FIELD>
        <FIELD ID=FilterColumn>..</FIELD>
        <FIELD ID=linktoFilterProfileService>..</FIELD>
        <FIELD ID=zeroPoint>..</FIELD>
    </TABLE>

B	filterB	http://filterProfileService?filter=B	1.32
V	filterV	http://filterProfileService?filter=V	1.12

    <TABLE ID=Values>
        <FIELD ID=time>
        <FIELD ID=FilterColumn FOREIGN=Filters.FilterColumn>
        <FIELD ID=flux>
    </TABLE>

12313	filterB	0.4
12314	filterV	0.5

I think Francois showed me a lot of years ago a very complex VOTable serialization that had something similar (with valid VOTable keys but with some non-standard logic). Maybe Francois could clarify is something like this was already tried in an experimental way at CDS.

On 2020-01-28 17:18, Laurent Michel wrote:

I understand the motivation of Ada to promote this solution since we still have no recommended TS model. Thus let’s work with what we have. Honestly, I am balanced between this pragmatism and the conviction that models remain indispensable, but there a point on which I agree, and that is that we must not get stuck in political considerations.

The current proposal works fine for simple TS but it has several limitations I would like to point out: * It is stated nowhere that the VOTable is a TS. I think that, if we have to move toward VOTable 1.5, it would be nice to reserve a little GROUP (or something) to say what is in the VOTable. * There is no way to clearly identify wich the TIME column is the independent axis (see bellow). * The proposed syntax cannot deal with table mixing time data with different filters or attached to different sources. * There no way to put together data spread out on several tables.

I worked out all of these items with my proposal, so before to follow the proposal, I would like to make sure it's worth the pain.

Bullets 2, 3, 4 are consequences of the GROUP approach. * The links between model and data goes from the FIELD to the GROUP. So, to retrieve the TIME column of a TS, we have to look for all FIELDS pointing to TIMSYS and to infer which one the good one. * GROUP éléments or attributes provide no way to filter or to group data by filter or by source ID or whatever. This would be difficult to implement for the same reason as above: semantic links from data to annotation. * GROUP are attached to one table and couldn’t see data out of that table.

Groups have however a big advantage, they are parts of the Votable standards for decades (almost 2) and all parsers know them. It is to noted that they are rather used as UTYPE containers than as hierarchical structure.

My proposal, namely VODML lite, resolves the issues mentioned here in a compact way but this is a new syntax that requires new parsers. I wrote two prototypes (Java and Python) but this not completed enough to say that everything is ready.

My feeling is that it would be a mistake to push the GROUP approach beyond what it has been designed for. Just limit the scope of the note to simplest cases and let the door open for something else for more complex situations.

On 2020-01-28 17:26, NEBOT GOMEZ-MORAN Ada (OBS) wrote:

One of the things you (Jesus) mention is referencing rows. I had a look at that too. As I understand this is possible according to VOTable Section 4.10. But the exact way this works and how to use this looked relatively complicated to me since I haven’t seen any example, but it relies on relation between two tables and using groups in a particular way.

As I understand we can define a first table with the info of filters and then another table with the info on the key and possible values (matching those of a row).

After talking to Dave, Pierre Fernique and Francois about that option they pointed to me this option would be more complicated. In particular for applications to combine time series.

That’s the main reason why I chose the several table option. Easier for clients (a priori).

But I agree adding some functionality to be able to select elements with a specific value in a row would be useful and deserves some exploring.

On 2020-01-28 17:49, Jesus Salgado wrote:

I totally overlooked the foreignKey point (!) with the table reference and it is exactly what I was looking for in a valid VOTable. I am not sure why this is not used more often.

It would be interesting to know why Dave, Pierre and Francois think that it would be more complicated as, in principle, any application would only try to read the main results table and only "clever" applications would need to read the filter characterization. I think this could be because you need to save in memory the filters table to use it for the second table (?)

On 2020-01-29 07:55, NEBOT GOMEZ-MORAN Ada (OBS) wrote:

I forgot to answer the question you (Jesus) asked about referencing GROUPs. There are two ways: FIELDref to the columns, or from the column to the GROUP by using ref=ID.

In this proposal I used the second case in agreement to how the elements COOSYS and TIMESYS are referenced. This allows the element to be defined externally if we find consensus. This element could be used for SED annotation as well, which is a plus for having it as external element of a time series. It’s scope is broader.

Although there seems to be a solution for referencing rows, it is unclear right now for me how to annotate that and unclear the time line for applications to work around it. Perhaps we should propose this multiple table for now. This is in line with what I wrote in the note. We know there is a possible solution, but it looks a bit more complicated.

In any case, I propose we add this point to the discussion for the Tech forum. That would allow others to contribute to the discussion.

Also, I would like to involve a broader community so I think it would be good to move this Note to GitHub for a collaborative work and send an email to the TDIG. This can be done now, unless you see some reason why not to. I can add there the issues we have talked about, and possible solutions and then wait to see reactions.

On 2020-01-29 09:18, Jesus Salgado wrote:

Fully clear. It makes total sense.

On 2020-02-02 23:11, François Bonnarel wrote:

I sent you an example which is made of a couple of changes inside Ada's approach.

I made some inputs coming from cab-msd, mainly the fact that a TimeSeries is attached to a source (or a target, bu this makes bo real difference)

A dummy simple ScalarTimeSeries model is assumed and should be proposed with the note

Le 28/01/2020 à 18:18, Laurent Michel a écrit :

> * It is stated nowhere that the VOTable is a TS. I think that, if we
> have to move toward VOTable 1.5, it would be nice to reserve a little
> GROUP (or something) to say what is in the VOTable.

I added a GROUP derived from cabb-msd Laurent.

> * There is no way to clearly identify wich the TIME column is the
> independent axis (see bellow).

An attribute of the ScalarTimeSeries model can say that. It is rendered here by a utype

> * The proposed syntax cannot deal with table mixing time data with
> different filters or attached to different sources.

Yes this requires indexation and joints on some columns. See my answer to Jesus tommorrow.

> * The links between model and data goes from the FIELD to the GROUP.
> So, to retrieve the TIME column of a TS, we have to look for all
> FIELDS pointing to TIMSYS and to infer which one the good one.

I think reference to TIMESYS is to find out the Time frame used for the time column. It should not be the only thing to identify the time column.

> * GROUP éléments or attributes provide no way to filter or to group
> data by filter or by source ID or whatever. This would be difficult to
> implement for the same reason as above: semantic links from data to
> annotation.

Yes, this requires additional indexing mechanism.

> * GROUP are attached to one table and couldn’t see data out of that
> table.

Although it's probably the most common usage, GROUPS can however be defined outside TABLES and even outside RESOUCES according to the xml schema. So ....

> My feeling is that it would be a mistake to push the GROUP approach
> beyond what it has been designed for. Just limit the scope of the note
> to simplest cases and let the door open for something else for more
> complex situations.

Yes I agree . More complex situations like TimeSeries of objects more complex than a couple of scalar parameters require your approach.