A new proposal for serialising time series from
Ada Nebot, intended to meet a specific limited set of requirements.
The LaTex source for the document is currently hosted on GitHub
here (in Ada's private space) because this is not an official note yet.
The latest PDF version is here
TimeSeries.pdf
This page is to provide an initial discussion forum about the proposed idea. We will collect the comments together and transfer them into issues on
GitHub once we find a permanent home in the main IVOA-std section of
GitHub.
To start the discussion I have transferred below a number of comments that we have received via email.
If you have comments about the proposed note please add them to this page.
There will be a hackathon discussion about this proposal on Wednesday 5th Feb 17:30 - 19:00 at the
WP4 Technology Forum in Strasbourg.
On 2020-01-24 09:13, Jesus Salgado wrote:
Unfortunately, I will not be able to attend this time but I could connect remotely.
On 2020-01-15 17:06, Mireille LOUYS wrote:
I can contribute too, with the focus in mind that this "Lightcurve DM effort" will be a testbed for a wider scope DM as proposed by Laurent for CAB-MSD ( Component and association based- Model for source data).
On 2020-01-16 11:42, François Bonnarel wrote:
Yes. We can go this way
We agree to participate.
On 2020-01-24 11:50, Pierre Fernique wrote:
I'm a little bit surprised with the document proposal. My concern is that the document is trying to sell us FILTERSYS for TIMESYS or COOSYS when it is not the same approach - at least on the serialization XML point of view. TIMESYS / COOSYS are high level XML entities, without hierarchy, and they impose the units and the vocabulary used inside. In the document, FILTERSYS is an annotation of a GROUP (via name), potentially hierarchical. This approach bothers me because it confuses the discussion. For me, if we want to do GROUP then we must not try to make a wacky similarity with TIMESYS and COOSYS and fully assume the GROUP approach.
My second comment is the use of "name" as "tagging" of a GROUP. It gives a different role from the name when used in a GROUP compared to its use in a FIELD or in a PARAM. I think this is a consequence of the GROUP approach.
So, I would suggest to avoid FILTER
SYS word if we want to use GROUP.
Or if we really want FILTERSYS, we must assume something like this (ADA example) :
<FILTERSYS ID="phot_sys" uniqueIdentifier="Palomar/ZTF.g/Vega"
zeroPointFlux="3963.97" magnitudeSystem="Vega"
effectiveWavelength="4722.74" />
On 2020-01-28 15:02, ada nebot wrote:
One of the things that I talked with Dave, and that will be discussed there, is the possibility of substituting the GROUP for a new element to be added to future versions on VOTable. That would simplify annotation, at the expense of imposing units for the described attributes.
On 2020-01-28 16:15, Jesus Salgado wrote:
I have been reading the proposal and it is clear and simple so it is a very good starting point in my view. I think a standard like this (simple in format but powerful in content) will be very useful for the community and for data providers.
The only question I have from the initial reading is if there is a reason to use "ref" in FIELD instead of FIELDref in the GROUP (it looks the same but in most of the standards the ref was done from the group to the table and I cannot remember if there is any difference)
About the substitution of GROUP, if we can propose a modification of VOTable in some way it could make sense to explain the problem I found when I tried to serialize the Gaia time series (some of you already know it but, maybe, others not).
We tried to use a table that conceptually was like:
time |
filter |
wavelength |
flux |
12313 |
B |
0.123 |
0.4 |
12314 |
V |
0.234 |
0.5 |
........................ (random numbers)
How to annotate this in VOTable? Ideally, in a DB you could use B and V as a foreign key to another table that contains filters characterization (reference at row level) but VOTable annotates at column level (reference at column level by ref in FIELD or FIELDref)
First thing, I proposed to Gaia a multi-table response, one per filter. (quite similar to second example in Ada's proposal) However, we were not sure if any application should be able to read it. Also, these time series are difficult to be interpreted as spectral data
Maybe, if we are going to promote the multi-table format as a standard, the impact is reduced as applications would be adapted.
Another option was to use different columns, with empty values where applicable:
(this is horrible)
And last option is to add in every row the characterization annotation:
(maybe there are more options but these were the ones discussed more or less)
Last one is a quite verbose way to express time series (I have put only one extra value but the table grows more and more for more columns).
It works in the way that new points can be easily added like a stream, it can be seen as a a time series or a spectrum without too much effort (so it can be opened by spectral applications) and it does not contain empty values. The worst part is that it does not make use of any of the good VOTable annotation features... it is just a table.
After discussions, that last format was conceptually the approach we followed as it allows users, at least, to open it in several applications.
Now, if we could consider the possibility to modify VOTable, I think the more valuable change would be the possibility to annotate at row level, like a foreign key.
This is a tricky change and it would need to be thought carefully but, for the particular case of times series, conceptually it would be something like:
<GROUP>
<ROWRef="FilterColumn" key="filterB"/>
... all the characterization metadata for filter B ...
</GROUP>
<GROUP>
<ROWRef="FilterColumn" key="filterV"/>
... all the characterization metadata for filter V ...
</GROUP>
<FIELD ID=time>..</FIELD>
<FIELD ID=FilterColumn>..</FIELD>
<FIELD ID=flux>..</FIELD>
12313 |
filterB |
0.4 |
12314 |
filterV |
0.5 |
(or, even, from one table to another preventing the use of groups, totally like a foreign key)
<TABLE ID=Filters>
<FIELD ID=name>..</FIELD>
<FIELD ID=FilterColumn>..</FIELD>
<FIELD ID=linktoFilterProfileService>..</FIELD>
<FIELD ID=zeroPoint>..</FIELD>
</TABLE>
<TABLE ID=Values>
<FIELD ID=time>
<FIELD ID=FilterColumn FOREIGN=Filters.FilterColumn>
<FIELD ID=flux>
</TABLE>
12313 |
filterB |
0.4 |
12314 |
filterV |
0.5 |
I think Francois showed me a lot of years ago a very complex VOTable serialization that had something similar (with valid VOTable keys but with some non-standard logic). Maybe Francois could clarify is something like this was already tried in an experimental way at CDS.
On 2020-01-28 17:18, Laurent Michel wrote:
I understand the motivation of Ada to promote this solution since we still have no recommended TS model. Thus let’s work with what we have. Honestly, I am balanced between this pragmatism and the conviction that models remain indispensable, but there a point on which I agree, and that is that we must not get stuck in political considerations.
The current proposal works fine for simple TS but it has several limitations I would like to point out: * It is stated nowhere that the VOTable is a TS. I think that, if we have to move toward VOTable 1.5, it would be nice to reserve a little GROUP (or something) to say what is in the VOTable. * There is no way to clearly identify wich the TIME column is the independent axis (see bellow). * The proposed syntax cannot deal with table mixing time data with different filters or attached to different sources. * There no way to put together data spread out on several tables.
I worked out all of these items with my proposal, so before to follow the proposal, I would like to make sure it's worth the pain.
Bullets 2, 3, 4 are consequences of the GROUP approach. * The links between model and data goes from the FIELD to the GROUP. So, to retrieve the TIME column of a TS, we have to look for all FIELDS pointing to TIMSYS and to infer which one the good one. * GROUP éléments or attributes provide no way to filter or to group data by filter or by source ID or whatever. This would be difficult to implement for the same reason as above: semantic links from data to annotation. * GROUP are attached to one table and couldn’t see data out of that table.
Groups have however a big advantage, they are parts of the Votable standards for decades (almost 2) and all parsers know them. It is to noted that they are rather used as UTYPE containers than as hierarchical structure.
My proposal, namely
VODML lite, resolves the issues mentioned here in a compact way but this is a new syntax that requires new parsers. I wrote two prototypes (Java and Python) but this not completed enough to say that everything is ready.
My feeling is that it would be a mistake to push the GROUP approach beyond what it has been designed for. Just limit the scope of the note to simplest cases and let the door open for something else for more complex situations.
On 2020-01-28 17:26, NEBOT GOMEZ-MORAN Ada (OBS) wrote:
One of the things you (Jesus) mention is referencing rows. I had a look at that too. As I understand this is possible according to VOTable Section 4.10. But the exact way this works and how to use this looked relatively complicated to me since I haven’t seen any example, but it relies on relation between two tables and using groups in a particular way.
As I understand we can define a first table with the info of filters and then another table with the info on the key and possible values (matching those of a row).
After talking to Dave, Pierre Fernique and Francois about that option they pointed to me this option would be more complicated. In particular for applications to combine time series.
That’s the main reason why I chose the several table option. Easier for clients (a priori).
But I agree adding some functionality to be able to select elements with a specific value in a row would be useful and deserves some exploring.
On 2020-01-28 17:49, Jesus Salgado wrote:
I totally overlooked the foreignKey point (!) with the table reference and it is exactly what I was looking for in a valid VOTable. I am not sure why this is not used more often.
It would be interesting to know why Dave, Pierre and Francois think that it would be more complicated as, in principle, any application would only try to read the main results table and only "clever" applications would need to read the filter characterization. I think this could be because you need to save in memory the filters table to use it for the second table (?)
On 2020-01-29 07:55, NEBOT GOMEZ-MORAN Ada (OBS) wrote:
I forgot to answer the question you (Jesus) asked about referencing GROUPs. There are two ways: FIELDref to the columns, or from the column to the GROUP by using ref=ID.
In this proposal I used the second case in agreement to how the elements COOSYS and TIMESYS are referenced. This allows the element to be defined externally if we find consensus. This element could be used for SED annotation as well, which is a plus for having it as external element of a time series. It’s scope is broader.
Although there seems to be a solution for referencing rows, it is unclear right now for me how to annotate that and unclear the time line for applications to work around it. Perhaps we should propose this multiple table for now. This is in line with what I wrote in the note. We know there is a possible solution, but it looks a bit more complicated.
In any case, I propose we add this point to the discussion for the Tech forum. That would allow others to contribute to the discussion.
Also, I would like to involve a broader community so I think it would be good to move this Note to
GitHub for a collaborative work and send an email to the TDIG. This can be done now, unless you see some reason why not to. I can add there the issues we have talked about, and possible solutions and then wait to see reactions.
On 2020-01-29 09:18, Jesus Salgado wrote:
Fully clear. It makes total sense.
On 2020-02-02 23:11, François Bonnarel wrote:
I sent you an example which is made of a couple of changes inside Ada's approach.
I made some inputs coming from cab-msd, mainly the fact that a
TimeSeries is attached to a source (or a target, bu this makes bo real difference)
A dummy simple
ScalarTimeSeries model is assumed and should be proposed with the note
Le 28/01/2020 à 18:18, Laurent Michel a écrit :
>
* It is stated nowhere that the VOTable is a TS. I think that, if we >
have to move toward VOTable 1.5, it would be nice to reserve a little >
GROUP (or something) to say what is in the VOTable.
I added a GROUP derived from cabb-msd Laurent.
>
* There is no way to clearly identify wich the TIME column is the >
independent axis (see bellow).
An attribute of the
ScalarTimeSeries model can say that. It is rendered here by a utype
>
* The proposed syntax cannot deal with table mixing time data with >
different filters or attached to different sources.
Yes this requires indexation and joints on some columns. See my answer to Jesus tommorrow.
>
* The links between model and data goes from the FIELD to the GROUP. >
So, to retrieve the TIME column of a TS, we have to look for all >
FIELDS pointing to TIMSYS and to infer which one the good one.
I think reference to TIMESYS is to find out the Time frame used for the time column. It should not be the only thing to identify the time column.
>
* GROUP éléments or attributes provide no way to filter or to group >
data by filter or by source ID or whatever. This would be difficult to >
implement for the same reason as above: semantic links from data to >
annotation.
Yes, this requires additional indexing mechanism.
>
* GROUP are attached to one table and couldn’t see data out of that >
table.
Although it's probably the most common usage, GROUPS can however be defined outside TABLES and even outside RESOUCES according to the xml schema. So ....
>
My feeling is that it would be a mistake to push the GROUP approach >
beyond what it has been designed for. Just limit the scope of the note >
to simplest cases and let the door open for something else for more >
complex situations.
Yes I agree . More complex situations like
TimeSeries of objects more complex than a couple of scalar parameters require your approach.
Notes from
DaveMorris made during the ESCAPE
WP4 Technology Forum hackathon discussion on Wednesday 5th Feb 17:30 - 19:00.
- If we use
FILTERSYS
, the units would need to be fixed in the specification.
- The idea of allowing units as attributes to the
FILTERSYS
would not work.
- The multiple table solution proposed for Gaia data should be replaced by an example using ForeignKey.
- The
GROUP
works, but to avoid confusion don't call the group 'filtersys'.
- Describing filters depends on many things.
- Using the
GROUP
mechanism is more flexible than a standardised FILTERSYS
element.
- MireilleLouys
- Vizier the filter descriptions may be assigned by the archive curators (CDS) rather than the authors of the original data.
- We would need an attribute in the
GROUP
that indicates who assigned the filter description.
- However, the list of
PARAM
in the GROUP
should be fixed.
- Leaving it as an open list could become confusing.
- GillesLandais
- If we can get the
foreignKey
relationship to work, then this meets the Gaia use case.
- JesusSalgado
- Aladin would need extra code to understand the
foreignKey
relationship.
- Without it, it would treate them as two separate tables.
- If we have just one filter, then having two tables with a
foreignKey
relationship is extra overhead.
- PierreFernique
- Relying on
UCD
and simple utype
only works for the simplest cases.
- We will very soon need
utype
from data models.
- Need to clearly specify the scope of this proposal.
- MireilleLouys
- How to associate magnitude and error as a pair.
- If we have two magnitudes and two errors ?
- Would need an extra
GROUP
for each pair.
- GillesLandais
- Questions about how we annotate the VOTable with links to data models.
- Same questions as before.
- Can we do a really simple implementation now, and add links to data models later?
- If we really do need a ScalarTimeSeries model, then we need to do that now.
- What is
cab-msd
?
- DaveMorris
- We have examples for time and flux, we need examples with position.
- MireilleLouys
Notes from
MireilleLouys made during the ESCAPE
WP4 Technology Forum hackathon discussion on Wednesday 5th Feb 17:30 - 19:00.
Time domain discussion / serialisation proposal by
AdaNebot / IVOA note
PierreFernique,
FrancoisBonnarel,
JesusSalgado,
DaveMorris,
BaptisteCecconi,
MireilleLouys,
GillesLandais
DaveMorris: proposed process : discuss this note on wiki then work to finalise on ivoa github and circulate on the list time domain and DAL and DM
- cossys is a special entity
- filtersys as an new entity - this depends on the definitions of a service ( identifiers) the param describing the filter and photcal depend on the PhotDMv1-1 and may be updated
this is better to use the group with a reserved name :
PierreFernique: other wise each change on filtersys would require an update of VOtable
MireilleLouys: group strategy is more flexible
strategy 2 tables : one table for lightcurve measures and one filtertable.
each phot measure points to a row in the filter table ( foreign key) Robust?
GillesLandais: Vizier : Filter is not always part of the original data. Is a part of the value added by the curator
JesusSalgado: the connexion between phot points : needs a foreign key index rows.
GROUP ids are available in VOtable to mimic foreign keys. check VOtable spec
multifilter case : how to avoid duplication of tables
two tables : one for the data , one column pointing to rows in the filter table
FrancoisBonnarel: check in SIA experiments
gaia time series :
- 3 implementations strategies accross various archives ESAC , gavo , CDS:
- 1 table perfilter : too redundant
- one column with the filter ref as a value : attached to the mag column
- 3 filters as 3 columns
here the ucd is enough to recognize columns except for the multiple
filter: em band is not always available as ucd
t mag mag err err / column grouping is required to attach error columns properly
where does the position belong ?
FrancoisBonnarel example: added as a Param.
NB : if we put it in a field , it is better to jump on the next
combination data set : to a table of time series .
conclusion : could we proceed with this note ?
agreed all to procedd ; need to experienced examples further.
- radio, high enery bands ?
- experience with non optical filters
- radial velocity time serie?
and provide more use-cases to see the limits
Meeting ended at 18:30
Comment from
MarkTaylor - 2020-02-10:
Regarding the foreign key proposal: I've got no objection to storing the information in multiple tables using the relational conventions discussed in VOTable section 4.10. However, it's not likely that topcat would pay much attention to them.
The semantic sense of this relational linkage is to reference from a cell in table A a structured data object (a row in table B). Topcat does not currently have UI for representing structured data objects in table cells, only primitives, Strings and arrays. It would also present difficulties in de/serialisation of such tables, since topcat concenptually treats tables individually rather than in collections.
That might not matter: topcat's not really in a position to do much with filter values anyway. It looks to me like the filter (meta-)data is only going to be useful to a photometry-aware consumer, which I don't think topcat is going to be. The foreign key values would still give topcat enough information to, e.g., plot different filters in different colours. But something like saving a time-series table from topcat to disk, or forwarding it via SAMP to another client, with metadata intact, would be hard to get working.
Associating filter information with column or table metadata would certainly fit topcat's view of the world better than providing it in row-level linkage of multiple tables. But that doesn't necessarily mean it's the right thing to do. It might help to have more concrete views of what behaviour you want or expect from clients based on this filter metadata.
Comments from
MireilleLouys - 2020-02-07:
Title :
Lightcurve data representation in VOTable:
The skinny profile for data and metadata
Abstract :
we should mention :
This proposal applies to light curves, the 1Dtable time series use-cases.
Time series of higher dimensions : time series for images, spectra,
cubes, etc.
will be covered by the TimeSeries general data model based on the Cube DM.
Acknowledgements:
This note is inspired by a previous annotation strategy developed for SED
(Derrière et al. ...)
[14] Derriere, S (2010) Providing Photometric Data Measurements
Description in VOTables, IVOA Note,
https://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-PPDMDesc-0.1-20101202.pdf
p.2 Introduction
time series of tabular data ? not clear to me.
--> proposal
Simple time series where a set of measures are gathered in a 1D vector
for each time
stamp,typically light curves or radial velocity curves.
Time series of images or arrays with wider dimensions are not covered here.
Sec 2.3 p 4
comments:
I think we should insist on the role for
FILTERSYS
but not set it as a
standard.
The Filter description currently relies on an external service ,SVO
profile service mainly for optical observations. This cannot be a definition in the
VOTable standard.
Some other filter features might be of interest for other regimes.
I like the "GROUP name ='filtersys'" strategy, because it is very
flexible and it
allows to add other filter features if necessary for the use-case.
( ex. access url to transmission curve)
It corresponds to a special serialized block from
PhotDMv1-1 including the
following necessary classes of
PhotDMv1-1 with some of their attributes:
PhotometryFilter
identifier
spectral.Location.Value (i.e effectiveWavelegth)
PhotCal
MagnitudeSystem.type
zeroPoint.flux.value
we miss the
PhotometricSystem
class
+PhotometricSystem.type =(O energycounter, 1 photon counter) -->
can be also a Param
<PARAM
name="PhotSystemDesc"
ucd=""
utype="phfdm:PhotometricSystem.description"
unit=""
datatype="char"
arraysize="*"
value="2MASS"
/>
this
GROUP
for filtersys should have a new
utype = "phot:PhotSys"
or
"phot:PhotCalibration"
minted for this specific simple time series use-case
(photometricPoint is not correct here)
2.4 Points vs time Table
these are the measurements part represented as in a TABLE.
- one FIELD should have a ucd:time.epoch;meta.main (it represents the independant axis)
- one other FIELD will have a ucd explicit to the measure .
we need to define profiles to distinguish
- timed trajectory (Position vs time)
ucd="pos.xxx" ref='coossys'
- lightcurve
ucd="phot.xxx" ref='filtersys'
- radial velocity curve :
ucd=spectral.veloc.xxx
ref="??"
These ucds seems enough to distinguish between the 3 profiles
but if necessary it can also be specified with a param added to the table.
f.i.
for a light curve, there is no position information for one point in our
example.
are we sure we always see the exact same position ?
if yes the position information can be included in extra columns which
refer COOSSYS.
Comments from
MireilleLouys - 2020-02-07:
now I see the predefined structure with groups and fields, to match some
representation in the spirit of cab-MSD.
theses are actually the tree leaves of the CAB-MSB tree representation.
when we reuse the utypes from Spectrum dm ( to be transformed in
corresponding
VODML-ids)
then the profile for metadata labels actually fit.
same for
CharDM
Utypes for the various time series profiles: DM
we can reuse the spectrumDM utypes / or corresponding
VODML ids
utype=Data.FluxAxis.Location.value with ucd="phot.*"
utype=Data.FluxAxis.Accuracy.StatError with ucd ="stat.error;phot.*"
utype=Data.SpatialAxis.Location.refval with ucd="pos.*" for a position point
utype=Data.SpatialAxis.Accuracy.StatError with ucd="stat.error;pos.*" for position error
utype=Data.RedshiftAxis.Location.refval with ucd="spectral.doppler.veloc"
utype=Data.RedshiftAxis.Accuracy.StatError with
ucd="stat.error;spectral.doppler.veloc"
the tree here is
detected object ( Source)
PhotCalibration
flux ---> meas:Flux or spec:Data.Fluxaxis
time ---> meas:TimeStamp or char:Data.TimeAxis
etc ...
For me the mapping strategy appears similar and compatible with vodmlite
strategy.
And this note is the first step for simple data sets ...
Comments from
BaptisteCecconi - 2020-02-08:
I agree with
MireilleLouys about the proposed new title, since this document really applies light curve and not time-series in general. The proposal deals only with photometric/magnitude scalar data time series. Any other scalar parameter time-series (e.g., any non photometric derived parameters) would not fit into this.
Concerning Solar System time series, the proposed scheme can be applied to non-resolved (or poorly resolved) telescopic observations of solar system bodies (mostly small bodies). For other kinds of observations, e.g., in-situ measurements of physical parameters (magnetic field strength, plasma density, wind velocity...), there are 2 places where we need to be more flexible: the COOSYS must be able refer to any Solar System reference frame (usually referred to with a known name); and the time varying parameter description, which must be either simplified or extended to be much more flexible.