<--
TimeSeries 2020 proposalA new proposal for serialising time series from Ada Nebot, intended to meet a specific limited set of requirements. The LaTex source for the document is currently hosted on GitHub here (in Ada's private space) because this is not an official note yet. The latest PDF version is here TimeSeries.pdf This page is to provide an initial discussion forum about the proposed idea. We will collect the comments together and transfer them into issues on GitHub once we find a permanent home in the main IVOA-std section of GitHub.To start the discussion I have transferred below a number of comments that we have received via email. If you have comments about the proposed note please add them to this page. There will be a hackathon discussion about this proposal on Wednesday 5th Feb 17:30 - 19:00 at the WP4 Technology Forum in Strasbourg. On 2020-01-24 09:13, Jesus Salgado wrote: Unfortunately, I will not be able to attend this time but I could connect remotely. On 2020-01-15 17:06, Mireille LOUYS wrote: I can contribute too, with the focus in mind that this "Lightcurve DM effort" will be a testbed for a wider scope DM as proposed by Laurent for CAB-MSD ( Component and association based- Model for source data). On 2020-01-16 11:42, François Bonnarel wrote: Yes. We can go this way We agree to participate. On 2020-01-24 11:50, Pierre Fernique wrote: I'm a little bit surprised with the document proposal. My concern is that the document is trying to sell us FILTERSYS for TIMESYS or COOSYS when it is not the same approach - at least on the serialization XML point of view. TIMESYS / COOSYS are high level XML entities, without hierarchy, and they impose the units and the vocabulary used inside. In the document, FILTERSYS is an annotation of a GROUP (via name), potentially hierarchical. This approach bothers me because it confuses the discussion. For me, if we want to do GROUP then we must not try to make a wacky similarity with TIMESYS and COOSYS and fully assume the GROUP approach. My second comment is the use of "name" as "tagging" of a GROUP. It gives a different role from the name when used in a GROUP compared to its use in a FIELD or in a PARAM. I think this is a consequence of the GROUP approach. So, I would suggest to avoid FILTERSYS word if we want to use GROUP. Or if we really want FILTERSYS, we must assume something like this (ADA example) : <FILTERSYS ID="phot_sys" uniqueIdentifier="Palomar/ZTF.g/Vega" zeroPointFlux="3963.97" magnitudeSystem="Vega" effectiveWavelength="4722.74" /> On 2020-01-28 15:02, ada nebot wrote: One of the things that I talked with Dave, and that will be discussed there, is the possibility of substituting the GROUP for a new element to be added to future versions on VOTable. That would simplify annotation, at the expense of imposing units for the described attributes. On 2020-01-28 16:15, Jesus Salgado wrote: I have been reading the proposal and it is clear and simple so it is a very good starting point in my view. I think a standard like this (simple in format but powerful in content) will be very useful for the community and for data providers. The only question I have from the initial reading is if there is a reason to use "ref" in FIELD instead of FIELDref in the GROUP (it looks the same but in most of the standards the ref was done from the group to the table and I cannot remember if there is any difference) About the substitution of GROUP, if we can propose a modification of VOTable in some way it could make sense to explain the problem I found when I tried to serialize the Gaia time series (some of you already know it but, maybe, others not). We tried to use a table that conceptually was like:
<GROUP> <ROWRef="FilterColumn" key="filterB"/> ... all the characterization metadata for filter B ... </GROUP> <GROUP> <ROWRef="FilterColumn" key="filterV"/> ... all the characterization metadata for filter V ... </GROUP> <FIELD ID=time>..</FIELD> <FIELD ID=FilterColumn>..</FIELD> <FIELD ID=flux>..</FIELD>
<TABLE ID=Filters> <FIELD ID=name>..</FIELD> <FIELD ID=FilterColumn>..</FIELD> <FIELD ID=linktoFilterProfileService>..</FIELD> <FIELD ID=zeroPoint>..</FIELD> </TABLE>
<TABLE ID=Values> <FIELD ID=time> <FIELD ID=FilterColumn FOREIGN=Filters.FilterColumn> <FIELD ID=flux> </TABLE>
On 2020-01-28 17:18, Laurent Michel wrote: I understand the motivation of Ada to promote this solution since we still have no recommended TS model. Thus let’s work with what we have. Honestly, I am balanced between this pragmatism and the conviction that models remain indispensable, but there a point on which I agree, and that is that we must not get stuck in political considerations. The current proposal works fine for simple TS but it has several limitations I would like to point out: * It is stated nowhere that the VOTable is a TS. I think that, if we have to move toward VOTable 1.5, it would be nice to reserve a little GROUP (or something) to say what is in the VOTable. * There is no way to clearly identify wich the TIME column is the independent axis (see bellow). * The proposed syntax cannot deal with table mixing time data with different filters or attached to different sources. * There no way to put together data spread out on several tables. I worked out all of these items with my proposal, so before to follow the proposal, I would like to make sure it's worth the pain. Bullets 2, 3, 4 are consequences of the GROUP approach. * The links between model and data goes from the FIELD to the GROUP. So, to retrieve the TIME column of a TS, we have to look for all FIELDS pointing to TIMSYS and to infer which one the good one. * GROUP éléments or attributes provide no way to filter or to group data by filter or by source ID or whatever. This would be difficult to implement for the same reason as above: semantic links from data to annotation. * GROUP are attached to one table and couldn’t see data out of that table. Groups have however a big advantage, they are parts of the Votable standards for decades (almost 2) and all parsers know them. It is to noted that they are rather used as UTYPE containers than as hierarchical structure. My proposal, namely VODML lite, resolves the issues mentioned here in a compact way but this is a new syntax that requires new parsers. I wrote two prototypes (Java and Python) but this not completed enough to say that everything is ready. My feeling is that it would be a mistake to push the GROUP approach beyond what it has been designed for. Just limit the scope of the note to simplest cases and let the door open for something else for more complex situations. On 2020-01-28 17:26, NEBOT GOMEZ-MORAN Ada (OBS) wrote: One of the things you (Jesus) mention is referencing rows. I had a look at that too. As I understand this is possible according to VOTable Section 4.10. But the exact way this works and how to use this looked relatively complicated to me since I haven’t seen any example, but it relies on relation between two tables and using groups in a particular way. As I understand we can define a first table with the info of filters and then another table with the info on the key and possible values (matching those of a row). After talking to Dave, Pierre Fernique and Francois about that option they pointed to me this option would be more complicated. In particular for applications to combine time series. That’s the main reason why I chose the several table option. Easier for clients (a priori). But I agree adding some functionality to be able to select elements with a specific value in a row would be useful and deserves some exploring. On 2020-01-28 17:49, Jesus Salgado wrote: I totally overlooked the foreignKey point (!) with the table reference and it is exactly what I was looking for in a valid VOTable. I am not sure why this is not used more often. It would be interesting to know why Dave, Pierre and Francois think that it would be more complicated as, in principle, any application would only try to read the main results table and only "clever" applications would need to read the filter characterization. I think this could be because you need to save in memory the filters table to use it for the second table (?) On 2020-01-29 07:55, NEBOT GOMEZ-MORAN Ada (OBS) wrote: I forgot to answer the question you (Jesus) asked about referencing GROUPs. There are two ways: FIELDref to the columns, or from the column to the GROUP by using ref=ID. In this proposal I used the second case in agreement to how the elements COOSYS and TIMESYS are referenced. This allows the element to be defined externally if we find consensus. This element could be used for SED annotation as well, which is a plus for having it as external element of a time series. It’s scope is broader. Although there seems to be a solution for referencing rows, it is unclear right now for me how to annotate that and unclear the time line for applications to work around it. Perhaps we should propose this multiple table for now. This is in line with what I wrote in the note. We know there is a possible solution, but it looks a bit more complicated. In any case, I propose we add this point to the discussion for the Tech forum. That would allow others to contribute to the discussion. Also, I would like to involve a broader community so I think it would be good to move this Note to GitHub for a collaborative work and send an email to the TDIG. This can be done now, unless you see some reason why not to. I can add there the issues we have talked about, and possible solutions and then wait to see reactions. On 2020-01-29 09:18, Jesus Salgado wrote: Fully clear. It makes total sense. On 2020-02-02 23:11, François Bonnarel wrote: I sent you an example which is made of a couple of changes inside Ada's approach. I made some inputs coming from cab-msd, mainly the fact that a TimeSeries is attached to a source (or a target, bu this makes bo real difference) A dummy simple ScalarTimeSeries model is assumed and should be proposed with the note Le 28/01/2020 à 18:18, Laurent Michel a écrit : > * It is stated nowhere that the VOTable is a TS. I think that, if we > have to move toward VOTable 1.5, it would be nice to reserve a little > GROUP (or something) to say what is in the VOTable. I added a GROUP derived from cabb-msd Laurent. > * There is no way to clearly identify wich the TIME column is the > independent axis (see bellow). An attribute of the ScalarTimeSeries model can say that. It is rendered here by a utype > * The proposed syntax cannot deal with table mixing time data with > different filters or attached to different sources. Yes this requires indexation and joints on some columns. See my answer to Jesus tommorrow. > * The links between model and data goes from the FIELD to the GROUP. > So, to retrieve the TIME column of a TS, we have to look for all > FIELDS pointing to TIMSYS and to infer which one the good one. I think reference to TIMESYS is to find out the Time frame used for the time column. It should not be the only thing to identify the time column. > * GROUP éléments or attributes provide no way to filter or to group > data by filter or by source ID or whatever. This would be difficult to > implement for the same reason as above: semantic links from data to > annotation. Yes, this requires additional indexing mechanism. > * GROUP are attached to one table and couldn’t see data out of that > table. Although it's probably the most common usage, GROUPS can however be defined outside TABLES and even outside RESOUCES according to the xml schema. So .... > My feeling is that it would be a mistake to push the GROUP approach > beyond what it has been designed for. Just limit the scope of the note > to simplest cases and let the door open for something else for more > complex situations. Yes I agree . More complex situations like TimeSeries of objects more complex than a couple of scalar parameters require your approach. Notes from DaveMorris made during the ESCAPE WP4 Technology Forum hackathon discussion on Wednesday 5th Feb 17:30 - 19:00.
Notes from MireilleLouys made during the ESCAPE WP4 Technology Forum hackathon discussion on Wednesday 5th Feb 17:30 - 19:00. Time domain discussion / serialisation proposal by AdaNebot / IVOA note PierreFernique, FrancoisBonnarel, JesusSalgado, DaveMorris, BaptisteCecconi, MireilleLouys, GillesLandais DaveMorris: proposed process : discuss this note on wiki then work to finalise on ivoa github and circulate on the list time domain and DAL and DM
Comment from MarkTaylor - 2020-02-10: Regarding the foreign key proposal: I've got no objection to storing the information in multiple tables using the relational conventions discussed in VOTable section 4.10. However, it's not likely that topcat would pay much attention to them. The semantic sense of this relational linkage is to reference from a cell in table A a structured data object (a row in table B). Topcat does not currently have UI for representing structured data objects in table cells, only primitives, Strings and arrays. It would also present difficulties in de/serialisation of such tables, since topcat concenptually treats tables individually rather than in collections. That might not matter: topcat's not really in a position to do much with filter values anyway. It looks to me like the filter (meta-)data is only going to be useful to a photometry-aware consumer, which I don't think topcat is going to be. The foreign key values would still give topcat enough information to, e.g., plot different filters in different colours. But something like saving a time-series table from topcat to disk, or forwarding it via SAMP to another client, with metadata intact, would be hard to get working. Associating filter information with column or table metadata would certainly fit topcat's view of the world better than providing it in row-level linkage of multiple tables. But that doesn't necessarily mean it's the right thing to do. It might help to have more concrete views of what behaviour you want or expect from clients based on this filter metadata. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Changed: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
< < | Comment from MireilleLouys - 2020-02-07: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > | Comments from MireilleLouys - 2020-02-07: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Title :
Lightcurve data representation in VOTable:
The skinny profile for data and metadata
Abstract :
we should mention :
This proposal applies to light curves, the 1Dtable time series use-cases.
Time series of higher dimensions : time series for images, spectra,
cubes, etc.
will be covered by the TimeSeries general data model based on the Cube DM.
Acknowledgements:
This note is inspired by a previous annotation strategy developed for SED
(Derrière et al. ...)
[14] Derriere, S (2010) Providing Photometric Data Measurements
Description in VOTables, IVOA Note,
https://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-PPDMDesc-0.1-20101202.pdf
p.2 Introduction
time series of tabular data ? not clear to me.
--> proposal
Simple time series where a set of measures are gathered in a 1D vector
for each time
stamp,typically light curves or radial velocity curves.
Time series of images or arrays with wider dimensions are not covered here.
Sec 2.3 p 4
comments:
I think we should insist on the role for FILTERSYS but not set it as a
standard.
The Filter description currently relies on an external service ,SVO
profile service mainly for optical observations. This cannot be a definition in the
VOTable standard.
Some other filter features might be of interest for other regimes.
I like the "GROUP name ='filtersys'" strategy, because it is very
flexible and it
allows to add other filter features if necessary for the use-case.
( ex. access url to transmission curve)
It corresponds to a special serialized block from PhotDM including the
following necessary classes of PhotDM with some of their attributes:
PhotometryFilter identifier spectral.Location.Value (i.e effectiveWavelegth) PhotCal MagnitudeSystem.type zeroPoint.flux.valuewe miss the PhotometricSystem class
+PhotometricSystem.type =(O energycounter, 1 photon counter) --> can be also a Param
<PARAM name="PhotSystemDesc" ucd="" utype="phfdm:PhotometricSystem.description" unit="" datatype="char" arraysize="*" value="2MASS" />this GROUP for filtersys should have a new utype = "phot:PhotSys" or "phot:PhotCalibration" minted for this specific simple time series use-case
(photometricPoint is not correct here)
2.4 Points vs time Table
these are the measurements part represented as in a TABLE.
for a light curve, there is no position information for one point in our
example.
are we sure we always see the exact same position ?
if yes the position information can be included in extra columns which
refer COOSSYS. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Added: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > | Comments from MireilleLouys - 2020-02-07: now I see the predefined structure with groups and fields, to match some representation in the spirit of cab-MSD. theses are actually the tree leaves of the CAB-MSB tree representation. when we reuse the utypes from Spectrum dm ( to be transformed in corresponding VODML-ids) then the profile for metadata labels actually fit. same for CharDM Utypes for the various time series profiles: DM we can reuse the spectrumDM utypes / or corresponding VODML ids utype=Data.FluxAxis.Location.value with ucd="phot.*" utype=Data.FluxAxis.Accuracy.StatError with ucd ="stat.error;phot.*" utype=Data.SpatialAxis.Location.refval with ucd="pos.*" for a position point utype=Data.SpatialAxis.Accuracy.StatError with ucd="stat.error;pos.*" for position error utype=Data.RedshiftAxis.Location.refval with ucd="spectral.doppler.veloc" utype=Data.RedshiftAxis.Accuracy.StatError with ucd="stat.error;spectral.doppler.veloc"the tree here is detected object ( Source) PhotCalibration flux ---> meas:Flux or spec:Data.Fluxaxis time ---> meas:TimeStamp or char:Data.TimeAxis etc ...For me the mapping strategy appears similar and compatible with vodmlite strategy. And this note is the first step for simple data sets ... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|