DataLink-1.0 Next

This topic collects proposals for modifications of the DataLink-1.0 specification in order to improve the next revision of the specification.

Errata to the DataLink-1.0 recommendation can be found on the devoted DataLink-1.0 Errata page.

Possible Errata

The following are acknowledged mistakes in DataLink-1.0. Errata could be pushed through, or they could just get fixed at the next version.

  • sec 4.3 'GROUP name="input"' should read 'GROUP name="inputParams"'. See mailing list (at the end of that message).

  • Bibliography entry [4] points to the wrong place, should be RFC2045(?). See mailing list.

Implementation feedback from TOPCAT (-- FrancoisBonnarel - 2020-05-13)

These points were (mostly) taken from a presentation in Victoria 2018, written up here as requested by the DAL chair. They can be taken into account when preparing a subsequent version of the standard, though not all of them necessarily should lead to changes.

Service Descriptor Context
Service descriptors come in two different contexts:
  1. {links}-response: explicitly referenced from a row of a {links}-response table
  2. standalone: implicitly referenced from all rows of a generic VOTable

They have to be treated differently in the different cases (neither is a special case of the other), since in the standalone case the service descriptor applies equally to all rows, while in the {links}-response case it only applies to those rows from which it is explicitly referenced. This makes it somewhat complicated to handle them since you need to determine the context first, but there's probably nothing that can be done about that while maintaining backward compatibility. However, it would be useful to spell out this distinction in the document; it took me quite a while to work it out. See mailing list.

Standalone Example
The document discusses {links}-response documents in some detail, and implies but does not give much explicit discussion of Service Descriptors in standalone VOTable documents. I think an additional example with a title like "Per-row service reference" along the lines of what is generated by the ARI-Gaia TAP service (see example) would help. In some sense the same ground is covered by the existing examples 4.2 "Service descriptor for the {links} capability" and 4.5 "Custom Access Data Service". However, given the context of the document, 4.2 looks like it is specific to {links} services (though the pattern isn't really) and 4.5 looks a bit daunting. I personally think that decorating generic VOTables with service descriptors to indicate associated parameter-less per-row services in this way is one of the most useful things about the DataLink document.

Service Descriptor metadata
If you have a standalone service descriptor in a generic VOTable document, clients will typically need more metadata than the accessURL, standardID and resourceIdentifier discussed in sec 4.1. shown in the existing examples, to communicate to the user what's going to happen if they follow the implied link. This is especially true if there are multiple such service descriptors per table. A name and description of such services can be included by using the name attribute and DESCRIPTION child of the service descriptor RESOURCE element. That is permitted given the VOTable schema, but not mentioned in this document. I suggest to include such usages in the examples given here, and to encourage service descriptors to add these items where appropriate. I further suggest adding an (optional) name="contentType" PARAM alongside the existing ones in table 3 to supply MIME type where known.

DataLink Recognition
A pattern suggested by the document (and used, e.g. by the ESA gaiadr2.gaia_source catalogue) is to include a column in certain tables that contains a URL pointing to a {links}-response table corresponding to that row. But there is no way in VOTable to mark up such a column so that clients know that's what it is. Not sure what to do about this; it's actually a more general problem about marking up URL-bearing VOTable columns with content types. See mailing list.

Service Descriptor positioning
There is no prescription in the document for how to arrange service descriptor and result table RESOURCEs within a VOTable document. One difficulty is that if streaming VOTables, it is sometimes necessary to know about the Service Descriptors before the table rows are encountered. Given that, it would be nice to require or recommend putting the service descriptor RESOURCE(s) before the table RESOURCE. Another is that in the case of more than one TABLE per document, there is no way in general to tell which service descriptors correspond to which table. This latter point may however be out of scope for this standard.

Row Correspondence
It would be useful to be able to identify a row in one {links}-response table that "corresponds" to a row in another, related, {links}-response table. For instance, if a user is browsing a (parent) table for which each row references a different {links}-response table, and has selected the 1/4-scale-JPEG-preview link in the {links}-response for one parent table row, it would be convenient to be directed to the 1/4-scale-JPEG-preview link when she selects another parent table row, rather than having to search for it again. Clients can attempt to do this at present, e.g. by looking at the semantics and description columns, but it's a bit haphazard. I suggest a new (optional?) column named something like link_code that can be assessed for equality in order to identify corresponding rows.

-- MarkTaylor - 2018-06-13

Proposed Features

Suggestion for revision of DataLink-1.0, in terms of new features.

Notes by MarcoMolinaro and FrancoisBonnarel from a splinter session held during Paris Interop meeting on May the 16th, 5-30:7 PM

Around 15 IVOA partners discussed DataLink evolution proposals Among those people were Pat Dowler, Markus Demleitner, Laurent Michel, Mark Taylor, Tom McGlynn, Alberto Micol, Marco Molinaro, Anais Oberto, Gregory Mantelet, François Bonnarel

Sorry for people we forgot. Please add your name above.

The starting points were the feedback discussions we had during the last years in the DAL working group.

The main issues have been summarized in this IVOA note : http://www.ivoa.net/documents/Notes/RecentDALProtocolsFeedback/index.html

A proposal for changes has been presented at College Park : https://wiki.ivoa.net/internal/IVOA/InterOpNov2018DAL/DataLink-next.pdf

An attempt for a new draft is now available here

These are the changes which have been discussed (the items numbers are those used in the College Park presentation):

1 and 2) - Extension of the scope of DataLink {links} response to items which are different from datasets discovered in whatever way.

This point comes from data providers willing to use DataLink for attaching datasets or additional information to sources in catalogues or other items in service responses. This new usage has to be reflected in introduction and use cases.

The discussion on this has been moved to two github/ivoa-std/DataLink issues : https://github.com/ivoa-std/DataLink/issues/6 and https://github.com/ivoa-std/DataLink/issues/7

3 ) - The extension of the scope makes the linkage to {links} response occur in contexts not planned by original spec. Beside the acces.format/access.reference couple of FIELDS/PARAMs which can be used in ObsCore-ike contexts, the only previous proposed generic solution to address this response in a VOTable was to use a service descriptor RESOURCE to define the url to the {links} service with a reference for the ID param. there is a proposal to also use the LINK element inside a FIELD with a new content-type = ""application/x-votable+xml;content=datalink" where the FIELD directly contains the url to the {links} endpoint/. A new section for that seems too much. This will come in an appendix.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/31

4 ) - The dataLink {links} response can be discovered and used outside a service query. IT can be useful to recognize its nature of {links} response by an INFO tag.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/17

5 ) - Allowing fragments in the access_url seems to be a sensible thing to do considering multi-extension FITS, tar files, HDF5 and other structured data available. Issues to be solved are however related to providing the client enough information to consume this solution. Prototyping on direct use cases could help. It is questionable if The links response is used to get one raw with a specific semantics, description for each subpart but only retrievable or if each subpart can be retrieved via an extension of SODA. (See SODA 1.1)

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/15

6 ) - The "description" column in the {links} response needs to be a SHOULD to properly label the various links made available, specially when they share the same semantics. Pretty useful for the end user of a {links} response table.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/16

7 ) - There is an obvious need for new vocabulary. See: https://wiki.ivoa.net/twiki/bin/view/IVOA/UpdateDatalinkTerms But the semantics/vocabulary discussion is detached from the DataLink specification revision. I.e. it's fine to discuss it, but not within the scope of the document revision.

The discussion occured actually in the following threads

http://mail.ivoa.net/pipermail/dal/2019-October/008191.html

http://mail.ivoa.net/pipermail/dal/2019-October/008200.html

http://mail.ivoa.net/pipermail/dal/2019-October/008202.html

8 ) - In order to connect resource table to resource service descriptor, Mark Taylor proposed to adopt a nested resource schema. Considering we're currently mainly in the situation of one table response per query, it doesn't seem critical at this stage, but it needs more discussion and testing. Tom Mac Glynn proposed another solution which not well catched by the editor. Please Tom can you add your proposal here.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/20

9 ) - We propose to add a free-text name of the service descriptor resource to help identify the offered services. With a SHOULD requirement.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/21

10 ) - We propose to Add an optional "content-type" resource descriptor PARAMETER to identify the expected media type of the offered linked dataset/resource. This can also be considered as a SODA-1.1 new input parameter for driving format conversion.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/22

11 ) - It COULD be useful to provide a human readable description of a service descriptor. This wil be done by using a element inside the service descriptor RESOURCE

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/23

12 ) - A self describing service provides a service descriptor when queried with no input parameter. If queried with the only single identifier PARAMETER the provided service descriptor restrict parameter ranges (MIN/MAX) or OPTIONS to values adapted to the queried dataset or item.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/25

13 ) - ReST Interface descriptors. Could be useful for VOSPACE or any URL with variable sections/ It may be better to refer the existing Recommendation (https://tools.ietf.org/html/rfc6570 ) discussing this than to reinvent a ReST descriptor on our own. Actual implementation may be postponed to when use cases/prototypes are made available.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/27

14 ) - DataLink recognition outside a response from a protocol. Some discussion on the new proposed solution from the Note (new content-type=" in the LINK element), especially when the identification of this link column (sort-of) replicates the content that can be provided by a proper service descriptor.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/29

Extra#0 : Should we suppress the availabilty endpoint in Section 2 (which according to Markus nether works)? Should we focus on {links} capabilities attached to other services only (and not as standalone DataLink services). This point is still controversy.

The discussion on this has been moved to the two github/ivoa-std/DataLink issues: https://github.com/ivoa-std/DataLink/issues/13 and https://github.com/ivoa-std/DataLink/issues/14

Extra#1 : from A. Micol. Addition of a "category" column to identify diffrent offered datasets. Isn't that tackled by new semantics terms ? Reluctance to add too much columns belonging to other protocols (Obscore data_product_type). Alberto should add his proposal to the -Next page. Discussion to follow.

The discussion actually occured in the following threads

http://mail.ivoa.net/pipermail/dal/2019-October/008191.html

http://mail.ivoa.net/pipermail/dal/2019-October/008200.html

http://mail.ivoa.net/pipermail/dal/2019-October/008202.html

Extra#2 : use case for an additional boolean column to quickly identify link elements that require authorization (see below, PatrickDowler) .

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/33

-- FrancoisBonnarel - 2019-07-20

Links and Authorization

We could add a non-normative advice to include a column in the {links} resource named "readable" with values true|false. The values predict if the client will (using the current anon or authenticated identity) be allowed to use the link (eg download the data). This saves the clients the annoyance of trying and getting a 403 Permission Denied.

-- PatrickDowler - 2019-05-16

New issues after virtual interop

A recent semantic discussion addressed the use case of adding the possibility to link sibling or alternate science datasets to the main item. Eventually, the right place to specify the dataproduct_type of the datasets has been decided to be a standardized media type parameter in the content_type FIELD. this has to be explained in the section. See PR #43

On December the 6th 2019 Ada Nebot proposed to refurbish the semantic model of DataLink. My guess is that enhancment is probably for version 2. Ada :

_As I see it, the things we are discussing concerning Datalink fall into 4 independent levels or categories: Level 0 - Data-format (fits, VOTable, PDF, png, …) Level 1 - Data-type (tabular, image, spectrum, cube, text, …) Level 2 - Data-information (Documentation, Calibration, Log, Preview, …) Level 3 - Data-relation (Derived from, Progenitor of, Sibling of, ...)

I see these as orthogonal levels since a list of links can be of any type (level 1) with any kind of format (level 0), any kind of relation (level 3) and could have any type of associated information to describe it (level 2).

Today the list of links returned by datalink is described in the columns content-type and semantics. These two columns cover the above levels only up to some degree.

Content-type: covers level 0 mainly, with some exceptions such as VOTable (which is also level 1). Semantics: covers level 2 mainly (e.g. preview), but also level 3 (e.g. derivation, progenitor).

Datalink at the moment has no field properly covering level 1 and applications (—> users) would benefit from having that well covered.

So, in my opinion, if I had to redo Datalink I would keep these different levels separated instead of putting everything into the semantics field. But applications might have a different point of view here —> Shouldn't we add Apps to this discussion?

Timeseries would be in level 3, since it is a relation. And I don’t think we would need the use of sibling or progenitor or anything like that for timeseries. What we need is to be able to say is:

"This list of links are time-series of tabular type"

"This list of links are time-series of spectrum type"

But if were to add terms such as sibling and so on, there is already an IVOA relationship vocabulary: http://ivoa.net/rdf/voresource/relationship_type/2016-08-17/relationship_type.html_

Markus wrote

" If the client submits more ID values than a service is prepared to process, the service should process ID values up to the limit and must include an overflow indicator in the output as described in DALI. The service must not truncate the output within the set of rows (links) for a single ID value if the request exceeds such an input limit."

Control by MAXREC or by telling the client there is an overflow ?

None of these is in 1.0. Neither in 1.1 at the moment Mark Taylor seems to be OK for QUERY_STAUS OVERFLOW Alberto and ESO have another solution for tuning the number of output lines

Thoughts?

-- MarkusDemleitner - 2020-05-13 answered

" If the client submits more ID values than a service is prepared to process, the service should process ID values up to the limit and must include an overflow indicator in the output as described in DALI. The service must not truncate the output within the set of rows (links) for a single ID value if the request exceeds such an input limit."

For the record, that's not my text, that's Datalink REC-1.0, p. 10. The context is this thread on the DAL list: http://mail.ivoa.net/pipermail/dal/2020-March/008318.html.

Control by MAXREC or by telling the client there is an overflow ? ```xml ``` None of these are in 1.0. Neither in 1.1 at the moment Mark Taylor seems to be OK for QUERY_STAUS OVERFLOW Alberto and ESO have another solution for tuning the number of output lines

Thoughts ?

I think I agree with Mark's assessment in the cited thread: QUERY_STATUS=OK and QUERY_STATUS=ERROR aren't useful in Datalink, and hence we shouldn't put them in. Also, it doesn't seem there's much of a place for MAXREC in Datalink.

Hence, I think there's no immediate need for changes in the spec text, let alone the spec content from this issue.

One might argue that writing something like:

No QUERY_STATUS INFOs with values other than OVERFLOW should be produced by datalink services.

That's probably benign, since we can't change the overflow indication in DALI anyway when it is directly referenced by implemented standards, and thus we can hard-code QUERY_STATUS here, too. It would perhaps have saved me a bit of bafflement. On the other hand: has this ever baffled anyone else? And so badly as to justify more spec text?

Similarly, perhaps it is worth saying somewhere that DALI MAXREC doesn't apply to Datalink, but I couldn't say where that text would fit without seeming odd itself.

So... I think my vote would be for closing this issue without action.

Topic attachments
I Attachment History Action Size DateSorted ascending Who Comment
PDFpdf DataLink.pdf r1 manage 402.2 K 2019-07-22 - 06:43 FrancoisBonnarel DataLink 1.1 preliminary internal working draft
Edit | Attach | Watch | Print version | History: r18 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2020-05-13 - FrancoisBonnarel
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback