DataLink-1.0 Next

This topic collects proposals for modifications of the DataLink-1.0 specification in order to improve the next revision of the specification.

Errata to the DataLink-1.0 recommendation can be found on the devoted DataLink-1.0 Errata page.

Possible Errata

The following are acknowledged mistakes in DataLink-1.0. Errata could be pushed through, or they could just get fixed at the next version.

  • sec 4.3 'GROUP name="input"' should read 'GROUP name="inputParams"'. See mailing list (at the end of that message).

  • Bibliography entry [4] points to the wrong place, should be RFC2045(?). See mailing list.

Implementation feedback from TOPCAT

These points were (mostly) taken from a presentation in Victoria 2018, written up here as requested by the DAL chair. They can be taken into account when preparing a subsequent version of the standard, though not all of them necessarily should lead to changes.

Service Descriptor Context
Service descriptors come in two different contexts:
  1. {links}-response: explicitly referenced from a row of a {links}-response table
  2. standalone: implicitly referenced from all rows of a generic VOTable
They have to be treated differently in the different cases (neither is a special case of the other), since in the standalone case the service descriptor applies equally to all rows, while in the {links}-response case it only applies to those rows from which it is explicitly referenced. This makes it somewhat complicated to handle them since you need to determine the context first, but there's probably nothing that can be done about that while maintaining backward compatibility. However, it would be useful to spell out this distinction in the document; it took me quite a while to work it out. See mailing list.

Standalone Example
The document discusses {links}-response documents in some detail, and implies but does not give much explicit discussion of Service Descriptors in standalone VOTable documents. I think an additional example with a title like "Per-row service reference" along the lines of what is generated by the ARI-Gaia TAP service (see example) would help. In some sense the same ground is covered by the existing examples 4.2 "Service descriptor for the {links} capability" and 4.5 "Custom Access Data Service". However, given the context of the document, 4.2 looks like it is specific to {links} services (though the pattern isn't really) and 4.5 looks a bit daunting. I personally think that decorating generic VOTables with service descriptors to indicate associated parameter-less per-row services in this way is one of the most useful things about the DataLink document.

Service Descriptor metadata
If you have a standalone service descriptor in a generic VOTable document, clients will typically need more metadata than the accessURL, standardID and resourceIdentifier discussed in sec 4.1. shown in the existing examples, to communicate to the user what's going to happen if they follow the implied link. This is especially true if there are multiple such service descriptors per table. A name and description of such services can be included by using the name attribute and DESCRIPTION child of the service descriptor RESOURCE element. That is permitted given the VOTable schema, but not mentioned in this document. I suggest to include such usages in the examples given here, and to encourage service descriptors to add these items where appropriate. I further suggest adding an (optional) name="contentType" PARAM alongside the existing ones in table 3 to supply MIME type where known.

DataLink Recognition
A pattern suggested by the document (and used, e.g. by the ESA gaiadr2.gaia_source catalogue) is to include a column in certain tables that contains a URL pointing to a {links}-response table corresponding to that row. But there is no way in VOTable to mark up such a column so that clients know that's what it is. Not sure what to do about this; it's actually a more general problem about marking up URL-bearing VOTable columns with content types. See mailing list.

Service Descriptor positioning
There is no prescription in the document for how to arrange service descriptor and result table RESOURCEs within a VOTable document. One difficulty is that if streaming VOTables, it is sometimes necessary to know about the Service Descriptors before the table rows are encountered. Given that, it would be nice to require or recommend putting the service descriptor RESOURCE(s) before the table RESOURCE. Another is that in the case of more than one TABLE per document, there is no way in general to tell which service descriptors correspond to which table. This latter point may however be out of scope for this standard.

Row Correspondence
It would be useful to be able to identify a row in one {links}-response table that "corresponds" to a row in another, related, {links}-response table. For instance, if a user is browsing a (parent) table for which each row references a different {links}-response table, and has selected the 1/4-scale-JPEG-preview link in the {links}-response for one parent table row, it would be convenient to be directed to the 1/4-scale-JPEG-preview link when she selects another parent table row, rather than having to search for it again. Clients can attempt to do this at present, e.g. by looking at the semantics and description columns, but it's a bit haphazard. I suggest a new (optional?) column named something like link_code that can be assessed for equality in order to identify corresponding rows.

-- MarkTaylor - 2018-06-13

Proposed Features

Suggestion for revision of DataLink-1.0, in terms of new features.

Notes by MarcoMolinaro and FrancoisBonnarel from a splinter session held during Paris Interop meeting on May the 16th, 5-30:7 PM

Around 15 IVOA partners discussed DataLink evolution proposals Among those people were Pat Dowler, Markus Demleitner, Laurent Michel, Mark Taylor, Tom McGlynn, Alberto Micol, Marco Molinaro, Anais Oberto, Gregory Mantelet, François Bonnarel

Sorry for people we forgot. Please add your name above.

The starting points were the feedback discussions we had during the last years in the DAL working group.

The main issues have been summarized in this IVOA note :

A proposal for changes has been presented at College Park :

An attempt for a new draft is now available here

These are the changes which have been discussed (the items numbers are those used in the College Park presentation):

1 and 2) - Extension of the scope of DataLink {links} response to items which are different from datasets discovered in whatever way.

This point comes from data providers willing to use DataLink for attaching datasets or additional information to sources in catalogues or other items in service responses. This new usage has to be reflected in introduction and use cases.

3 ) - The extension of the scope makes the linkage to {links} response occur in contexts not planned by original spec. Beside the acces.format/access.refrence couple of FIELDS/PARAMs which can be used in bsCore-ike contexts, the only previous propsoed genreic solution to adress this response in a VOTable was to use a service descriptor RESOURCE to define the url to the {links} service with a refrence for the ID param. there is a proposal to also use the LINK element inside a FIELD with a new content-type = "" where the FIEL directly contains the url to the {links} endpoint/. A new section for taht seems too much. This will come in an appendix

4 ) - The dataLink {links} response can be discovered and used outside a service query. IT can be useful to recognize its nature of {links} response by an INFO tag.

5 ) - Allowing fragments in the access_url seems to be a sensible thing to do considering multi-extension FITS, tar files, HDF5 and other structured data available. Issues to be solved are however related to providing the client enough information to consume this solution. Prototyping on direct use cases could help. It is questionable if The links response is used to get one raw with a specific semantics, description for each subpart but only retrievable or if each subpart can be retrieved via an extension of SODA. (See SODA 1.1)

6 ) - The "description" column in the {links} response needs to be a SHOULD to properly label the various links made available, specially when they share the same semantics. Pretty useful for the end user of a {links} response table.

7 ) - There is an obvious need for new vocabulary. See: But the semantics/vocabulary discussion is detached from the DataLink specification revision. I.e. it's fine to discuss it, but not within the scope of the document revision.

8 ) - In order to connect resource table to resource service descriptor, Mark Taylor proposed to adopt a nested resource schema. Considering we're currently mainly in the situation of one table response per query, it doesn't seem critical at this stage, but it needs more discussion and testing. Tom Mac Glynn proposed another solution which not well catched by the editor. Please Tom can you add your proposal here.

9 ) - We propose to add a free-text name of the service descriptor resource to help identify the offered services. With a SHOULD requirement.

10 ) - We propose to Add an optional "content-type" resource descriptor PARAMETER to identify the expected media type of the offered linked dataset/resource. This can also be considered as a SODA-1.1 new input parameter for driving format conversion.

11 ) - It SOULD be useful to provide a human readable description of a service descriptor. This wil be done by using a element inside the service descriptor RESOURCE

12 ) - A self describing service provides a service descriptor when queried with no input parameter. If queried with the only single identifier PARAMETER the provided service descriptor restrict parameter ranges (MIN/MAX) or OPTIONS to values adapted to the queried dataset or item.

13 ) - ReST Interface descriptors. Could be useful for VOSPACE or any URL with variable sections/ It may be better to refer the existing Recopmmendation ( ) di scussing this than to reinvent a ReST descriptor on our own. Actual implementation may be postponed to when use cases/prototypes are made available.

14 ) - DataLink recognition outside a response from a protocol. Some discussion on the new proposed solution from the Note (new content-type=" in the LINK element), especially when the identification of this link column (sort-of) replicates the content that can be provided by a proper service descriptor.

Extra#1 : from A. Micol. Addition of a "category" column to identify diffrent offered datasets. Isn't that tackled by new semantics terms ? Reluctance to add too much columns belonging to other protocols (Obscore data_product_type). Alberto should add his proposal to the -Next page. Discussion to follow.

Extra#2 : use case for an additional boolean column to quickly identify link elements that require authorization (see below, PatrickDowler) .

-- FrancoisBonnarel - 2019-07-20

Links and Authorization

We could add a non-normative advice to include a column in the {links} resource named "readable" with values true|false. The values predict if the client will (using the current anon or authenticated identity) be allowed to use the link (eg download the data). This saves the clients the annoyance of trying and getting a 403 Permission Denied.

-- PatrickDowler - 2019-05-16

Edit | Attach | Print version | History: r8 | r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2019-07-20 - FrancoisBonnarel
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback