DataLink-1.0 Next

This topic collects proposals for modifications of the DataLink-1.0 specification in order to improve the next revision of the specification.

Errata to the DataLink-1.0 recommendation can be found on the devoted DataLink-1.0 Errata page.

Possible Errata

The following are acknowledged mistakes in DataLink-1.0. Errata could be pushed through, or they could just get fixed at the next version.

  • sec 4.3 'GROUP name="input"' should read 'GROUP name="inputParams"'. See mailing list (at the end of that message).

  • Bibliography entry [4] points to the wrong place, should be RFC2045(?). See mailing list.

Implementation feedback from TOPCAT (-- MarkTaylor - 2018-06-13 )

These points were (mostly) taken from a presentation in Victoria 2018, written up here as requested by the DAL chair. They can be taken into account when preparing a subsequent version of the standard, though not all of them necessarily should lead to changes.

Service Descriptor Context
Service descriptors come in two different contexts:
  1. {links}-response: explicitly referenced from a row of a {links}-response table
  2. standalone: implicitly referenced from all rows of a generic VOTable

They have to be treated differently in the different cases (neither is a special case of the other), since in the standalone case the service descriptor applies equally to all rows, while in the {links}-response case it only applies to those rows from which it is explicitly referenced. This makes it somewhat complicated to handle them since you need to determine the context first, but there's probably nothing that can be done about that while maintaining backward compatibility. However, it would be useful to spell out this distinction in the document; it took me quite a while to work it out. See mailing list.

Standalone Example
The document discusses {links}-response documents in some detail, and implies but does not give much explicit discussion of Service Descriptors in standalone VOTable documents. I think an additional example with a title like "Per-row service reference" along the lines of what is generated by the ARI-Gaia TAP service (see example) would help. In some sense the same ground is covered by the existing examples 4.2 "Service descriptor for the {links} capability" and 4.5 "Custom Access Data Service". However, given the context of the document, 4.2 looks like it is specific to {links} services (though the pattern isn't really) and 4.5 looks a bit daunting. I personally think that decorating generic VOTables with service descriptors to indicate associated parameter-less per-row services in this way is one of the most useful things about the DataLink document.

Service Descriptor metadata
If you have a standalone service descriptor in a generic VOTable document, clients will typically need more metadata than the accessURL, standardID and resourceIdentifier discussed in sec 4.1. shown in the existing examples, to communicate to the user what's going to happen if they follow the implied link. This is especially true if there are multiple such service descriptors per table. A name and description of such services can be included by using the name attribute and DESCRIPTION child of the service descriptor RESOURCE element. That is permitted given the VOTable schema, but not mentioned in this document. I suggest to include such usages in the examples given here, and to encourage service descriptors to add these items where appropriate. I further suggest adding an (optional) name="contentType" PARAM alongside the existing ones in table 3 to supply MIME type where known.

DataLink Recognition
A pattern suggested by the document (and used, e.g. by the ESA gaiadr2.gaia_source catalogue) is to include a column in certain tables that contains a URL pointing to a {links}-response table corresponding to that row. But there is no way in VOTable to mark up such a column so that clients know that's what it is. Not sure what to do about this; it's actually a more general problem about marking up URL-bearing VOTable columns with content types. See mailing list.

Service Descriptor positioning
There is no prescription in the document for how to arrange service descriptor and result table RESOURCEs within a VOTable document. One difficulty is that if streaming VOTables, it is sometimes necessary to know about the Service Descriptors before the table rows are encountered. Given that, it would be nice to require or recommend putting the service descriptor RESOURCE(s) before the table RESOURCE. Another is that in the case of more than one TABLE per document, there is no way in general to tell which service descriptors correspond to which table. This latter point may however be out of scope for this standard.

Row Correspondence
It would be useful to be able to identify a row in one {links}-response table that "corresponds" to a row in another, related, {links}-response table. For instance, if a user is browsing a (parent) table for which each row references a different {links}-response table, and has selected the 1/4-scale-JPEG-preview link in the {links}-response for one parent table row, it would be convenient to be directed to the 1/4-scale-JPEG-preview link when she selects another parent table row, rather than having to search for it again. Clients can attempt to do this at present, e.g. by looking at the semantics and description columns, but it's a bit haphazard. I suggest a new (optional?) column named something like link_code that can be assessed for equality in order to identify corresponding rows.

-- MarkTaylor - 2018-06-13

Proposed Features

Suggestion for revision of DataLink -1.0, in terms of new features.

Notes by MarcoMolinaro and FrancoisBonnarel from a splinter session held during Paris Interop meeting on May the 16th, 5-30:7 PM

Around 15 IVOA partners discussed DataLink evolution proposals Among those people were Pat Dowler, Markus Demleitner, Laurent Michel, Mark Taylor, Tom McGlynn, Alberto Micol, Marco Molinaro, Anais Oberto, Gregory Mantelet, Franηois Bonnarel

Sorry for people we forgot. Please add your name above.

The starting points were the feedback discussions we had during the last years in the DAL working group.

The main issues have been summarized in this IVOA note : http://www.ivoa.net/documents/Notes/RecentDALProtocolsFeedback/index.html

A proposal for changes has been presented at College Park : https://wiki.ivoa.net/internal/IVOA/InterOpNov2018DAL/DataLink-next.pdf

An attempt for a new draft is now available here

These are the changes which have been discussed (the items numbers are those used in the College Park presentation):

1 and 2) - Extension of the scope of DataLink {links} response to items which are different from datasets discovered in whatever way.

This point comes from data providers willing to use DataLink for attaching datasets or additional information to sources in catalogues or other items in service responses. This new usage has to be reflected in introduction and use cases.

The discussion on this has been moved to two github/ivoa-std/DataLink issues : https://github.com/ivoa-std/DataLink/issues/6 and https://github.com/ivoa-std/DataLink/issues/7

3 ) - The extension of the scope makes the linkage to {links} response occur in contexts not planned by original spec. Beside the acces.format/access.reference couple of FIELDS/PARAMs which can be used in ObsCore -ike contexts, the only previous proposed generic solution to address this response in a VOTable was to use a service descriptor RESOURCE to define the url to the {links} service with a reference for the ID param. there is a proposal to also use the LINK element inside a FIELD with a new content-type = ""application/x-votable+xml;content=datalink" where the FIELD directly contains the url to the {links} endpoint/. A new section for that seems too much. This will come in an appendix.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/31

4 ) - The dataLink {links} response can be discovered and used outside a service query. IT can be useful to recognize its nature of {links} response by an INFO tag.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/17

5 ) - Allowing fragments in the access_url seems to be a sensible thing to do considering multi-extension FITS, tar files, HDF5 and other structured data available. Issues to be solved are however related to providing the client enough information to consume this solution. Prototyping on direct use cases could help. It is questionable if The links response is used to get one raw with a specific semantics, description for each subpart but only retrievable or if each subpart can be retrieved via an extension of SODA. (See SODA 1.1)

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/15

6 ) - The "description" column in the {links} response needs to be a SHOULD to properly label the various links made available, specially when they share the same semantics. Pretty useful for the end user of a {links} response table.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/16

7 ) - There is an obvious need for new vocabulary. See: https://wiki.ivoa.net/twiki/bin/view/IVOA/UpdateDatalinkTerms But the semantics/vocabulary discussion is detached from the DataLink specification revision. I.e. it's fine to discuss it, but not within the scope of the document revision.

The discussion occured actually in the following threads

http://mail.ivoa.net/pipermail/dal/2019-October/008191.html

http://mail.ivoa.net/pipermail/dal/2019-October/008200.html

http://mail.ivoa.net/pipermail/dal/2019-October/008202.html

8 ) - In order to connect resource table to resource service descriptor, Mark Taylor proposed to adopt a nested resource schema. Considering we're currently mainly in the situation of one table response per query, it doesn't seem critical at this stage, but it needs more discussion and testing. Tom Mac Glynn proposed another solution which not well catched by the editor. Please Tom can you add your proposal here.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/20

9 ) - We propose to add a free-text name of the service descriptor resource to help identify the offered services. With a SHOULD requirement.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/21

10 ) - We propose to Add an optional "content-type" resource descriptor PARAMETER to identify the expected media type of the offered linked dataset/resource. This can also be considered as a SODA-1.1 new input parameter for driving format conversion.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/22

11 ) - It COULD be useful to provide a human readable description of a service descriptor. This wil be done by using a element inside the service descriptor RESOURCE

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/23

12 ) - A self describing service provides a service descriptor when queried with no input parameter. If queried with the only single identifier PARAMETER the provided service descriptor restrict parameter ranges (MIN/MAX) or OPTIONS to values adapted to the queried dataset or item.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/25

13 ) - ReST Interface descriptors. Could be useful for VOSPACE or any URL with variable sections/ It may be better to refer the existing Recommendation (https://tools.ietf.org/html/rfc6570 ) discussing this than to reinvent a ReST descriptor on our own. Actual implementation may be postponed to when use cases/prototypes are made available.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/27

14 ) - DataLink recognition outside a response from a protocol. Some discussion on the new proposed solution from the Note (new content-type=" in the LINK element), especially when the identification of this link column (sort-of) replicates the content that can be provided by a proper service descriptor.

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/29

Extra#0 : Should we suppress the availabilty endpoint in Section 2 (which according to Markus nether works)? Should we focus on {links} capabilities attached to other services only (and not as standalone DataLink services). This point is still controversy.

The discussion on this has been moved to the two github/ivoa-std/DataLink issues: https://github.com/ivoa-std/DataLink/issues/13 and https://github.com/ivoa-std/DataLink/issues/14

Extra#1 : from A. Micol. Addition of a "category" column to identify diffrent offered datasets. Isn't that tackled by new semantics terms ? Reluctance to add too much columns belonging to other protocols (Obscore data_product_type). Alberto should add his proposal to the -Next page. Discussion to follow.

The discussion actually occured in the following threads

http://mail.ivoa.net/pipermail/dal/2019-October/008191.html

http://mail.ivoa.net/pipermail/dal/2019-October/008200.html

http://mail.ivoa.net/pipermail/dal/2019-October/008202.html

Extra#2 : use case for an additional boolean column to quickly identify link elements that require authorization (see below, PatrickDowler) .

The discussion on this has been moved to the following github/ivoa-std/DataLink issue: https://github.com/ivoa-std/DataLink/issues/33

-- FrancoisBonnarel - 2019-07-20

Links and Authorization

We could add a non-normative advice to include a column in the {links} resource named "readable" with values true|false. The values predict if the client will (using the current anon or authenticated identity) be allowed to use the link (eg download the data). This saves the clients the annoyance of trying and getting a 403 Permission Denied.

-- PatrickDowler - 2019-05-16

New issues after virtual interop (gathered by -- FrancoisBonnarel - 2020-05-13)

A recent semantic discussion addressed the use case of adding the possibility to link sibling or alternate science datasets to the main item. Eventually, the right place to specify the dataproduct_type of the datasets has been decided to be a standardized media type parameter in the content_type FIELD. this has to be explained in the section. See PR #43

On December the 6th 2019 Ada Nebot proposed to refurbish the semantic model of DataLink. My guess is that enhancment is probably for version 2. Ada :

_As I see it, the things we are discussing concerning Datalink fall into 4 independent levels or categories: Level 0 - Data-format (fits, VOTable, PDF, png, …) Level 1 - Data-type (tabular, image, spectrum, cube, text, …) Level 2 - Data-information (Documentation, Calibration, Log, Preview, …) Level 3 - Data-relation (Derived from, Progenitor of, Sibling of, ...)

I see these as orthogonal levels since a list of links can be of any type (level 1) with any kind of format (level 0), any kind of relation (level 3) and could have any type of associated information to describe it (level 2).

Today the list of links returned by datalink is described in the columns content-type and semantics. These two columns cover the above levels only up to some degree.

Content-type: covers level 0 mainly, with some exceptions such as VOTable (which is also level 1). Semantics: covers level 2 mainly (e.g. preview), but also level 3 (e.g. derivation, progenitor).

Datalink at the moment has no field properly covering level 1 and applications (—> users) would benefit from having that well covered.

So, in my opinion, if I had to redo Datalink I would keep these different levels separated instead of putting everything into the semantics field. But applications might have a different point of view here —> Shouldn't we add Apps to this discussion?

Timeseries would be in level 3, since it is a relation. And I don’t think we would need the use of sibling or progenitor or anything like that for timeseries. What we need is to be able to say is:

"This list of links are time-series of tabular type"

"This list of links are time-series of spectrum type"

…

But if were to add terms such as sibling and so on, there is already an IVOA relationship vocabulary: http://ivoa.net/rdf/voresource/relationship_type/2016-08-17/relationship_type.html_

Markus wrote

" If the client submits more ID values than a service is prepared to process, the service should process ID values up to the limit and must include an overflow indicator in the output as described in DALI. The service must not truncate the output within the set of rows (links) for a single ID value if the request exceeds such an input limit."

Control by MAXREC or by telling the client there is an overflow ?

None of these is in 1.0. Neither in 1.1 at the moment Mark Taylor seems to be OK for QUERY_STAUS OVERFLOW Alberto and ESO have another solution for tuning the number of output lines

Thoughts?

-- MarkusDemleitner - 2020-05-13 answered

" If the client submits more ID values than a service is prepared to process, the service should process ID values up to the limit and must include an overflow indicator in the output as described in DALI. The service must not truncate the output within the set of rows (links) for a single ID value if the request exceeds such an input limit."

For the record, that's not my text, that's Datalink REC-1.0, p. 10. The context is this thread on the DAL list: http://mail.ivoa.net/pipermail/dal/2020-March/008318.html.

Control by MAXREC or by telling the client there is an overflow ? ```xml ``` None of these are in 1.0. Neither in 1.1 at the moment Mark Taylor seems to be OK for QUERY_STAUS OVERFLOW Alberto and ESO have another solution for tuning the number of output lines

Thoughts ?

I think I agree with Mark's assessment in the cited thread: QUERY_STATUS=OK and QUERY_STATUS=ERROR aren't useful in Datalink, and hence we shouldn't put them in. Also, it doesn't seem there's much of a place for MAXREC in Datalink.

Hence, I think there's no immediate need for changes in the spec text, let alone the spec content from this issue.

One might argue that writing something like:

No QUERY_STATUS INFOs with values other than OVERFLOW should be produced by datalink services.

That's probably benign, since we can't change the overflow indication in DALI anyway when it is directly referenced by implemented standards, and thus we can hard-code QUERY_STATUS here, too. It would perhaps have saved me a bit of bafflement. On the other hand: has this ever baffled anyone else? And so badly as to justify more spec text?

Similarly, perhaps it is worth saying somewhere that DALI MAXREC doesn't apply to Datalink, but I couldn't say where that text would fit without seeming odd itself.

So... I think my vote would be for closing this issue without action.

Changes DataLink -1.1 (around May 2022)

• added optional content_qualifier to describe link target content with terms from the product-type vocabulary

• added optional link_auth and link_authorized to signal whether au- thentication is necessary to use the link

• clarified use of multiple ID values and possible OVERFLOW

• clarified use of utype for self-describing service descriptors

• clarified use of semantics

• generalize by adding use cases for links to content other than data files

• added using LINK to convey when datalink request URL is in a table column

• service descriptors can include a contentType param to describe service output and should include a name and description

• service descriptors can include exampleURL param(s) with working example and description

• VOSI-availability and VOSI-capabilities endpoints are now optional

Revised list of changes (January the 4th/5th/6th 2023) and implementation review

* List of changes of version 1.1 with respect to version 1.0 (2015) and corresponding DataLink issues on github repository.

see : https://github.com/ivoa-std/DataLink for details

  • generalize the spec scope by adding use cases for links to starting points different from data files (issues DataLink #6 and #7)
  • getcapability is becoming optional (issue DataLink #13)
  • getavailability is removed because it's no more a VOSI recommendation (issue DataLink #14)
  • the old Resource term has been replaced by endpoint each time it's appropriate (issue DataLink #11)
  • INFO tag recommended at the beginning of the response -ensuring compatibility of the VOTable with DataLink - (issue DataLink #17)
  • insistance of the interest in filling the free-text user readable description FIELD (issue DataLink #16 )
  • Fragmented URL are now allowed in the links. (issue DataLink #15)
  • the control of limits in the response is done following the DALI rules (issue DataLink #45)
  • changes in server descriptors
    • descriptor RESOURCE name (issue DataLink #21)
    • descriptor RESOURCE DESCRIPTION (issue DataLink #23)
    • content-type PARAM to describe the media type of the service response (issue DataLink #22)
    • in case this content-type is text/html the result should go to the WEB browser both when it is an html document or when it is a web interface. (issue DataLink #52 )
    • nesting of service descriptors recommended (issue DataLink #20)
    • better description of the inputPArams section in the spec (issue DataLink #80)
    • optional example URL added (issue DataLink #55)
  • Autodescription of services better described (issue DataLink #25)
  • authorization/authentication of linked resources announced by optional FIELDS link_auth and link_ authorized (issue DataLink #33)
  • new FIELD content_qualifier to qualify/type the content of the response of the link -dataset or whatever- (issue Datalink #42)
  • new FIELD local_semantics added to match similar links in {links} responses for different IDs -PR is still to be added at the current date- (issue DataLink #88)
  • new ways to indentify {links} URLs in VOTAble : LINKS in FIELD +content-type= "application/x-votable+xml;content=datalink" or combination of two utyped FIELDS Access.reference and Access.format="application/x-votable+xml;content=datalink" (issue DataLink #29 and #59)
  • wrong ObsCore UCD on obs_publisher_did in one example is fixed (issue DataLink #90)
  • clarification of semantics and product_type vocabulary URI (issue DataLink #67)
  • distinction of standard SODA services from generic or custom Access Data services is clarified (issue DataLink #61)
  • ucd in spectral cut out custom services example changed (issue DataLink #60)
  • set utypes on some inputParams in service descriptors (issue DataLink #84). One use case is SODA where imput PARAMS force the ObsCore parameters values of the cutout.
  • Write in the text that ranges of values and enumerated values can be inferred from the metadata included in the data discovery table we are coming from (issue DataLink #85).
  • The required content_type header (votable/text+xml;content=datalink) of the http response forbids WEB browsers to execute xslt. text/xml will still be valid for VOTable but Various solutions have been discussed (1 ) Replace MUST by SHOULD, With the additional requirement option of providing the Datalink standard ID in an INFO tag in the document, or 2) rely on the content of the accept header of the query - does it accept votable, text/html, etc..- solution 1 has been integrated eventually) (issue DataLink #91)

-- FrancoisBonnarel - 2023-01-17

List of changes following the RFC period and TCG vote

  • The ID and semantics must make sense in case the DataLink service send an error message ( DataLink PR #108)
  • The standard version ID in capability has been upgraded to ivo://ivoa.net/std/DataLink#links-1. (DataLink PR #109)
  • The datalink recognition mechanisms section intially written for this spev has been removed from the text and an implementation review Note has been published (https://github.com/ivoa/DataLinkRecImplNote).
  • An explanation of { } syntax has been added

-- FrancoisBonnarel - 2024-03-14

* discussed, but not integrated in 1.1 yet :

  • service descriptor : URL could benefit go be templated (for example to pick up part of the path from the user or from the table and not only parameters). Proposals have been made but it was delayed to next version until some service implement it. (issue DataLink #27). Probably DataLink 2.0
  • DataLink underlying data model : is content_type, content_qualifier, semantiics triplet enough. Should we distinguish conveyed "information" and conveyed "realtionship" (issue DataLink #44). This is a discussion for DataLink 2.0.
  • Service descriptor inputParams enhancing distinction between required and optional parameters (issue DataLink #51). No obvious solution has been found at the moment.
  • Service Descriptor to be removed from DataLink and pushed to VOTable (issue DataLink #53). This is a major revision of DataLink but also of VOTable. Version 2.0. This may also be completed by introduction of the templating mechanism or even more by the use of PDL inside inputParams.
  • content parameter in datalink mime type. (issue DataLink #82). This is apparently to be solved in VOTable before DataLink1.1 becomes a REC
  • UCD for ID column in {links} service output . Is not the same as the UCD of ID parameter in any SODA service referring to the ID column in the {links} table. Is that an issue ? apparently not (issue DataLink #89)
  • interface version attribute in spec example . Should it be #links-1.1 or #links-1.0 (DataLink #96 - see also #62)

* Implementation : server side.

GAVO implemented the following changes = content_qualifier and local_semantics, service descriptor additional features such as DESCRIPTION? name, content_type, etc....

As a matter of example, a couple of links response for various obscore/sia/ssa tables in GAVO server

rosat.images : http://dc.zah.uni-heidelberg.de/rosat/q/dl/dlmeta?ID=ivo%3a%2f%2forg.gavo.dc%2f~%3frosat%2fimage_data%2frda_4%2fwg400138p_n1_p1_r2_f2_p1%2frp400138n00_im3.fits.gz where some links have content_qualifier = #image, others have content_qualifier = #event or #timeseries

lamost6.lrs : http://dc.zah.uni-heidelberg.de/lamost6/q/sdl_lrs/dlmeta?ID=ivo%3a%2f%2forg.gavo.dc%2f~%3flamost6%2flrs%2f373006237 where some links have content_qualifier = #spectrum

califadr3.cubes : http://dc.zah.uni-heidelberg.de/califa/q3/dl/dlget?ID=ivo%3A//org.gavo.dc/~%3Fcalifa/datadr3/COMB/ARP220.COMB.rscube.fits where some links have content_qualifier = #cube

ppakm31.maps : http://dc.zah.uni-heidelberg.de/ppakm31/q/cdl/dlmeta?ID=ivo%3a%2f%2forg.gavo.dc%2f~%3fppakm31%2fdata%2fPPAK_M31_F5_cube.fits wher some links have local_semantics filled up and content_qualifier equals #image or #cube

All these are combined with various semantics or content_type values

* implementation : client side

TopCAT prototype (http://andromeda.star.bristol.ac.uk/releases/topcat/pre/topcat-full_datalink11.jar) displays additional features in service descriptor and makes use of additional links table FIELD such as content_qualifier, authorization and local_semantics; The tool behavior is adapted to the content of these new FIELDS. For example Actions suggested by TopCat for the links not only depend on content_type but also from content_qualifier.

AladinDesktop is going to adapt to those new features too.

-- FrancoisBonnarel - 2023-01-06

DataLink11RFC

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf DataLink-20200505.pdf r1 manage 382.0 K 2020-05-14 - 16:58 FrancoisBonnarel DataLink 1.1 version For virtual interop May 2020
PDFpdf DataLink.pdf r1 manage 402.2 K 2019-07-22 - 06:43 FrancoisBonnarel DataLink 1.1 preliminary internal working draft

This topic: IVOA > WebHome > IvoaDAL > DataLink > DataLink-1_0-Next
Topic revision: r18 - 2024-03-14 - FrancoisBonnarel
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback