DataLink 1.1 Proposed Recommendation: Request for Comments


Introduction


DataLink describes the linking of data discovery metadata to access to the data itself, further detailed metadata, related resources, and to services that perform operations on the data.

The main changes in v1.1 are

  • Generalize by adding use cases for links to content other than data files
  • VOSI-availability and VOSI-capabilities endpoints are now optional
  • Service descriptors can include exampleURL and contentType param(s), as well as DESCRIPTION, name, etc...
  • Added optional link_auth and link_authorized to signal whether authentication is necessary to use the link
  • INFO element with standardID mandatory in {links} response
  • Added content_qualifier FIELD to inform on the nature of the link target
  • Added local_semantics to identify similar links in the same DataLink service for different IDs
  • Mechanisms to recognize {links} endpoints outside ObsCore

Latest version of DataLink can be found at:

The GitHub repository for issues and source can be found at: Detailed discussion towards 1-1 can also be found on this ivoa twiki page ( last update by FrancoisBonnarel - 2023-05-04):

Reference Interoperable Implementations

Server side

GAVO implemented the following changes = content_qualifier and local_semantics, service descriptor additional features such as DESCRIPTION? name, content_type, etc....

As a matter of example, a couple of links response for various obscore/sia/ssa tables in GAVO server

All these are combined with various semantics or content_type values

-- MarkusDemleitner - 2023-07-10


CADC has implemented the following in https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops:

  • INFO element with standardID in links response
  • new optional fields in links response: local_semantics (no content yet but can be populated with default vocab in most cases), content_qualifier (no content, not likely to use), link_auth, link_authorized
  • contentType param in service descriptors (where applicable)
IRIS image: https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/datalink?ID=ivo://cadc.nrc.ca/IRIS?f212h000/IRAS-25um shows link_auth=optional and link_authorized=true because one can authenticate but the data is public.

new CFHT data: anonymous use of https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/datalink?ID=ivo://cadc.nrc.ca/CFHT?2773629/2773629o shows link_auth=optional and link_authorized=false because the data is still proprietary and the caller is anonymous; if an authorized user makes the call they will see authorized=true. It's hard to demonstrate that for a general audience.

The core CADC implementation is available as a library (cadc-datalink-server) in MavenCentral with source code at https://github.com/opencadc/dal.git; the caom2-specific logic is available in a library (caom2-datalink-server) with source at https://github.com/opencadc/caom2service.git -- the core lib is also used in ALMA DataLink service but may not yet be released with the latest features.

Client side

TOPCAT v4.8-8 and later displays additional features in service descriptor and makes use of additional links table FIELD such as content_qualifier, authorization and local_semantics; The tool behavior is adapted to the content of these new FIELDS. For example Activation Actions suggested by topcat for the links not only depend on content_type but also content_qualifier, and local_semantics is used to guess which link a user is interested in based on previous selections.

AladinDesktop is going to adapt to those new features too. (see prototype screenshot)

The CADC DownloadManager (https://github.com/opencadc/apps.git) includes a simple DataLink client class so it can resolve publisherID values into 1..* URLs for download; this code hasn't changed as a result of DataLink -1.1. The CADC AdvancedSearch web portal makes calls to the above caom2ops/datalink service to find previews and download info for each row (publisherID): it makes use of link_authorized to decide to display the download options (or not), which prevents users from selecting downloads/links when they are not authorized and the request will be rejected later.

Implementations Validators

The following validators are available for DataLink

  • datalinklint which is part of STILTS. STILTS v3.4-9 contains DL 1.1 validation features, but later versions (at time of writing, post-3.4-9 pre-release) recommended as slightly updated for PR-DataLink-1.1-20231108.
  • Show my DataLink which is part of DaCHS



Comments from the IVOA Community during RFC/TCG review period: 2023-04-21 - 2023-05-18

The comments from the TCG members during the RFC/TCG review should be included in the next section.

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document

Community Comment by Markus Demleitner

(1) The standard ID: I'm pretty sure we discussed that before, but I'm really unsure how we came to the conclusion that even Datalink 1.1 still has the ivoid of ivo://ivoa.net/std/DataLink#links-1.0.

Yes, we have to do crazy stuff like that for the schema URIs due to the way XML element names are compared. But there is in general no analogous need with ivoids, because we control the rules how to compare them in what situations.

Does anyone remember why we went for links-1.0 here? If not, I'd suggest links-1. I volunteer for adding a brief explanation about how clients should disregard the minor version for normal operations.

(2) I am entirely unhappy with section 3.1.1, starting with its title, which probably should be something like "Datalinks in VOTable columns". And then the first paragraph should probably say something more concrete like perhaps "Columns containing datalinks SHOULD be marked with a UCD of X.Y.Z and a LINK-typed child in its FIELD like this:

   <LINK whatever="blabla"/>"

And the second paragraph I'd say doesn't belong here at all (it could go to, perhaps, 1.2.7 or a use case discussing datalinks as primary results if we think we need to be explicit about this).

There are use cases behind this. When datalink links response is hooked to table rows outside the context of ObsTAP /SIA2 how do we generate/recognize the DataLink URL ?

Of course we can use the Service Descriptor with the single ID parameter if the DataLink can be parametrized by and "id" from one of the columns. But in that case the descriptor would be doing exactly the same than the LINK element proposed here as included in the appropriate FIELD and is much less verbose. And it's pretty correct VOTable standard. The FIELD itself should not be described by a datalink ucd because it's probably generally an id.

The second paragraph refers to use cases where the URL is not built from the content of one FIELD and when the URL is ad hoc and should be the content of a FIELD. Using the same utypes than the one used in Obscore responses seems reasonable. This is for example adapted to SIA1 or SSA responses. I think this has nothing to do with recursive datalink.

We may try to rephrase all this if this is unclear, but the intent has to be kept.

-- FrancoisBonnarel - 2023-08-12

(3) In 3.2, is says:

If an error occurs while processing an ID value, there \rfcshould\ be at least one row for that ID value and an error\_message

The way the pyvo datalink client is written, we have to make that an unconditional MUST, or pyvo will keep requesting any failing ID (and frankly I'm unsure how else to implement this given multi-ID and overflows): it will only remove an ID of its list of ids to query if it gets at least one row back for it. Perhaps:

A service MUST return at least one row for each ID passed in.

[ceterum censeo we should have let ID be single-valued; it would have made everything soo much simpler and nothing really much harder/slower]

(4) 3.2.2, second paragraph: I had to puzzle quite a bit about this, starting with wondering what a "dereferenceable URL" might be. I'd suggest to replace the entire paragraph with "Access URLs may have fragment parts, which could, for instance, refer to id-ed elements within XML documents or extensions within FITS files. As in URIs in general, the interpretation of a fragment identifier depends on the media type."

"dereferanceable" was used in the sense that it can be fully accessed by http. Which is not the case in URN in general or URL with fragments. For the latter the client is supposed to interpret the fragment. See: https://en.wikipedia.org/wiki/URI_fragment

Apart from that I agree with your rephrasing. FrancoisBonnarel - 2023-08-14"

This will also drop the "No other additional parameters or client handling are allowed." -- if this forbids query strings on access URIs, I'd strongly disagree. If this means something else, we'd have to write that something else.

In version 1.0 we could read:

"The access_url column contains a URL to download a single resource. The URL in this column must be usable as-is with no additional parameters or client handling; it can be a link to a dynamic resource (e.g. preview generation)."

This statement was not consistent with the allowance of fragments, hence the new statement. I can rephrase it in the upcoming PR. -- FrancoisBonnarel - 2023-08-14

(5) In my editoral PR, I've dropped a paragraph on semantics for error_message rows. This is now sufficiently addressed above that passage.

(6) sect. 3.2.9 content_qualifier: I think we should at least name the motivating use case a bit more precisely here, as in, perhaps: "It aids clients in presenting to the user the same sort of link as they go from one dataset to another within a service. For instance, suppose a service serves both continuum and line cubes. Using content_qualifier, users can configure their clients such that, as they change to a new data set, they always see the line cube even when the semantics and content\_type columns agree for both types of data." Or so.

OK for this change. I will adopt it in the next PR. -- FrancoisBonnarel - 2023-08-14

(7) Sect. 4.8: Sorry, you cannot introduce a utype ("adhoc:this") in a section called "Example: X". If you are really, really sure these "self-describing" things are useful, put them into a section of their own.

Me, I've frankly never really understood where you want to go with this, and I think there's no implementation doing any of this, so perhaps we should drop the whole thing. But if we don't drop it and somehow nonchalantly mention it in an example, at least don't introduce a new utype here. What's wrong with the name="this" you had before? You see, having two different mechanisms for what to my knowledge hasn't been implemented even once seems a bit excessive.

When dropping adhoc:this, don't forget that it is referenced in sect 4.1.

The autodescription motivation may be explained earlier in the section. For "adhoc:this" I remember Pat advocating for this. If we motivate earlier then we can restrict to a pure example here. FrancoisBonnarel - 2023-08-14

(8) I have not looked at the DataLinkImp source that's also present in the repo. If you think this ought to become a document, please extract it to a different repo; ivoatex is not designed to support two documents in one repo.

You are right. The note repo will be created in github.com/ivoa. -- FrancoisBonnarel - 2023-08-18

I've also collected a few rather editorial changes in https://github.com/ivoa-std/DataLink/pull/108



Comments from TCG member during the RFC/TCG Review Period: 2023-04-21 - 2023-05-18

WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.

IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.

TCG Chair & Vice Chair

With positive review by the TCG with a comments & feedback period successfully completed, the TCG chair/Vice Chair approve as well.

Applications Working Group

No comment on the document, we appreciate the presence of examples that clarify the usage and implementation

--
Datalink is used and usage will increase for external webservice like simulated data, output format that are not in IVOA (Hapi Timeseries, OGC format ...)

May be change the datalink page with examples of implementation
refer to the datalink page in the document.
encourage working/interest groups to put examples as Markus did

Data Access Layer Working Group

Some minor edits only, otherwise this update looks sound.

  1. Section references would be useful in the v1.1 changes list - PR#105 raised and merged
  2. Some minor grammar updates - PR #107 raised

-- JamesDempsey - 2023-07-03

Data Model Working Group

I have a rather rudimentary understanding of DataLink, VOSI and DALI, so there are some details that I'm glossing over in my read.

I don't see any real issues/conflicts with the DM group work. However, I have 2 points/questions to raise:

  1. local_semantics: This is an identifier from a local vocabulary to help identify/select rows at a finer level than possible with just the other tags (semantics, content_type, content_qualifier). I'm guessing this is for something like ObsCore 's dataproduct_subtype. My question is that I don't really understand what the value is... is it just the tag? or URI for the local vocabulary + tag? The example serializations are no help since the DaCHs ones seem to resolve into a pretty format and I can't see the actual datalink content, and the CADC examples don't populate this field. I'd like to see, either in the document or examples, something more concrete.
I don't think authors discussed this point too much. IMHO both status would be acceptable. simple terms are enough to associate the results, but local vocab URI + tag allow to link the term to definitions and relationships, so it's reacher. Examples wil be given in the text. FrancoisBonnarel - 2023-08-18
  1. Product Type vocabulary: This directly affects the DM group, it'd be used in the Dataset model and ObsCore could be updated to use it as well. The link in the standard resolves to a 2021 version of the vocabulary. At the interop, a 2023 version was discussed which looked like it had some issues. Which vocabulary would support this REC?
    1. 2023-07-28: The referenced vocabulary now resolves to a version dated 2023-06-26 (though the event-list discussion was just going on this week). The elements and definitions in this list appear compatible with DM group usage in the ObsCore and Dataset models. -- MarkCresitelloDittmar - 2023-07-28
Yes, the product-type ivoa vocabulary is what should be used from now onwards in dataset DM, next version of ObsCore as well as DataLink content_qualifier or maybe also registry standards. -- FrancoisBonnarel - 2023-08-18

-- MarkCresitelloDittmar - 2023-06-21

Followup on revised document:

I see the items above have been addressed satisfactorily, I see no additional issues with the revised document.

-- MarkCresitelloDittmar - 2023-11-11

Grid & Web Services Working Group

Possible backward compatibility drawbacks in VOSpace (VOSpace implementation can use a DataLink to reference data location):

  • new columns of VOTable content_qualifier, local_semantics, link_auth and link_authorized (pgg. 15, 16) could break backward compatibility.
  • pg. 17 it is stated "From version 1.1 of this standard the {links} response must include this INFO ....
  • pg.24 in "Example service descriptor for VOSpace 2.0, attributes "datatype" and "arraysize" are added to <PARAM>

    -- SaraBertocco - 2023-11-10

Registry Working Group

No particular remark pertaining to Registry standards.

Semantics Working Group

No issue for Semantics at this point.

Data Curation & Preservation Interest Group

Education Interest Group

Knowledge Discovery Interest Group

Operations Interest Group

I don't strictly speaking speak for Operations IG as of this week, but since I did most of the review before my term expired, I'll fill it in here; the TCG can decide whether this counts as an Ops endorsement or not.

As one of the authors I'm basically happy with this document, but I will draw attention to one or two issues.

  • Section 2.2 defines the standardID for this standard as ivo://ivoa.net/std/DataLink#links-1.0, followed by the comment "Note this is applicable to endpoints following any version 1.* of the DataLink standard, to avoid backward compatibility problems." In my opinion the backward compatibility problems are not sufficient to justify this choice, and the minor version should be reflected in this standard ID, i.e. it should be "...#links-1.1". This has been discussed in the open github Issue #96, and other authors seem to agree. A fix will require at least an update to the StandardsRegExt record, and also changes in the document to places where the key is referenced, including Section 2.2 and, especially, Section 3.3.1 as well as related example text. This change would amongst other things make it possible for validators to check which minor version they are supposed to be validating against. PatDowler has volunteered to write a Pull Request addressing this issue, but I can have a go if he doesn't.
  • Section 3.3.1 REQUIRES an INFO defining a suitable standardID for links response tables. The example shows such an INFO element as a child of the RESOURCE/@type="results" element, but it's not clear what restrictions there are on the location - does it have to go there, or can it be elsewhere in the VOTable? This should be clarified. If it's not required to be a child of the results resource, the example text in this section should probably be cut down.
This is done in consistency with DALI. DALI seems to insist that INFO elements should be in the primary RESOURCE (name="results"), and that other RESOURCEs may be in the VOTable. We may be more explicit on this. see next PR -- FrancoisBonnarel - 2023-08-18
  • Section 3.2.2: The final sentence says "No other additional parameters or client handling are allowed." I don't understand what is meant here. Should this sentence be removed?
See my answer to Markus above. And next PR. -- FrancoisBonnarel - 2023-08-18
  • As mentioned in Issue #82, the recommended MIME type application/x-votable+xml;content=datalink is using a content-type parameter for VOTable not endorsed in the VOTable standard, which is a bit questionable; but this is not new in this version of DataLink and it will hopefully be addressed in VOTable 1.5, so turning a blind eye is probably OK.

-- MarkTaylor - 2023-05-15

Radio Astronomy Interest Group

Solar System Interest Group

Nothing to add.

-- Anne Raugh - 2023-11-10

Theory Interest Group

Time Domain Interest Group

Just one suggestion, perhaps it would be a good idea to explain the notation with braces (e.g. {link}) in the "Conformance-related definitions" section (page 3).

PierreFernique - 2024-11-08

I have added this text to that section (in PR, will nerge before REC):

"This document uses curly braces (e.g. \{name\} to refer to a named concept
such as a web servcie endpoint where the text requires a logical name but
the actual name in a service implementing the standard are not restricted."

-- PatrickDowler 2023-11-11

Standards and Processes Committee


TCG Vote : 2023-05-19 - 2023-06-01

If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.

Group Yes No Abstain Comments
TCG *      
Apps *      
DAL *      
DM *      
GWS *      
Registry *      
Semantics *      
DCP *      
Edu        
KDIG *      
Ops *      
Radio        
SSIG *      
Theory        
TD *      
<nop>StdProc        



<!--
* Set ALLOWTOPICRENAME = TWikiAdminGroup
-->

Edit | Attach | Watch | Print version | History: r35 < r34 < r33 < r32 < r31 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r35 - 2023-11-16 - MarkTaylor
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback