Public discussion page for the IVOA Datalink Proposed Recommendation.
Latest version of the IVOA Datalink can be found at:
(Indicate here the links to at least two Reference Interoperable Implementations)
The CADC implementation of DataLink provides one or more downloads per input ID an provides links to prototype AccessData services for most science data. Example invocation:
http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/datalink?ID=caom:IRIS/f212h000/IRAS-25um
The DataLink service descriptor resource is included in VOTables from DataLink, TAP, and SIA services whenever an appropriate identifier column is included in the output. In the output from example above, there is a service descriptor for the prototype AccessData services (currently custom, so there is no standardID).
Update: The CADC DataLink service has been updated to the post-RFC period PR: uses the core vocabulary in the semantics field and never returns an access_url and service_def in the same row.
For SIA and TAP requests, the service descriptor tells the caller how to invoke the associated DataLink service itself. Example invocation of SIAv2 query:
http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/v2query?MAXREC=1&POS=Circle+180+5+0.2
The service descriptor is the second resource in the VOTable.
GAVO's server package DaCHS contains an implementation of Datalink both in auxiliary resources for SSAP and SIAP and standalone. The Datalink cores section of the reference manual discusses how operators define datalink behaviour (this includes the definition of data processing services and hence is longer than it would need for datalink proper)..
At the Heidelberg data center, several services use these facilities, among them ivo://org.gavo.dc/califa/q2/s, ivo://org.gavo.dc/feros/q/ssa, ivo://org.gavo.dc/mlqso/q/s, and ivo://org.gavo.dc/theossa/q/ssa. There are also pointers to datalink documents in the obscore table (try something like select * from ivoa.obscore where access_format like '%datalink%'
on ivo://org.gavo.dc/tap).
Recent versions of SPLAT, a spectral analysis tool, uses datalink to discover how to do server-side manipulations of spectra it retrieves. You can try this over SSAP on, e.g., the TheoSSA, califa ssa, and Flash/Heros services. Or also get the access information to download a spectrum from a Datalink table, which can be tested on the ObsCore results from the CADC service. -- MargaridaCastroNeves - 2014-11-17
(If any, indicate here the links to Implementations Validators)
A validator is being developed by VO Paris.
NOTE: I will use bullets and italics for the official responses (from the editor, pending agreement of the authors so some might change in the next few days)
All along the doc: several [ref] to fix.
page 1. http://www.ivoa.net/documents/DataLink/20140228/index.html is duplicated
page 5. 2nd p. I would stress on the fact that the service descriptor resource describes how to query a service. It does not describe in detail, for example, what the service returns.
Page 7. 3rd p. At the end of the paragraph: "[The s]"
Page 7. 4th p. Is this actually the same use case of 1.2.1?
Page 8. 2nd p. At the end of the paragraph: "Providers should be able to describe [...]"
Page 11 Remove block describing param REQUEST, since it is no longer required.
Page 15. 2nd paragraph "This resource [is] typically describes.."
->
Author Response(2014 July 17th) by MarkusDemleitner
The problem is that there are two usages for the service descriptors:
(a) as part of a datalink response, where there is, as you say, access_url in the datalink table as for any other data link;
(b) as part of a DAL response (say, a SIAP table), where you say "go here for postprocessed (cutout, resampled...) data" -- that's the thing with the PARAM name="ID" ref="". In this case, no external access URL is available and hence the GROUP must contain it.
One could stipulate that service descriptors within datalink documents have no accessURL PARAM and the others do, but I'd say that's an implementation complication that's not really warranted. ->
-->
Answer (2014 July 21st) by JoseEnriqueRuiz
Ok, but what to do in the potential case of having different values for access_url field and accessURL PARAM? This case is explicetely described in the docs, and as I understand it the solution proposed is to use access_url value to call the service (or at least it is not clear enough for me) If this is true, I would prefer to use the value given in accessURL PARAM instead, as it would be the case for calling a service after a DAL response (case b)
* Jose Enrique pointed a difficulty with the "service_def" paragraph. The potential inconsistence (or redondancy) between the acces_url in the* {links} resource response table and the accessURL in the service descriptor. The accessURL in the service descriptor should be generic. In the response table the access_url is always attached in some way to the dataset, either implicitly or by fixing some id parameter. Maybe we should use an example based on an http Get parameter instead of REST ?
Page 16. http://www.ivoa.net/rdf/datalink does not exist. 404
->
Author Response(2014 July 17th) by MarkusDemleitner
In this case, a 404 is almost fine, as the URL really only defines a name space, and in this role there's not requirement it resolves to anything at all. In our case, though, we promise there's an RDF file there that would let people figure out semantic relationships between the various terms that are there (e.g., a "flatfield" is some kind of "file used in data reduction").
Things still work with the 404, but it'd suck if it were there at REC time. So, is anyone actively working on getting the vocabulary in? Can we discuss it a bit, too?
Is this the same point on which you already asked (or was François? someone else?) for contributions in terms of which predicates this vocabulary should * *contain as a minimum set? It was some time ago, I don't remember properly. I think we need it if we want client applications to act smart upon links, leaving it totally free to the providers can make this field quite useless.
Can we build on the existing ivoa.net/rdf existing vocabularies? E.g. the 'flatfield' example can fit the obs.calib.flat@en UCD vocab concept?
(BTW: should the link of the vocabulary point to http://www.ivoa.net/rdf/Vocabularies/vocabularies-YYYYMMDD/datalink or maybe http://www.ivoa.net/rdf/datalink/YYYYMMDD/datalink or similar?)
-->
Page 16. 3.2.7 conte_type and content_length In the case the link is a pointer to a an ad-hoc service, it may happen that content_type and content_length cannot be defined before calling with a specific input params chosen by the user. I'm thinking of a service that generates images on-the-fly, and based on the input params this result image may be very different in size, and its format may be png, jpeg or fits. Which values for conte_type and content_length for these cases? blank?
->
Author Response(2014 July 17th) by MarkusDemleitner
Yes to blank/null. I'd argue that is implied in the required=no in Table 1. I seem to remember there once was prose making this a bit more explicit in previous versions; I'm not sure how much I miss it now.
->
Page 16. 3.2.8 content_length I would use unit="Kbyte", much more practical and user-friendly.
->
Author Response(2014 July 17th) by MarkusDemleitner
This is a protocol, and hence users will not usually see the raw table, and hence the unit chosen doesn't really matter; it's up to the clients to format and display this information, if at all. Except with Kbyte we wouldn't leave the realm of 32-bit integers quite as quickly.
Which made me notice we don't define the type of content_length yet. I think we should at least make a recommendation. My first choice would be "long", which in VOTable is a 64 bit integer, and unit="byte" will do fine then [quick: how many 2014 hard drives can you fill before VOTable longs warp over with the number of bytes stored? Assuming one hard drive weighs 100g, express the mass of that storage cluster in solar masses].
If people are worried about interoperability of such longs and were to advocate int, I'd say unit should be kbyte (decimal prefix) with commercial rounding or so.
For float, it wouldn't really matter and I'd go for byte again.
So, which would it be?
->
-->
Group member response (2014 July 18th) by MarcoMolinaro
I'd vote for long/byte. If we're to use kilobytes for the unit, however, we have to decide between: kbyte and Kibyte, at least to follow VOUnits (since it's a recommendation).
->
Author Response(2014 July 17th) by MarkusDemleitner
With not-found situations the server may want to add some explanation ("This identifier is not from this site" versus "We seem to have lost this file"). We should at least provide it with a means to do this, hence the NotFoundError.
Whether it's a good idea to mandate at least one row per ID (up to the match limit) and have errors in every case may not be quite as clear-cut. I have to say I'm on the side of one row per ID, but I don't have terribly strong arguments for that. Well, of course there's the general rule that silent failures are bad. Except when they aren't and silent failures are what preserves what's left of the user's sanity. Hm. No easy answer.
Note from PatrickDowler: In the previous version these messages were changed from *Error to *Fault (following VOSpace style); I failed to note this change in the change log, but it was in the first PR. A typical implementation of this would be to declare exceptions with these names and in Java (at least) Error has a very specific meaning such that declaring a normal application exception with that name was very wrong.
Answer (2014 July 21st) by JoseEnriqueRuiz
Ok, though some could argue that DAL services in general do not provide these kind of messages for "zero records found" responses in multiple-valued params queries. I guess, they just simply skip to next value.
If I follow your arguments, I would say we could have different explanations for errors found when creating different links for the same ID. (i.e. some services not designed to work with a specific dataset)
->
Author Response(2014 July 17th) by MarkusDemleitner
No -- and you're right, that should be made clearer at this point.
Should we add use="required" to PARAM tags describing mandatory input params?
->
Author Response(2014 July 17th) by MarkusDemleitner
use="required" isn't available in VOTable. And I'd argue that's not a big loss anyway, as typically relations between parameters are more complex than that ("if you give RA_MAX, you cannot give any of PIX_*"). We know how to say these complex things in PDL, and I'd hope in a future version we can add VO-DML-based PDL annotation to the the PARAMs that would be able to express this kind of thing.
->
--> Answer (2014 July 21st) by JoseEnriqueRuiz
Ok, fair enough
-->
I would add one example of ref="columnID" (other than the obs_publish_id) to one or several PARAM tags describing an input param whose value is taken from the tabular data present in
I would stress on the fact that the Service Descriptor syntax allows also providing default values, which facilitates the use for a client.
Page 24. 3rd p. 9it is related to photometric or flux calibration).
->
Author Response(2014 July 17th) by MarkusDemleitner
I'm not sure I find this convincing -- for one, most of the services described by datalink groups probably will put out data that's not obviously tabular in nature (i.e., images and such). For two, the output column metadata in tabular data should really, really be contained in the response (as in VOTable and to some degree FITS binary), which is where the clients should get it from.
For discovering services by output table structure ("which services return normalised fluxes?"), that's admittedly not good enough, but that's a Registry problem (which I still don't consider terribly relevant to *data*link).
->
-->
Answer (2014 July 21st) by JoseEnriqueRuiz
For one, in my view, this is not a reason to forbid this optional use. I could say many DataLink services will not provide links to adhoc services, and this does not forbid the use of the adhoc services description syntax in the DataLink response when it is needed. _ For two, Yes. VOTables should be accurately self-described, though I do not see why this should go against describing them also in services as their output._
I see DataLink as a very powerful way to discover generic adhoc services not present in the VO Registry. In this sense, I find DataLink somehow related to service discovery usecases. This is why we are talking about things like [use=required] (present in VOSI-capabilities but not in VOTable), VO-DML-based PDL annotation, and descriptions of service outputs.
In my opinion, the description of a service would benefit from a syntax that also allows the description of its outputs (goind beyond the human-readable text in the description field of DataLink response), and for tabular outputs the solution is quite straight-forward and simple, so why forbiding it?
We have in DALI the MAXREC=0 mechanism to provide description of service outputs, where the service is not required to execute any specific request (just a mean to provide a simple hard-coded description of the tabular outputs) I guess this mechanism has been adopted and approved because there are use cases behind, nad I guess they may also be valid for DataLink..
-->
In the same spirit, I think we should agree on a optional mechanism to provide a detailed description of the number and nature of the links given by the datalink service (rows of the reponse VOTable), in the case this response is always the same for any ID.
->
Author Response(2014 July 17th) by MarkusDemleitner
This sounds interesting and the first requirement that might necessitate a registry extension for datalink. I don't think anyone is wild about having to define one, and the document has been careful not to introduce some dependency on it, but if we collect use cases that call for it, it's probably not prohibitively hard to do, either. What use cases do you have in mind that would be solved by such a description?
->
-->
Answer (2014 July 21st) by JoseEnriqueRuiz
Well, I'm not going that far.. (registry extension) I think this could be solved just adopting the
For example, consider three different data providers as three SIA services. One person would like to know that for the first SIA service the complemetary DataLink provides a set of links with progenitors and provenance metadata, for the second SIA service the proposed DataLink service has a very different nature providing cutouts and one specific analysis service, while the third DAL service offers only related bibliography through a different DataLink service. These different natures of these two DataLink services could be known in advance before actually calling the DataLink services.
The specific nature of complementary DataLink services should not be at all restricted or categorised, just think on any potentially accessible resource in the web that could be linked, even outside the VO-world: related bibliography (ADS), SIMBAD or NED objects in the FoV, non-VO services like those coming from SDSS or SkyView, or even simple doc-like HTML pages..
-->
1) General comment: Despite the introduction, the difference between the two datalink methods has not be clear for me. Both are called "datalink" and it is difficult to understand the difference between the two methods with the same name. I suspect that there were various author point of views and no definitive choice. Why and when we have to use one method or the other one would be helpful for future implementors.
->
Author Response(2014 July 21st) by MarkusDemleitner
I'm not quite sure what you mean by "two methods" -- standalone datalink vs. service descriptor in DAL? If so, I wouldn't call that two methods, but we should obviously do a better job laying out how things are supposed to work in both access scenarios. Do you have (possibly high-level) suggestions on how we could improve the text?
->
-->
Answer (2014 July 21st) by PierreFernique
I just cite the first sentence in the introduction "DataLink defines two distinct but related data-linking mechanisms" (well mechanisms... methods...) : "service descriptor resource" and "links". And after this introduction, it is not clear for me where and when we have to choose the first "mechanism", or the other one, or both. May be a simple example could help the reader.
-->
Response (2014 July 27th) by FrancoisBonnarel
I am convinced that putting the two "linking" mechanisms described at exactly the same level is not clarifying the whole thing. Clarifying this requires some changes in the introduction. Norman also pointed this before interop and I supported the idea to change the introduction at that time. The introduction should start by invocating the {links} resource. A few exemples of usage should be given. The dataLink name should be restricted to this resource. And service descriptor is ..... "Service descriptor".
The service descriptor should be introduced historically like Pat did, in his response on the list, starting from the need of declaring the DataLink service in a DAL query response. A few other examples could be given for service descriptor to illustrate where they are usefull.
2) Technical questions: Concerning the second method (with the PARAM definitions by GROUP): The possibilities opened by this method is very promising. At the first view, it seems simple and flexible, and very useful to build on the fly associated user forms.
a) However, I do not see how to describe REST links, or any URL for which the prefix depends of the parameter values. May be I'm wrong, but it seems that this method can only build URL on this template : http://static_url_prefix?param1=val1¶m2=val2... However, it is quite common that some servers provide their collections on this basis URL template : http://host/variable_path/datasetID (VOSpace links ?). It would be great if basis URLs could be also described by this datalink method.
->
Group member Response (2014 July 18th ) by MarcoMolinaro
I think this is a good point, and maybe it doesn't affect only Datalink, query interfaces from most of the protocols work in the HTTP-GET way. I don't see how this can be answered now, but maybe we can take it into account for future revisions at a higher level than the simple protocol.
->
->
Author Response(2014 July 21st) by MarkusDemleitner
Hm. I'm not wild about this -- much as I appreciate good-looking URLs, I think allowing this is not going to make a better standard. In particular, I'd claim that even if people actually ran such services already, they'd have to write wrapper code anyway in order to make it datalink compliant. That wrapper code would have to do the conversion from IVOID (which should be what's passed in through ID) to their local datasetID, and then going from HTTP parameter to URL part should be straightforward (i.e., of the order of two lines of code). I don't think a complication of the standard is warrented there.
->
-->
Answer (2014 July 21st) by PierreFernique
I have to say that I'm not a very keen supporter of the REST paradigm, but as the document seems to follow this recommendation (introduction page 5), and as VOSpace is RESTfull, it is surprising that the links to the REST servers will be not supported "natively" by the protocol. And in any case, a wrapper is generally a "last resort" solution, rarely implement directly on the server side, and very badly maintained in the long term (my experience).
More generally, as I partially said, the GROUP mechanism does not take into account any variable URL prefix (before the '?'), nor HTTP parameters without any value (flags) or combination of values in the same parameter. It will be a potential issue.
A few existing URLs possibly used in a datalink response for which the GROUP mechanism won't work (red fields)...
1) any VOSpace URLs 2) Dedicated "static" HTTP trees
http://www.cadc.hia.nrc.gc.ca/data/pub/HSTCA/u21x0102t_prev.jpg
Note: The above URL is to a custom CADC service for archive data delivery, which has nice pretty URLs. In our DataLink prototype you would see this URL as an access_url value in the links table and would not have to construct it. -- PatrickDowler - 2014-09-22
http://alasky.u-strasbg.fr/SDSS/DR9/color/Moc.fits
3) A lots of cone search servers :
http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/CFHT/query?POS=83.633083,22.0145&SIZE=0.2333&FORMAT=image/fits&VERB=2 http://wfaudata.roe.ac.uk/ukidssdr9-siap/?POS=83.633083,22.0145&SIZE=0.233&FORMAT=image/fits http://vo.imcce.fr/webservices/skybot/skybotconesearch_query.php?&-ra=83.633083&-dec=22.0145&-size=28.0,28.0&-mime=votable&-out=basic&-loc=500&-search=Asteroids+and+Planets&-filter=120+arcsec
4) Dedicated services:
http://alasky.u-strasbg.fr/footprints/cats/vizier/B/DENIS?product=MOC&nside=512
5) TAP services :
http://geadev.esac.esa.int/tap-dev/tap/run/tap/sync?REQUEST=doQuery&LANG=ADQL-2.0&QUERY=SELECT+TOP+1000+*+FROM+gums.mw+WHERE+1%3DCONTAINS%28POINT%28%27ICRS%27%2C+alpha%2C+delta%29%2C+CIRCLE%28%27ICRS%27%2C+80.89417%2C+-69.75611%2C+0.2333+%29%29
Author response (2014 July 27th) by FrancoisBonnarel
service descriptors and RESTFULL services : Pierre is right that the current version of the description doesn't allow to describe all sort of services, for example REStfull interfaces. Including in the service descriptor metadata to describe template roots or pathes in Restfull URL is probably out of the scope of the first version. (But maybe not of the second one as Pat said in his response) However right now in the descriptor we should at least add a "URL type" PARAMETER in the descriptor, beside . I see at least three possible values for this param: HTTP GET, RestFull, Mixed. Current version of the draft only details the "HTTP GET" case with the "InputParams" GROUP.
Group member Response (2014 July 18th) by MarcoMolinaro
I'd limit it to recommending proper HTTP encoding, that should be sufficient for HTTP GET requests ('&' is not the only possible separator, ';' can also be used for nested GET...I don't see where we can use this latter case, but...)
->
Author Response(2014 July 21st) by MarkusDemleitner
What you're saying is that we should say, at some suitable point:
Parameters not used in a service invocation should not be passed at all, rather than with empty values.
I'd somewhat have expected that to be implied -- do you really think people would pass what in effect are empty strings? I'm not opposed to the prose, just a bit surprised that it might be necessary.
->
--> Answer (2014 July 21st) by PierreFernique
I totally agree that HTTP parameter with empty string is rare, but mainly because, in this case, the parameter is just fully removed ("¶m=" is removed). I suggested to be able to support this common case.
http://masthla.stsci.edu/hla/Footprints/aptfootprint/Footprints.svc/Footprints?POS=83.633083,22.0145&SIZE=0.4666666666666667,0.4666666666666667&INST=&LEVEL=Best
-->
3) My wish. In IVOA we use frequently VOTable as a container (SIA, SSA, TAP, ObsTAP, and now Datalink), but without magic code or any signature to recognize that this VOTable is a Datalink result, or a TAP result or whatever. And concretely it is a nightmare for client which are supporting simultaneously several of these protocols. I recommend to introduce in our VOTable protocols (at least the new protocols) a signature which could be a simple INFO tag (ex:
->
Group member Response (2014 July 18th) by MarcoMolinaro
I agree on this wish. S*AP, TAP (e.g.) are interrogated and answer directly, but Datalink opens up the scenario. _In principle a client will always be able to know in advance what type of service it is querying (Datalink provides standardid), but a specific signature can turn out to be useful.
->
->
Author Response(2014 July 21st) by MarkusDemleitner
Although I'm always a bit nervous when we put in the same information in two places (in this case, the content-type header of the HTTP response and then later the HTTP payload) I think I like that a lot, mostly because I expect people might store datalink responses and re-use them later, when there's not HTTP header any more.
So, I'd say we should have a new subsection in Sect 3 (I'd say it should become 3.2, but if people are worried about renumbering subsections at this late stage, 3.5 would be ok with me, too):
3.2 Protocol declaration
To help clients dispatch between various internal recipients of VOTables even in the absence of HTTP header information, datalink responses serialised in VOTables MUST contain an INFO element with a name SERVICE_PROTOCOL and a content of "datalink" as an immediate child of the VOTable element; the strings are interpreted case-sensitively Services SHOULD declare the version of this document they conform to in the value attribute of the INFO element.
VOTable responses from datalink 1.0 services would thus contain:
<INFO name="SERVICE_PROTOCOL" value="1.0">datalink</INFO>
(This is fashioned after SSAP) What does everyone think? Should something like this get into VOSI? The "dispatch according to content"-thing appears to be something quite frequent, and offering a general, non-heuristic method to do it sounds like good sense to me.
<INFO name="SERVICE_PROTOCOL">ivo://ivoa.net/std/DataLink#links-1.0</INFO>
That enables us to use the same mechanism to refer to new types of services we haven't invented yet.
<INFO name="SERVICE_PROTOCOL">http://wiki.ivoa.net/twiki/bin/view/IVOA/NotInventedYet20140926</INFO>
Question (2014 July 21st) by JoseEnriqueRuiz If "empty value" means "ALL values", one question rises here: How to make a query to gather those datasets with param=empty/NULL/blank ? I do not know if there are use cases for this :-/
no HTTP That's the point. Also, the client can have HTTP API which does not provide easy access to the HTTP header fields. Marku's INFO TAG: Sounds good
-->
* Editor Response (2014 July 21st) by PatrickDowler
I think the idea of describing service output in the DataLink service descriptor is interesting and it is something I thought about. The current use cases revolve around two things:
1. The {links} response solves a variety of discover->download issues such as (i) multiple files per dataset, (ii) alternate representations like previews, (iii) related resources (other sources of metadata, services that can act on the data).
2. The service descriptor was originally conceived to solve the problem of getting from a discovered ID (eg in a TAP or SIAv2 query response) to the {links} resource itself without having to resolve the ID via registry lookup....
We quickly realised that with minimal additional metadata we could use the same mechanism to go from the discovered values to any service that took them as input (the 3 service params and the inputParams)...
We also realised that the {links} response, since it is a votable, could also use service descriptors to describe services (typically lower level access services). Of course, one can put such links directly in the data discovery response if the cardinality of their discovered records and services matches (eg if one identifier in the discovery response can be used to call a service, then you can tell the client about it). That's the whole thing about links: you can add them anywhere you have an identifier that can be used someplace else! But that was not new spec, just new usage.
But, this is all aimed (currently) at forward-linking and how to describe the call to the service. We have not tried to describe what the service will actually do nor the response it might create. For now (1.0) I don't think we need it for the use cases at hand. Further, since we probably do want to add it later, I feel strongly that we should not add any simplictic form that we might regret. As has been mentioned elsewhere, PDL, VO_DML, and several other new-ish things cover some common ground and we should take the time to consider them and prototype.
I think that means adding desciption of the output in DataLink-1.1
* September 10th 2014: a compilation of Discussion among authors on top of FrancoisBonnarel remarks (July 27th)
> 2 ) service descriptors and RESTFULL services : Pierre is right that
> the current version of the description doesn't allow to describe all
> sort of services, for example REStfull interfaces. Including in the
Answer by MarkusDemleitner
Well -- that's for the service descriptor, and this is obviously only relevant when a descriptor is embedded in DAL resoponses. In datalink responses you're free to pass whatever URLs you fancy.
For DAL responses, the things described are in all likelihood post-processing services (e.g., cutout, recalibration), and I doubt many services exist that do these kinds of things with a URL schema violating what the service descriptor can do.
On the other hand, you're explicitely free to describe a datalink service itself in your service descriptor -- if your data structure is sufficiently complex, that's what I'd say you should do; that's what datalink is for: decoupling the service interface from the actual representation.
Which is to say: I don't believe we should make things more "flexible" here. Flexibility is a liability for implementations, and a liability for interoperability. I think we should have a much stronger case before adding features, much less just announcing them.
Answer to MarkusDemleitner by LaurentMichel
I agree that it is not the right time to start a discussion about a general mechanism building URL templates, but the draft cannot ignore the existence of the RESTfull encoding. The strong reason for that is that REST is used by 2 VO standards likely to be involved in datalink responses at least : VOSpace and UWS. This point could be mentioned either by adding a URLType as FB proposed (see above July 27th) or by making the standardID mandatory even for free services. I prefer this second solution since *it avoids possible inconsistencies with URLType.
> 3 ) Jose Enrique pointed a difficulty with the "service_def" paragraph.
> The potential inconsistence (or redondancy) between the acces_url in
> the {links} resource response table and the accessURL in the service
> descriptor.
> The accessURL in the service descriptor should be generic.
Answer by MarkusDemleitner
I'm not sure I understand what you're saying here -- I believe we essentially have three options:
(1) force access_url (table)==accessURL (param) (2) accessURL given, access_url NULL (3) access_url given, acesssURL not in the service descriptor within a datalink response.
Although it sucks to have the same information in two places, (1) from my implementation experience is the most straightforward, and I believe we should simply mandate that. (2) and (3) I could live with. What I'm firmly against implying that there may be situations in which access_url!=accessURL -- that way lies madness.
> By the way, as a kind of shortcut, the "service descriptor resource"
Answer by MarkusDemleitner
Hm -- I don't like the "By the way" here, even in the introduction. I agree, though, that saying there are two distinct but related data-linking mechanism probably is confusing.
>However right now in the descriptor we should at least add a "URL type" PARAMETER in the descriptor, beside . I see at least three* possible*
>values for this param: HTTP GET, RestFull, Mixed. Current version of the draft only details the "HTTP GET" case with the "InputParams" *GROUP.
Answer by MarkusDemleitner
As I said, I'm against over-generalising the protocol, and in particular, building in things that appear to claim we support something that might come in a future standard -- or might, as so often in VO standards, not.
-> Discussion LaurentMichel / MarkusDemleitner
Laurent: I'm still not thinking that supporting RESTfull URLs can be considered as an over-generalisation.
Markus: Hm -- its URL templating, something we've never really tried in the VO as far as I'm aware, and something that has quite a few opportunities to mess up. After all, there are many, many URL schemes, and most I've seen are fairly funky...
Laurent: I insist a little bit just to keep open the possibility to quickly address VOSpace records with Datalink.
Finally replacing http://server/service?ID=paramValue&action=download with something like http://server/service/$ID/download where $ID is replaced with paramValue does not look so much complex.
I've no ambition about templating any sort of URLs but just GET-HTTP en REST.
Markus: Hm -- the devil usually lies in the details --for instance, how is $ID-pathcomp to be interpreted? $ID_pathcomp? Should the value of the ID param still be passed as an HTTP parameter? What's to happen if a parameter referenced in the URL is not a string? Are there any special rules for quoting these things? And so forth.
Laurent: You are right, that is why the URL building mode must be specified somewhere else. This point could be tricky and that cannot be sorted out for this document. That is why I suggest (with françois) to reserve a field specifying how the URL must be constructed and to work in GETHTTP mode until a proper way to build REST URLs is specified.
* Markus*: Meaning: If we write this into a standard (and I admit there seems to be a use case), we should have implementations first to see what can go wrong.
Laurent: A datalink pointing onto a VOSpace, I can do that.
Laurent: The document must have a little room for this possibility even if the way to do the URL encoding is not achieved in the first version.
Markus: Could you live with the language on telling clients to ignore services with an empty accessURL?
Laurent: In a general way, I'm suspicious about the idea or triggering a client action from an absence of a parameter.
Markus :...but I'd say in this case there's not much that can go wrong -- implementors just need to be aware that empty access URLs may turn up in the future. Given that they are in no risk of erroneously operating a service they have no access URL for, at least there won't be silent failures either way.
Laurent: right
Markus: That way, we can later use that as a sentinel that more complex URL building mechanics will be required, and nothing will break.
Laurent: I definitely prefer to say somewhere that the URL is HTTP-GET, REST or something else. From an implementer point of view, the right place to state that is the standardID. It is already used to know how to build URLs for VO services and its scope can be extended for non VO URLs. If we agree with that, we can postpone the definition of the new supported standardIDs and DATALINK will work as it is defined yet by the 1.0 standard.
Markus : I wouldn't have a problem with that, either. But someone would have to write some prose urging client authors to check the standard id (case-insensitively, and possibly ignoring minor versions if they don't care).
Laurent: AS far as I know, there is no standard ID referring to an external URL (e.g. ivo://ivoa.net/nostd/url) I've no idea about what is legal to do here. As I said, the first version of the protocol would just have to mention the role of the extended standardID and state the GETHTTP is taken by default . Meanwhile I'll have a look at a possible formalism for non standard standardID
->
Back to initial MarkusDemleitner answer to FrancoisBonnarel for URLType proposal
If we believe there's an actual place for URL templating, then we should say (provided we go for access_url==accessURL above) or If accessURL is NULL or missing, clients must ignore the service definition. This is an extension mechanism that might, for instance be used for more complex ways of URL generation in future standards.
If we really went for URL templating later, we could then say something along the lines of "for your crazy URL scheme, define accessURL-template and make accessURL null. Then do $FOO in your template yadda...".
But while we're talking: I have a really bad feeling about passing this on without more client prototypes, as least. We have something in SPLAT that we'll need to review against the current spec, in particular as regards telling datalink from data processing services (I seem to remember Margarida had an issue there, but she's on vacation right now). Who else has clients running? Non-trivial ones, even? If nobody has, how can we make them happen?
[Disclosure: Shortly before Madrid, I've started a bit of javascript that would provide a SAMP-enabled datalink+data processing client in the Browser; but when it didn't finish for Madrid for this reason and that, I let it slip again. If someone felt like this is a good idea, I'd gladly pull it out again].
_*Answer by PatrickDowler to the whole discussion*
-- 1. parameterised URLs are not going to help you use VOSpace -- there is a lot more to RESTful service invocation than URL templates.
-- 2. the more important thing we are missing is any way to describe authentication requirements as auth pretty much always means different URLs (usually different scheme and/or path).
These two points are related as they both come up when one is trying to deal with RESTful web services: it is with restful services that the access_url in the table will usually contain something different (longer) than the accessURL in the service descriptor. One solution would be to have the {links} table contain either an access_url or a service_def but not both; in the latter case, the client would have to use the service_def to find a descriptor resource and construct the URL. This would get rid of a redundancy that leads to the conflict in #3 below, and that is one that comes up with REST services like VOSpace.
As for REST services, this is quite a complex issue and I think we can't do much better than give the descriptor with standardID and accessURL (for that capability) and rely on the client knowing all that the stadardID entails. Even then, it isn't so simple...
For vospace, one would have to convey the standardID and accessURL for the {nodes} resource and the standardID and accessURL for the {transfers} resource. These have different semantics and the client needs both in principle. So, in this case maybe the right way to use "datalink" is to put two rows in the {links} table, each with the same vos URI, and with service_def values to indicate the two capabilities being described. One could only describe the normal VOSpace params (eg for {nodes] that is limit, uri, detail, and view, iirc). There is no feasible way to convey the calling semantics.
Now, to actually describe a REST service, one needs something fancier, eg WADL or things like that as discussed in GWS. The problem with those, and with PDL, is that we cannot embed them inside VOTables. We could/should discuss (soon) whether VOTable needs to allow a kind of resource that can embed such descriptors, or such things need to be designed to be embeddable in VOTable in a formal way... but that is not something for DataLink-1.0 ... for now, we should give advice and improve the vospace example.
It would be nice to be able to describe multiple capabilities in a single service_def. One thing that is missing from everywhere except VOSI capabilities is a way to describe the authentication requirements of an interface. In VODataService, I recall that the cardinality is:
1 service ... N capability (standardID) ... M interface (accessURL)
whereas our service descriptor only supports:
1 service ... 1 capability (standardID) ... 1 accessURL
and our accessURL doesn't have any additional metadata; the biggest thing missing for practical use is info about authentication, but we'll have to live with that until 1.1
An addendum regarding URL templating: the point of templating is (generally) to let clients construct a URL which they know will go direct to a resource. For that you need things like WADL, which aren't necessarily pretty, and which introduce yet another document and yet another technology. However you don't necessarily need templating, if your single/starting {link} URL produces a document which points to the other ones. If a {link} document says "_here_ is the image, here is the background, ..." then you don't need templates -- the client just needs to 'follow its nose', in exactly the same way that a human does on an HTML page. This is the 'Linked Data' idea, and is easy. -- NormanGray, 2014-10-21
WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or not the Standard.
IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.
Requires a second reference implementation and any implementation validators to be described above. Otherwise approved.
-- SeverinGaudet - 2014-10-02
The document is reasonably clear and readable. I do have a few concerns that I'd like to discuss before approval, recognizing with apologies that some of these would have been more constructive earlier in the process.
Reference Implementations
Creating reference implementations help ensure that a spec is unambiguous, complete, practical and interoperable. With those benefits in mind, I'd love to see some enhancements to the reference implementations and/or the descriptions of them with respect to this spec.
In Heidelberg we discussed mechanisms to provide a more direct and efficient access to certain products, with the primary example being preview images. Bascially, given a data discovery response, a client should be able to quickly know URLs to preview images for any or all of the result rows. Ideally these previews could be offered in various sizes to suit a client's needs (e.g., thumbnail for showing many at a time, medium for showing several on a page, and large for a browser-friendly higher resolution display of a single preview). Scrolling through the results of a query in the MAST data discovery portal show how a client might make use of this: http://mast.stsci.edu
I'm disappointed that this didn't end up being one of the motivating use cases. I realize it may be too late in the game to advocate for this again, but in reading the discussions above about templating URLs, etc., it seems like there may be room to work through some discussion on this.
What I'm actually hoping is that the use case I mentioned is actually supported by this document, and that I just can't figure out how. If someone could describe such an example, that would be wonderful.
-- TomDonaldson - 2014-11-05
The link from a record in the response to a preview can be conveyed to clients using a service descriptor.
1. set and ID attribute on a field with an identifier, e.g.: <FIELD ID="IDVALUE" ... />
2. include a service descriptor for a custom "service" that returns the preview in the discovery response, e.g.:
<RESOURCE type="meta" utype="adhoc:service" ID="previews"> <PARAM name="accessURL" value="http://example.com/previews" /> <GROUP name="inputParams"> <PARAM name="IDENT" datatype="char" arraysize="*" value="" ref="IDVALUE" /> <PARAM name="SIZE" > <VALUES> <OPTION value="small" /> <OPTION value="medium" /> <OPTION value="large" /> </VALUES> </PARAM> </GROUP> </RESOURCE>
3. This tells the client they can create URLs of the form: http://example.com/previews?ID=<value from te IDVALUE column>&SIZE=<small|medium|large>
It does not tell then what this service "means". That's hard (maybe in a future version). You can add UCDs to the input params to help say what they mean... The values for SIZE are arbitrary; here I chose words, but one could have made that param datatype="integer", or something else. Once a few places actually do that and if we can agree on what the SIZE parameter should be, then that service could be a standard and thus get a standardID to describe it... For a custom service, you probably want to adorn that service descriptor with some additional descriptive text (INFO element?) as allowed by VOTable.
The less efficient way to do this is to use the links response and tag the link's semantics value as #preview. It is more explicit, it allows for static URLs or service descriptors, it allows for a text description clients could display, but it is an extra call. You can send multiple ID values to the links resource in one call, so it isn't 1 extra call per discovered dataset, but it is work to implement and more things have to happen while (I assume) trying to display results. -- PatrickDowler - 2014-11-28
Nice standard. Just a couple of very small comments.
-- JesusSalgado - 2015-01-28
Approved -- AndreSchaaff - 2015-01-12
* Sect. 1: "The current version provides no way to describe the output of a service, but this may be added in a future (minor) revision of this specification." -- this is potentially of major importance in the use of the standard. Is there no way to have some mechanism already in v1.0? Even a information on mime type and type (image, spectrum, catalog) for the client to be able to plot the output or send it by samp
if the form should be
http://foo.bar/datalink?ID=ivo://example.org/data?
put an example in the text
Markus & Pierre
Thanks. -- PatrickDowler - 2014-11-28
The document looks good -- the right length, and clear.
I've just looked at the DataLink PR document with a particular REST-shaped question in mind. I found that I couldn't get a satisfactory answer from the document.
I wanted to ask: would it be possible for a client to ask for the links response in a format other than VOTable, via an Accept header, and would it be permissible for a service to provide it in a different format? In my mind, obviously, is allowing a service to provide a Linked Data style response, meaning that the response is in one or other RDF syntax. (DataLink is a poster-child Linked Data application -- note, I'm not suggesting that it's a priority to do this, but I would hope that the spec would make it permissible for an intern to implement it one afternoon in future)
1. A REST-style GET of this URL would imply that the client could make its GET request with an Accept header. If that's 'application/x-votable+xml', that's fine, but it should be at least permissible to give a different Accept header (such as text/turtle, for example). If the service can't supply that, it's supposed to reply '406 Not Acceptable'. I can see that it would be permissible to request a links resource with ?RESPONSEFORMAT=text/turtle, and permissible for a service to reply with such content (so the answer to my original question is 'partly yes'). Is it permitted, however, for a service to respect the Accept header? (this would probably be a more normal pattern in a Linked Data context). My reading of Sect. 3.3 "Unless the incoming request included a RESPONSEFORMAT parameter requesting a different format, the content-type header of the response MUST be application/x-votable+xml" is that the answer is no. The discussion above includes suggestions that implementers should be aware of, and respect, well-known HTTP semantics, which I take to mean allowing the full range of HTTP interactions, including Accept-based retrievals (pace this discussion, it might be worth a remark in the document to remind server- and client-side implementors that there's this slightly different style possible). The cross-reference to DALI (specifically its Sect 4.2) implies I think that the 406 error would be permitted (as being a normal HTTP error code).
http://example.org/foo/links
, should I get a 406 response, rather than a 200 VOTable? (I think the answer is 'yes'; and as noted above this would be permissible).
3. I will ritually remark that the x-* media subtype is deprecated, and that the process for registering new subtypes (such as application/votable) is intended to be streamlined compared to what it was before.
We checked that the DataLink P.R. fullfil needs of the Theory I.G.
That is the case.
Approved
-->
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
Working Groups
Interest Groups
Committees