RegTAP 1.1 Proposed Recommendation: Request for Comments

Summary

Registries provide a mechanism with which VO applications can discover and select resources - first and foremost data and services - that are relevant for a particular scientific problem. The RegTAP specification defines an interface for searching this resource metadata based on the IVOA's TAP protocol. It specifies a set of tables that comprise a useful subset of the information contained in the registry records, as well as the table's data content in terms of the XML VOResource data model. The general design of the system is geared towards allowing easy authoring of queries.

RegTAP 1.1, as a minor version increment over RegTAP 1.0, is backward compatible. The main differences between 1.1 and 1.0 are as follows:

  • new allowed res_detail values for testQueryStrings
  • more generic type information in schema tables
  • mapped terms in fields for dates and resource relationships to a vocabulary for DataCite compatibility
  • case-insensitive query support from ADQL 2.1
  • new columns for service mirrors, rights, and authentication and authorization of protected data and services
  • new table for alternate identifiers, supporting DOIs, ORCIDs, bibcodes, and future identification schemes
The latest version of RegTAP 1.1 can be found at: A slightly updated, unofficial version including the changes made after RFC comments from Ops, GWS, and Semantics is available from http://docs.g-vo.org/RegTAP.pdf.

Reference Interoperable Implementations

Two separate reference implementations of server-side architecture exist at GAVO (and other archives using GAVO's codebase) and STScI

The TOPCAT client is interoperable with both reference implementations, though it does not use any of the new 1.1 features yet. Likewise, Aladin's registry bundle is being generated from a RegTAP 1.1 query.

Implementations Validators

The RegTAP validator (currently at http://docs.g-vo.org/regtap-val; should this move into the VCS for the standard?) has been updated to cover the main new features.

For reviewers, here's a set of RegTAP queries exercising the main user-visible new features (TAP access URLs above):

alt_identifiers – find VO resources and their titles that have DOIs:

select ivoid, res_title, alt_identifier
from rr.resource
natural join rr.alt_identifier
where alt_identifier like 'doi:%'

rights_uri – find VO resources that have a CC license declared:

select ivoid, res_title, rights_uri
from rr.resource
where rights_uri like 'http://creativecommons.org/%'

or find what license URIs are already in use:

select distinct rights_uri from rr.resource

mirror_url – find mirrors available for a known access url (in this case, indicating that the service is available through https, too):

select ivoid, mirrors.mirror_url
from rr.interface as intfs
join rr.interface as mirrors
using (ivoid,intf_index, cap_index)
where intfs.access_url='http://dc.zah.uni-heidelberg.de/antares/q/cone/form'

authenticated_only – find resources unavailable without authentication (note that we do not claim that's enough to actually operate them; the use case at this point is filtering them out with a view to a VO that has more of them):

select distinct ivoid from rr.interface where authenticated_only=1

vocabulary mapping – use just a single term to find out services of data collections:

select res_title
from rr.resource as res
natural join rr.relationship as rel
where relationship_type='isservedby'
  and rel.related_id='ivo://nasa.heasarc/services/xamin'



Comments from the IVOA Community during RFC/TCG review period: 2019-06-15 - 2019-07-31

The comments from the TCG members during the RFC/TCG review should be included in the next section.

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document



Comments from TCG member during the RFC/TCG Review Period: TCG_start_date - TCG_end_date

WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.

IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.

TCG Chair & Vice Chair

Applications Working Group

(Review based on the unofficial updated version at http://docs.g-vo.org/RegTAP.pdf)

The changes since v1.0 seem reasonable and well-described, and the document overall seems to be in good shape.

One comment/suggestion:

OAI-PMH is mentioned several times, but I didn’t see a reference to official documentation on that protocol. Should we include such as reference, perhaps being specific about the protocol version if that is important?

  • Good point. There's now a reference to the standard where it's first mentioned (early in the introduction) -- MarkusDemleitner - 2019-09-09
-- TomDonaldson - 2019-08-15

Data Access Layer Working Group

After a clarification that led to the re-introduction of the ivo_nocasematch in place of the ILIKE and disentangling RegTAP from ADQL in clashing-defining query functions, we support this specification. -- MarcoMolinaro - 2019-08-16

Data Model Working Group

First comment I have is that the specification is quite complete and detailed. However, reading it completely to have a global view I have had the feeling that the structure of sections 3-7 look more implementation details or appendixes and the real meat of the specification starts in section 8, what could be better to have it before, including the references to the implementation details of the next sections within the text whenever relevant.

I imagine that there are historical reasons of this structure that was already present in version 1.0, so, for a possible future 2.0 version, this is just a suggestion of a possible re-arrange of the text so developers can discover easily what needs to be implemented (introduction->use cases description->tables->data content rules->use cases implementation solution->Implementation notes and appendixes)

* In specs, it's always hard to decide whether to introduce the basic concepts first and present the interesting stuff later (which minimises forward references but has the negative effect you describe) or delay the definitions of the terms you're using in the central part until later (which keeps the boring stuff out of the way but uses lots of odd words undefined if people read sequentially). Either way: If we change it, it's material for the next iteration at the earliest. -- MarkusDemleitner - 2019-09-09

Focusing on the 1.1 changes from version 1.0, I do not see any impact data model that could block this update. I only have four comments (no changes requests) on four points that looks to me a little bit open.

Section 4: In the text, it looks that there is a need to update the translations of the IVOA vocabularies at the services by the services operators with vocabularies outside the spec. The fact that there is not definition of how to treat deprecated terms or which vocabulary version is implemented could be an issue. I understand this is a more global problem that RegTAP but I wonder is something can be done to Is there minimise the impact within RegTAP, for example with a vocabulary version metadata at service level.

  • That VOResource offloaded some term lists into vocabularies may be read as: these things are intended to change without having to change specs. In effect, they're external metadata parameterising the spec, and both servers and clients will have to deal with such changes (which mostly will be trivial: think subject keywords in a browsing interface, for instance). In the case of the relationship translations, I doubt there's going we'll see more deprecations, so here, that's probably an academic discussion. In general, however, I'd say yes, implementors depending on the content of vocabularies should indeed update from ivoa.net/rdf now and then. But before we can require that, Semantics has to come up with a stable spec saying how they'll do that, which will certainly come too late for this spec. -- MarkusDemleitner - 2019-09-09
Section 8.2: I checked VOResource 1.1 about the role_name format and the only reference that I find is "This should be exactly one name, preferably last name first (as in "van der Waals, Johannes Diderik")." I think this is not too strict enough to be used effectively. This is probably part of VOResource but also part of the service validator when there is a new insertion of a new record. Any suggestion on what has to be changed (and in which spec) to prevent this problem?

  • I agree it would be a good idea if the VOResource validator were amended to guess the format names are written as and perhaps produce warnings if they look "wrong". I'll take it to the RofR people when I next talk to them (but as you say it goes beyond RegTAP, although it of course impacts RegTAP's usability for author searches). -- MarkusDemleitner - 2019-09-09
Section 8.8: "Clients not prepared to authenticate to services should always include a authenticated_only=0 condition" In the future, instead of a single value (0 or 1), to point to IVOA Single-Sign-On Profile: Authentication SSO mechanisms list in some way would be even more useful (e.g. 0 for non authenticated, 1-7 for the security methods described in the SSO spec and, maybe, 8 for other). Perhaps, the security method metadata is not expected to be handled here.

  • Well -- we still don't know what securityMethod content will be there, which is why we're leaving all this open. When we have use cases for discovery in the presence of authentication, we'll see, but I'm pretty sure we won't want to touch authenticated_only even then. I guess essentially all interesting use cases will require more metadata. But then you notice I only talk about authenticated_only=0 in the current spec, so in principle all other values are "reserved" right now. Do you think we need to stress this? -- MarkusDemleitner - 2019-09-09
Finally, whenever ILIKE is part of ADQL, should we have a new version of RegTAP pointing to this new ADQL version or this ADQL deviation is going to be maintained here? I think it would be cleaner (and better) to point to the new ADQL but this could have some impact depending on the changes of the new ADQL (and implementations)

  • I basically agree, and the original 1.1 draft had already deprecated ivo_nocase_match. I am also entirely sure that ivo_nocase_match's implementation will always be simple ILIKE. Still, people from DAL have brought forward concerns that ivo_nocase_match nicely isolates them from the concerns for Registry. Well, I'll not quarrel either way, since I believe it is a minor matter not hurting much either way. Anyone feeling more strongly about this is cordially invited back for 1.2... Oh, and thanks for your review! -- MarkusDemleitner - 2019-09-09
In summary, the specification is mature and implementable so data model WG approve it

-- JesusSalgado - 2019-09-05

Grid & Web Services Working Group

Some minor comments

  • On pages 4 and 5, you refer to "amount" of records or rows. I would avoid this kind of sentence in a standard doc.
    • Hm – not quite sure what you're referring to here, as the text doesn't say "amount" anywhere (at least back to rev. 5428). Or are you objecting to "At the time of writing, there are roughly 20000..."? There, I'd say giving an idea of the order of magnitude this was written for might be a nice service. This and the following replies: -- MarkusDemleitner - 2019-08-13
  • Figure 1. The Caption refers to some tagging that is not clear to me. I would say that this picture is the architectural diagram for this specific standard.
    • Ah, yes – the caption referred to the old arch diagram. Good catch. I'm now writing “IVOA Architecture diagram with the IVOA Registry Relational Specification (shown as ``RegTAP'') and the related standards.”
  • Page 4. About the SOAP implementation, the sentence is quite generic; maybe you can drop it.
    • Are your referring to “Built on SOAP and an early draft of...”? If so, I'd rather keep it, since it explains why we abandoned RegTAP's (incompatible) predecessor, which might otherwise seem just a whim.
  • page 5 terminology: relational registry == rr. It is clear but better to specify the prefix.
    • Now mentioning the schema name (rr) in the first paragraph of the section.
  • As you discuss SecurityMethod with some details, I would include a reference to the "standard."
    • Now saying “usually taken from the SSO document \citet{2017ivoa.spec.0524T}”; the trouble is that it's still not clear where we'll say what the content of SecurityMethod actually is, so the reference is only half pertinent.
  • Page 9 first-line the formatting splits "xs:token" into two lines, this could be misleading (xs:token become xs:to-ken), this is a "nano" comment but better to keep it on one line.
    • Frankly, I'm happy that the beast hyphenates this (rather than letting it stick out). And with token, I'd consider the risk for misunderstandings is minimal.
  • Page 12: some prefix refers to two versions of the same schema (e.g., ssap and vs), maybe it is not necessary to cite both.
    • Well, the situation that a single canonical prefix corresponds to two different namespace URIs is ugly enough to be explicit about it. Incidentally, we won't have any more of that thanks to the XML versioning policies. Let's hope the old namespace URIs with shared prefixes will die out soon (so we can finally forget about this uglyness).
  • I agree with Mark that it could be useful to summarize the Appendix.
    • It's there now. All changes mentioned here went in in Volute rev. 5571.
-- GiulianoTaffoni - 2019-08-12

Registry Working Group

With the removal of proposed ADQL 2.1 ILIKE dependency in favor of a note about the existence of the future function, I approve of the current version of the document (rev 5576) -- TheresaDower - 2019-09-10

Semantics Working Group

Remarks from Carlo Maria Zwölf

  • Page 4
    • "Even if it were, data discovery would at least be fairly time consuming if
      each client had to query dozens or, potentially,hundreds of publishing
      registries
      ". --> If a client performs queries in parallel, the number of
      registries is not a matter. In some cases a central service gathering all the
      information could be a bottleneck of the IVOA infrastructure. A discussion
      would be welcome on these aspects. It it also important to mention sync-issue
      between the publishing registries and the RegistryOfRegistries.
    • "this first attempt which was quickly" --> check if 'which' should be removed
    • "The simplification yields 14 tables" --> I would suggest to replace by "The simplification
      yields to a schema composed of 14 tables"
    • TAP_SCHEMA is not defined. What does this stand for? Why is it in red? What is the implied
      color convention?
  • Page 5
    • "The largest table,table_column, has about a million rowsat the time of
      writing
      ". --> A standard document is neutral about the practical and specific
      implementation. For highlighting this neutrality I would suggest to write
      something like 'If we use this standard for describing all the resources available in the publishing
      registries, then the table table_column would contain about a million row'
    • "table_column" --> why in green. What is the color convention?
  • Page 6
    • Caption of figure 1 : "Relational Registry" tagging does not appear in the
      figure, as it is said in the caption. Please highlight the current described
      standard in red on the figure.
    • "...table using xpaths into the registry documents.This document should not
      in general
      " --> Does 'this document' refer to the registry document or the
      current document? It is ambiguous.
  • Page 15
    • In both TAP_SCHEMA and the VODataService tableset, this schema MUST be
      associated with a utype matching the data model identifier given in sect. 7"
      --> It would be more clear to write "In both TAP_SCHEMA and the VODataService
      tableset, the rr schema MUST be
      associated with a utype matching the data model identifier given in sect. 7
    • "On the values in the utype columns within TAP_SCHEMA except for the schema
      utype, see section 6
      ." --> Please check this sentence. It is not clear for me.
In general --> Please define color convention and differences between red & green parts of text.

Summary response

* As to typos and style proposals not mentioned below: Should all be fixed in Volute rev. 5574 – thanks for the careful reading.

  • I've added a few words on the typographic conventions at the end of sect. 1.1. They were, indeed, not obvious. I hope the logic is a bit clearer now.
  • As to a discussion of the general registry design, I think this document isn't quite the right place to do that; if a deeper analysis of these fundamental questions is desired, I'd say it should go to Registry Interfaces.
  • As to writing "If we use this standard..." -- given that this is an update and RegTAP has been in fairly active use for quite some time now, I'd say that'd be a bit too self-deprecating. I'd be ok with "The largest table in existing RegTAP 1.0 services at the time of writing" or something like that if that's actually preferred. On the question of whether to have size hints at all here see the response to the GWS comments.
-- MarkusDemleitner - 2019-08-14

Data Curation & Preservation Interest Group

Education Interest Group

Knowledge Discovery Interest Group

Solar System Interest Group

Theory Interest Group

Time Domain Interest Group

Operations

  • As of 2019-06-13, the "latest" version pointed at from this RFC page is http://ivoa.net/documents/RegTAP/20190503/index.html, but the current latest version is actually http://www.ivoa.net/documents/RegTAP/20190529/index.html. It's 20190529 that I'm commenting on here.
  • Fig 1: the caption says '(tagged with "Relational Registry")' . I can't work out what this note means - should it say something like '(tagged as "RegTAP")' ?
  • Appendix D: In this version (20190529) all the changes since REC-1.0 have been condensed into a single section (D.1) to avoid extra content that only serves historical interest of VO archaeologists; I think this is good practice. By the same token, does it makes sense to delete all the sections (D.2-D.9 in 20190529) that detail how it got to REC-1.0? As an issue affecting preparation of all REC-track documents, this is probably something which also ought to be discussed at TCG or SDP level.
    • I'm not really leaning one way or another, but I'm not sure what condensing the history from... well, nothing, really, to 1.0 would entail. I could condense the history from the first draft to 1.0, but is that really something that's useful to people? I'd tend to leave things as they are for now and see what TCG and/or SDP have to say for the next version. -- MarkusDemleitner - 2019-08-14
  • Appendix D.1: One change in the text which I don't see noted here is changes to the standard_id constraints in the example queries from section 10, e.g. 10.3 in REC-1.0 has WHERE standard_id='ivo://ivoa.net/std/sia' , and in PR-1.1-20190529 has WHERE standard_id LIKE 'ivo://ivoa.net/std/sia%' . I appreciate this may be a result of changes elsewhere in the standards landscape rather than in RegTAP itself, but it would be useful for people making use of these example queries if the change log summarised what's changed in recommended practice since REC-RegTAP-1.0. (There's also a correction to the argument order in the ivo_hashlist_has invocation in section 10.3 which should probably be logged for completeness).
  • This RFC page says "The TOPCAT client is aware of RegTAP 1.1 features and is interoperable with both reference implementations" . Yes it is interoperable, because of the 1.1->1.0 backward compatibility. However, can you refresh my memory about what 1.1 features it's aware of? I'm not denying it! Just can't remember what I may have claimed.
    • Ouch. I think I was thinking about the authentication experiments when I wrote that, but these have disappeared again. So, I guess we'd need to tone the language down a bit here. If you'd ask me what I think TOPCAT should take up, it would be mirror_url to provide failover for registries and GloTS. And perhaps other things, as that would perhaps make it attractive for VizieR to put in their mirror URLs. -- MarkusDemleitner - 2019-08-14
  • Plus a few typos:
    • Sec 4.1: "whereever" -> "wherever"
    • Sec 6: "specifiation" -> "specification"
    • Sec 8.1, vr:organisation: "is to be references by IVOID" -> "is to be referenced by IVOID" ?
    • Sec 8.10: "slight denormalization of the vr:Relationship type: Whereas..." -> "slight denormalization of the vr:Relationship type: whereas..." ? * Typos fixed in volute rev. 5575.
-- MarkTaylor - 2019-06-13

Mark, I updated the link. Thanks and apologies.

As for the TOPCAT support, I'm unaware of specific 1.1 features and will leave that with other comments for Markus to respond to as author; possibly the intention was noting the schema additions didn't break anything, with altIdentifier queries working and no broken baked-in example queries? Those are the main points that came to my mind when testing my reference implementation with TOPCAT. -- TheresaDower - 2019-07-24

Thanks for updates - all looks OK now. I will think about mirror_url implementation. -- MarkTaylor - 2019-08-14

Standards and Processes Committee


TCG Vote : Vote_start_date - Vote_end_date

If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.

Group Yes No Abstain Comments
TCG *      
Apps *      
DAL *      
DM *      
GWS *      
Registry *      
Semantics *      
DCP        
KDIG        
SSIG *      
Theory        
TD        
Ops *      
StdProc        
Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r23 - 2019-10-09 - BaptisteCecconi
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback