Difference: SimDALRFC (1 vs. 22)

Revision 222017-02-06 - MireilleLouys

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Changed:
<
<
Public discussion page for the IVOA SimDAL! 1.0 Proposed Recommendation.
>
>
Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.
  The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Discovery and Access to simulation data has been a long term VO goal. It is nice to see that this goal is now ready to be achieved as the first implementations prove it. Specially nice to see that DALI constistency has been much improved in the last versions. We validate this specification for recommendation.

On the long term we recommend authors and editors to pay attention to a couple of points for better integration to the full DAL landscape and better interoperability between observational and simulation data :

- convergence of DataLink technics which may be reached in next versions of both SimDAL and DataLink.

- convergence in query languages which may also result from efforts on SimDAL side as well as Simple access or TAP side for future versions

- complementarity and coupling of SKOS technology and ucd technology for coding the semantics looks important to us and is probably possible. But this is more in the hand of the semantics working group which is collaborating with TIG on that.

-- FrancoisBonnarel , MarcoMolinaro - 2016-11-24

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).

Answer (Carlos Rodrigo): The SVO implementation (http://svo2.cab.inta-csic.es/theory/simdal1/) includes all three types of services: repository, search and dataAccess. We implemented all the mandatory resorces and some optional ones (for instance, {cutouts-preview}). It is presented in the web page with explanations of what we consider a typical workflow and an example of use because we thought that, as simdal has many endpoints, it was more ilustrative to do it so. But, in each case, links are given to the URL's of the real implementation of each resource.


2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

Answer: Ok, we realize that this is a tricky term so that we have removed it completly

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.3 - pg12: Question mark in place of reference to JSON standard

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.5 - pg15: another question mark in place of reference (VOTable)

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.

S5.6 - pg37: "the same way than the one" => "the same way as the one"

Answer: Thank you. We corrected the document.

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

Answer: Thank you. We corrected the document.

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Answer: Thank you. We reformulated this point.

Appendix B - pg48: "and some developpers" => "and some developers"

Answer: Thank you. We corrected the document.

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Answer: Thank you. It now also corrected.

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Answer: Thank you. Done.

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Answer: Thank you. This is now corrected.

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

Answer: Thank you. We corrected the document.

-- MarkCresitelloDittmar - 2016-10-26

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

I approve this document.

-- BrianMajor - 2016-12-09

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

I have read the specification carefully (2016-10-14) and proposed a few typo corrections for the Appendix A and B namely, directly to the authors/editors.

Answer: Thanks. We have adressed the typo corrections in the January version.

Table labels and captions might help to clarify the goal of such table examples for the reader and reference them more easily. The Acknowlegment section does not mention the support of national projects --> can be mentioned.

Changed:
<
<
Answer: The projects are now acknowledged in the head of the document.
>
>
Answer: The projects are now acknowledged in the head of the document.
  The document is very rich and covers the use cases precisely and with working examples. It is didactic and useful as a guide for implementation. The SKOS VO theory vocabulary has been defined previously in the SimDM specification already and is appropriately cited in the specification.

The heterogeneity and scale of possible simulation parameters is very large and deserves a specific vocabulary as defined in the VO Theory interest group. The overlap of such a vocabulary with Unified Content Descriptors is not so important. When dealing with the comparison of simulated observations and genuine observations , a mapping between SKOS concepts and UCDs might help to homogeneize the metadata.

-- MireilleLouys - 2017-01-10

Changed:
<
<
-- Answers: David Languinon and Franck Le Petit (24th January 2017)
>
>
-- Answers: David Languignon and Franck Le Petit (24th January 2017) new update of the document uploaded
Added:
>
>
After reviewing version 20170130 (http://ivoa.net/documents/SimDAL/20170130/PR-SimDAL-1.0-20170130.pdf) of the document, I support the recommendation of this specification which is rich and helpful as a guideline to publish services for simulated data.

-- MireilleLouys - 2017-02-06

 

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 212017-01-24 - FranckLePetit

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL! 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Discovery and Access to simulation data has been a long term VO goal. It is nice to see that this goal is now ready to be achieved as the first implementations prove it. Specially nice to see that DALI constistency has been much improved in the last versions. We validate this specification for recommendation.

On the long term we recommend authors and editors to pay attention to a couple of points for better integration to the full DAL landscape and better interoperability between observational and simulation data :

- convergence of DataLink technics which may be reached in next versions of both SimDAL and DataLink.

- convergence in query languages which may also result from efforts on SimDAL side as well as Simple access or TAP side for future versions

- complementarity and coupling of SKOS technology and ucd technology for coding the semantics looks important to us and is probably possible. But this is more in the hand of the semantics working group which is collaborating with TIG on that.

-- FrancoisBonnarel , MarcoMolinaro - 2016-11-24

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).

Answer (Carlos Rodrigo): The SVO implementation (http://svo2.cab.inta-csic.es/theory/simdal1/) includes all three types of services: repository, search and dataAccess. We implemented all the mandatory resorces and some optional ones (for instance, {cutouts-preview}). It is presented in the web page with explanations of what we consider a typical workflow and an example of use because we thought that, as simdal has many endpoints, it was more ilustrative to do it so. But, in each case, links are given to the URL's of the real implementation of each resource.


2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

Answer: Ok, we realize that this is a tricky term so that we have removed it completly

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.3 - pg12: Question mark in place of reference to JSON standard

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.5 - pg15: another question mark in place of reference (VOTable)

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.

S5.6 - pg37: "the same way than the one" => "the same way as the one"

Answer: Thank you. We corrected the document.

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

Answer: Thank you. We corrected the document.

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Answer: Thank you. We reformulated this point.

Appendix B - pg48: "and some developpers" => "and some developers"

Answer: Thank you. We corrected the document.

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Answer: Thank you. It now also corrected.

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Answer: Thank you. Done.

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Answer: Thank you. This is now corrected.

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

Answer: Thank you. We corrected the document.

-- MarkCresitelloDittmar - 2016-10-26

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

I approve this document.

-- BrianMajor - 2016-12-09

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

I have read the specification carefully (2016-10-14) and proposed a few typo corrections for the Appendix A and B namely, directly to the authors/editors.

Added:
>
>
Answer: Thanks. We have adressed the typo corrections in the January version.
 Table labels and captions might help to clarify the goal of such table examples for the reader and reference them more easily. The Acknowlegment section does not mention the support of national projects --> can be mentioned.
Added:
>
>
Answer: The projects are now acknowledged in the head of the document.
 The document is very rich and covers the use cases precisely and with working examples. It is didactic and useful as a guide for implementation. The SKOS VO theory vocabulary has been defined previously in the SimDM specification already and is appropriately cited in the specification.

The heterogeneity and scale of possible simulation parameters is very large and deserves a specific vocabulary as defined in the VO Theory interest group. The overlap of such a vocabulary with Unified Content Descriptors is not so important. When dealing with the comparison of simulated observations and genuine observations , a mapping between SKOS concepts and UCDs might help to homogeneize the metadata.

-- MireilleLouys - 2017-01-10

Added:
>
>
-- Answers: David Languinon and Franck Le Petit (24th January 2017)
 

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Changed:
<
<
Answer: Thank you Carlos, we have done the update in the document.
>
>
Answer: Thank you Carlos, we have done the update in the document.
 

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 202017-01-10 - MireilleLouys

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Changed:
<
<
Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.
>
>
Public discussion page for the IVOA SimDAL! 1.0 Proposed Recommendation.
  The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Discovery and Access to simulation data has been a long term VO goal. It is nice to see that this goal is now ready to be achieved as the first implementations prove it. Specially nice to see that DALI constistency has been much improved in the last versions. We validate this specification for recommendation.

On the long term we recommend authors and editors to pay attention to a couple of points for better integration to the full DAL landscape and better interoperability between observational and simulation data :

- convergence of DataLink technics which may be reached in next versions of both SimDAL and DataLink.

- convergence in query languages which may also result from efforts on SimDAL side as well as Simple access or TAP side for future versions

- complementarity and coupling of SKOS technology and ucd technology for coding the semantics looks important to us and is probably possible. But this is more in the hand of the semantics working group which is collaborating with TIG on that.

-- FrancoisBonnarel , MarcoMolinaro - 2016-11-24

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).

Answer (Carlos Rodrigo): The SVO implementation (http://svo2.cab.inta-csic.es/theory/simdal1/) includes all three types of services: repository, search and dataAccess. We implemented all the mandatory resorces and some optional ones (for instance, {cutouts-preview}). It is presented in the web page with explanations of what we consider a typical workflow and an example of use because we thought that, as simdal has many endpoints, it was more ilustrative to do it so. But, in each case, links are given to the URL's of the real implementation of each resource.


2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

Answer: Ok, we realize that this is a tricky term so that we have removed it completly

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.3 - pg12: Question mark in place of reference to JSON standard

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.5 - pg15: another question mark in place of reference (VOTable)

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.

S5.6 - pg37: "the same way than the one" => "the same way as the one"

Answer: Thank you. We corrected the document.

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

Answer: Thank you. We corrected the document.

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Answer: Thank you. We reformulated this point.

Appendix B - pg48: "and some developpers" => "and some developers"

Answer: Thank you. We corrected the document.

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Answer: Thank you. It now also corrected.

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Answer: Thank you. Done.

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Answer: Thank you. This is now corrected.

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

Answer: Thank you. We corrected the document.

-- MarkCresitelloDittmar - 2016-10-26

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

I approve this document.

-- BrianMajor - 2016-12-09

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Changed:
<
<
Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28
>
>
Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28
 

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Added:
>
>
I have read the specification carefully (2016-10-14) and proposed a few typo corrections for the Appendix A and B namely, directly to the authors/editors.

Table labels and captions might help to clarify the goal of such table examples for the reader and reference them more easily. The Acknowlegment section does not mention the support of national projects --> can be mentioned.

The document is very rich and covers the use cases precisely and with working examples. It is didactic and useful as a guide for implementation. The SKOS VO theory vocabulary has been defined previously in the SimDM specification already and is appropriately cited in the specification.

The heterogeneity and scale of possible simulation parameters is very large and deserves a specific vocabulary as defined in the VO Theory interest group. The overlap of such a vocabulary with Unified Content Descriptors is not so important. When dealing with the comparison of simulated observations and genuine observations , a mapping between SKOS concepts and UCDs might help to homogeneize the metadata.

-- MireilleLouys - 2017-01-10

 

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 192016-12-09 - BrianMajor

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Changed:
<
<
Discovery and Access to simulation data has been a long term VO goal. It is nice to see that this goal is now ready to be achieved as the first
>
>
Discovery and Access to simulation data has been a long term VO goal. It is nice to see that this goal is now ready to be achieved as the first implementations prove it. Specially nice to see that DALI constistency has been much improved in the last versions. We validate this specification for recommendation.
Deleted:
<
<
implementations prove it. Specially nice to see that DALI constistency has been much improved in the last versions. We validate this specification for recommendation.
 
Changed:
<
<
On the long term we recommend authors and editors to pay attention to a couple of points for better integration to the full DAL landscape and better
>
>
On the long term we recommend authors and editors to pay attention to a couple of points for better integration to the full DAL landscape and better interoperability between observational and simulation data :
Deleted:
<
<
interoperability between observational and simulation data :
 
Changed:
<
<
- convergence of DataLink technics which may be reached in next versions of both SimDAL and DataLink.
>
>
- convergence of DataLink technics which may be reached in next versions of both SimDAL and DataLink.
 
Changed:
<
<
- convergence in query languages which may also result from efforts on SimDAL side as well as Simple access or TAP side for future
>
>
- convergence in query languages which may also result from efforts on SimDAL side as well as Simple access or TAP side for future versions
Deleted:
<
<
versions
 
Changed:
<
<
- complementarity and coupling of SKOS technology and ucd technology for coding the semantics looks important to us and is probably
>
>
- complementarity and coupling of SKOS technology and ucd technology for coding the semantics looks important to us and is probably possible. But this is more in the hand of the semantics working group which is collaborating with TIG on that.
Deleted:
<
<
possible. But this is more in the hand of the semantics working group which is collaborating with TIG on that.
  -- FrancoisBonnarel , MarcoMolinaro - 2016-11-24
Deleted:
<
<
 

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).

Answer (Carlos Rodrigo): The SVO implementation (http://svo2.cab.inta-csic.es/theory/simdal1/) includes all three types of services: repository, search and dataAccess. We implemented all the mandatory resorces and some optional ones (for instance, {cutouts-preview}). It is presented in the web page with explanations of what we consider a typical workflow and an example of use because we thought that, as simdal has many endpoints, it was more ilustrative to do it so. But, in each case, links are given to the URL's of the real implementation of each resource.


2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

Answer: Ok, we realize that this is a tricky term so that we have removed it completly

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.3 - pg12: Question mark in place of reference to JSON standard

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.5 - pg15: another question mark in place of reference (VOTable)

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.

S5.6 - pg37: "the same way than the one" => "the same way as the one"

Answer: Thank you. We corrected the document.

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

Answer: Thank you. We corrected the document.

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Answer: Thank you. We reformulated this point.

Appendix B - pg48: "and some developpers" => "and some developers"

Answer: Thank you. We corrected the document.

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Answer: Thank you. It now also corrected.

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Answer: Thank you. Done.

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Answer: Thank you. This is now corrected.

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

Answer: Thank you. We corrected the document.

-- MarkCresitelloDittmar - 2016-10-26

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Added:
>
>
I approve this document.

-- BrianMajor - 2016-12-09

 

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 182016-11-24 - FrancoisBonnarel

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Added:
>
>
 

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Added:
>
>
Discovery and Access to simulation data has been a long term VO goal. It is nice to see that this goal is now ready to be achieved as the first implementations prove it. Specially nice to see that DALI constistency has been much improved in the last versions. We validate this specification for recommendation.

On the long term we recommend authors and editors to pay attention to a couple of points for better integration to the full DAL landscape and better interoperability between observational and simulation data :

- convergence of DataLink technics which may be reached in next versions of both SimDAL and DataLink.

- convergence in query languages which may also result from efforts on SimDAL side as well as Simple access or TAP side for future versions

- complementarity and coupling of SKOS technology and ucd technology for coding the semantics looks important to us and is probably possible. But this is more in the hand of the semantics working group which is collaborating with TIG on that.

-- FrancoisBonnarel , MarcoMolinaro - 2016-11-24

 

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).

Answer (Carlos Rodrigo): The SVO implementation (http://svo2.cab.inta-csic.es/theory/simdal1/) includes all three types of services: repository, search and dataAccess. We implemented all the mandatory resorces and some optional ones (for instance, {cutouts-preview}). It is presented in the web page with explanations of what we consider a typical workflow and an example of use because we thought that, as simdal has many endpoints, it was more ilustrative to do it so. But, in each case, links are given to the URL's of the real implementation of each resource.


2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

Answer: Ok, we realize that this is a tricky term so that we have removed it completly

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.3 - pg12: Question mark in place of reference to JSON standard

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.5 - pg15: another question mark in place of reference (VOTable)

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.

S5.6 - pg37: "the same way than the one" => "the same way as the one"

Answer: Thank you. We corrected the document.

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

Answer: Thank you. We corrected the document.

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Answer: Thank you. We reformulated this point.

Appendix B - pg48: "and some developpers" => "and some developers"

Answer: Thank you. We corrected the document.

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Answer: Thank you. It now also corrected.

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Answer: Thank you. Done.

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Answer: Thank you. This is now corrected.

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

Answer: Thank you. We corrected the document.

-- MarkCresitelloDittmar - 2016-10-26

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 172016-11-23 - CarlosRodrigo

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).

Added:
>
>
Answer (Carlos Rodrigo): The SVO implementation (http://svo2.cab.inta-csic.es/theory/simdal1/) includes all three types of services: repository, search and dataAccess. We implemented all the mandatory resorces and some optional ones (for instance, {cutouts-preview}). It is presented in the web page with explanations of what we consider a typical workflow and an example of use because we thought that, as simdal has many endpoints, it was more ilustrative to do it so. But, in each case, links are given to the URL's of the real implementation of each resource.


  2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

Answer: Ok, we realize that this is a tricky term so that we have removed it completly

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.3 - pg12: Question mark in place of reference to JSON standard

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.5 - pg15: another question mark in place of reference (VOTable)

Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.

S5.6 - pg37: "the same way than the one" => "the same way as the one"

Answer: Thank you. We corrected the document.

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

Answer: Thank you. We corrected the document.

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Answer: Thank you. We reformulated this point.

Appendix B - pg48: "and some developpers" => "and some developers"

Answer: Thank you. We corrected the document.

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Answer: Thank you. It now also corrected.

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Answer: Thank you. Done.

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Answer: Thank you. This is now corrected.

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

Answer: Thank you. We corrected the document.

-- MarkCresitelloDittmar - 2016-10-26

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 162016-11-04 - FranckLePetit

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

Added:
>
>
Answer: All mandatory APIs are implemented in the Paris implementation. This implementation does not implement (up to now) the optional {cutouts-preview} The SVO implementation implement most of the mandatory API (see the implementation documentation and tutorial to have detailed information about the implementation).
 2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.
Added:
>
>
Answer: Indeed. As explained in another answer concerning Datalink, SimDAL does not use it since it would have made the standard heavier whereas we just need a tiny subset of datalink. As you say, that is something we have in mind for a future version of the standard. Concerning DALI, SimDAL is DALI-compliant.
 General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

Changed:
<
<
pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.
>
>
Answer: Thank you for pointing this out. We have reworked the part your are talking about trying to make it more clear.
 
Added:
>
>
pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

Thank you. We will see with DAL chairs to correct this IVOA Architecture figure.

 S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?
Added:
>
>
Answer: Ok, we realize that this is a tricky term so that we have removed it completly
 S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."
Added:
>
>
Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.
 S3.3 - pg12: Question mark in place of reference to JSON standard
Added:
>
>
Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.
 S3.5 - pg15: another question mark in place of reference (VOTable)
Added:
>
>
Answer: sorry, that was an erroneous generation of the bilio in the published archive. It is now corrected.
 S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.
Added:
>
>
Answer: We though that it is not more abritrary than saying any random time like 10 seconds or 10 minutes. Moreover, this "sufficient lifetime" is often very specific to the service/type of data/simulation that only the publisher is the most capable to master.
 S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?
Added:
>
>
Answer: That is correct. The user knows about how the "q" parameter is handled through publisher documentation of the extra features. But we understand that it could be tricky so that we added better explanations in the document. Thank you for pointing this.
 S5.6 - pg37: "the same way than the one" => "the same way as the one"
Added:
>
>
Answer: Thank you. We corrected the document.
 S5.6 - pg37: "but in another cases it would" => "but in other cases it would"
Added:
>
>
Answer: Thank you. We corrected the document.
 S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.
Added:
>
>
Answer: Thank you. We reformulated this point.
 Appendix B - pg48: "and some developpers" => "and some developers"
Added:
>
>
Answer: Thank you. We corrected the document.
 Appendix B - pg48: "to have much more properties" => "to have many more properties"
Added:
>
>
Answer: Thank you. It now also corrected.
 Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..
Added:
>
>
Answer: Thank you. Done.
 Appendix B - pg48: "should know about technics" => "the technical aspects"?
Added:
>
>
Answer: Thank you. This is now corrected.
 Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"
Added:
>
>
Answer: Thank you. We corrected the document.
 -- MarkCresitelloDittmar - 2016-10-26
Added:
>
>
Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-11-04
 

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 152016-10-26 - MarkCresitelloDittmar

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Added:
>
>
I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed.

Two broad comments:

1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec?

2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count.

General:

Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8)

pg 6: Architecture diagram does not match the one in the master IVOA Architecture document.

S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience?

S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)."

S3.3 - pg12: Question mark in place of reference to JSON standard

S3.5 - pg15: another question mark in place of reference (VOTable)

S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement.

S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive?

S5.6 - pg37: "the same way than the one" => "the same way as the one"

S5.6 - pg37: "but in another cases it would" => "but in other cases it would"

S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard.

Appendix B - pg48: "and some developpers" => "and some developers"

Appendix B - pg48: "to have much more properties" => "to have many more properties"

Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope"..

Appendix B - pg48: "should know about technics" => "the technical aspects"?

Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry"

-- MarkCresitelloDittmar - 2016-10-26

 

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 142016-10-22 - TomDonaldson

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Added:
>
>
Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent.

-- TomDonaldson - 2016-10-22

 

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Answer: Thank you Carlos, we have done the update in the document.

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 132016-10-18 - DavidLanguignon

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

Added:
>
>
Answer: Thank you Carlos, we have done the update in the document.
 

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 122016-10-12 - CarlosRodrigo

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Changed:
<
<
Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.
>
>
Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.
 
That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Changed:
<
<
Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.
>
>
Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.
  As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html


  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Answer: Done. Section 2.2. no more mention use cases but requirements.

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Answer: We simplified as much as possible.

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Answer: You are perfectly right. That is corrected.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Answer: Ok. We clarified this.

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Answer: That is corrected


  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Answer: You are right, thank you, we have made the update in the document.

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.


  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.


  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Answer: Indeed we meant Optionaly. This has been precised.


  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Answer: Done


  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Answer: Ok. We modified the presentation of the section.


  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Answer: Indeed. Ok.


  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?


  • In 4.2, is the project parameter mandatory?
Answer: No it is not. We now say it explicitly.


  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Answer: We corrected the typo. Thank you.

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Answer: Ok, document updated


  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.


  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Answer: You are right, thank you for pointing this to us, we have added the reference.


  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Answer: We have done the update following your recommendation.


  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Answer: Yes, thank you this is clarified (section 6.3)


  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Answer: Done. Thank you.

-- MarkusDemleitner - 2016-09-13

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Answer: Yes, this precision adds clarification, thank you. We modified the document.

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:

1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .

2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Added:
>
>
When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding.

First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient).

Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset.

By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document.

In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case.

 

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 112016-09-28 - FranckLePetit

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

Changed:
<
<
  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
>
>
  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
 
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

Changed:
<
<

>
>

 
  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Added:
>
>
Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.


That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.


Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now.
We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.

As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.

We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.

 Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.
Added:
>
>
Answer: Thank you to propose to help for the proofreading ! The Latex file can be downloaded under the src link on the SimDAL webpage: http://www.ivoa.net/documents/SimDAL/index.html
 
Added:
>
>

 
  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).
Added:
>
>
Answer: Done. Section 2.2. no more mention use cases but requirements.
 
  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).
Added:
>
>
Answer: We simplified as much as possible.
 
  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.
Added:
>
>
Answer: You are perfectly right. That is corrected.
 
  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?
Added:
>
>
Answer: Ok. We clarified this.
 
  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.
Added:
>
>
Answer: That is corrected
 
Added:
>
>

 
  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?
Added:
>
>
Answer: You are right, thank you, we have made the update in the document.
 
  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.
Added:
>
>
Answer: Datalink would have made the standard heavyier whereas we just need a tiny subset of datalink: the subset that was Datalink when we were involved in its definition back in 2012 when F.Bonnarel started thinking about it. That is something we have in mind for a future version of the standard.
 
  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.
Added:
>
>
Answer: The foreign-key mechanisms is used in VOTable responses, and is directly taken from the VOTable 1.3 document. It is a neat and tidy way to link our tables while letting some extensibiity. We define it in the document because it is not formally defined in the VOTable document, just mentionned as some kind of example/possible use of the GROUP element.
 
  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.
Added:
>
>
Answer: These fields are defined in each API endpoints, see for example the result table in {search}, subsection response schema. All FIELDs are described and those that are not mandatory are explicitly declared as optional.
 
Added:
>
>

 
  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).
Added:
>
>
Answer: Ok, we have not precised anything because our idea is "do not bother with the lifetime of the pagination, we take care of it for you so that it's longlife enough for you to do interactive browsing in your current session". This has been precised (section 3.6).
 
  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.
Added:
>
>
Answer: We thought the same while writing/implementing the standard but decided it would be of some use to users to be consistent with the majority of existing api. But after some more thinking recently we agree with you to simply remove this to make the standard more straightforward to use.
 
Added:
>
>

 
  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).
Added:
>
>
Answer: Indeed we meant Optionaly. This has been precised.
 
Added:
>
>

 
  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".
Added:
>
>
Answer: Done
 
Added:
>
>

 
  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".
Added:
>
>
Answer: Ok. We modified the presentation of the section.
 
Added:
>
>

 
  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?
Added:
>
>
Answer: Indeed. Ok.
 
Added:
>
>

 
  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).
Added:
>
>
Answer: We already have the datatype in the FIELD element, do you think we have to add more ?
 
Added:
>
>

 
  • In 4.2, is the project parameter mandatory?
Added:
>
>
Answer: No it is not. We now say it explicitly.
 
Added:
>
>

 
  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".
Added:
>
>
Answer: We corrected the typo. Thank you.
 
  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)
Added:
>
>
Answer: The answer is no. In this first version of SimDAL, only a single "parameter" can be passed. We now say it in the text.
 
  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.
Added:
>
>
Answer: Ok, document updated
 
Added:
>
>

 
  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).
Added:
>
>
Answer: Ok, this is related to our answer to the previous question about foreign_keys - see above.
 
Added:
>
>

 
  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.
Added:
>
>
Answer: You are right, thank you for pointing this to us, we have added the reference.
 
Added:
>
>

 
  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?
Added:
>
>
Answer: We have done the update following your recommendation.
 
Added:
>
>

 
  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".
Added:
>
>
Answer: Yes, thank you this is clarified (section 6.3)
 
Added:
>
>

 
  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?
Added:
>
>
Answer: You are right about the JCL, we have removed the mention. We also made the 1st point clearer. Thank you.
 
  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
Added:
>
>
Answer: Done. Thank you.
 -- MarkusDemleitner - 2016-09-13
Added:
>
>
Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28
 

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

Added:
>
>
Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present.

Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories:

1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS).

2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to).

The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries.

 When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."
Added:
>
>
Answer: Yes, this precision adds clarification, thank you. We modified the document.
 Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.
Changed:
<
<
Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.
>
>
Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared:
 
Changed:
<
<
In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.
>
>
1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc .
 
Added:
>
>
2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ...

Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0

Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

Answer: Ok, fixed in the document. Thank you for pointing this out.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document.

 In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.
Added:
>
>
Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction.

We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly).

 In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Added:
>
>
Answer: Yes, that's correct. Thank you for pointing this out, we have updated the document to fix this.

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-09-28

 

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 102016-09-13 - MarkusDemleitner

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from Markus Demleitner

Added:
>
>
(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent)
  Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.

  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.

  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.

  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.

  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).

  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".

  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".

  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?

  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).

  • In 4.2, is the project parameter mandatory?

  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.

  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).

  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.

  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?

  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".

  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
-- MarkusDemleitner - 2016-09-13

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 92016-09-13 - MarkusDemleitner

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

Changed:
<
<

>
>

Added:
>
>
 
  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.

Anwser: Indeed. This has been corrected

Changed:
<
<

>
>

Added:
>
>
 
  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.

Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

Changed:
<
<

>
>

Added:
>
>
 
  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

Anwser: Thank you. This has been corrected


Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Added:
>
>

Comments from Markus Demleitner

Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue?

In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever?

In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable.

Individual issues:

  • I couldn't find the document source, so I couldn't fix a number of typos and editorial glitches (e.g., "SimDAL as then" -> "SimDAL and the", two instances of "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" in the Introduction, "fine grain" -> "fine-grained" throughout). If you tell me where the source is, I'd volunteer for another round of proofreading.

  • sect. 2.1 ends with a pointer to use cases in Appendix A. The text continues with "use cases" in 2.2. It would help the comprehension of the document if the reason for this distribution of use cases were made clear (actually, I think 2.2 could be re-formulated a bit so they actually become requirements rather than use cases).

  • I think the document would profit from a bit of de-duplication (e.g., the affirmation that "only a few data centers" would implement a repository is made at least twice; the different URI forms for-id/3 vs. views?id=3).

  • Starting p. 8, there are references to "UML classes". I don't think the "UML" should be there. Perhaps just "classes" is enough, or one needs a different terminology. That (initial) modelling has been done in UML is, I think, of no import for this specification, and indeed I would hope that future versions of SimDM will come in VO-DML.

  • On p. 8, there's "ex star, cloud, halo" -- I'd much rather see "e.g.," than "ex"; in general, I think it would be better if the term "object" could be avoided here (if it indeed refers to "astronomical object"). Does SimDM perhaps already offer precise terms for what's meant here?

  • On p. 9, the "pivot format" (incidentally, I'm not sure I understand why it is called "pivot" -- perhaps a brief explanation could help?) is defined as consisting of several files which are given as what looks like file names. It is not clear to me whether these file names are part of the standard, and if so, how multiple experiment files are to be stored under one name. If, as I suppose, these are generic identifiers for "sub-formats", I think you shouldn't use file name-like names for these but instead use more format-like names for them. But perhaps all that should rather be part of SimDM.

  • On p. 10, I think "with a list of couple (error messages, error code)" should be "containing rows consisting of a string-valued column error_msg and an integer-valued column error_code." or so. Also, DALI says. "The content of the INFO element conveying the status should be a message suitable for display to the user describing the status." Of course, this cannot convey multiple error messages, but for improved compatiblitiy with DALI I think you should keep the INFO text as something immediately displayable to a user. Also: Does the table in the error case have a name results as well or does it not? If it does, then perhaps the text currently in the paragraph "Result" on p. 11 should be an introductory paragraph to 3.2?

  • In 3.3, you define the "links" table; this is pretty much a stripped-down datalink table -- why do you not simply use datalink itself? It would have the advantage that client authors might already have code to parse and display datalink tables, and they'll curse you if they have to unnecessarily write some glue code just to shoehorn your table into their datalink data structures. Getting your SimDAL-specific terms into the Datalink vocabulary should not be a big deal.

  • I have a heartfelt dislike for your "foreign-key" GROUP in 3.3. My preference would be to just fix the column name(s) in results and links and be done with it. If, on the other hand, you want to establish a general mechanism for declaring foreign key relationships, don't do it here, do it in VOTable or in the VO-DML mapping document. We should do this properly; if every standard starts to ad-hoc this kind of annotation, VOTable will become an unimplementable, contradictory mess.

  • In general, after reading 3.3, I'd not be sure what I'm supposed to return in results. ident, yes. created? MUST? SHOULD? Just as an example? And as a client, what am I supposed to do with the results table? Just display it as an opaque table? You have a couple of words on "general" return fields quite a bit later; perhaps the document would profit if you pulled that part up a bit or at least referenced that from here.

  • In 3.4, I think you should give some explicit guidance as to what to say when a next_page/previous_page link has expired, be it because the query result was cached somewhere, be it because the underlying result has changed. In that same vein, you might consider recommending that services communicate an (estimated) validity span of the pagination link (see, e.g., OAI-PMH for how they did that).

  • Still in 3.4, I'd say there's not enough value in letting clients specify the page size to justify the complication in implementation. Let the service decide on the page size and trust that it's not so large as to overwhelm the client. Pagination is hard enough to get actually (!) right even without extra tricks.

  • In 3.7, you currently say "eventually followed by a decimal point and fractions of seconds"; I think you intend this to be "optionally followed", right? If not, I'd be severely concerned. In this context I think you should allow an optional "Z" at the end for compliance with other timestamp formats in the VO (ideally, just reference DALI 1.1 here).

  • In 4.1, you are claiming {search} were "search for concepts" -- as far as I can make out, this is just a full-text search. If so, I'd say just say so: "perform full-text searches".

  • In 4.1, I guess I'd rather start with "formal" definition of the query parameters and then go on with all the explanation. I was a bit confused about the talk about q and att. (And I'd then remove the "Note" about att, too, as it only repeats what's (now) later said under "Parameter".

  • In 4.1, you say that without a document schema "it is up to the client to understand what the attributes are and what they mean." I think that's misleading. The client simply has no way to figure out what attributes there are, no? Wouldn't "the metadata schema has to be communicated by non-standard means" or something like that be more appropriate?

  • In 4.1 and following, I think you should give the VOTable types you expect in the response sehemas ("text" is fine, since I don't think you should mandate arraysize="*" on char fields; however, if, e.g., created is a timestamp, I think you should mandate the corresponding xtype).

  • In 4.2, is the project parameter mandatory?

  • In 4.3, the query example uses "projects" as the parameter name, whereas the defined parameter name is "project".

  • Which brings me to a general point: I think SimDAL should say somewhere whether its parameters are supposed to repeatable (i.e.: can I pass multiple "project" parameters to, say, ?)

  • In 5, you say, at various places "These views can be seen as ASCII tab separated files.", "That is what would be done when performing a SQL query on a single flat table", "This server-side file is abstracted, in a VO context, as a VOTable.", "It aims at untying the standard and the implementation details." So -- I have to say I'm fairly confused. From what I can fathom from this, you're saying the underlying data structure is a relation, and a view is a projection of a subset of that relation? Whatever it is, I think you should define the basic data structure without reference to any specific serialisation, just in terms of the underlying mathematical model, exactly to untie model and implementation.

  • On p. 26, you give a VOTable group to declare foreign keys, which is fairly related to the foreign-key from 3.3, but has some additional PARAMs, but doesn't have a name and doesn't use ref. I appreciate that the use case is a bit different here, but couldn't there be one common mechanism for "foreign-key-like relationship between entities declared in some VOTable"? Sure, this might make the 3.3 GROUP a bit clumsier, and perhaps the typedness of the FIELDref is lost, but I'd consider this a small price to pay for at least internal consistency of SimDAL (of course, I'm still all for trying to do without some generic foreign-key mechanism defined in a DAL standard; let's have that in VOTable).

  • On p. 26, "Query Language", you should reference the concrete JSON standard people should implement against (or reference some Javascript specification and say which nonterminal your dictionaries should conform to. There's just too many flavours of JSON out there.

  • The {fields} end point apparently uses a REQUEST parameter and is polymorphic on it (REQUEST=search has a q parameter, REQUEST=schema has a field parameter). Isn't that a bit at odds with the rest of the design, where you have different endpoints for different functionalities? Why don't you split up these two functionalities into two endpoints (or, conversely, join a few other endpoints and use REQUEST to dispatch between different sub-functions; obviously, that's not my preference)?

  • In 6.3., a reader might suppose the first job creation request already returns the UWS document with the results element filled out. I'd suggest putting an "eventually" or something like this into "It returns a UWS resource".

  • p. 41f, "UWS extension" stipulates that SimDAL "differs" from UWS in two points. First, there's no joblist -- where does the SimDAL say that? If the sentence itself is the norm, this should be made much clearer. Also, I don't think there's a necessity to even outlaw it, since I'd expect most people would use off-the-shelf UWS components anyway that have their own ways of dealing with the "security" issues you cite. The second point, the use of JSON as a JCL, I cannot see as a difference from UWS, which does not specify the JCL in the first place. Which stipulation of UWS do you see violated there?

  • Finally, even in a technical text, the pervasive use of male-only forms reads a bit odd and cranky these days. Just use plural forms and don't worry about it ("...the final user can get a hint about if he is asking for too many..." -> "users can get a hint whether they are asking for too many...").
-- MarkusDemleitner - 2016-09-13
 

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Added:
>
>
We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension.

When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6."

Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process.

Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs.

In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model.

In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure.

In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, that # is a globally unique identifier (albeit one for which only the publisher part properly resolves, but that's fine).

In the VO Registry, the real complex point is the proliferation of information from the publishers to the searchable registries. This problem must surely exist in the proposed SimDAL system, as the number of publishers is apparently expected to be much larger than the number of repositories. Some indication of how the initial metadata transfer and subsequent updates should be performed (file format, transfer modalities, signaling,...) would strengthen our confidence in this part of the standard.

 

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 82016-08-27 - FranckLePetit

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Enrique Solano

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.

Changed:
<
<
>
>
Added:
>
>
Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries

 
  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.
Changed:
<
<
>
>
Added:
>
>
Anwser: Indeed. This has been corrected

 
  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.
Changed:
<
<
>
>
Added:
>
>
Anwser: This would be useful. An implementation note was written for the Simulation Data Model. It presents how to map the DM on different kind of simulations. We plan to do the same for SimDAL. Once the standard will be accepeted, it is planned to write an Implementation Note that will present how to use SimDAL to publish different categories of simulations / numerical models.

 
  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice
Added:
>
>
Anwser: Thank you. This has been corrected


 

Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 72016-08-11 - EnriqueSolano

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Changed:
<
<
>
>

Comments from Enrique Solano

Added:
>
>

  • "First SimDAL Repositories store codes and theoretical projects descriptions. They can be used by clients to discover theoretical services"
    • As it is written now, it seems to me that SimDAL Repositories are the only way to discover theoretical services. This is not true that, at least the "simdal search" services can also be found using the Registries. This should be clarified.
  • "Finally, SimDAL Data Access services are dedicated to retrieve raw data."
    • Only raw data? This is not true. I would remove "raw" from the sentence.
  • The inclusion of an Appendix describing some implementations and showing how these services work in real life would be more than desirable. This was done with SSAP and it was very useful.
  • A typo: In the Introduction, the sentence "It is a fine grain registry for numerical codes and simulations in the Virtual Observatory" is repeated twice

 

Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

Deleted:
<
<
    • Answer to the answer: Well, ok. If you've found by experience that TAP is insufficient for SimDAL's requirements I believe you, and I'm well prepared also to believe that inventing some custom discovery/search API may be a better way to proceed than trying to generalise the more mainstream VO technologies to cope with SimDAL's specific requirements. However, a couple of comments on that:
      • One downside of this approach is that you don't benefit from the large amount of scrutiny and testing that have gone into existing TAP/Registry protocols, and this means that you may have to take extra care in specifying exactly how the APIs you're defining here are supposed to behave (this in one reason that validators are valuable tools, to check where that hasn't happened). One example that springs to mind: it looks like the intention of the JSON query language described in sec 5.1 is that constraints in the "where" list are ANDed together. But I don't see that written down explicitly (apologies if I've missed it), and I don't see any provision for OR logic. I also don't see discussion here of case sensitivity for field names. If things like this are not specified explicitly it may inhibit interoperability in implementations.
      • Your response here is not what's written down in Appendix B, which mentions other reasons for avoiding TAP. It would be useful if you summarised in Appendix B all the reasons that SimDAL decided on non-TAP/non-Registry solutions.
      • -- MarkTaylor - 2016-08-09
 
  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 62016-08-09 - MarkTaylor

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:

The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

Added:
>
>
    • Answer to the answer: Well, ok. If you've found by experience that TAP is insufficient for SimDAL's requirements I believe you, and I'm well prepared also to believe that inventing some custom discovery/search API may be a better way to proceed than trying to generalise the more mainstream VO technologies to cope with SimDAL's specific requirements. However, a couple of comments on that:
      • One downside of this approach is that you don't benefit from the large amount of scrutiny and testing that have gone into existing TAP/Registry protocols, and this means that you may have to take extra care in specifying exactly how the APIs you're defining here are supposed to behave (this in one reason that validators are valuable tools, to check where that hasn't happened). One example that springs to mind: it looks like the intention of the JSON query language described in sec 5.1 is that constraints in the "where" list are ANDed together. But I don't see that written down explicitly (apologies if I've missed it), and I don't see any provision for OR logic. I also don't see discussion here of case sensitivity for field names. If things like this are not specified explicitly it may inhibit interoperability in implementations.
      • Your response here is not what's written down in Appendix B, which mentions other reasons for avoiding TAP. It would be useful if you summarised in Appendix B all the reasons that SimDAL decided on non-TAP/non-Registry solutions.
      • -- MarkTaylor - 2016-08-09
 
  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

-- MarkTaylor - 2016-07-14

Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 52016-08-09 - FranckLePetit

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

Changed:
<
<
  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
>
>
  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
Anwser:
Deleted:
<
<
  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes.
  • Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
 
Added:
>
>
The notion of views may present similarities with TAP/TAP schemas. TAP has not been chosen as a solution because it does not fulfill the requirements for Theory. Theoretical services will publish very different kind of numerical models and simulations (N-body / SPH / MHD simulations, asterosismology models, radiative transfer codes, astrochemistry models, ...). Some of these theoretical results have a lot of properties characterizing simulated objects (> 100 000 in one the SimDAL implementation). These numbers are growing due to the progresses in numerical models.

We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter.

Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits
1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data)
2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost.

Concerning the SimDAL Repository part:
First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries.
To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions
of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do.
Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics.
Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.

  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
Answer: Indeed. The diagram has been replaced. If a diagram with all the standards is required, it will be introduced in the corrected version of the document.

  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
Answer: Thank you. That has been corrected.

  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes. Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
Answer: Thank you. Also corrected.

  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.
Answer: At the InterOp of Sesto, in May 2015, when the procedure to finalize SimDAL has been launched, Severin Gaudet (as chair of the TCG) asked for a validator but said that a client compatible with the reference implementations is a validator. So, a client instead of a simple validator as been developed. It is compatible with the two reference implementations.

We tested the client (https://app.ism.obspm.fr/simdal-client/) and it seems to work properly.
A few comments on its use:
1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one.
2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}.
3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step.

 -- MarkTaylor - 2016-07-14
Added:
>
>
Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09
 

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 42016-07-14 - MarkTaylor

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Added:
>
>
 

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

Added:
>
>

Comments from Mark Taylor

I don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.

  • This document departs from usual VO procedures in various ways, apparently reinventing the capabilities of TAP and the Registry for its own purposes. There is a rationale provided in Appendix B for avoiding use of TAP, which I'm not sure I find convincing, but I haven't gone into the requirements of simulation data access carefully enough to want to comment further on that.
  • Section 1.1: Only a specimen IVOA architecture diagram is included, a real one should be used. In view of the unusual content of this standard as I dicussed above, there should be some more detailed discussion here of which IVOA standards this document uses, which ones it avoids in favour of its own ways of doing similar things, and why.
  • Section 3.2: The use of VOTable to encode errors here says it follows DALI, but in fact it looks different from the usual way that DALI-compliant services do it. The specification in this document encodes errors as a sequence of multiple (error_msg,error_code) pairs as rows within a TABLE, while DALI encodes an error as a single INFO element outside the TABLE element. I suspect this is a misunderstanding of DALI intention, but maybe it's deliberate because of the need to report sequences of errors rather than single ones. It should either be changed to match standard DALI practice, or if not it should be clear from the text that this is not DALI standard.
  • Section 4.2: "The response schema of the results table is (FIELD IDs):" but the following table has FIELDs with name attributes as listed rather than ID attributes.
  • Some of the VOTable samples use lower-case element names, which is not permitted in VOTable.
  • There are reference implemenations listed, which is good. However, I don't see any validators. I played around a bit with the implementations (not really understanding how to drive it properly); quite a few links in the obspm implementation lead to error pages. Validation tools should be provided by this stage of the review process, and ought to help in identifying missing/broken functionality like that I currently see in the obspm implementation.

-- MarkTaylor - 2016-07-14

 

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 32016-07-08 - FrancoisBonnarel

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Changed:
<
<

Comments from the IVOA Community and TCG members during RFC period: 2016-07-07 - 2016-08-22

>
>

Comments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22

 
Changed:
<
<

Comments from TCG members during the TCG Review Period: 2016-07-07 - 2016-08-22

>
>

Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22

  WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 22016-07-08 - FranckLePetit

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

Changed:
<
<
  • [[http://....][XXXX] (yyyy)
>
>
Deleted:
<
<
 

Comments from the IVOA Community and TCG members during RFC period: 2016-07-07 - 2016-08-22

Comments from TCG members during the TCG Review Period: 2016-07-07 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->

Revision 12016-07-06 - FrancoisBonnarel

 
META TOPICPARENT name="TemplateRFC"

SimDAL 1.0 Proposed Recommendation: Request for Comments

Public discussion page for the IVOA SimDAL 1.0 Proposed Recommendation.

The latest version of the SimDAL Specification can be found at:

Reference Interoperable Implementations

  • [[http://....][XXXX] (yyyy)

Comments from the IVOA Community and TCG members during RFC period: 2016-07-07 - 2016-08-22

Comments from TCG members during the TCG Review Period: 2016-07-07 - 2016-08-22

WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard.

IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.

TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )

Applications Working Group ( _Pierre Fernique, Tom Donaldson )

Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )

Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel )

Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )

Registry Working Group ( _Markus Demleitner, Theresa Dower )

Semantics Working Group ( _Mireille Louys, Alberto Accomazzi )

Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )

Time Domain Interest Group ( _John Swinbank, Dave Morris )

Data Curation & Preservation Interest Group ( Françoise Genova )

Operations Interest Group ( _Tom McGlynn, Mark Taylor )

Knowledge Discovery Interest Group ( Kaï Polsterer )

Theory Interest Group ( _Carlos Rodrigo )

Standards and Processes Committee ( Françoise Genova)


<--  
-->
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback