TWiki> IVOA Web>IvoaDAL>DALI (revision 7)EditAttach

Back to: DAL

Data Access Layer Interface

DALI is a base set of requirements and rules that all DAL services will follow. The goal is not to define what service must do, but rather to specify various common service resources or operations so t if a service specification includes a common operation it will do so by referencing the DALI specification. This will make it much easier for common service features to be defined once and implemented the same way wherever they are needed.


The latest draft is 20121014 but since it did not appear on the Documents page here it is:

WD-DALI-1.0-20121014.pdf: WD-DALI-1.0-20121014.pdf

Discussion of the 2012-02 draft

The draft is at http://www.ivoa.net/Documents/DALI/20120202/WD-DALI-1.0-20120202.pdf

Three points from Markus, 2012-04-17

Feel free to hack your comments into the text -- MarkusDemleitner - 17 Apr 2012

On 3.1.3 Multiple Values, 3.1.4 Qualifiers

Well, I'm still opposed to the whole idea of syntax in parameters. Here's why:

(1) Rich parameter syntax hasn't worked well for SSAP -- most services either don't interpret the syntax at all or at least not nearly consistently. Care to see how many support teff_min and teff_max rather than doing the slash syntax on Teff? Also, it's at least very hard to figure out what part of the "PQL" syntax a given parameter supports.

(2) Enumerations are a fairly rare special case. Many interesting values people want to query against are real values, and you'd much rather have ranges than enumerations. So, do ranges go into DALI as well?

(3) If we think enumerations are actually that valuable, they work fine by just repeating parameter names without any syntax at all, with simple HTTP quoting rules sufficing. [Btw I've not found a reference that said in HTTP URLs repeated parameters were equivalent to commas in values]

(4) If we still think we want enumerations, we need to provide quoting rules, i.e., you need to say how you'll encode the list of strings (python syntax) ["23,3", "this, and not something else"]. Welcome to escaping hell. Suggesting "ah, percent-encode the commas then" is, I think inviting trouble since getting the decoding steps right will evolve to be a major challenge (and you'll have to percent-encode embedded percents, too).

(5) Suppose you still want syntax, I'm sure people will get confused by whitespace if these things are entered into UIs: If I write "folk, classical" in my form, and the application sends folk,%20classical", what's supposed to happen?

(6) Defining a classifier syntax with some suggestion it might be used to specify coordinate systems in some syntax not defined any closer is inviting horrible confusion. People will be tempted to use this stuff for nothing or everything, and there will be lots of conflicting syntaxes that few, if any, clients and servers implement correctly. Writing a "common parser" for these values will be effectively impossible, and this IMHO that's even less DALI stuff than enumerations.

(7) If you still believe we want qualifiers, at least provide clear syntax that allows (a) embedded semicolons in values and qualifiers and (b) that allows parsers to ignore what they don't understand and maybe give some structured representation of the qualifier(s) to higher levels.

In sum: I'm sure we just should strike sections 3.1.3 and 3.1.4. Maybe a recommendation for parameter naming would be nice ("implement ranges by appending _min and _max to the parameter names, and interpret missing range limits when another one is present as open ranges"). I really don't see how we can sensibly say anything about declaring coordinate systems generically.

If consensus cannot be reached on this, we need to (a) define sane escaping rules and (b) define some mechanism how clients and users can discover what kind of syntax is supported on a given parameter.

On 3.2.5 UPLOAD

If the spec remains as it is, we need a better specification of syntax and semantics. Since we're changing the definition from what TAP did anyway (TAP had a ; to separate pairs), I'd argue that's fair.

(1) We should make clear that tablename must be a simple, C-like identifier (rather than, e.g. a delimited identifier that's not ruled out in TAP). I'd then prefer whitespace to a comma to seperate name from URI, but I'd not fight.

(2) We should state what happens if on an async service UPLOAD is re-posted -- do the new pairs get added or replaced?

(3) "if the service refuses to accept the entire table, it must respond with an error" -- we should make clear here that the error need to become immediately visible on POSTing? In async services, that error might only become visible at execution time [e.g., in DaCHS, e.g., uploaded tables are temporary in the DB, i.e., upload and execution must take place within a single connection].

(4) The inline-upload solution with param: is not a joy to implement, and I'd contend it's not exactly pretty either. For one, you need to inspect all UPLOAD parameters to be able to identify a given MIME part as an upload (rather than a parameter). This distinction is important since usually, you will store parameters in the database, which is something you may not want to do with potentially large inline uploads. Plus, it's nice if you can process the request body in a stream, which isn't possible if you need to know the whole thing before deciding what to do with a parameter.

In the end, I'd much rather we said on UPLOAD something along the lines of:

UPLOAD -- a request to a service or POSTing to parameters may carry one or more uploads. In this case, the request body must be a multipart/form-data document. [Currently, this is only true with inline uploads; reference HTML REC, chapter 17 here]. The control name of a part containing an upload must always be UPLOAD. The part furthermore contains a header X-Upload-To: specifying the table identifier the upload should use [defined somewhere else to match [A-Za-z_][A-Za-z_0-9]*]. The table content is either defined in an X-Source-URI: header (in which case the part has empty content), or as the content of the part. In that case, the client MUST transfer a content-type. Concrete protocols SHOULD allow VOTable uploads with a MIME type of application/x-votable+xml (or whatever) and are encouraged to conclusively enumerate the allowed upload formats for interoperability.

This isn't well worked out; I'd provide better prose if people agree something like this is where we want to go.

4.4.3 Additional Information

Unfortunately, the one thing on most scientist's minds is citations, citations, citations. To accomodate this, having VO clients automatically generate reference lists for data sets would be useful. SSAP can already do this on a row level; for more generic protocols, we won't reach row level. Still, we can do better than we currently are. I suggest to add in 4.4.3:

Services SHOULD include INFO elements with a name "source" as children of the top-level RESOURCE element. The value attribute is set to a bibcode that should be referenced when the data contained contributed to a published result. The INFO element's content may be a formatted reference. It is explicitely allowed to include more than one such INFO element, though services are advised to exercise moderation.

This information should be used by clients to allow the automatic generation of reference lists, usually by resolving the bibcodes at ADS.

Discussion Topics for the Pune2011 interop

1. Standard resources: sync, async, availability, capabilities, tables

- these are in fact 5 capabilities, but in future we leave the actual names free (to be specified in the VOSI-capabilities + registry record)

TBD - value in fixing the resource names for the VOSI resources themselves and only leaving the 1+ DAL service resources free to be named?

2. Standard job parameters

REQUEST, VERSION, FORMAT, MAXREC, RUNID

3. rules for literal values

a. dates: one variant of ISO8601 only

b. ranges: does this actually belong in PQL?

c. lists: agreed to follow the HTTP spec, which says that multiple occurences of a parameter and a single occurence with comma-separated list of values is equivalent

4. Can individual params (here or in PQL) define something that conflicts with 3? e.g. UPLOAD violates 3c

5. overflow in VOTable - as specified in TAP

6. UPLOAD: table upload (re: object-list query science use case)?

a. the UPLOAD param specified here (as in TAP)

b. how to reference the uploaded table(s) specified in PQL? TAP or ADQL? (next revision)

7. error responses

a. VOTables error document

b. when are HTTP status codes (and text/plain) appropriate?

Edit | Attach | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2012-10-21 - PatrickDowler
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback