Registry Interface Working Draft Discussion Topic
How should we handle namespace prefixes in ADQL queries?
The RI specifies that we use
ADQL to pass registry queries; in place of a column name, we use a restricted XPath notation to indicate which element in the schema we wish to search against. Strictly speaking, the elements in the XPath should be qualified with namespace prefixes. Since is
ADQL/x is encoded in XML, it is straight-forward to define the prefixes within the XML query; however, this poses 2 problems:
- the prefixes appear within values (currently as the value an attribute called xpathName). This means standard XML parsing software can not be transparently used to resolve the prefixes to namespaces.
- the purpose of the restricted form of XPath is to eliminate the need to parse and interpret the components of an XPath. (See section 2.1.1. of the RI spec.) This makes it straight-forward to convert ADQL to a non-XML database query (mainly SQL) through a simple, static lookup table. Allowing arbitrary prefixes breaks this feature.
I see three possible solutions (correct me if I've gotten any of this wrong):
- We go with the strict solution where prefixes are explicitly defined. All databases would have to substitute in their internally recognized prefixes on the fly.
If you use a Web Services toolkit like Axis or .Net, namespace prefix definitions are usually stripped away by the time the specific implementation gets a hold of the query. To address this, we can alter the interface to the search() method that includes an explicit argument that defines namespace prefixes.
- Advantages:
- This is the most "correct" use of XPath.
- This is perhaps easier to support for XML databases as long as the prefix mappings can be captured in the conversion to XQuery.
- Disadvantages:
- This does require internally processing the XPath in most cases (for both XML and SQL DBs); however, once the mapping is known, a simple global substitution can accomplish this.
- Require the use of standard prefixes for standard VOResource schemas. (e.g.
vr
for the VOResource core, vs
for VODataService, etc.) This allows straight-forward conversion to both SQL and XQuery, assuming an XQuery database uses the same prefixes.
- Advantages:
- This is still "correct" use of XPath.
- This is easier for XML databases to support, however, SQL implementations can do the simple XPath-to-column-name lookup that was originally envisioned.
- Disadvantages:
-
- XQuery databases that don't use the "standard" prefixes would have to convert; this would be a simple substitution.
- There is no mechanism for supporting namespaces for non-standard schemas.
- Require that no namespace prefixes be used. XQuery implementations would have to insert prefixes. This would most easily be done through a static look-up on the entire XPath (just like an SQL implementation would do).
- Advantages:
- This is equally easy/difficult for both XML & SQL databases to support: both must do a look-up substitution of whole XPaths with no internal parsing.
- This supports both standard and non-standard shemas equally.
- XPath values are simpler without the prefixes, and therefore less prone to error.
- Disadvantages:
-
- A no-prefix XPath might map to multiple possible prefixed ones. [If this is an issue, I believe this can be handled easily enough for XML databases and likely will not be an issue for SQL databases.]
--
RayPlante - 31 Mar 2005
If you have an opinion, question, or comment about this topic, feel free to append your discussion below. Be sure to indicate your name as the author.
I prefer choice 3 for the advantages listed above, and I don't think the disadvantage will be a difficult implementation issue.
--
RayPlante - 31 Mar 2005