Discussion on VOUnits

This page is created to discuss the VOUnits 1.0 document, and to find a consensus on undecided issues, by listing pros and cons on each item. For each item on the page, various options are presented, and the community is encouraged to express opinions in order to reach a consensus, by appending your wikiname to the preferred option, and/or commenting.

See Unity library notes.

What is the goal of the 'new' VOUnits unit syntax?

One possibility is that we should aim for a minimal grammar (in which case the CDS grammar is probably the most nearly minimal of the three), or a lenient but still well-defined one (in which case the FITS covers almost all bases, even though it's not a strict superset of the others). I'd push for the latter: it's well-defined, but most nearly compatible with everyone, plus it's flexible enough that people don't have to try to memorise what is and isn't permitted (principle of least surprise).

In case you're interested, the only thing stopping the FITS grammar being a superset of the others is that the CDS and OGIP grammars both permit multiple solidi in an expression (W/Hz/m**2), and the FITS grammar doesn't (good for FITS!).

Proposal: I think we should simply adopt the FITS specification, possibly after tidying it up by formalising the grammar. At least, we should take this as a starting point, and aim to make minimal deltas to that. -- NormanGray - 09 Jan 2012

Hm -- I've seen the VOUnits as a companion to the VOTable spec so far, in the sense of "this is what you have to understand in the unit attribute". The typical use case is: you have two VOTables from two different services; join them as good as you can. Utypes are one part of this, UCDs maybe another, but giving machines a chance to bring together data in different units is probably the lowest hanging fruit.

So, parsing the things is only a start point; you need to do something with the parse trees and with whatever is in the nodes of the tree. Therefore, I'd say let's keep the number of things that can be in there at the minimum absolutely necessary for what we're trying to do. Every little decoration, every baroque variation, will somehow have to be reflected in lots of VOTable clients.

And the other side of the equation is: What for? If people actually fear moving their "old" unit strings to the "new" syntax is too much work -- well, Norman's software probably covers 90% of what an automatic translator could do (where I'd expect 95% is the best machines could achieve given the mess that's lurking in the unit strings out there in the wild). -- MarkusDemleitner

I'm doing some work for ROE right now, looking at assigning UCDs to database schemas, and (opportunely) the unity library is helping here, by doing what Markus suggests, and spotting commonalities in unit dimensions. I expect to add to that, and incorporate material from the QUDT units ontology, in the near future.

As part of that, the process very quickly spots unit strings which don't parse, and it's very easy to include a lookup of mappings "mags per square Arcseconds" -> "mag/arcsec^2" which can be incorporated into files with quite manageable labour.

As to how to use the grammars, I certainly don't expect users to deal with parse trees. The API documentation shows Java and C interfaces. These are intended to free users from having to care much about the actual structure of the units. In consequence, it doesn't much matter if we have a reasonably sophisticated units grammar: as long as it's well-enough defined that the library can parse it, the users don't have to care. We can therefore go for a fairly big grammar (such as the FITS) one, on the grounds that this means that a larger fraction of what users guess will work, will work in fact. -- NormanGray - 27 Jan 2012

Symbols

Electric resistance

The IAU recommends to use the upper-case greek letter Omega, but this is not allowed with the adopted character set for VOUnits. OGIP uses ohm (lower-case o), while StdCats and FITS recommend Ohm (upper-case o).

The upper-case version seems more appropriate (refers to the name of Georg Simon Ohm).

Options for VOUnits :

  1. Accept both ohm and Ohm. pro: remains compatible with all current use. con: adds complexity to parsing.
  2. Only allow Ohm. pro: easier parsing. con: translation needed for OGIP (but is this unit really used in astronomy?).

Discussion:

Only Ohm; we have too many ambiguities already, and I doubt anyone cares enough to quarrel right now -- MarkusDemleitner - 02 Jan 2012

Micro-arcseconds

This is not yet widely used, but it will probably be the case with future astrometric missions like GAIA. We have arcsec, and mas (while IAU recommends nanoradians nrad). Options for VOUnits :
  1. Follow only IAU style: nrad, prad...
  2. uas
  3. uarcsec

Discussion:

uarcsec; there's a chance parsers can get by without a special case then. mas is a lost case anyway -- MarkusDemleitner - 02 Jan 2012

Celsius

Do we keep a degC symbol?

Discussion:

Dropped deprecated symbols (Table 5)

Any objections to removing these from the standard ? dyn, cal, bar, atm, Gal, eotvos, gamma, oersted

Discussion:

Other symbols (Table 6)

Is it OK to reuse the FITS 2010 notations? (FITS v3.0, section 4.3, W.D. Pence et al., A&A 524, A42, 2010. doi:10.1051/0004-6361/201015362)

Discussion: I think we should more-or-less adopt the FITS specification, clarified where appropriate, and adjusted where necessary. -- NormanGray

More than one solidus

The FITS specification, and the draft VOUnit specification both say "The IAU style manual forbids the use of more than one slash (/) character in a units string. However, since normal mathematical precedence rules apply in this context, more than one slash may be used but is discouraged." There's no way of expressing 'discouraged' in a grammar -- something either is a valid string or it isn't. I suggest that the VOUnit grammar simply forbid multiple solidi. -- NormanGray - 20 Dec 2011

Discussion:

Fractional powers and brackets

I wasn't able to deduce from the FITS spec whether m^1.5 was legal or not, because the power is not surrounded by brackets. I think it's meant to be illegal (and the grammar in the Unity library reflects this), but the VOUnit spec should be more precise here. I don't think it much matters which way we decide. -- NormanGray - 20 Dec 2011

Discussion:


Special cases

Quantity with no unit

For example a character string or a dimensionless ratio. We suggest to use an empty string. Other possibilities are one or 3 dashes, or blanks.

Discussion:

I'd definitely go for an empty string; the only other interpretation for an empty string could be a null value, and that's far less useful. -- MarkusDemleitner - 02 Jan 2012

Percent

This means a factor 0.01, with no unit. Do we allow "%"?

Discussion:

Unknown unit

What to do when we know the unit is not known?
  1. Use a question mark: ?
  2. Use an empty string
  3. Other notation?

Discussion:

Does this have to be specified in the VOUnit document? I don't think so, and believe it can be a library matter.

I suggest we simply define three conformance levels: that a unit string is 'as recommended', 'is acceptable' (meaning that it uses only recognised but not necessarily recommended units), and 'is parseable' (meaning that it conforms to the grammar, but that there are either unrecognised units or units used in an disrecommended way, such as with inappropriate SI prefixes). Since the document is defining a grammar rather than an API, it's not reasonable for it to define error behaviour. -- NormanGray - 20 Dec 2011

Well, to me that's basically a NULL unit. I'd go with Norman in suggesting to communicate null values "out of band", i.e., by deferring to the container format or an API. If it turns out need in-band signalling of this after all, I'd go for the question mark; I cannot see this notation colliding with anything useful, and it's clear enough -- MarkusDemleitner - 02 Jan 2012


Mathematic expressions

Any objections to following the FITS guidelines?

Regrets that VOUnits don't include trigonometric functions?

Discussion:

Is there any chance we can agree on one and only one notation for a given operator? I'd say, let's have one multiplication operator (my order of preference: blank, asterisk, dot) and in particular one exponentiation operator (** only if the * is not used for multiplication, and I'd say joining the strings is out, at least if "expr" (whatever that is) rather than exclusively an integer is allowed in the exponent) -- MarkusDemleitner - 02 Jan 2012

That depends on what the goal of the exercise is. As above, I suggest simply adopting a cleaned-up FITS spec.

'expr' is only implicitly defined in the FITS spec, but I've taken it to be

numeric_power:    INTEGER 
                | OPEN_P INTEGER CLOSE_P 
                | OPEN_P FLOAT CLOSE_P 
                | OPEN_P INTEGER DIVISION INTEGER CLOSE_P 
                ;
...which seems reasonable. -- NormanGray - 09 Jan 2012

AOB ?


Edit | Attach | Watch | Print version | History: r11 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2012-01-27 - NormanGray
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback