Difference: ObsCoreExtensionForRadioData (15 vs. 16)

Revision 162024-03-11 - FrancoisBonnarel

 
META TOPICPARENT name="IvoaRadio"

ObsCore for radio data:

This project also discussed on github

Some considerations: Authors: A. Zanichelli, V. Galluzzi, M. Molinaro (INAF)

Here below we report some considerations mainly focused on single dish data with respect to the two draft documents: IVOA Obscore Extension for Radio data Version 1.0 (IVOA Note 2022-10-14) and Pulsar and FRB Radio Data Discovery and Access Version 1.0 (IVOA Note 2022-09-22).

These comments are intended for further discussion within the IVOA Radio Interest Group.

dataproduct_type

Current values for dataproduct_type as in the preliminary document Data Product Type vocabulary do not seem suitable to describe single dish observational products in order to allow efficient/successful data discovery.

Following the same “parent / narrower Term” classification, we propose the value “sdradio” to be used 1) as a parent Term for any type of single dish data or 2) as parent Term associated with a set of more specific, narrower terms identifying more precisely the various data products coming from the possible observing modes.

The value “sdradio” identifies the electromagnetic domain of the data product. We would prefer not to use a more generic “singledish” which would be more strictly related to the instrument more than the physical observable (also, single dish instruments are not used in the radio domain only).

FB answer I think you are right that the data predict type of single dish data has to be discussed. However i don't think creating a specific data product type for radio single dish data is consistent with the current concept of data_product_type which is more on the dataset structure with respect to the axes nature; (because it is important for the tools which may display, render or analyse them) so i think according to following discussion things like "cube", "spectra" , "timeseries" or "dynamic spectra" should be ok. We are still looking for some term for spectropolarimetry product. My feeling is that we need to better describe single dish datasets is new parameters such as observing type or modes or scan modes or whatever; FrancoisBonnarel - 2023-02-09

BC answer I agree with François, the dataproduct_type is about the organization of the data dimensionalities, its axes, etc, not the way it is recorded, nor the spectral range. Mixing the "radio domain", "instrument type" and "data dimensionalities" will make things difficult to separate the various semantic components. BaptisteCecconi - 2023-02-09

The optional, free-text dataproduct_subtype parameter could be used for a more detailed description of the data content. A better solution could be to use the sky_scan_mode parameter proposed in Table 1 of IVOA Obscore Extension for Radio data Version 1.0 (IVOA Note 2022-10-14). This last parameter offers the advantage of using a predefined vocabulary, thus avoiding the use of free text.

FB answer yes dataproduct_subtype is free text; As written above we probably have to introduce new paramters in this extension. FrancoisBonnarel - 2023-02-09

ML answer about scan mode , and other information about the way the data were obtained: These metadata belong to the observing configuration applied in the instrument to obtain the data. It makes a category by itself. This need is also caracterised for the high energy data, and it is worth to describe those parameters separately from the data producttype. It is no longer a core property in terms of data discovery , but it is very useful to radio astronomers. MireilleLouys - 2023-02-09

BC answer There are already 3 types of pointing listed in the "ObsLocTAP" standard, in a "tracking_type" keyword. The current values are: "tracking", "solar-system-object-tracking", "fixed-az-el-transit". This seems to call for a list of an external terms, which would be maintained in with Semantics WG. BaptisteCecconi - 2023-02-09

The following figure shows the main single dish observing mode:



The following table summarizes the possible values for dataproduct_type and narrower/parent terms associations.

Term

Parent

dataproduct_subtype

sdradio

spatial-profile

#sdradio

skydip

map ()

#cube, #sdradio

on-the-fly map, raster map

on-source

#spectrum, #sdradio

frequency switching, position switching,

tracking

crosscan (*)

#spectrum, #sdradio

on-the-fly cross scan, raster cross scan

FB answer for sdradio : i would say this is more something like an "observation_type" . sdradio will differ from interferometry FrancoisBonnarel - 2023-02-09"

(*) Single dish radio maps cannot be considered as “image” dataproducts. Data are typically written in (a) table(s), each row containing coordinate positions, timestamp and raw intensity (raw counts) and further processing is required to obtain a proper image. Also, in the more general case data are not acquired on a regular 2D grid in a single map. Typical observations consist of more than one map, to be combined to recover the final image. Maps can be obtained in spectropolarimetric mode, so the most appropriate parent term seems to be “cube”.

FB answer sure this looks like a (sparse) cube; FrancoisBonnarel - 2023-02-10

(**) In principle the crosscan can be executed in raster mode instead of on-the-fly. For this reason the narrower term has been left more generic and the specific description is demanded to dataproduct_subtype.

Note that INAF has no Phased Array Feed receivers onboard the radio telescopes so we are not taking into account cases specific to beamforming techniques. Thus, more values could be needed. Do we have any PAF expert in the RadioIG?

FB answer i think LOFAR and Nenufar people do have this; Yan ? Baptiste ? Alan ? FrancoisBonnarel - 2023-02-10

BC answer About phased arrays, yes, I think Alan and myself can give some inputs. BaptisteCecconi - 2023-02-10

This approach has some advantages: the narrower terms are in principle usable also in other spectral domains, associated with appropriate parent values/dataproduct_subtype. A query may happen in a two-level mode: a generic one can be done on “sdradio” getting back all the data products associated to any narrower term; alternatively a more detailed query can be done directly on one of the narrower terms.

FB answer if we consider all this is done by a (some) new parameter(s) to describe the observation (and not the product type) do we prefer several parameters or one single parameter with a hierarchy of terms ? FrancoisBonnarel - 2023-02-10

We are aware that this proposal is somehow different from the general VO approach because it is strongly related to a particular instrument/telescope design. However, we are motivated by the need to make single dish data discoverable in an effective manner, which could be hardly achieved by using the current ObsCore dataproduct_type values.

FB answer In other words : we have to distinguish the description of the observation (which is something like a provenance) from the type of the data which is important for the usage of the data; so really, again, I think we need a (some) new parameters in the extension FrancoisBonnarel - 2023-02-10

ML answer I suggest having a special extension for observing configuration. this would also fit to other domain like X rays, high energy , etc . MireilleLouys - 2023-02-10



em_xel and spectral MOCs

Single dish radio data may contain a multifrequency setup, that is many spectral windows disjointed on the spectral axis and with different resolutions (i.e. different numbers of spectral channels). In such a case em_xel can be computed but could lead to an incorrect interpretation of the actual spectral sampling of the dataset. In this respect, current efforts towards the creation of energy/frequency MOCs by the IVOA Applications Working Group could represent a solution. We note that frequency MOCs would also offer a comprehensive representation of the em_min, em_max and overall spectral coverage/properties of the dataset.

A further extension of the work on energy/frequency MOCs could be developed to include also the polarimetric information. In fact, two polarisation states in the same dataset could in principle have different frequency setups.

FB answer From the ObsCore/Characterisation point of view what you describe is the "support" concept which is more accurate than "bounds" (= min/max). A support is either a spatial detailed field og view or a set of intervals. MOC can be used to render those things. It would not be a new extension concept but a coding format for it. In VOTable would be rendered by the xtype attribute (xtype="moc" or "stmoc" or "emoc") FrancoisBonnarel - 2023-02-10

BC answer May be a solution would be to have a generic "multi_dim_coverage" column (or just "coverage"); which there would be a MOC (with spatial, temporal and/or spectral domains, defined by an xtype in the column header). This may cover the other comment later in the text about the variation of the field of view across the spectral window. This should be further discussed, of course, but it might be useful here. Baptiste Cecconi - 2023-02-10

Therefore, ObsCore mapping on radio data in general represents a proof of concept for current developments on the MOC standard.

An example of multifrequency setup is shown in the following figure: a spectroscopic observation in the so-called “zoom mode” with the Xarcos spectrograph, delivering the two circular polarisations for each spectral window.



Comments on other ObsCore parameters

We collected two examples of ObsCore fields whose interpretation appears to be different from the original IVOA prescriptions.

Facility_name and instrument

These parameters in our view should have the same meaning irrespectively if they are referred to space- atmosphere- or ground-based instrumentation. Typically a spacecraft hosts a number of different instruments similarly to what happens with ground-based telescopes. The same applies to modern balloon-borne experiments. Facilities like the ISS can be seen as equivalent to ground-based Observatories in the sense that they host different telescopes/experiments.

We propose that facility_name identifies the (observatory + telescope) hosting the instrument used to acquire the dataset, while instrument describes the acquisition system used among all those available on that telescope.

For instance: facility_name=ESO-VISTA, instrument=VIRCAM

For the radio domain, generally speaking an instrument would be composed by a number of tokens, e.g. specific filters between the frontend and the backend, the frontend and backend themselves, as well as the beamformer/correlator used.

For single dish data we are currently describing the acquisition system with the combination frontend+backend.

FB answer I think we have to follow what you recommend here; FrancoisBonnarel - 2023-02-10

s_fov

In the draft IVOA Obscore Extension for Radio data Version 1.0 s_fov is defined as “A typical value for the field of view size … λ/D where λ is the mid value of the spectral range and D is the diameter of the telescope or the largest diameter of the array antennae or telescopes.”

This appears in contrast with the ObsCore definition: “The s_fov column (1D size of the field of view) contains the approximate size of the region covered by the data product. For a circular region, this is the diameter (not the radius). For most data products the value given should be large enough to include the entire area of the observation; coverage within the bounded region need not be complete, for example if the specified FOV encompasses a rotated rectangular region. For observations which do not have a well-defined boundary, e.g. radio or high energy observations, a characteristic value should be given.”

FB answer this definition is loose enough that the mid value could be also valid. The result point to discussion would be "what we need this value for"; then we can decide which definition we take in our radio case; FrancoisBonnarel - 2023-02-10

BC answer The radio extension of ObsTAP already includes s_fov_min s_fov_max, which is good. For compatibility and consistency with other services, the s_fov should be filled with the "representative" value. I mean, that it is the value, that the provider considers as most representative as possible of the fov value. The provider should have a good idea of the typical value that a user would use to filter data. BaptisteCecconi - 2023-02-10

The former is in fact computed by means of the mid value of the wavelength range, while the latter is computed by means of the maximum value of the wavelength range.

Also, we note that for low frequency aperture arrays whose stations are composed of dipole antennas the diameter D could not be given simply as a dish size but may be defined as a function of the number of dipoles, geometry, etc.

o_ucd

In the current UCD vocabulary (UCD1+ Controlled Vocabulary 1.4 https://ivoa.net/documents/UCD1+/20210616/index.html) there appear to be no primary words suitable to describe raw single dish radio data.

For pulsar data and transient radio data, o_ucd=stat.Fourier could be used, as proposed for visibility data in the Obscore extension for radio data document (v 1.0).

The use of o_ucd=phot.flux.density for raw single dish data does not seem appropriate, since the single dish measured quantity is expressed in raw counts. These counts come from the digitisation of a voltage signal generated in the receiver chain by the incoming electromagnetic field.

FB answer there is phot.count : wouldn't that be ok for raw single dish data ? FrancoisBonnarel - 2023-02-10

MM answer I think it doesn't because it's not photons that are recorded by the ADC conversion of the EM field. This looks semantically different. But I need @alessandra.zanichelli@inaf.it to check if my comment is right. MarcoMolinaro - 2023-02-10

BC answer there might indeed be some semantics issue here. We had this discusssion a few years ago in the Semantics WG, and the proposed solution was to use "phot.flux.density" for both photometric flux density and EM wave flux density, since there would be no sense to have 2 terms in this case. I would say that the raw counts issue is different: "phot.count" means "Flux expressed in counts" (and this is really counting photon hits), whereas the output of an RF ADC is not photon hit counts. I'm submitting a new term for ADU (i.e., analogue to digital converter units) to the UCD group. Baptiste Cecconi - 2023-02-10





Specific comments on IVOA Obscore Extension for Radio data Version 1.0 (IVOA Note 2022-10-14)

  • Introduction (page 3) and thereon: when referring to single dish data format we would avoid to be too specific in referring to the SDFITS format. SDFITS is a registered standard but is not the only possible data format. Other FITS flavours which follow the FITS prescriptions (without being registered standards) are delivered by the various telescopes.

it was given as an example, we may add others to be more realistic. FrancoisBonnarel - 2023-02-10"

  • Sect 2.1:

  1. single dish may be equipped with multifeed receivers, we would avoid mentioning the acquisition of signal through a central beam only because this is a particular case of a more general scenario.

  2. there seems to be a mix between the definitions of s_fov and primary beam. In ObsCore s_fov is defined as the “approximate size of the region covered by the data product”. This is not the primary beam size of the antenna, for instance a map observation has an s_fov described by the approximate size of the mapped region on the sky. See also note on Sect 4.1 below.

This part has to be rewriten FrancoisBonnarel - 2023-02-10

  1. the typical case is not the one stated at the beginning of the section (spectrum), since the acquisition of emission for each spectral sample in the spectral band and polarization can be generally done in more complex modes (for instance, map). The following sentences well summarizes the variety of cases. We should rephrase the content, starting from the general statement and later describing specific cases.

OK FrancoisBonnarel - 2023-02-10"

  1. single dish multifeed receivers typically allow to cover larger spatial regions, acquiring simultaneously spectra from different positions on the sky still sharing the same spectral setup.

OK FrancoisBonnarel - 2023-02-10"

  • Sect 3: a typo in the first sentence: for radio data (not only for visibilities)

OK FrancoisBonnarel - 2023-02-10"

  • Sect 3.2: a correspondent definition for single dish data should be given.

of course FrancoisBonnarel - 2023-02-10"

Also: with ultrawide band receivers (for instance: 20 GHz bandwidth at 7mm) it may happen that the number of spectral windows (each with its own setup) largely increases thus translating in a multiplication of entry lines in ObsCore for the same observation. How do we plan to deal with these cases? Are we happy to have a large number of records in such cases?

I think it's difficult to avoid. Or we have to group together several spectral windows and use the multi-interval support concept FrancoisBonnarel - 2023-02-10"

Something like a spectral-coverage MOC could be useful here, but that would be something for a future ObsCore update as it isn't really radio-specific. I think it makes sense to describe data with small "holes" in the spectral coverage using a single ObsCore entry, (i.e. 16 MHz "bands" separated by something like 1 MHz), but describe data with larger holes by multiple entries (i.e. 1GHz@4GHz + 1GHz@15GHz + 1GHz@22GHz). That would help discovery because people may discard individual entries with a small amount of bandwidth on grounds of not providing enough sensitivity. MarkKettenis -2023-03-01

  • Sect 3.3: the definition of D as the largest diameter of the array antennae or telescopes is not correct for any telescope type. In fact, there is no main mirror size in the case of dipole arrays or beamformed data. Instead, D should be defined as the measuring system aperture scale, as already written in Sect 2.

OK FrancoisBonnarel - 2023-02-10"

  • Sect 3.4: it should be clarified that s_resolution is the best (smallest) spatial resolution achieved during your observation. The minimum of the spectral range should be used instead. this holds true for any radio data. See also note on Sect 4.1 below.

i think it's the same discussion than for the s_fov family. What do we expect from such a value ? FrancoisBonnarel - 2023-02-10"

  • Sect 3.5: last sentence could be changed to mention MOCs: “the use of MOC for s_region is strongly encouraged to gather the more accurate description of the spatial coverage”

See my above comment : MOC is an xtype FrancoisBonnarel - 2023-02-10"

  • Sect 3.6: see note above on single dish values for o_ucd.

OK FrancoisBonnarel - 2023-02-10"

  • Sect 4.1: by ObsCore definition s_fov coincides with s_fov_max. We suggest to introduce s_fov_mid and s_fov_min. Similarly, s_resolution coincides with s_resolution_min and we suggest introducing s_resolution_min and s_resolution_max.

    _

we agree that in the radio case we need three concepts for fov and resolution. Which we call s_resolution and s_fov has to be discussed according to what usage we want to make of it (discovery or description ) FrancoisBonnarel - 2023-02-10"

-- AlessandraZanichelli VincenzoGalluzzi MarcoMolinaro - 2022-11-25

Comments on ObsCore extension and Pulsar/FRB draft documents . John Tobin

See also commented document

1. spelling of names/definintions/utypes - antennae -> antennas, excentricity -> eccentricity or ellipticity hence uv_distribution_exc might not be appropriately named.

OK thank FrancoisBonnarel - 2023-02-10

2. uv_distribution_fill - this definition seemed quite confusing to me as written it seems like you would always get an answer of 1/N_samples, I think it needs to be summation over i,j for n_cells with n_points >=1/n_cells. Maybe I am reading the definition incorrectly. The other issue with this definition is that it does not account for the fact that a dataset can have a large number of channels, where each is actually is own uv point. Each entry in a VO Table will be split into some number of channels, so this might need to be addressed and perhaps requires its own field. Finally, the uv filling factor will also be different depending on whether a user has continuum or spectral line observations in mind, continuum will have multi-frequency synthesis which implicitly increases its uv-coverage, while a spectral line applications will have worse uv-coverage implicitly.

I see your point, and would like to have comments coming from our Astron and JIVE colleagues who originally proposed to characterize the uv coverage this way.FrancoisBonnarel - 2023-02-10"

3. uv_distance_max, uv_distance_min; This might not quite be fine-grained enough because you might have one really long baseline and one very short baseline, but an array is actually configured somewhere in between. Perhaps also adding a 75th percentile baseline and 50th percentile baseline distance would be useful to add to this since those values would provide more information about where most of the uv-coverage is concentrated.

Good point, we were already wondering how to estimate "effective numbers" for these two quantities in order to avoid "outliers". Your percentile is an interesting proposal to investigate. Or can we find another significant minimum and maximum estimation ? FrancoisBonnarel - 2023-02-10"

Well, for dense-core arrays, there might be very few "outlier" baselines, but those are a very significant addition to the core. Hence, we (NenuFAR team) would like to keep the min and max values as they are. Remember that this metadata should be filled for each observations, hence those values should contain the actual baseline min and max values for an observation, not a generic value for the instrument. Since we are building data discovery metadata, the uv coverage keywords should be consistent with each shared dataset. BaptisteCecconi - 2023-02-10

The 75 percentile uv distance is not a generic value, but could be calculated for each dataset. I get your point that dense core arrays will have fewer outlier baselines, but the density of uv points will be such that the beam one gets from imaging a dataset would be more reliably characterized by something like the 75th percentile baseline rather than the max uv distance. I think there would be value in min, max, and something in between like 75th percentile. JohnTobin - 2023-02-10

MK Answer _This is likely to be an issue for VLBI in particular where such outlier baselines are somewhat common. That said, I think users probably will use these parameters to pre-select candidate observations but will always need to look closely at the actual UV-coverage and/or UV-distance plots to determine whether the selected observations actually do meet the scientific requirements. -- MarkKettenis 2023-08-28

4. sky_scan_mode - this has applications beyond just single-dish as listed because interferometry data also have single-pointing, mosaic, and on the fly mosaics. Also, for single-dish there may be other modes that are not covered like drift-scan.

OK. thanks for that. In that case we have to restructure the text. FrancoisBonnarel - 2023-02-10"

5. s_resolution_beam_dirty - it is unclear what is intended here, whether it's to be a map of the dirty beam or the resolution of the dirty beam. If it's the resolution, this is somewhat redundant because the resolution of a cleaned map is derived from a Gaussian fit to the central core of the dirty beam, and there would be a degeneracy with what is provided by s_resolution. The dirty beam image, or psf image as CASA refers to it, is not always archived. ALMA and the NRAO do not include it in their standard image products for instance, so it is unclear how readily available this information would be for most archives.

Obviously the idea was to add a FIELD containing a link to the dirty beam map. The idea was to display it to help the user to figure out the level of quality of data. And this not a queryable column of course.FrancoisBonnarel - 2023-02-10"

Comments on ObsCore extension . Andreas Wicenec

I agree with the comments JohnTobin made above, in particular wrt. uv stats. In addition I would like to raise the following points:

1. Just a very general comment on use cases: I'm not quite sure whether this was discussed, but what are use cases to search for a visibility data set using an antenna diameter or the minimum and maximum uv distance? In many interferometric arrays there are multiple different antennas and specifying or searching for a single diameter is thus not really useful. The uv distances (as well as eccentricity) are highly variable even within a single, long observation and yes, it seems straight forward to calculate them, but in fact it is not, if a dataset is of the order of many TB or even PB. Thus the question about the usefulness of providing these values in the first place. Who would use these values for queries and why? The explanations in the text are kind of fine for UV snapshots, but for observations spanning many hours, many channels and an in-homogeneous distribution of baseline length, they get far less useful to the degree that they might be misleading or plainly not comparable between datasets from different arrays.

2. Related to this, the description of the antenna diameter in the table should also mention that this might be the maximum diameter in case of multiple different ones, like it is in the text (2.2)

3. Another, high level question is more about wording: When reducing a visibility data set it is possible to change the resolution and sacrifice frequency for spatial resolution to a certain degree. It is also possible to change the phase centre as well as the FOV, i.e. the pointing of the final data product. Thus quite a number of the values are more or less just boundaries and in some cases not even very strict ones. I guess one example would be the FOV, since that depends a lot on how far out you are performing the imaging, 1st, 2nd, 3rd null and that is pretty much up to the user to decide. In that sense I think we need to have more constraint descriptions, else these values will not be consistent and comparable, even if we allow for min and max.

4. In the second sentence of 2 the case wavelength vs frequency is correctly made, but then, in particular for the FOV and resolution description the document is referring to wavelength. I think all of this should be described in terms of frequency.

5. Measurement sets allow for an almost insane flexibility and this extension will never be able to account for all of the possible variations, else it would need to replicate the MS data model. Thus I would opt for an extension which is as lightweight as possible and fully driven by actual real-life use cases of queries people would be performing on visibility data. -- AndreasWicenec - 2023-07-26

MK answer Point 1. covers quite a lot of territory. The original idea was to characterize the UV coverage of an interferometric observation with a few numbers that can be used in an ADQL query with the goal of selecting observations that are likely to meet requirements in terms of resolution, largest angular scale and image fidelity I think the current proposal probably has too many parameters now and that some of the proposed parameters are essentially trying to describe the same properties in slightly different ways.

I don't think the size of the dataset is all that relevant; reconstructing the UV coverage from the UVW values that are part of the dataset is one possibility but there are other ways to do this. The original code developed by @matmanc for LOFAR calculated this from scratch based on a description of the observation for example. And even if it is reconstructed from the dataset, the UVW metadata will be several orders of magnitude smaller than the visibility data itself and for a well designed data format (e.g. the MeasurementSet) it will be possible to read this data without the need look at the actual visibilities.

Unless you're looking at sources that are variable on the timescale of the observation, the difference between snapshots and longer observations doesn't really matter, at least as long as the time-dependence of the UV coverage doesn't affect calibratebility of the the data too much.

I agree that antenna diameters are not really meaningful for (inhomogenious) interferometric arrays. But it probably is something users would want to know for single dish observations?

Points 2-5 are all very sensible. Especially point 5. Trying to capture all the details is simply not possible.

-- MarkKettenis -2023-08-28

New vocabulary proposal for ObsCore extension. Authors: A. Zanichelli, V. Galluzzi, M. Molinaro (INAF)

We report some considerations mainly focused on single dish data with respect to the draft documents: IVOA Obscore Extension for Radio data Version 1.0 (IVOA Note 2023-02-13). These considerations are intended for further discussion within the IVOA Radio Interest Group.

dataproduct_type and sky_scan_mode We are proposing a new schema that, while trying to better follow IVOA ObsCore prescriptions, still poses some open questions to be discussed.

Current values for dataproduct_type are taken from the preliminary document Data Product Type vocabulary. Note that the current definition of spectrum in the Data Product Type vocabulary should be extended to include raw counts (not only flux or magnitudes). For instance: “A scalar observable given as a function of a spectral coordinate”.

We assume that multi-feed/multi-beam capabilities can be exploited in case of imaging observations (raster map, otf map). For non-imaging observing modes (ON, ON-OFF, otf cross scan, skydip) single feed capabilities are typically used (in the case of a multi feed receiver, data are recorded from a single/central feed).

In the following table the proposed new values for dataproduct_type and sky_scan_mode are marked in yellow. Note: for dataproduct_type we are using the new definition of “measurements” from the Data Product Type vocabulary (preliminary): https://www.ivoa.net/rdf/product-type/2021-11-18/product-type.html as “Generic tabular data not fitting any of the other terms. Because of its lack of specificity, this term should generally be avoided, and new, more precise terms should be introduced instead.”. The definition of “measurements” in ObsCore-1.1 (REC): https://ivoa.net/documents/ObsCore/20170509/index.html as “A list of derived measurements gathered in a particular original dataset of one of the previous sort after some analysis processing, like a source list, or more generally a list of ‘results’ attached to such datasets” would not be appropriate and would imply the definition of a further value for dataproduct_type. We are also supporting the use of the value “spatial profile” for dataproduct_type (currenlty under discussion). Both “measurements” and “spatial_profile” are written in square brackets to account for their preliminary status/meaning.

We note that sky_scan_mode applies to all radio data, not only single-dish ones. Also, sky_scan_mode cannot describe the frequency switching mode. We propose to change from sky_scan_mode to a more general scan_mode. Alternatively, in order to consider the space and frequency domains separately we propose to keep sky_scan_mode and to add a further term frequency_scan_mode (equal to 'fixed' or 'switching' depending on the cases).

mode

axes

dataproduct_type

note

sky_scan_mode

ON total power

degenerate in: x, y, freq

[measurements]

(a)

on-source (g)

ON spectral

freq, degenerate in: x, y

spectrum

on-source (g)

ON-OFF  total power

x,y, degenerate in: freq

[measurements]

(b)

on-off

ON-OFF spectral

x,y,freq

cube

(c)

on-off

frequency switching

freq, degenerate in: x and y

spectrum

(d)

on-source (g)

on-the-fly cross scan total power

x,y, degenerate in: freq

[spatial profile]

on-the-fly cross scan

on-the-fly cross scan spectral

x,y,freq

cube

(c)

on-the-fly crosscan

raster map total power

x,y, degenerate in: freq

map

(e)

raster map

raster map spectral

x,y,freq

cube

(c)

raster map

otf map total power

x,y, degenerate in: freq

map

(e)

on the fly map

otf map spectral

x,y,freq

cube

(c)

on the fly map

skydip total power

x, y, degenerate in: freq

[spatial profile]

skydip

skydip spectral        

x,y,freq

[spatial profile]

(f)

skydip

(

(a) in this case “measurements” seems to be the most appropriate value for dataproduct_type.

(b) ON-OFF total power: two points are measured, no spectral info. One of the two measurements (OFF) may have very little scientific content if used by itself.

(c) sparse cube. Only the cross in a cross scan (or the ON and OFF positions in an ON-OFF) is sampled. An otf spectropolarimetric map taken with a peculiar scanning strategy (for instance a spiral pattern) may result in a sparse cube. A spectral skydip may be considered a sparse cube, but see note (f)

(d) frequency switching has dataproduct_type=spectrum. The resulting spectrum may contain gap(s) if the frequency switching encompasses non-overlapping bandwidths.

(e) In the case of total power raster or otf map we propose to introduce a new value dataproduct_type=map. In fact the frequency axis is degenerated so that `”cube'' is not appropriate and at the same time this is not an “image” as it is intended in the VO dataproduct_type Document. A single dish total power radio map cannot be considered an “image” dataproduct because data are typically written in table(s), each row containing spatial coordinates, timestamp and raw intensity (raw counts). Further processing is required to obtain a proper 2D image. Also, in the more general case data are not acquired on a regular 2D grid (for instance, if one uses a spiral scanning pattern). Map should be the parent term for “image”.

(f) in principle, a spectral skydip could be considered as a collection of spectra (e.g a sparse cube). However, the relevant information (atmospheric opacity) is contained in the spatial profile derived from the observation, thus the choice of dataproduct_type=spatial profile also for spectral skydip.

(g) sky_scan_mode value “on-source” should be added for ON and frequency switching observations.

The following figure illustrates the main single dish observing mode.

Note that INAF has no Phased Array Feed receivers onboard the radio telescopes so we are not taking into account cases specific to beamforming techniques. Thus, more values could be needed. Do we have any PAF expertsexpert in the RadioIG?

-- AlessandraZanichelli VincenzoGalluzzi MarcoMolinaro - 2023-08-25

s_fov

The field of view of an interferometric array can be further restricted by bandwith and/or time smearing, that is the frequency resolution and integration time chosen when correlating the data. This is especially true for VLBI where these correlator parameters almost always restrict the field of view to only part of the primary beam of the primary beam of the individual antennas. This is probably worth adding to the description of s_fov.

-- MarkKettenis -2023-08-29

Discussion after release of 2023 November version.

initial list of changes

new dataproduct types : the current ObsCore list seems to be insufficient : a proposal is made to solve that

instrument types, scan_modes and tracking_modes parameters are proposed to tackle various Observation configuration or provenance details

frequency characterization : solution is proposed to take into account concerns about potential inconsistent differences with wavelength characterization

TAP management and standardID for the extended ObsCore list of attributes : solution is proposed as a single table or but with possible implementation as a view on top of two tables

miscellaneous small changes

-- FrancoisBonnarel -2023-11-08

f_min / f_max / f_resolution

You'll not be surprised that I still think sect. 4.2, f_min and f_max, is a bad idea and you should rather require an ivo_specconv function as discussed before; it'll also immediately placate folks who want Hz, MHz, GHz, or THz instead of the kHz you went for.

-- MarkusDemleitner -2023-11-09

(1) The longer I think about them, the less I like f_min and f_max. If you look at use case 1.3: "range inside the 1 to 1.5 Ghz band" -- and then people have to write f_min > 1000 AND f_max < 1500 and thus do some conversion anyway, and to the relatively random unit MHz on top. Please let's reconsider this; I have sympathies for not wanting to write the λ-ν conversions manually, but if

1= ivo_interval_overlaps( em_min, em_max, ivo_specconv(1.5, "GHz", "m"), ivo_specconv(1, "GHz", "m"))

doesn't work for you, let's think again and figure out something that's less verbose. But let's not define something parallel to em_min and em_max with an even more random unit than m.

-- MarkusDemleitner -2023-12-11

That is quite a mouthfull... but it does bother me as well to provide what is essentially the same information twice. And I agree that the arbitrary units are problematic (the current draft specifies f_min/max as using "Mhz" but f_resolution as using "kHz"). We do need to retain f_resolution though as em_res_power simply varies too much for low-frequency observations that span a large fractional bandwidth. Radio observations typically have a fixed frequency reolution (and therefore varying resolving power) across the band.

-- MarkKettenis -2023-12-13

If that's a concern in practice, we can have a more specific function for matching radio intervals in obscore tables; but to design these,saying having a few clearer use cases would be useful. Perhaps

1= ivo_has_radio_interval(1, 1.5, "GHz")

(that has built-in knowledge about em_min and em_max) is justifiable?

-- MarkusDemleitner -2023-12-13

- I would advocate (very) strongly to not record the same information (em_min/max, f_min/max) in the same table but make it easy to query on energy, wavelength, or frequency by adding convenience functions, such as Markus proposes at some point. For the raw table information let's stick to one (1) physics representation.

-- MarjoleinVerkouter -2023-12-18

Yes I think it's reasonable to use the same unit, Hz for all the frequency fields. We will change that in the draft.

That's much simpler than to find the right multiple for any quantity and sub domain. You are right!

As for the frequency characterization, I would advocate :

Why are we building an ObsCore extension for radio data ?

We are indeed speaking of data discovery.

"core ObsCore" metadata help to discover any kind of datasets in any spectral domain. But in the radio domain some specificities are not well enough taken into account by the standard.

And this is specifically the case for raw data (visibilities in the radio case)

The consequence is that the result of a query is only roughly matching some of the discovery tasks.

The basic idea with the ObsCore extensions is the same than the one we enhanced by creating some Optional fields in the original ObsCore. Not forcing anything but ALLOWING to add details in order to better tackle some specific needs.

When the CSP identified the rather low uptake of VO service in the radio domain and defined the goal to fix that as an IVOA priority, and when in parallel the Euro-VO Asterics project (+ESCAPE) held several sessions around this, the first thing we heard from many/many radio astronomers and potential users was :

"Hahh, gosh .... Wavelengths !!!"

After explaining these colleagues why we definitely needed a common language for everybody and why wl is the minimal lingua franca, we also considered to provide them with a couple of additional fields useful for them.

In practice I imagine many radio archives start by storing their metadata in frequencies and transform them into wavelengths using functions, views or whatever to be consistent with ObsCore.

So for sure conversion udf probably exist anyway. But this is implementation.

But If we provide an extension, then this extension should be easy to use for queries and for metadata visualisation for the users. And also for client developers.

Nobody is forced to use a radio extension. But if people are in the spirit of using it then this has to be easy and readable for them.

Last thing : if we have a parameter based interface to ObsCore (and extensions) in a near future (see :https://github.com/ivoa-std/DAP), frequency characterization will be provided with an optional parameter and with standard column names anyway.

-- FrancoisBonnarel 2023-12-22

ALLOWING to add details

Uh... let's be very careful with language in the vicinity of "allowing" and "optional".

If the obscore extensions are supposed to work for global discovery (i.e., one query is executable on all compliant services), then all fields people may write constraints against (and that means: by default all of them) need to be mandatory for the ivoa.obs_radio table.

Without such a requirement, an all-VO query would first have to work out where some column is available and then decide whether to re-write the query and drop some constraint or whether to skip the service in question. That would be painful for client writers and mystifying for their users.

Ceterum censeo Optional Features Are A Bane.

So: either we have f_min/_max or we don't.

"Hahh, gosh .... Wavelengths !!!"

And right they were: We should have used energies, which not only (like frequencies) are independent of the medium but also work for massive messengers.

But we didn't, and we can't sensibly fix that by adding extra columns. I, for one, would totally be in favour of planning a transition to energies over a few versions of obscore, but that's nothing an extension could do.

In practice I imagine many radio archives start by storing their metadata in frequencies and transform them into wavelengths using functions, views or whatever to be consistent with ObsCore.

For ingestion, I claim it really doesn't matter; the ingestion rules are written once, and very quickly on top.

No, the question is: Can we make writing obscore queries more pleasant across the electromagnetic spectrum? I can see how f_min/_max help a bit there, but it's just a bit (because everyone still has their non-Hz native units, including wavelengths ("21 cm", "submillimeter")). And to me the massive denormalisation is too high a price for what little it buys.

Anyway, if you really can't find it in yourself to simply drop the two columns, at least say something like: "Non-NULL f_min MUST be equal to c/em_min and f_max MUST be equal to c/em_max, with c=299792458 m/s; implementations are advised to ensure this by using, for instance, views."

But don't you agree that, written like this, this definitely looks like a bad idea? I notice in passing that in this way, it might be that

em_min between 1e-2 and 2e-2

is fast but the -- according to the above stipulation -- equivalent

f_max between 14989622900.0 and 29979245800.0

is not. That's because the query planner may very well not be smart enough to see it could use an index on e_min when the query it sees after expanding the view statement is against 14989622900.0/em_min[1]

Nobody is forced to use a radio extension....

Well, but we certainly would like them to use it if the have radio data, right?

-- MarkusDemleitner 2024-01-03

table and standardID management

Comment on the Document

[The ObsCore extension for radio (including or not visibility data) described above SHOULD be added to the main ObsCore table.

This is painful for data centres that have only a few radio items and millions of non-radio items (such as me). In effect, you'd be forcing me to do a massive denormalisation of my data. And yes, people shouldn't do SELECT *, but they do, and then they have all these extra columns with NULLs in them for no benefit at all, in particular because you acknowledge it'd be a joined view anyway.

On the other hand, if there were some overriding reason to have the columns in the main obscore table, then don't make it SHOULD. If this is supposed to work at all, it must be a MUST -- software doing radio obscore queries has to be able to rely on it if it's necessary for some of your usecases.

But as I said: I don't believe it's necessary, and then we shouldn't do it, neither as SHOULD or MUST (I take the liberty of citing my own https://blog.g-vo.org/requirements-and-validators.html in support of this argument).

Doc : In practice a table containing only the extension attributes MAY be added to the same schema.

That's where I believe the MUST needs to sit. That way, your sample queries will have a JOIN (my advice based on RegTAP experience: make it a NATURAL JOIN and leave it to the implementors what the join columns actually are). Side benefit discovery rules are a lot simpler.

You'd just be saying:

Register the obscore extension as a VODataService CatalogService with a tableset only containing the ivoa.obsradio [or whatever; feel free to suggest a different name. I'd also be open to loosening naming schemes for ivoa.obscore, but I don't think the radio extension is the place to start that discussion] table. Assign a table utype of ivo://ivoa.net/std/obscore#radioext-1.0 to that table. For later extensibility, discover radio-extended obscore tables using this utype (rather than the table name). In RegTAP, you would find TAP services having radio extensions like this:

SELECT access_url, table_name FROM rr.capability NATURAL JOIN rr.interface NATURAL JOIN rr.res_table WHERE standard_id='ivo://ivoa.net/std/tap' AND table_utype like 'ivo://ivoa.net/std/obscore#radioext-1.%'

I'd do a PR for that if you don't flame me too hard.

Also, if someone gives me radio data needing these extra columns, I'm happy to do a reference implementation.

-- MarkusDemleitner 2023-11-09

instrumentation details

Another somewhat questionable example is use case 1.10. The minimum number of antennas for a "good" observation really is instrument-specific and the maximum distance between antennas really is just a very poor way of expressing a resolution or uv-coverage constraint for which we already have columns. So unless someone can come up with a scientific use case for these parameters, I think they should be dropped from the extension.

-- MarkKettenis 2023-11-10

Apologies that I have not been following this in detail but whilst I agree with Mark that 1.10 is not a realistic use case, selection by uv coverage is, but just the number of antennas and extrema of baseline lengths is not enough. On the other hand, often all archives provide is baseline length (or even just antenna positions), frequency, pointing direction and observation duration. Metrics related to uv coverage density can then easily be calculated (as in the L5 and L80 etc. metrics in the ALMA archive) but I don't know if this is commonly seachable directly for any archive. So it is not just a matter of what is commonly searched for, but also what archives provide and how much there can be an interface to convert the latter to the former. -- AnitaRichards 2023-11-10

The columns the DaCHS extension now offers are f_resolution, instrument_ant_diameter, instrument_ant_max_dist, instrument_ant_min_dist, instrument_ant_number, instrument_feed, obs_publisher_did, s_fov_max, s_fov_min, s_maximum_angular_scale, s_resolution_max, s_resolution_min, scan_mode, t_exp_max, t_exp_mean, t_exp_min, tracking_mode, uv_distance_max, uv_distance_min, uv_distribution_ecc, uv_distribution_fill

(select column_name from tap_schema.columns where table_name='ivoa.obs_radio' order by column_name)

This doesn't mean I'm convinced all of them should be in. But it does mean that I'm pretty sure all others should go from table 1; in particular, the "via DataLink" things I think are just confusing in there.

-- MarkusDemleitner 2023-12-11

I raised my concerns about the instrument_* columns during the InterOp. Those really are instrument specific and therefore not very useful in generic queries one would issue to multiple TAP services. It may make more sense to provide this information in an observatory specific table (see for example the presentation by Greg Sleap on how the MWA presents information like this).

-- MarkKettenis 2023-12-13

- Given that ObsCore is meant for data discovery, any instrument specific details should be removed as much as possible, to make sure that a cross-ObsTAP-service-query will Just Work ™. That there may be a (defined) mechanism to get to instrument specifics would be important to have - e.g. for visibility data te u,v-plane characterisation. These could be standard extensions of ObsCore but could live outside the ObsCore standard itself if you catch my drift.

- Very much related to that, columns like instrument_and_diameter, instrument_ant_number, or almost all instrument_ant_* for that matter, are, for any array of radio receivers unhelpful. I have argued before. In my opinion they'd fall under an instrument characterisation. I'm also thinking of dipole or large-N-small-D arrays here, in the event they'd want to publish some visibility data.

Possibily a proper instrument characterisation mechanism can be thought of, but in the end, only two things are important for interoperability I think: what was the instrument signature at the time of observation and has it been removed from the published data.

My .02 arbitrary currency units, obviously.

MarjoleinVerkouter 2023-12-18

Added:
>
>
Humm, before ruling out all this, I just want to explain where it came from. This was not a fantasy of the editor or first authors.

To help a user to discover quality data we have to give a way to estimate this quality, by trying to characterize the original raw data.

Of the course the uv plane characterization is the best we can do, but if we cannot provide them whta can be done ? nothing ?

Our background was this rather old but very comprehensive note by Anita Richards

https://wiki.ivoa.net/internal/IVOA/SiaInterface/Anita-InterferometryVO.pdf

The whole section 1.1 is to be read again.

But I would like to mention here these excerpts :

> Component telescope properties, such as location and diameter, can be used to deduce other prop-
> erties if these are not provided explicitly. Should these be included in this model or in Prove-
> nance/Observation (and/or a link to the array web page in the absence of the latter models)

and > Diagnostics of the uv plane coverage.
> – Most conventional radio archives provide links to plots (such as Fig. 1) of visibility am-
> plitude as a function of uv distance This is in dimensionless units of (projected baseline
> length)/(observing wavelength), or SI multiples, written as e.g. Mλ (mega-lambda).
> – Table 1.1 provides a machine-readable quantification of Fig. 1.
> – The range of the maximum and minimum uv distances present in a data set is the simplest
> fairly accurate quantifier. The example shown in Table 1.1 gives 114.93–3553.52 kλ.
>
> – The range of intermediate spacings available as well as the limits of uv coverage affect the data
> quality. There is no commonly-used universal quantifier for this although the coverage could
> be described in more detail using the finer levels of Characterization. Astronomers usually
> inspect plots such as Fig. 1 or the dirty beam (see Section 1.2 and Fig. 2). The time axis also
> provides some information, see below.
> – A crude estimate of coverage can be obtained from the number of participating telescopes,
> the maximum and minimum baselines on the ground in m or km and the duration of the
> observation
(last item bolded by me)

So if the data provider is not able to quantifify the uv plane quality some of these instrumental details seem to be useful

In addition, immediatly before we started the radio interest group in the VO there was an Asterics project / ESCAPE project initial work on that

which happened to be a first contribution.

A dedicated meeting on "radio astronomy data in the VO" was held in Strasbourg in february 2019.

Several radio astronomers gave talks about their needs and requirements for data discovery.

Loook at Katarina Lutz presentation here :

https://www.asterics2020.eu/dokuwiki/lib/exe/fetch.php?media=open:wp4:wp4techforum5:talk_klutz.pdf

or Yelena Stein's one

https://www.asterics2020.eu/dokuwiki/lib/exe/fetch.php?media=open:wp4:voradio_stein_.pdf

I copy paste you two interesting slides (one from each of these two talks) below

So my conclusion : I understand we still have to discuss the details and science cases. But do not exclude instrumental and configuration details

to early.

-- FrancoisBonnarel 2024-03-08

 

science cases

It is good to have these ADQL examples. But I think we need a scientific use case associated with these ADQL examples as well. I still think the current set of additional columns is larger than it should be as there is overlap between e.g. the ones that characterize the uv coverage and the ones that characterize the field of view and/or resolution. I think we should drop the ones for which we do not of have a scientific use case (including an ADQL example).

-- MarkKettenis 2023-11-10

I have to say I find the use cases relatively unconvincing because at this point they take some constraints out of thin air and then repeat them three times with slightly differenty syntaxes. That's not very helpful for working out why anything in the extension is the way it is.

I'd find it a lot convincing if the use cases stated a scientifically meaningful discovery problem -- for instance, why would one look for a "dataset with a field of view larger than 0.5 degree"?

Also, at this point I don't believe in any use case involving target_name -- we don't have reliable rules for how to write them (and likely will never have them for "ordinary" objects). Don't fool people into believing they could do all-VO searches using target_name. Convert all of them into positional queries, because that's the only interoperable technique for this kind of thing at this point (but of course it's fine and even welcome to mention object names in the use case formulation).

For solar system objects with their fast-changing positions, the whole problem might pose itself differently, but then that's more epn-tap's problem and arguably doesn't belong here.

Of course, all the queries will have to be re-done if PR #43 is merged, but that's minor, and I'd volunteer. But before such an update, let's have (perhaps fewer but) more meaningful use cases, preferably giving, in sum, justification to each and every column in the extension.

-- MarkusDemleitner 2023-12-11

utypes in ObsCore extension

As usual, I have utype quibbles; in particular, it looks funny if there suddenly are underscore-separated words in there ("Provenance.Observation.tracking_mode") where I think all other utypes (including some here) are CamelCase.

Also as usual, I don't think the column utypes here have any discernable function -- can't we just drop them altogether? I'd be willing to bet that nothing negative happens if we do. (yeah, we're doing something with the table utype, so that's different) -- MarkusDemleitner 2023-12-11

Do you mean you do not want to use utypes for the extension table : 'ivoa.obs_radio' or also for 'ivoa.obscore'.

That sounds strange to me to omit utypes in the table for radio ( or time) extension . The quantities that are stored in the columns need to belong to some more general schema carrying the context. This should be understandable by humans and connect to existing concepts detailed in the data models.

If we would omit them in the table extension, then the role of the columns (or fields) cannot be compared to the one in ivoa.obscore .

The pecularity in Obscore DM is that this data model relies on some ideas and objects defined in Characterisation data model for the physical properties of a data set. But it does not re-use the full structure of those objects, only the main properties that help for data discovery.

I think the relationship between a column and a data model element is useful for clarification and consistence checking . The Obscore authors did not have in mind we would use MIVOT in ObsCore to carry this information in the tables, and further in the TAP response .

Up to now the utype is a light-weight annotation that is useful enough, and used in simple services and ObsTAP. It is not sure yet wether we would gain much in applying MIVOT , but it may be worth to exercise.

-- MireilleLouys 2023-12-13

I don't want to make this a major point, but I cannot fail to notice that utypes do not have any operational role in either obscore nor obs_radio -- there is nothing any software does with them, or any functionality that wouldn't be available if they weren't there. To me, that mildly poses the question of why we bother with them.

We could discuss that in itself, but more pragmatically: do the utypes really help to link to that context? I mean, not even I know what to look up where when I see the string

Provenance.Observation.sky_scan_mode

-- and certainly no machine could do that; so, why not just state whatever context link there may be in the spec rather than bothering implementations and validators with it?

But my point was still a lot more trivial: I'm much rather drop that string altogether than quarrel whether that ought to be SkyScanMode, which I think would make it a bit more consistent with the other obscore utypes.

But feel free to drop the whole thread: While in general I much prefer it when what we require is something we actually need in a standard so it works (long version of that thought: https://blog.g-vo.org/requirements-and-validators.html) -- and that is certainly not true for the column utypes here --, I feel this isn't a good place to re-think age-old VO practices, either.

-- MarkusDemleitner 2023-12-14

Science cases discussion.

interferometry

single dish

META FILEATTACHMENT attachment="ObsCoreExtensionForRadioData_jt.pdf" attr="" comment="John Tobin comments inside document" date="1683695788" name="ObsCoreExtensionForRadioData_jt.pdf" path="ObsCoreExtensionForRadioData_jt.pdf" size="569392" user="FrancoisBonnarel" version="1"
META FILEATTACHMENT attachment="ObsCoreExtensionForRadioData-draft_1.pdf" attr="" comment="Draft Sepetmber 28th" date="1696003316" name="ObsCoreExtensionForRadioData-draft_1.pdf" path="ObsCoreExtensionForRadioData-draft_1.pdf" size="619745" user="FrancoisBonnarel" version="1"
META FILEATTACHMENT attachment="ObsCoreExtensionForRadioData-2023-11-07.pdf" attr="" comment="Modifications to take into account August 29th running meeting included" date="1699472515" name="ObsCoreExtensionForRadioData-2023-11-07.pdf" path="ObsCoreExtensionForRadioData-2023-11-07.pdf" size="588448" user="FrancoisBonnarel" version="1"
META FILEATTACHMENT attachment="ADQLusecasesForObsCoreExtensionForRadioData.pdf" attr="" comment="firts attempt to write use cases as ADQL queries examples" date="1699472566" name="ADQLusecasesForObsCoreExtensionForRadioData.pdf" path="ADQLusecasesForObsCoreExtensionForRadioData.pdf" size="200662" user="FrancoisBonnarel" version="1"
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback