Validator
A VOParquet validator is available:
- There is a new STILTS command parqlint in STILTS v3.5-2 and later
- Invoke it like this:
stilts parqlint <voparquet-filename> (note not all stilts installations come with parquet libraries; you may need to do java -jar topcat-extra.jar -stilts ... )
- It checks that the key-value entries look OK, does full votlint validation on the embedded VOTable, and reports on any discrepancies between the VOTable metadata and the parquet data
- It has a few optional parameters; see the docs for more details
- One option useful for debugging is being able to test an external data-less VOTable as if it were attached to the parquet file - see the
votable parameter.
Parquet and DALi
The current draft of DALI 1.2 has added a new row to the RESPONSEFORMAT table, introducing the alias parquet for the parquet MIME type application/vnd.apache.parquet . That means that for DALI services offering parquet output, clients can request that format by providing the parameter RESPONSEFORMAT=parquet . See DALI PR#43.
Documents and Presentations
Meetings
5 November 2024 19:00 UTC Online meeting
22 Zoom participants
Agenda
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
Meeting minutes
Active groups using Parquet techonology:
- Jeff Burke (CADC) - Parquet format for TAP
- Mario Juric (U of Washington) - Upload and download large Rubin catalogues
- Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
- Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
- Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
- Gregory Dubois-Felsmann (Rubin) - Rubin very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
- Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
- Mark Taylor presentation on adding VOTable rich metadata in Parquet
Questions and comments on the presentation
- Gregory D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
- Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
- Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
- Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
- Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
- Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
- Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
- A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
- Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.
Actions:
- Mark T. and Gregory D-F will start drafting the note
- Mark T. will make his presentation available so that others can comment on it - done
- Mark T. will present progress during the Apps session in Malta
- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.
<!-- * Set ALLOWTOPICRENAME = TWikiAdminGroup -->
META FILEATTACHMENT |
attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1" |
META FILEATTACHMENT |
attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1" |
META FILEATTACHMENT |
attachment="skysim10.parquet" attr="" comment="" date="1734439551" name="skysim10.parquet" path="skysim10.parquet" size="2993" user="MarkTaylor" version="1" |
|