name="IvoaApplications" |
< < | Parquet In IVOA |
> > | Parquet in IVOA |
< < |
November 5 2024 7:00PM UTC Online meeting |
| |
< < | 22 Zoom participants |
> > | |
< < | Agenda: |
| |
< < | The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards. |
> > | VOParquet Note |
< < | Agenda |
| |
< < | - Around the table short description of the groups and Parquet-related projects |
> > | A Note is under preparation at https://github.com/ivoa/voparquet (formatted versions available: html, pdf). |
| |
< < | - Mark Taylor presentation on VOTable metadata with Parquet files |
> > | This describes use of VOTable within Parquet files to associate rich semantic (VOTable) metadata with the (parquet) data. |
| |
< < | - Jos de Bruijne Parquet Compression algorithms in Parquet |
> > | Implementation Status |
| |
< < | - Path forward - Malta Interop |
> > | The following implementations of the VOParquet convention exist. |
> > | Please add to this list if you know of others. |
| |
< < | Meeting minutes: |
> > | |
> > |
- Writes VOParquet metadata into output parquet files by default; uses VOParquet metadata from input files when reading if present.
- Believed fully compliant with VOParquet note as at 17 Dec 2024.
- A small example output file is attached: skysim10.parquet
| |
< < | Active groups using Parquet techonology:
- Jeff Burke (CADC) - Parquet format for TAP
- Mario Juric (U of Washington) - Upload and download large Ruben catalogues
- Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
> > | Documents and Presentations |
< < |
- Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
- Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
- Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
- Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet |
| |
< < | Questions and comments on the presentation: |
> > |
> > | Meetings |
| |
< < | Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata. |
> > | 5 November 2024 19:00 UTC Online meeting |
| |
< < | Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat). |
> > | 22 Zoom participants |
| |
< < | Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word. |
> > | Agenda |
| |
< < | Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata |
> > | The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards. |
| |
< < | Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file. |
> > |
- Around the table short description of the groups and Parquet-related projects
> > | |
| |
< < | Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet |
> > | Meeting minutes |
| |
< < | Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type? |
> > | Active groups using Parquet techonology: |
> > |
- Jeff Burke (CADC) - Parquet format for TAP
- Mario Juric (U of Washington) - Upload and download large Rubin catalogues
- Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
- Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
- Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
- Gregory Dubois-Felsmann (Rubin) - Rubin very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
- Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
- Mark Taylor presentation on adding VOTable rich metadata in Parquet
| |
< < | A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way. |
> > | Questions and comments on the presentation |
| |
< < | Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.
Actions: |
> > |
- Gregory D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
- Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
> > |
- Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
- Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
- Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
- Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
- Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
- A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
- Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.
| |
< < | - Mark T. and Gregory F-B will start drafting the note |
> > | Actions: |
| |
< < | - Mark T. will make his presentation available so that others can comment on it - done |
> > |
- Mark T. and Gregory D-F will start drafting the note
> > |
- Mark T. will make his presentation available so that others can comment on it - done
- Mark T. will present progress during the Apps session in Malta
- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.
| |
< < | - Mark T. will present progress during the Apps session in Malta
- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc. |
| <--
attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1" |
attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1" |
> > |
attachment="skysim10.parquet" attr="" comment="" date="1734439551" name="skysim10.parquet" path="skysim10.parquet" size="2993" user="MarkTaylor" version="1" |
| |