Difference: ParquetInAstronomy (8 vs. 9)

Revision 92024-12-17 - MarkTaylor

 
META TOPICPARENT name="IvoaApplications"
Changed:
<
<

Parquet In IVOA

>
>

Parquet in IVOA

Deleted:
<
<

November 5 2024 7:00PM UTC Online meeting

 
Changed:
<
<
22 Zoom participants
>
>
Deleted:
<
<

Agenda:

 
Changed:
<
<
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
>
>

VOParquet Note

Deleted:
<
<

Agenda

 
Changed:
<
<
- Around the table short description of the groups and Parquet-related projects
>
>
A Note is under preparation at https://github.com/ivoa/voparquet (formatted versions available: html, pdf).
 
Changed:
<
<
- Mark Taylor presentation on VOTable metadata with Parquet files
>
>
This describes use of VOTable within Parquet files to associate rich semantic (VOTable) metadata with the (parquet) data.
 
Changed:
<
<
- Jos de Bruijne Parquet Compression algorithms in Parquet
>
>

Implementation Status

 
Changed:
<
<
- Path forward - Malta Interop
>
>
The following implementations of the VOParquet convention exist.
Added:
>
>
Please add to this list if you know of others.
 
Changed:
<
<

Meeting minutes:

>
>
Added:
>
>
    • Writes VOParquet metadata into output parquet files by default; uses VOParquet metadata from input files when reading if present.
    • Believed fully compliant with VOParquet note as at 17 Dec 2024.
    • A small example output file is attached: skysim10.parquet
 
Changed:
<
<
Active groups using Parquet techonology:
  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
>
>

Documents and Presentations

Deleted:
<
<
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet
 
Changed:
<
<
Questions and comments on the presentation:
>
>

Added:
>
>

Meetings

 
Changed:
<
<
Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
>
>

5 November 2024 19:00 UTC Online meeting

 
Changed:
<
<
Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
>
>
22 Zoom participants
 
Changed:
<
<
Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
>
>

Agenda

 
Changed:
<
<
Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
>
>
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
 
Changed:
<
<
Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
>
>
  • Around the table short description of the groups and Parquet-related projects
Added:
>
>
 
Changed:
<
<
Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
>
>

Meeting minutes

 
Changed:
<
<
Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
>
>
Active groups using Parquet techonology:
Added:
>
>
  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Rubin catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Rubin) - Rubin very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
  • Mark Taylor presentation on adding VOTable rich metadata in Parquet
 
Changed:
<
<
A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
>
>

Questions and comments on the presentation

 
Changed:
<
<
Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:

>
>
  • Gregory D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
  • Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
Added:
>
>
  • Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
  • Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
  • Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
  • Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
  • Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
  • A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
  • Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.
 
Changed:
<
<
- Mark T. and Gregory F-B will start drafting the note
>
>

Actions:

 
Changed:
<
<
- Mark T. will make his presentation available so that others can comment on it - done
>
>
  • Mark T. and Gregory D-F will start drafting the note
Added:
>
>
  • Mark T. will make his presentation available so that others can comment on it - done
  • Mark T. will present progress during the Apps session in Malta
  • Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.
 
Deleted:
<
<
- Mark T. will present progress during the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

 
<--  
-->

META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"
META FILEATTACHMENT attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1"
Added:
>
>
META FILEATTACHMENT attachment="skysim10.parquet" attr="" comment="" date="1734439551" name="skysim10.parquet" path="skysim10.parquet" size="2993" user="MarkTaylor" version="1"
 
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback