Difference: ParquetInAstronomy (1 vs. 10)

Revision 102024-12-19 - BrianMajor

 
META TOPICPARENT name="IvoaApplications"

Parquet in IVOA

Changed:
<
<
>
>

 

VOParquet Note

A Note is under preparation at https://github.com/ivoa/voparquet (formatted versions available: html, pdf).

This describes use of VOTable within Parquet files to associate rich semantic (VOTable) metadata with the (parquet) data.

Implementation Status

Changed:
<
<
The following implementations of the VOParquet convention exist.
>
>
The following implementations of the VOParquet convention exist. Please add to this list if you know of others.
Deleted:
<
<
Please add to this list if you know of others.
 
  • TOPCAT/STILTS/STIL: protototype implementation at topcat-extra_metaparq.jar.
    • Writes VOParquet metadata into output parquet files by default; uses VOParquet metadata from input files when reading if present.
    • Believed fully compliant with VOParquet note as at 17 Dec 2024.
    • A small example output file is attached: skysim10.parquet
Changed:
<
<
>
>
Added:
>
>
    • For queries where RESPONSEFORMAT specifies Parquet, Parquet file produced with metadata in enclosed VOTable as per the VOParquet note.
    • If the server encounters an error while constructing and streaming out the Parquet file, the client will receive text/plain message describing the error. Since the output stream is already open at this point (and the response code cannot be modified) we think it's the best it can do, but we are open to other suggestions.
    • Will be available in all instances of CADC TAP services including YouCat for user-managed tables.
 

Documents and Presentations

Changed:
<
<
>
>
 
Changed:
<
<
>
>

Deleted:
<
<

 

Meetings

5 November 2024 19:00 UTC Online meeting

22 Zoom participants

Agenda

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Meeting minutes

Active groups using Parquet techonology:

  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Rubin catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Rubin) - Rubin very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
  • Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation

  • Gregory D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
  • Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
  • Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
  • Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
  • Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
  • Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
  • Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
  • A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
  • Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:

  • Mark T. and Gregory D-F will start drafting the note
  • Mark T. will make his presentation available so that others can comment on it - done
  • Mark T. will present progress during the Apps session in Malta
  • Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.
Changed:
<
<
 
META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"
META FILEATTACHMENT attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1"
META FILEATTACHMENT attachment="skysim10.parquet" attr="" comment="" date="1734439551" name="skysim10.parquet" path="skysim10.parquet" size="2993" user="MarkTaylor" version="1"

Revision 92024-12-17 - MarkTaylor

 
META TOPICPARENT name="IvoaApplications"
Changed:
<
<

Parquet In IVOA

>
>

Parquet in IVOA

Deleted:
<
<

November 5 2024 7:00PM UTC Online meeting

 
Changed:
<
<
22 Zoom participants
>
>
Deleted:
<
<

Agenda:

 
Changed:
<
<
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
>
>

VOParquet Note

Deleted:
<
<

Agenda

 
Changed:
<
<
- Around the table short description of the groups and Parquet-related projects
>
>
A Note is under preparation at https://github.com/ivoa/voparquet (formatted versions available: html, pdf).
 
Changed:
<
<
- Mark Taylor presentation on VOTable metadata with Parquet files
>
>
This describes use of VOTable within Parquet files to associate rich semantic (VOTable) metadata with the (parquet) data.
 
Changed:
<
<
- Jos de Bruijne Parquet Compression algorithms in Parquet
>
>

Implementation Status

 
Changed:
<
<
- Path forward - Malta Interop
>
>
The following implementations of the VOParquet convention exist.
Added:
>
>
Please add to this list if you know of others.
 
Changed:
<
<

Meeting minutes:

>
>
Added:
>
>
    • Writes VOParquet metadata into output parquet files by default; uses VOParquet metadata from input files when reading if present.
    • Believed fully compliant with VOParquet note as at 17 Dec 2024.
    • A small example output file is attached: skysim10.parquet
 
Changed:
<
<
Active groups using Parquet techonology:
  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
>
>

Documents and Presentations

Deleted:
<
<
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet
 
Changed:
<
<
Questions and comments on the presentation:
>
>

Added:
>
>

Meetings

 
Changed:
<
<
Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
>
>

5 November 2024 19:00 UTC Online meeting

 
Changed:
<
<
Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
>
>
22 Zoom participants
 
Changed:
<
<
Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
>
>

Agenda

 
Changed:
<
<
Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
>
>
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
 
Changed:
<
<
Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
>
>
  • Around the table short description of the groups and Parquet-related projects
Added:
>
>
 
Changed:
<
<
Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
>
>

Meeting minutes

 
Changed:
<
<
Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
>
>
Active groups using Parquet techonology:
Added:
>
>
  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Rubin catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Rubin) - Rubin very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
  • Mark Taylor presentation on adding VOTable rich metadata in Parquet
 
Changed:
<
<
A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
>
>

Questions and comments on the presentation

 
Changed:
<
<
Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:

>
>
  • Gregory D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
  • Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
Added:
>
>
  • Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
  • Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
  • Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
  • Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
  • Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
  • A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
  • Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.
 
Changed:
<
<
- Mark T. and Gregory F-B will start drafting the note
>
>

Actions:

 
Changed:
<
<
- Mark T. will make his presentation available so that others can comment on it - done
>
>
  • Mark T. and Gregory D-F will start drafting the note
Added:
>
>
  • Mark T. will make his presentation available so that others can comment on it - done
  • Mark T. will present progress during the Apps session in Malta
  • Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.
 
Deleted:
<
<
- Mark T. will present progress during the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

 
<--  
-->

META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"
META FILEATTACHMENT attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1"
Added:
>
>
META FILEATTACHMENT attachment="skysim10.parquet" attr="" comment="" date="1734439551" name="skysim10.parquet" path="skysim10.parquet" size="2993" user="MarkTaylor" version="1"
 

Revision 82024-11-06 - AdrianDamian

 
META TOPICPARENT name="IvoaApplications"

Parquet In IVOA

November 5 2024 7:00PM UTC Online meeting

22 Zoom participants

Agenda:

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Agenda

- Around the table short description of the groups and Parquet-related projects

- Mark Taylor presentation on VOTable metadata with Parquet files

- Jos de Bruijne Parquet Compression algorithms in Parquet

- Path forward - Malta Interop

Meeting minutes:

Active groups using Parquet techonology:

  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation:

Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.

Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).

Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.

Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata

Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.

Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet

Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?

A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.

Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:

- Mark T. and Gregory F-B will start drafting the note

- Mark T. will make his presentation available so that others can comment on it - done

Changed:
<
<
- Mark T. wil present in the Apps session in Malta
>
>
- Mark T. will present progress during the Apps session in Malta
  - Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

<--  
-->

META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"
META FILEATTACHMENT attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1"

Revision 72024-11-06 - MarkTaylor

 
META TOPICPARENT name="IvoaApplications"

Parquet In IVOA

November 5 2024 7:00PM UTC Online meeting

22 Zoom participants

Agenda:

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Agenda

- Around the table short description of the groups and Parquet-related projects

- Mark Taylor presentation on VOTable metadata with Parquet files

- Jos de Bruijne Parquet Compression algorithms in Parquet

- Path forward - Malta Interop

Meeting minutes:

Active groups using Parquet techonology:

  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation:

Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.

Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).

Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.

Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata

Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.

Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet

Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?

A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.

Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:

- Mark T. and Gregory F-B will start drafting the note

- Mark T. will make his presentation available so that others can comment on it - done

- Mark T. wil present in the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

<--  
-->

META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Bulk_download_for_DR4-public.pdf" attr="" comment="Gaia DR4 bulk download plans" date="1730887269" name="Bulk_download_for_DR4-public.pdf" path="Bulk_download_for_DR4-public.pdf" size="925116" user="MarkTaylor" version="1"
 

Revision 62024-11-05 - AdrianDamian

 
META TOPICPARENT name="IvoaApplications"

Parquet In IVOA

November 5 2024 7:00PM UTC Online meeting

22 Zoom participants

Agenda:

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Added:
>
>

Agenda

 
Deleted:
<
<
Agenda:
 - Around the table short description of the groups and Parquet-related projects

- Mark Taylor presentation on VOTable metadata with Parquet files

- Jos de Bruijne Parquet Compression algorithms in Parquet

Changed:
<
<
- Path forward - Malta Interop
>
>
- Path forward - Malta Interop
 
Changed:
<
<
Short minutes:
>
>

Meeting minutes:

  Active groups using Parquet techonology:
  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
Changed:
<
<
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
>
>
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
 Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation:

Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.

Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).

Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.

Changed:
<
<
Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
>
>
Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
  Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.

Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet

Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?

A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.

Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Added:
>
>

Actions:

 
Changed:
<
<
Actions:
- Mark T. and Gregory F-B will start drafting the note
>
>
- Mark T. and Gregory F-B will start drafting the note
 
Changed:
<
<
- Mark T. will make is presentation available so that others can comment on it - done
>
>
- Mark T. will make his presentation available so that others can comment on it - done
  - Mark T. wil present in the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

<--  
-->

META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"

Revision 52024-11-05 - MarkTaylor

 
META TOPICPARENT name="IvoaApplications"

Parquet In IVOA

November 5 2024 7:00PM UTC Online meeting

22 Zoom participants

Agenda:

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Agenda:

- Around the table short description of the groups and Parquet-related projects

Changed:
<
<
- Mark Taylor presentation on VOTable metadata with Parquet files
>
>
- Mark Taylor presentation on VOTable metadata with Parquet files
  - Jos de Bruijne Parquet Compression algorithms in Parquet

- Path forward - Malta Interop

Short minutes:

Active groups using Parquet techonology:

  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation:

Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.

Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).

Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.

Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata

Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.

Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet

Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?

A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.

Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:
- Mark T. and Gregory F-B will start drafting the note

Changed:
<
<
- Mark T. will make is presentation available so that others can comment on it
>
>
- Mark T. will make is presentation available so that others can comment on it - done
  - Mark T. wil present in the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

<--  
-->
Added:
>
>
META FILEATTACHMENT attachment="votparquet-telecon-2024-11-05.pdf" attr="" comment="" date="1730846262" name="votparquet-telecon-2024-11-05.pdf" path="votparquet-telecon-2024-11-05.pdf" size="142840" user="MarkTaylor" version="1"

Revision 42024-11-05 - AdrianDamian

 
META TOPICPARENT name="IvoaApplications"
Changed:
<
<

Parquet In Astronomy

>
>

Parquet In IVOA

 

November 5 2024 7:00PM UTC Online meeting

Added:
>
>
22 Zoom participants
 

Agenda:

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Changed:
<
<
A possible agenda:
>
>
Agenda:
  - Around the table short description of the groups and Parquet-related projects
Changed:
<
<
- Challenges and wish lists. Questions to the community
>
>
- Mark Taylor presentation on VOTable metadata with Parquet files
 
Changed:
<
<
- Path forward - Malta Interop and beyond
>
>
- Jos de Bruijne Parquet Compression algorithms in Parquet
 
Changed:
<
<
- <your specific agenda item - please send it to us>
>
>
- Path forward - Malta Interop
 
Added:
>
>
Short minutes:

Active groups using Parquet techonology:

  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation:

Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.

Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).

Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.

Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata

Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.

Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet

Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?

A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.

Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:
- Mark T. and Gregory F-B will start drafting the note

- Mark T. will make is presentation available so that others can comment on it

- Mark T. wil present in the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

 
<--  
-->
Deleted:
<
<

Revision 32024-10-25 - AdrianDamian

 
META TOPICPARENT name="IvoaApplications"

Parquet In Astronomy

November 5 2024 7:00PM UTC Online meeting

Agenda:

Changed:
<
<
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies/possible IVOA standards.
>
>
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
  A possible agenda:
Changed:
<
<
- Around the table short presentation of the groups and projects
>
>
- Around the table short description of the groups and Parquet-related projects
  - Challenges and wish lists. Questions to the community
Changed:
<
<
- Path forward
>
>
- Path forward - Malta Interop and beyond
 
Changed:
<
<
- <your agenda item>
>
>
- <your specific agenda item - please send it to us>
 
<--  
-->
Added:
>
>

Revision 22024-10-24 - AdrianDamian

 
META TOPICPARENT name="IvoaApplications"

Parquet In Astronomy

Changed:
<
<

November 5 2024 Online meeting

>
>

November 5 2024 7:00PM UTC Online meeting

 

Agenda:

Changed:
<
<
- Around the table intro and short intro on the Parquet application and current issues
>
>
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies/possible IVOA standards.
 
Changed:
<
<
- Issues discussions
>
>
A possible agenda:
 
Changed:
<
<
- Synergies
>
>
- Around the table short presentation of the groups and projects
 
Changed:
<
<
- Plan for going forward
>
>
- Challenges and wish lists. Questions to the community
 
Changed:
<
<

>
>
- Path forward
Added:
>
>
- <your agenda item>
 
<--  
-->

Revision 12024-10-22 - AdrianDamian

 
META TOPICPARENT name="IvoaApplications"

Parquet In Astronomy

November 5 2024 Online meeting

Agenda:

- Around the table intro and short intro on the Parquet application and current issues

- Issues discussions

- Synergies

- Plan for going forward


<--  
-->
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback