Science Platform: Towards Data Science
ADASS 2019 - Gronigen - 9-10-2019
Room 1-2
Participants: ~50
Discussion Topics
• What is your science platform?
• How can our platforms work together or interoperate?
• How are we approaching HPC or cloud resources (public or provate)
• The role of IVOA standards.
• Let make our platform sustainable.
Participats Presentations
- JJ Kavelaars (CADC) Slides
- Gerard Lemson(JHU - IDIES) Slides
- Steven Crawford (STScl) Slides
- Andre Schaaff (CDS) Slides
- Christine Banek (LSST)
- Petr Skoda (VO-CLOUD) Slides
- Zheng Mayer(Astron ESCAPE Wp5) Slides
- Sonia Zorba (IA2) Slides
- Dave Morris+Marco Molinaro (Virtual Observatory) Slides
ADASS 2019 BoF Summary Session Slides
Discussion
CADC introduces the CANFAR platform and the ARCADE science platform on top of it. ARCADE is a Desktop oriented Science Platform for data reduction. The design has been done for ALMA users.
CADC platform is "entirely" based on IVOA standards. It offers DOI to assign to VOSpace nodes.
Q: how can we implement an efficent data transfer? Can we introduce a off line data transfer service?
SciServer is a platform user oriented that implements resource sharing,
agnostic storage to host and share datasets, query and analysis tools. Is can be used as collaborative workspace. It can be installed with K8S and HELM. It can be used to explore CPU and GPU computing or to access DASK or SQL Server 2019 Big Data Studio.
Q: how can I make my SP interoperate? in practice how can a user execute a contaner based on my image on another science platform?
Q: sustainbility problem, I have a finite resourcepool, can I make my SP able to deploy on commercial clouds so that users pay for it? Is it a functional sustainability approach?
STScl open source oriented science platform that profit of github and AWS clouds. It uses
K8S HELM and Docker and provide Jupyterh Labs to users.
Q:
Security How do we secure the environment from external and internal sources? How do we control exclusive access data in the environment?
Q:
Collaboration How do we provide an environment that enables collaboration while being secure?
Q:
Observability What are the important metrics to monitor? How do we have real time insight into the system?
Q:
Cost How do we control costs for different users and different use cases? How do we provide access for different users?
CDS Science platform concepts A science platform could provide an access to Simbad and
VizieR through APIs, visualisation through Aladin Lite (ipyaladin) and python tools for
HiPS and MOC, computational facilities to X-Match catalogues.
Comment: It must be well framed and sized, complementary to the services and developed with scientists for scientists =>
scientific sustainability
Q: storage space is necessary and must be allocated
Q: Can we accept external Containers?
LSST science platform allows accessing to data and processing for LSST. There are three interactive ways of exploring the data portal (discovery, structured work flow between datasets, querries), notebook environment (ipython, contemporary hub technologies) and public API/VO and REST.
VO-CLOUD is an example of science paltform that implements batch processing based on UWS VO standard and it is focusing on interactive Machine learning: "active learning".
The European funded project ESCAPE is developing a science platform on top of European Open Science Cloud. It will include VO data access standards, AARC compliant authentication and authorization, containers from the ESCAPE software market place and EOSC market-place.
Q: what can IVOA provides to simplify the design and implementation?
The Virtual Observatory point of view (not IVOA) identify some important items to build real interoperability and in particular what is missing and what is already available:
- data discovery and access, open VO standards
- interoperable AAI solutions (IVOA GWS working on it ... )
- metadata to characterize network proximity
- IVOA metadata EXIST for describing data, NEED to have them for code
- there probably already are some standards for this
- we should find/evaluate and adopt them rather than invent new ones
The use of OAuth tokens for authentication has been discussed by IA2 Sonia Zorba that introduce the problems related to the use of tokens outside the browser and web enviroment. We must support also CLI and terminals so this is crucial for Astronomers.
Q: should we build complex clients or use separate AA for CLI?
Q: what about using certificates? PKI is still very supported and secure.
Conclusions
We identify three different science platforms types:
- Applications oriented: platform that provides access to tools and visualization
- Project oriented: access to data and sw and pipelins and contaners for a specific Project.
- user oriented: the user can also provide his own software and data.
We identify some common concepts:
- Interoperability among science platforms aims at enabling direct re-use of data/service/code/workflow resources among standalone platforms
- "ecosystem is better than monoculture" so we should build iteroperability from the ecosystem idea
- Interoperability means:
- interoperable data resources
- interoperable code resources
- moving them around in a realistic and reasonable way
- The use of FAIR principles
Actions:
- Common Resource access control?
- Run my platform on your k8s cluster?
We identify research area for the future:
- Interopoerable Authentication and Authorisation
- Data Access and proximity standards: how far are may data, shoudl I move data or software
- Software regisitry: how can I find contaiiners
We need to indentify starndards procedures to allow users to move from one platform to another or use at the same time two platforms.
IVOA has a central role but we do not want to "reinvent the weel" so let see also what is the activity of large commercial data providers.
Continue the discussion on IVOA framework and also in ADASS in a couple of years