Joshua Fraustro - Generation of code from OpenAPI

Presentation Summary
Goal
Evaluate various Python server generators for OpenAPI specifications and determine which is best suited for developers who need to quickly integrate a protocol like IVOA.

Context
    • Motivation: Generating server code from OpenAPI specs simplifies development.
    • Why Python?: Widely known, well-liked, and used in various projects.
    • Why focus on servers (not clients)?: Clients are plentiful, server generators are few or inadequate. Reluctance in use due to software security and to rely on outside developers for support for the critical software.
    • Common issue: These tools often generate broken or poorly documented code and lack adherence to best practices.

Evaluation Criteria
    • Ease of use (e.g. Docker support, setup)
    • Code quality (typing, logic separation, tests)
    • Specification compatibility (OpenAPI 3.1)
    • Documentation availability
    • Project health (github commits, open issues)

Tested Generators (Summary)
    1. openapi-generators/Python-AIOHTTP
            ▪ Easy to use
        ◦ – Broken code, syntax errors
        ◦ – No real OpenAPI 3.1 support
    2. openapi-generators/Python-BluePlanet
            ▪ Good structure, Dockerfile, README
        ◦ – Strangely similar to the previous one (same engine?)
    3. openapi-generators/python-flask
    • All use the same underlying generator engine.
    • All have the same fundamental limitations.
    • The only real changes are the framework they use.
    4. openapi-generators/Python-FastAPI
            ▪ Uses Pydantic, enforces parameter constraints
        ◦ – Lacks documentation
        ◦  Best among the tested tools
    5. FastAPI-Code-Generator (independent project)
            ▪ Supports custom code templates
        ◦ – Many bugs and open GitHub issues
        ◦ – Only handles query/path parameters, not complex payloads
    6. swagger-codegen/Python-Flask (Swagger/SmartBear)
        ◦ – Same underlying template engine as previous ones, same bugs
        ◦ – Server generation is now behind a paywall 

Conclusion
    • FastAPI is the strongest current option, but no solution is ideal.
    • Python server generators for OpenAPI are immature, often poorly documented, and buggy.
    • Code generation is a nice-to-have, but often not worth the effort.
    • For now, FastAPI with manual adjustments is the most reliable choice.

Final Questions
    1. Question 1:
       Could we reuse generated code/documentation to contribute back to these generators?
       Yes, it’s technically feasible. It wasn’t the main focus of the presentation (which assumed the perspective of a lone developer), but as a team, contributing to improve these generators is definitely possible.
    2. Question 2:
       Are there any packages that generate routes dynamically at runtime instead of generating code ahead of time?
       Some libraries (e.g., DRF) offer similar patterns, but most OpenAPI generators do not produce actual server logic—just routing scaffolds. Dynamic solutions could be worth exploring for future work.


Mark Taylor - IVOA Authentication 

Presentation Summary
Goal
Solve the problem:
“How does a VO client know where and how to authenticate with a VO service?”

Problem Breakdown
    • A client may start with:
        ◦ A known TAP service URL, or
        ◦ A direct URL to a resource (e.g. a VOTable) that might be protected.
    • The client must discover:
        ◦ Whether authentication is required, and
        ◦ How to authenticate.

Challenges in Practice
    • Web clients often have out-of-band knowledge (e.g. preconfigured URLs).
    • Headless clients (e.g. Python scripts) often don’t—just a URL.
    • Existing standards don't handle dynamic discovery of auth mechanisms well.

Proposed Solution
    • Formalized in a working draft called IAP – Interoperable Authentication Protocol (on GitHub).
    • Not a new auth system – it fills the gap between a VO client and standard auth mechanisms.
    • VO services aren’t required to support IAP—it's opt-in and non-breaking.

How It Works
    • Relies on HTTP standard: WWW-Authenticate headers (aka challenges).
    • A client probes a resource (e.g. via HEAD or GET) and receives:
        ◦ 200 OK: resource is public
        ◦ 401 Unauthorized: authentication is required, challenges included
        ◦ Proposal: even 200 OK responses can include optional challenges (e.g., for services with guest/privileged modes)

Supported Authentication Schemes
    • Standard:
        ◦ Basic – username/password via HTTP
    • Proposed by IAP:
        ◦ Cookie – client is directed to a login form, gets a cookie
        ◦ X.509 – certificate-based
    • To be added:
        ◦ Bearer tokens (e.g., OAuth2) – ongoing discussion

Current Implementations
    • Servers:
        ◦ ESAC: uses the cookie scheme (e.g., Gaia archive)
        ◦ CADC: supports X.509
        ◦ DaCHS: uses Basic auth
    • Clients:
        ◦ Java library auth (used in TOPCAT and CASSIS)

Ongoing Work
    • Extend to support OAuth2 + OpenID Connect:
        ◦ Requires domain scoping to avoid token misuse
        ◦ Must work in headless contexts (e.g., cloud scripts)
    • Investigate naming:
        ◦ Alternatives like VOAuth or AuthIO
        ◦ Current name: IAP (was a typo but stuck)
    • Standards implications:
        ◦ HEAD requests should be supported in VO services
        ◦ May deprecate outdated elements like securityMethod in SSO

Final Audience Questions
Q1:
Is it standard HTTP behavior to return 200 OK with WWW-Authenticate challenges?
Answer:
It’s unusual but completely legal. The RFC explicitly allows it. Clients not looking for the challenge can ignore it safely.

Q2:
Can we send both Basic and Cookie challenges together in one response, and let the client choose?
Answer:
Yes, multiple WWW-Authenticate headers can be included. The client can pick the scheme it supports (e.g., browsers default to Basic). This approach is recommended for interoperability.


Adrian Damian - Towards Federation of CADC AA&I

Current Context (A&A, CADC)
    • The CADC has been using its own Access Control system for many years, based on Role-Based Access Control.
    • Manages 9,000–10,000 user accounts.
    • Authentication methods:
        ◦ Username/password
        ◦ Cookies
        ◦ X.509 certificates
        ◦ A basic OpenID endpoint (used by services like image repositories, etc.)
    • Supports both web and command-line applications.
    • SSO (Single Sign-On) works across three organizational domains.

Authorization
    • Primarily based on group membership using a Group Membership Service (GMS).
    • Supports nested groups (groups of groups).
    • Works across distributed cloud infrastructures.
    • Requires local POSIX identities for users on some platforms.
    • Previously supported X.509 proxy certificates, now deprecated → delegation now limited.

Shift to Federated Identity
    • Need to support identities across multiple enterprises/domains (e.g. SKA, Rubin).
    • Moving toward OpenID Connect (OIDC) technologies for modern federated identity.
    • Goal: maximum flexibility in access policies, despite lack of standardization.

 Evaluated Solutions
    • CILogon: academic federation (used by Rubin).
    • Indigo IAM: proxy for identity providers (used by SKA).
    • Keycloak: open-source, widely adopted system.
None fully met requirements → decision to enhance the internal CADC system.

Paradigm Shift Considerations
    • Moving from X.509 to OIDC represents a significant shift:
        ◦ X.509 = Passport: long-lived, secure, government-backed, awkward but powerful.
        ◦ OIDC = Ski pass: short-lived, domain-specific, easy to use but less secure by default.

Implementation Guidelines
    1. Accept only signed JWTs, clearly indicating the issuer.
    2. Only trusted IDPs (e.g. SKA, Rubin) will be allowed.
    3. Automatic CADC account provisioning when local POSIX identity is needed.
    4. UI support for logging in via external IDPs (already done in other IDP proxies).
    5. Existing CADC users may link their external identities.
    6. Plan for a credential exchange service to map tokens/certificates across domains.
    7. Support for refresh tokens for long-running jobs (as access tokens are short-lived).
    8. Continued use of internal X.509 certificates for internal workflows.

Conclusion
    • Federated identity is essential for upcoming collaborations and systems.
    • As standards are immature, temporary solutions may be needed.
    • User experience remains a central priority.
    • Invitation to collaborate with others working on similar identity federation challenges.


Brian Major – Firefly on CANFAR

Success Story: Firefly on CADC
    • Firefly has been successfully deployed as an interactive web container within CADC CANFAR.
    • Available to:
        ◦ All CADC CANFAR users
        ◦ SKA SRCNet nodes
        ◦ Potentially in Element Notebooks
    • Thanks to contributions from several collaborators (Shiny, Steven, Gregory, Lloyd).

Tool Overview
    • Firefly: Visualization tool from Caltech; powerful for discovery via TAP, registries, etc.
    • Integrated into science platforms.
    • Also shown working with test environments (e.g., Steven’s examples with images).

Interoperability Considerations
1. Authentication Token Forwarding
    • Challenge: How to authorize access to protected services (e.g., TAP) from within Firefly.
    • Solution: use Firefly Token Relay plugin:
        ◦ Forwards session cookies from CANFAR to services within the same domain.
        ◦ Works because token = cookie in their environment.
    • Limitations:
        ◦ Only works in the same domain.
        ◦ Raises broader questions:
            ▪ Which credentials should be forwarded?
            ▪ Should web tools handle authentication like TOPCAT does with WWW-Authenticate challenges?
Open question: how to support authentication to multiple organizations in tools like
Firefly that expect AAI to be handled at a higher layer?


2. Interactive Container Standardization
Focus: Web-based interactive containers, not just general compute containers.
    • Firefly is one of several interactive container types (others: JupyterLab, Spark, Carta).
    • CADC uses a "contributed container type" to generalize container behavior.
    • Key considerations:
        ◦ Networking/Port: Firefly defaults to port 8080; arbitrary; should ideally be image-defined.
        ◦ Path: Tools should be able to self-discover the path they run under, rather than requiring predefined paths.
        ◦ Startup Behavior: Use container entry points instead of custom slug runners.
Recommendation:
    • Let tools define their own port.
    • Avoid hardcoding paths like /root.
    • Use container-native mechanisms for startup.
Standardization would improve inter-platform portability, even if current local solutions work well.

3. Cross-Tool & Cross-Platform Interaction
    • Firefly has an API, usable from other tools like Jupyter Notebooks (as shown at ADASS demo in Malta).
    • Idea: Build a "platform interface" that allows:
        ◦ Discovering other running sessions (e.g., Firefly, JupyterLab, Carta)
        ◦ Accessing their APIs dynamically
    • Goal: Enable distributed data processing by coordinating existing live tools, not launching new sessions.
Pattern already in use (e.g., Fabric’s session discovery API).
Proposal: Extend this to enable runtime cross-platform coordination.

Audience Q&A Highlights
Q1: Coordination with SKA on software discoverability?
Yes – Brian confirms they are discussing this actively and see value in aligning standardization efforts with IVOA initiatives.

Q2: Could Firefly use the IAP (Interoperable Authentication Protocol) like TOPCAT?
Yes – While Firefly is web-based, IAP could apply as long as tokens (e.g., cookies) are available in the browser context.
Issue: Firefly might not know it has credentials – but browser-supplied cookies may solve this transparently.


Marcos Lopez-Caniego – ESA DataLabs

What Is ESA Datalabs?
    • A science platform co-located with ESA archives to provide compute and collaborative tools alongside data.
    • Goal: Shift from "search, download, process, publish" to "search, process, publish"—especially important for petabyte-scale missions like Euclid.
    • Encourages users to analyze data where it resides, rather than downloading.

Platform Components
    • Mission-specific containers (mostly Jupyter + Python) preloaded with tools.
    • A Container Editor (currently being upgraded) allows users to:
        ◦ Build custom Jupyter/Bash images.
        ◦ Add metadata, set sharing permissions, choose licenses.
    • Standalone apps (e.g., DS9, TOPCAT) complement the container environment.
    • Access to extensive mission datasets including:
        ◦ JWST, HST, Solar System, Exoplanets
        ◦ Euclid (e.g., Data Release Q1)
    • Shared workspaces for collaborative research across missions like Euclid and JWST.

Use Cases
    • Euclid Key Release 1 analyses (e.g., galaxy strong lensing) were performed within the platform.
    • Demonstrated example: container for Euclid Q1:
        ◦ Lean, fast-loading Kubernetes containers
        ◦ Built-in notebooks for data overview, usage instructions, and astroquery integration
        ◦ Users access data via astroquery, not by navigating file systems manually
        ◦ Encourages output downloads only (not raw data)

Data Science Integration
    • LLM-based idea incubation: Users propose data science projects
        ◦ Promising ideas are prototyped within Datalabs
        ◦ Some enter production via platforms like SASKA or other backend services

Technical Features
    • Conda environments in containers to manage dependencies.
    • Access to raw data in read-only volumes, even for restricted datasets (with credentials).
    • Encourages reuse of standard notebooks copied into user workspaces for modification.
    • Astroquery.sara module helps locate file paths and metadata programmatically.

Performance & Scalability
    • Users have performed massive queries (e.g., 1.5M cutouts/day) without stressing archive servers.
    • No major performance issues observed so far.

Challenges & Future Developments
    • Need for improved integration with Euclid sensor heads and the upcoming GP (General Purpose) Data Space Platform.
    • Plans to combine:
        ◦ VO-compliant APIs
        ◦ Astroquery path resolution
        ◦ SARA Labs collaboration tools
    • Pushing forward with federation between science platforms and archives, and potentially a:
        ◦ Shared registry of containers/software
        ◦ Fine-grained access controls (e.g., within mixed public/private volumes)
        ◦ New cross-matching/catalog tools

Not a Replacement, but a Complement
    • ESA Datalabs is not a replacement for archives—it extends them with compute and collaboration.
    • Offers “laptop-level resources” in the cloud with direct access to mission data and tools.

Audience Q&A Highlights
Q1: Can users modify archive data or do they copy it into other areas?
Most archive data is read-only. Users access data directly via terminal/scripts or astroquery.
Proprietary and public data are kept in separate volumes to manage access permissions.

Q2: Has analysis workload ever impacted the archive infrastructure?
Not significantly. Even heavy use (e.g., 1.5M cutouts/day) has not caused issues. The system is performant and resilient.


Powered by: otter.AI and ChatGPT