ZHANG, Zhen -  Experiences and lessons learned from EP scientific workflow 


 Workflow Design and Function

Mission Context and Team Roles
The China-VO team developed a complex scientific workflow to process data from a space science mission related to high-sensitivity electrical fields and transient events. The data originates from satellites with transmission systems observing high-frequency phenomena.
Key collaborators:
    • NAOC (National Astronomical Observatories): WXT data analysis & archiving
    • IHEP (Institute of High Energy Physics): FXT data processing tools
    • NSSC (National Space Science Center): Data conversion and transmission
Each institution contributed distinct software modules, which the China-VO team integrated into a single workflow.

 Workflow Design and Function
    • Processes data from Level 0 to Level 3.
    • Handles raw telemetry, event correlation, auxiliary data matching, and product generation.
    • Supports parallel execution, backtracking, and dynamic updates during execution.
    • Capable of producing high-throughput scientific products (100+ TB generated to date).

 Key Challenges and Solutions
1.  Integratibility
    • Problem: Integration across organizations led to frequent dependency bugs.
    • Solution:
        ◦ Use containerization (Docker) as the primary integration method.
        ◦ Apply virtualization and Kubernetes + Argo for deployment and scheduling.
        ◦ Outcome: Independent modules, smoother integration, reduced conflicts.
2.  Schedulability & DAG Limitations
    • Problem: Standard Directed Acyclic Graphs (DAGs) couldn’t model dynamic workflows:
        ◦ Backtracking required when auxiliary data was insufficient.
        ◦ Indefinite parallelism when the number of outputs was unknown.
        ◦ Priority execution needed for time-sensitive modules.
    • Solution:
        ◦ Implement a control/execution separation architecture.
        ◦ Use a message bus to decouple communication and scheduling.
        ◦ Introduce pre/post-processing “aspects” for modular control.
3.  Extensibility
    • Problem: Frequent algorithm updates across teams required agile deployment.
    • Solution:
        ◦ New algorithms are packaged as containers and injected into the workflow.
        ◦ Teams can also build private message listeners for advanced use cases.
4. Traceability
    • Problem: Fault tracing was difficult when outputs didn’t match expectations.
    • Solution:
        ◦ Data and activities are richly described using structured metadata:
            ▪ Data: observation ID, CMOS ID, version, type (e.g. light curve, image)
            ▪ Activities: parent activity, input parameters, logs, runtime info
        ◦ Metadata enables data lineage, debugging, and reproducibility.

Impact and Achievements
 For the Workflow Platform
    • Over 300 algorithm updates
    • Integration time cut from hours to minutes
    • Triggered over 60,000 times
 For Scientific Output
    • Over 100 TB of products generated
    • At least 12 scientific publications

 Design Principles
    • Containerized modules: allow tech stack flexibility and team autonomy
    • Decoupled control & execution: dynamic scheduling and fault tolerance
    • Message queue: supports runtime updates and extensibility
    • Metadata-rich architecture: traceability and reproducibility

Future Directions
VO Integration
    • Treat each workflow activity as a VO (Virtual Observatory) resource
    • Assign VO identifiers
    • Standardize metadata and link to provenance records
Upcoming Missions Supported
    • DSL (Discovering Sky at the Longest Wavelength): a distributed satellite interferometer
    • SPO (Solar Polar-Orbit Observatory): observes the solar polar region with high inclination

Question & Answers

Q1: Have that questions, I so it's actually very nice to see this separation between execution and control, but it don't know if this could be done easily with legacy software. You have to divide the execution from the control, but the legacy workflows are mixed, so it's not so easy to divide them in parts. So, the question is: had you started with workflows from scratch, designing them in this way (control and execution divided) or you worked with existing workflows? to know if we are able to do  similarly with our workflows?
A1: I developed this workflow framework based on some  other framework, for example, this,  architecture, give us the sample inside and we, did some other work, such as communication, aspect and controller, then we find the metadata for data activities to improve the feasibility. 


Baptiste Cecconi -  Workflow orchestration for radio interferometric imaging using EXTRACT 


Presentation Summary 

Project Context: What is EXTRACT?
    • Goal: Develop a data-mining software platform for handling  extreme data (large, complex, diverse, distributed).
    • Computing Continuum: Integrates Edge, Cloud, and HPC (High Performance Computing) resources.
    • Partners: Universities, data centers, and SMEs across Europe.

What Is “Extreme Data”?
    • Huge data volumes (e.g., TB/hour)
    • High velocity & variability
    • Heterogeneous formats, often incomplete or noisy
    • Geographically distributed sources

Use Cases: PER (Personalised Evacuation Route) and  TASKA (Transient  Astrophysics with the Squared Kilometer Array)
    • Domain: Radio astronomy – time-variable sources, e.g., the Sun
    • Instruments: Involves NenuFAR, a French low-frequency radio telescope generating high-volume and high-velocity data
        ◦ Frequency range: 20–80 MHz
        ◦ Data rate: ~15 TB/hour at the edge
        ◦ After on-site reduction: ~1 PB/year sent to a distributed data center

Workflow Challenges in Astronomy
    • Distributed data must be moved efficiently across sites (multi-site storage)
    • No unified orchestration platform for cloud, edge, and HPC systems
    • Scientists run many pipelines with varying tools
    • Need for transparent and user-friendly orchestration

Demonstrated Solution
Key Workflow Components:
    1. Data reading and partitioning
    2. Calibration (on calibration sources and targets)
    3. Imaging (final output generation)
How it Works:
    • User submits workflows via Jupyter notebooks
    • Processing occurs remotely on Kubernetes clusters (e.g., OVH Cloud, EGI, local clusters)
    • Data and results stored in object storage buckets
    • The system:
        ◦ Partitions and distributes tasks
        ◦ Merges results
        ◦ Returns final data products to the user
Demonstration Video (Described):
    • Live Jupyter session running on OVH Kubernetes
    • Fully remote processing (not on local machine)
    • Interactive, minimal setup — only requires credentials and input location
    • Workflow split into modular steps (e.g., data read → calibration → imaging)

Platform Architecture (Technical Note)
    • Kubernetes-based orchestration
    • Object storage for data staging and persistence
    • Modular workflow execution using predefined components
    • Interactive mode + batch processing mode (for large datasets)

Current Status & Next Steps
Achieved:
    • Tested on multiple cloud environments (OVH, EGI, Kubernetes clusters)
    • Proven modularity and scalability of tasks (3-step example pipeline)
    • Notebook interface validated for astrophysical processing
In Progress:
    • Data cataloging and automated cleanup steps
    • HPC integration for hybrid workflows
    • Enhanced scheduling for multi-cluster coordination
    • Full provenance tracking (smart recomputation when inputs change)

Real-Time (Edge) Processing with AI
    • TaskA use case: Real-time detection of transient signals directly on the receiver (edge computing)
    • No hardware modification required
    • AI helps detect bursts and anomalies on the fly

Final Thoughts
    • The platform enables cloud-native astronomy workflows across distributed systems.
    • Supports use cases that do not require exascale computing, offering a complementary solution to large-scale HPC efforts (e.g., France’s national initiatives).
    • Strong potential for application to other observatories (e.g., LOFAR, SKA).

Notable Q&A & Comments
    • VS Code + Jupyter proved very effective for running and demonstrating workflows.
    • Participants appreciated the remote execution clarity — processing was confirmed to happen in the cloud, not locally.
    • The Kubernetes-based deployment was considered smooth and user-friendly