IVOA Web>IvoaEvents>InterOpNov2024>InterOpNov2024KDD-GWS (2024-11-16, JesusSalgado)

KDD - GWS Joint Session Schedule - AI in Astronomy and Impact on IVOA Standards - IVOA Nov 2024 Interoperability Meeting

Draft Schedule

Time: Saturday, November 16, 2024 14:00-15:30

Location: Aula Magna

Speaker	Title	Time	Material	Abstract
Andre Schaaff	NLP-chatbot R&D at CDS	10'+2'	pdf	Over the past years the CDS has undertaken a long term R&D work on Natural Language Processing applied to the querying of astronomical data services. The motivation was to enable new ways of interaction, especially a chatbot, as an alternative to the traditional forms with the aim to reach query results satisfying professional astronomers. The Virtual Observatory (VO) brought us standards like TAP, UCDs, ..., implemented in the CDS services, helping us to query our services and opening the door to query the whole VO. We will give a quick reminder and status of this work around a chatbot. We started in 2023 to explore how to improve it with the OpenAI API. We are now forking from this initial work to study how to apply it to the improving of our services, in a wider AI use. We will give a first overview of this new R&D study.
Sebastian Trujillo Gomez	‘Spherinator + HiPSter: from the known unknowns to the unknown unknowns’	10'+2'	slides	Current applications of machine learning to astrophysics focus on teaching machines to perform domain-expert tasks accurately and efficiently across enormous datasets. Although essential in the big data era, this approach is limited by our own intuitions and expectations, and provides at most only answers to the ‘known unknowns’. To address this, we are developing a new conceptual framework and software tools to help astronomers maximize scientific breakthroughs by letting the machine learn unbiased interpretable representations of complex data ranging from observational surveys to simulations. Our tools automatically learn low-dimensional representations of complex objects such as galaxies in multimodal data (e.g. images, spectra, datacubes, simulated point clouds, etc.), and provide interactive explorative access to arbitrarily large datasets using a simple graphical interface. Our framework is designed to be interpretable, work seamlessly across datasets regardless of their origin, and provide a path towards discovering the ‘unknown unknowns’.
Giuseppe Riccio	Integrating AI tools in data analysis frameworks: the Vera Rubin LSST and Euclid cases	10'+2'	pdf pptx	Data analytics frameworks offer very useful solutions to connect to a large amount of huge repositories and to collect and provide a set of tools to support scientists in their research on the huge quantity of exceptional quality data produced by the ever-increasing number of sophisticated astronomical instruments. In order to have a framework able to interface with as many archives as possible and to provide a large number of tools, a very high level of standardization is needed, both for repositories and analysis methods I/O. Moreover, integrating advanced data-driven science methodologies for the automatic exploration of data is becoming mandatory to face the huge amount of available data. As part of the LSST and Euclid project, we have developed a portable and modular web application (and its “euclidized” version), designed to provide an efficient and intuitive software infrastructure to analyze data acquired and stored on their official repositories. It is able to retrieve and analyze both housekeeping and scientific data, providing standard statistical and plotting tools, as well as machine/deep learning and data mining techniques and methods. Moreover, we foreseen to integrate an LLM model to simplify some time-consuming configuration operations, currently in charge of the user.
John Abela	The Computational Evolution of Human Intelligence in AI	10'+2'	pptx	The question of whether human intelligence is Turing-computable—replicable on a machine with sufficient complexity—has divided thinkers and researchers for decades. This talk explores two opposing views: either intelligence can be fully understood, quantified, and recreated through algorithms, or it encompasses qualities beyond the reach of computational methods. The recent evolution of large language models (LLMs), capable of producing nuanced, human-like responses, lends credibility to the theory that intelligence may indeed be algorithmic in nature. These models, leveraging enormous capacities, mimic aspects of human cognition, suggesting that machine replication of intelligence is within reach, at least in theory. I will examine the trajectory of AI through advances in LLMs and other architectures, highlighting how they support the hypothesis of intelligence as an emergent property of computational complexity. This perspective aligns with the Turing-computable hypothesis, suggesting that with sufficient resources and sophisticated architectures, machines may achieve levels of understanding and creativity once thought exclusive to human minds. In doing so, the discussion confronts the philosophical implications of these advancements, asking whether AI can not only emulate but embody facets of human-like intelligence.
Sara Shishehchi (remote)	Leveraging Large Language Model(LLM)-based Agents with Multiple Tool Integration for Enhanced Search in the Canadian Astronomy Data Centre	10'+2'	pptx Google Doc	Searching for data, including images, using the advanced search tool on the Canadian Astronomy Data Centre (CADC) website can be difficult for users, as it requires knowledge of the ADQL language and involves multiple steps to narrow and refine search queries. The goal of this project is to leverage Large Language Models (LLMs) and autonomous agents to create a chatbot that assists users in searching for images in the CADC database using natural language. Our LLM-based agent accepts queries in English, converts them to ADQL code, and returns the results after executing the query against the database. The system is designed to handle common user errors, such as spelling mistakes, incorrect column names, and incorrect values. In such cases, the chatbot suggests a shortlist of similar but correct values that the user might have intended. The user's feedback is then collected to retrieve the correct content. This robustness was achieved by incorporating Retrieval-Augmented Generation (RAG) and semantic search tools, which verify query components with the user before execution and test them against the database. To evaluate the performance of our system, we created a dataset of questions across different categories: standard questions, spelling errors, incorrect columns, and incorrect values. The system demonstrates 80-90% accuracy on benchmarks, which is a significant improvement over existing systems built using OpenAI ’s custom GPT, which achieved less than 20% accuracy on the same tests. Our solution streamlines the search process for CADC users, making data retrieval more efficient and accessible.
Panel: Andre Schaaff, Sebastien Trujillo, Giuseppe Riccio, John Abela, Chenzhou Cui Moderators: Yihan Tao, Sara Bertocco, Jesus Salgado	Discussion on the use of AI in astronomy and its impact on IVOA standards	30'	slides

Notes: TBD

* back to main programme page *

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who
pdf	IVOA-Malta-KDIG-GWS-161124-ASchaaff.pdf	r1	manage	12601.0 K	2024-11-16 - 11:35	YihanTao
pdf	IVOA_-_2024_11_16_-_Malta_FINAL.pdf	r1	manage	1652.1 K	2024-11-16 - 08:09	YihanTao
pptx	IVOA_-_2024_11_16_-_Malta_FINAL.pptx	r1	manage	15178.7 K	2024-11-16 - 08:42	YihanTao
pptx	John.Abela.IVOA.November.2024.pptx	r1	manage	7955.3 K	2024-11-16 - 08:25	YihanTao
pptx	Shishehchi_LLM-BasedAgents.pptx	r1	manage	1656.0 K	2024-11-16 - 08:24	YihanTao

Topic revision: r13 - 2024-11-16 - JesusSalgado

IVOA

Log in or Register

IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics

Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki

TWiki intro
TWiki tutorial
User registration
Notify me

Working Groups

Interest Groups

Time Domain

Committees

Stds&Procs

www.ivoa.net
Documents
Events
Members
XML Schema