A Python based TAP Server at Caltech/IPAC-NExScI



Participants: 52

Notes



NASA Exoplanet Science Institute python TAP server - intro slides (Bruce B.)
2 deployments at time of meeting: open source GPLv3, ADQL to SQL w/ spatial indexing, support KOA data policy.
KOA & NExScI moving from webUI to API based query infrastructure.
New NEID EPRV instrument support using TAP, next archive.
NExScI hosts NASA Exoplanet Archive for all known exoplanets, with mission data from Kepler, K2, Tess, ...
KOA API based architecture: web, python and custom interfaces on top of the TAP service. Allow data discovery through VO standard.
TAP server quite lightweight in python & C spatial indexing code.
ADQL to SQL converter currently supports Oracle, but SQL translation should help minimizing RDB requirements.
In first impl. no INTERSECT nor UPLOAD or JOINS.
Proprietary access (and download): cookie based authN. Modify query on the fly to show only accessible records.
Current status: extensively tested including PyVO, TAP+ & TOPCAT. astroquery testing uderway for KOA.
Code in a GitHub repo, adding docs & tests before releasing it (contact Bruce if interested)
Multiple DBMS support: Oracle done, PostgreSQL, SQLite etc, later, requires a different DB API.
Also other environments, like CGI through Apache for Oracle (currently) or NGINX based in the future.

TAP/ADQL feedback (J. Good):
    executionduration must be integer, but sub-second timing available
    polygon searches in ADQL: treat as convex hull

Albert M.: query modified on the fly for proprietary filtering. Easy for simple query, but for joins and subqueries becomes quickly complex.
answer: (lucky) we have simple tables. Collaborate on a solution for complex queries?
What about use cases with mixed private/public observation content.
KOA will have PIs to get data from one instrument per program, usually.
Pat D. in CAOM granting is part of the model, so they're part of the table. Conditions are put on columns w/o indexing, thus not loading a lot the cpu.

Tom D. about spatial code.
answer: it is treated outside the database.

Baptiste C.: how about setting up epn_core table/view(s) in TAP for your exo-planetary data?
answer: decision upon exoplanetary archive, need enquiry, also for solar system observations at KOA.
[EPN-TAP is on its REC path way, starting soon as a proper WD]

Christine B.: how hard to inject a join, try to avoid it, especially if it fails. It confuses user. Are you using the Oracle spatial library? ($$$)
answer: no, we have our own spatial library, since long. Support for joins could be more important to the community than supporting other DBMS-s (asking for opinions for that

Andrej B.: using C for spatial queries but not existing python ones, like healpy.
answer: not seeing the benefit.
- pixels? are they in DB or outside?
answer: DB keeps only metadata.

Anastasia G.: have their own python TAP running on PostgerSQL & MariaDB. Looking for the GitHub repo being made public. Our code: https://github.com/django-daiquiri , the tap functionality is in the daiquiri/tap package.
(answer) ask Bruce for accessing the repo.

Tom D.: how do you see the code evolving? public GitHub repo for use, or share-able effort devolpment?
(answer) looking for open development, working on CI, looking for the repo to be ready for the community to contribute.

Mark T.: did you consider existing TAP server codes?
(answer) yes, but they didn't do what we really wanted. Also TAPlib, but that's Java and we're not confortable with that language.

- used TOPCAT to test, used also STILTS validator?
(answer) didn't now about the validator within.

Tom D.: has anybody a sense for the best implementations or designs for streaming back large amounts of records?Caching & chunking, ...
C. Banek: XML is not great for streaming
Tom: it's also a matter of VOTable, not only XML
Mark: TOPCAT will stream without waiting for all elements closure
Bruce: we have small datasets, didn't went into streaming
Igor C.: did a couple of decades ago an experience on this. it could be the DBMS is not able to stream at all. Need to check for your engine cfg.
Dave: experienced delivering the response to Kafka stream. Interesting experiment. The client consumed that, not XML.
HEASARC stream out responses, but not felt any specific problem from the users.

Tom McG.: -missed completely the question-

Theresa D.: what's the status with NGINX?
(answer): future development, not yet started
Anastasia: using NGINX, not to different than apache, easier to configure. Available a container that also includes the NGINX part.

Dave: about the spatial indexing. Anybody intersted in testing the behaviour of the indexing solutions available?
Tom: suspecting minor mismatches, but would be nice to figure out how much _minor_.
Dave: try to collect the tests in a repo, would contribute?

Running Questions


1. DaveMorris : What license would the code be released under ? 
2. Anastasia Galkin: Would you please share a link to the GitHub repository? Or is it not there yet? (notes above)
3. Christine Banek: What if you’re doing a join on the same table ? (from chat)
4. T.McG: Has the TAP service been validated using TAPLint



Scige: just curiosity on datalink "walker":
1-even without loops...  isn't this potentially infinite?
2-maybe an history of "visited" link, and for keeping this low in memory usage, this list could be stored as hashmap?