AM : Alberto Micol
BC : Baptiste Cecconi
BM : Brent Miszalski
FB : François Bonnarel
GDF : Gregory Dubois-Felsmann
GM : Gregory Mantelet
JD : James Dempsey
JS : Jesus Salgado
MM : Marco Molinaro
PD : Pat Dowler
 

Introduction 
------------
FB: This is supposed to be a follow-up of this running meeting we had recently where I presented what was new during the last times in DAP and Soda. So today I send you the links to the previous material, and it's also on the page. It presents he current status, not of the merged PRs, but of the github "artifacts". So if you want you can look at the proposals. And also the link to the RM notes because some of the points were already discussed.

So what I want to do now is different, is to tackle the controversy point. Either coming from the running meeting or  coming in the meantime or some others coming before. 

There is a DataLink implementation note published on github just after the release of DataLink 1.1 in December; I don't think many people notice it, but we can discuss a few minutes if we can publish it. It's just a note.

- For SIA and DAP, I don't have a lot.

I think for DAP there is only one point which is the one step access to Soda URL where there are at least two possible solutions. So that's what I want to explain. 
- And then for Soda we have more.

* We have the MOC parameter proposal, which has been asked to be removed.
* The pixel parameter format, which is discussed.
*  The DP type parameter to reduce the dimensionality is also contested in some way.
* The RESPONSEFORMAT from and to HiPS, for which there is no real controversy, but I think we have to clarify a few things.
*  There is a new proposal and implementations also from our colleague, Robert Butora, who is online for alternative coordinate systems.
*  And then there was a question about Soda and the data model.

DataLink implementation  note
-----------------------------

FB : So the implementation note is something which was actually extracted from a previous version of Datalink because it was non-normative and it's actually doing two things. It explains how we can discover Datalink URLs outside the context of ObsCore, basically. And the second thing is giving some suggestions on how a data provider should fill the Datalink table fields.

So I don't know if you had a look at it. The only thing I want to know is if  if you think it's useful, if we can publish it as an IVOA a note. For example, Mark Taylor said that the second part where I try to explain how you could fill the Datalink fields is redondant with the spec. This was also motivated by a discussion I had in Tucson with Gregory and Trey. So I don't know.

Maybe you should just have a look. Or if somebody has comments right now, please speak now. If not, just have a look and tell me.

DAP discussion
--------------

FB : So now I can go to DAP. I think I have only one point for that, or two points. Gregory made the comment yesterday that we have in the current proposed PR, we have something which are errata, which have already been accepted.

So we should have a basic document with the errata accepted before going to that. So that's okay. I will do this ASAP.

GDF : Just so that the DAP changes are easier to identify. 

FB : Then the only question which I think is controversy is how to access the SODA cutout in one shot. So cutouts or transformation of the datasets or both existed in the old SIA-1.

Of course, with the development of SODA, Datalink and SIA or ObSTAP, we don't have this anymore in the current version. So I think it's a need which has been expressed several times. But how can we do that? So there is one simple solution which benefits from the fact,that the input parameters of SIA and SODA are basically very similar.

Very few differences, except that SODA actually presently has less parameters than SIA. So if you discover some datasets with the SIA parameters, you can immediately copy that to your SODA query. And so you will get first the discovery of the relevant datasets.

But at the same time, you can force a cutout in the same dataset you just discovered. 

And there are two ways to do that. 
   - First way is, well, I just copy past the parameters of the SIA query to a SODA URL, query URL, and this I put in access_URL FIELD.

The second solution I wrote also, as you can see in the current proposed PR, there are really two solutions alternatively.
    - This second solution is what I understood from Pat, who said, well, we have the service descriptor. So what we can do is add the service descriptor in the ObsCore response, but fill all the parameters with the input values from SIA query.

And just if you get this, from your client interface, you click, and basically you have the same result. 

- So that's the two solutions. Did I understood well what you wanted, Pat?

PD :  Yeah, roughly.

I mean, the advantage of the service descriptor is that the query can be for some region, and then you get responses, and then the cutouts are obviously going to be in that region, but there might be multiple of them. It might be smaller. So providing the service descriptor, that's the client to one query, and then multiple different cutout requests.

Depending on what they're trying to do. So and of course, this could be done now, and you could pre-fill the values, or you could put it in the options, like to put it in as kind of the way it's documented in data link or SODA with n-axes, something like that. So I kind of prefer the service descriptor because it gives the client the ability to do those things directly from the query response, but do multiple things, not just the hard one.

If you hard-coded in the URL, then that should be opaque to the client. They shouldn't really try to go in and mess with that. 

FB: So Gregory and then James.


GDF : So I have sort of an insight to share from some struggles we've had with this in thinking about like how Firefly should work and in the IRSA interfaces and in the Rubin interfaces where we've done different things for different reasons. When you do a cutout, so in the IRSA interface, in some cases, there's a place where you put in your search location, and then there's a radius of searching, and then there's a cutout size, and they're distinct things. And because of the way that SIA works where it's intersects, that's ultimately what it's doing, if you want to be sure that your target is in the image that you're going to get back, you're pushed towards having a small radius of the search circle in the parameter, but that may not be what you want for the size of the cutout.

You might want a big cutout. So I'm a little crazy about this mechanism in the first place, which is why I liked that in the draft of DAP, you added this parameter that would control whether it would do this or not, essentially, right? I forget, what did you call that parameter? I can't remember the name of it.
FB :  Retrievemode, I think.

GDF : Yeah, something like that, right? Came from SIAv1, I think, as a concept. So if you use that parameter, and you say, no, I mean it, I want you to make cutouts of the same radius as the thing that I gave you, then the data link point I wanted to raise is, should the this semantics be applied to the cutout at that point? Because that meant that they were asking for a cutout, and the way they get the cutout is with this. This is a thing that happens in Firefly right now in the Rubin interface.

You get back a data link table that tells you where the whole image is, and it also tells you where the cutout service is. I'm going to provide a service descriptor, and Firefly goes, oh, well, there's the this, so I'm going to go retrieve the this. And so there's the whole 50 megapixel image there in your session when what you wanted was a cutout, right? And so we need to think about, what's a good way to tell a client software what it should do so that the client doesn't have to do some kind of magic, do what I mean intuition thing to figure out whether it should load the actual image or the cutout.

Now, Firefly puts a limit, like if XSS size is bigger than something, then it won't try. But a lot of things don't have XSS size, so that's the thing that I thought about. And this may not be a very good, like, I don't know, meeting room discussion, but I was going to write something up about that for you.

FB : Okay, and James?

JD:  So I have a lot of concerns with the second solution. The reason being that we already have a way to do whatever we want to reduce an arbitrary cutout, and that is SIA, Data Link, SODA.

And we've got clients which work for that, and that is well covered. But what we don't have is a migration path from SIA 1. And if we want this to be an SIA 1 migration path, the clients that are currently using SIA 1 are expecting two get commands. A get command to the search, and a get command to get the cutout.

And I absolutely agree that in that retrieve mode where this is the cutout, you're, that what you should be doing is providing a mean to get to that particular cutout. And I mean, the one parameter that you then add in on top of the SIA 1, is then whether you want a PNG or a FITS file.

in the usual case. No arguments, there are other situations.
 So I actually don't see, I mean I can write a client for the second thing with no problems, that's fine, but I just don't see fulfilling the goal of having a replacement for that SA1 one-shot cutout thing. 

I really argue we should keep it really simple, keep it, and you know if someone wants to modify, I've absolutely seen a situation where you have one different search versus cutout radius, but if someone wants to do that they can just modify the URL they're working with. 

PD : So when you say modify the URL you're talking about that access URL equals that. 

JD: Yeah, yeah, and type the radius. 

PD : But do they know that's SODA or is it an opaque URL? 
JD : Well, yeah, 

PD : because at that point it can be just an opaque URL. That's right, yeah, absolutely. 

GDF : I think it's important that it can be an opaque URL.

PD :Yeah, yeah, and so modifying is not, but I understand the like the direct thing, so that means that when you query, you have in mind that you want to do, that this is exactly you want to do, so there has to be some sort of a mechanism to control what kind of access URL you get. Whether you get a download URL, a cutout URL, or a data link URL.

JD : Yes, yeah. 

PD : So what would you do, what do you do in the scenario where the row in ObsCore, normally let's say in our service where you would take the ID, call data link, and you get this and an auxiliary, say it's some sort of weight map, and ideally I feel like if the correct thing for the user might be to chop out, they would do a cutout in the science data, and they should chop out the equivalent piece of the weight map. And so to interpret that other thing, maybe, maybe not, like I don't know what they're trying to do, which is kind of my problem with pre-coding. 

GDF : That's exactly what the SphereX cutout service has to do, is exactly what you said. 

PD : Yeah, and so if you have that URL, how do I convert those two things? 

JD : Yeah, I think if we're talking about it's a simple to request URL, you're just getting back a FITS image or a PNG, and that's what the current SIA1 does, and if that's the part we're trying to cover, then I think it's probable we should try to restrict ourselves too. But I will note that, you know, in a similar note, the CASDA cutout service takes the FITS image, but it also appends the BEAMS table, for instance, that kind of image, so you've got a, you can interpret what the BEAM is, the different slices, so you can get an MDF and put it in there, or just a, yeah, just a multi-HDU FITS. Yeah, yeah. 

FB : Well, you are right. The motivation to propose the first solution was, I heard several times, and I feel myself that something from SIA1 was missing. Yeah. 
But first of all, for this behavior, as Gregory pointed out, will not be the only behavior, it's just on choice if you, like before in  SIA1, you had, what was it called, mosaic cutout or archive modes, so you have to choose which one you want. And second, for what Pat is looking for, I think if we have the retrievemode and get this kind of URL, this doesn't prevent us to, or forbid us to add the service descriptor in the same response, it is allowed. And in that case, if your main point is that the people have to be able to choose their parameters, it's totally possible. 

So in the classical way, you have either a direct link to the full dataset and the service descriptor can be there in the first response, or it can be in the data link response, both are possible. And with that solution, the service descriptor with three parameters could still be there, so people could still click on the access URL, get me back the cutout or go to the service descriptor.

JD :  And I have no problems with that, I think as long as we're just satisfying the path. 

FB:  And of course, the URL above will be there only on choice of user if you choose the retrievemode like this. 

JD :If I've asked for, give me a simple cutout, and then there is no problem with the simple access URL, which gives them that. I think we've satisfied the requirement. 


FB : Yes, Gregory?

GDF : I mean, if we're talking about trying to recover SIAv1 behavior, I think there are a lot of use cases in SIAv1 where I would have asked for a cutout to be made on the fly in this one shot thing, where I would have used a value of intersects that's different from the overlap one, which is the only thing that you can do in SIAv2. Because if your goal is I want every image in which I can find a hundred arc second radius thing around my favorite extended source, you can't get, that's not what you're going to get from SIAv2. You're going to get things where it's just barely at the edge, right? Because it always does overlap.

In SIAv1, you would have said I want the disk to be contained within the image so that the full area of the cutout is filled from that image. That would be a very common thing to do. And I wonder if we need to take a step back and say, I wasn't here when you decided not to carry that parameter forward to SIAv2. 

So I don't know what the argument against carrying it forward was. 

PD : So you're saying have a way for the user to specify contains instead of overlap?

GDF:  Yeah, yeah. Because then that makes this much more useful.

They work together. They're very compatible. 

FB : I think the motivation, our main motivation to go to SIAv2 was to have something closer ObsCore first.

GDF : Oh no, that of course. 

FB: And second, we wanted to tackle not only spatial parameters, but also band and time. 

GDF: No, I get what the positive reasons were, but I don't know why you didn't reach out.

FB : I remember fairly well, I was with Pat in all that. The counterpart was that we had to do something rather primitive, basic, in order to go fast. So that's the main reason why we scoped a lot of things, which were in SIAv1, because it would be too difficult in the context of all these dimensions.

Because SIAv1 was doing this only on the spatial axis, but we wanted to add the time axis, the polarization.

Pat :  I think the other difference is that at the point where we created this, we had ObsTap, which means that there was a way to do that thing, which was write a TAP query. And the more we add to this, it's just filling the space between SIA and TAP.

GDF : Well, okay, so there are a couple things there. So that depends on the data publisher actually having an ObsTap service.

PD : True, yeah.

GDF : Which is connected to a mental model that says, oh, there must be a database behind there that looks like ObsTap. But in fact, Rubin's in the middle of implementing SIAv2 over the Butler. And the Butler doesn't have these tables.


So it's making that stuff up on the fly, but it can still do the queries. And so the Butler is perfectly capable of doing the math to do contains and overlaps and intercepts and all those things. 

PD :I mean, it's true that that's one of the reasons to have, yeah.

GDF : And you wouldn't, you CADC, wouldn't be harmed by this because the way you implement SIAv2 is actually by proxying it over TAP anyway. And so you can just write the ADQL that corresponds to those relationships. So I kind of want to at least throw this out there that maybe if we're going to do this, it might be good to do it together with the intersects mode selector.

Because I think they work really, really well together. Should I write a proposal?

FB : Yeah. So is it enough for today for this one? So now we go to, I didn't have anything else about DAP where I see the transformation from SIA to more datasets was rather admitted.

And I don't remember any big discussion apart from that one. 

GDF : So I have a, like a data link intersection with ObsCore queries thing that, I don't know if there's any other business moment here, I could do it at the end. After this.

It's a concern that I have about how they work together. Now that that's something that occurred to me, I shared it with TAP already. It has to do with the format specifier.

FB : Yes. 

GDF: And I don't know if, would you like me to wait till the end or do you, or is this a good time before we switch to SODA? *

FB : Do you think it could be long? 

GDF: I think I can say it quickly and then we can decide to discuss it. Say it and yes, I would like.

Yeah. I think it's short. Yeah.

So the format specifier of SIAv2 is, and I'm not confusing format and responseformat, right? Format meaning the format of the dataset.
FB :  Yes. 

GDF: Is explicitly defined as having a relationship with the access format value in the hypothetical underlying obsCore table.

Right. Okay. That's where, that's what it's, I mean, it says that in the standard, it says that.

Yeah. It says that you're, you know, you're looking at access format to determine is it fits, is it parquet, is it CSV, whatever, right? But if your service like CADCs and Rubins uses data link, access format is always data link. And so you've rendered the format control completely useless, you know? And it's, the value of using data link is so enormously large compared to this problem that like I can live with it.

I mean, I could not build the Rubin interfaces I want to build without data link. But it's really a shame to not be able to have the user be able to say, I only want fits or I only want... 
FB : So, Marco? 

MM: This is something which appears to me and maybe wat I'm saying is a litle bit silly. Should the database inform how to map to the this?

GDF:  I know that's what I'm saying. Yeah. But that's not what the standard says. The standard is absolutely unambiguous.

That's something really... Yeah. The standard says it's against the access format in the table that comes back from the query. And the access format in the table that comes back from the query is data link, data link, data link, data link, data link.

It's useless. 

PD : I think we didn't expect DataLink to be so successful. 

BC : So you mean that there is nothing in the access URL, it's only data link? The idea... 

GDF No, no, no.

Access URL is whatever the URL for the link services. Yeah, the idea is... Format is just whatever the thing is. It's the application.

FB: The idea was... And I think Marcus who is in his bed, I think now. As in this service, in GAVO service, really as... For example, for images which are not too big. Yeah.

It's the image application slash fits or whatever. And for larger data set, he has the data link answer because he cannot provide them directly. Yeah.

GDF : Because that's what he wants to do, but we don't want to do that. We want it to use data link for every single image. 

FB : Yeah, I know.

So in that case, we have to change ObsCcore. 

GDF : Do we, or do we change SIAv2? Because it's totally useless, right? 

PD : It's just the meaning of the format parameter. Yeah.

GDF : I bet you no one is going to... This is part of the... 
FB : In SIAv2, this is part of the ObsCore standard. 

GDF: No, it's just the way it was written. It could be written... Was it you who said? No, it was you who said this.

You could just rewrite SIAv2 to say if the access URL and the access format are data link, then the format parameter is to be evaluated against the content type of the thing that's the this. It's very well defined. It's a completely unambiguous formal specification.

AM: One thing I don't understand. If a user sees and the machine sees this format, which is not a data link, but it's whatever it is, how will it know that the link is a data link or the link is directly a link to the... 

GDF: Because, okay, so we're leaving this difficult problem to the implementer. We might decide to add a hidden column to our ObsCore table, for instance, that says what the actual format is.

I prefer to add a second column. This is what I'm... No, no, no, but I don't want to tell people that they have to add a column to the OpsCore standard in order to do this. Yeah.

Right? I don't want to add anything to ObsCore in a formal... 

AM : That's kind of a confusion. It's a no, right? We're not downloading data? 

GDF : No, it's not. It's an implementation detail.

PD :It doesn't have to be in the table. It's odd if you query and your query says, show me things with block with image slash fits, and the response doesn't have image slash fits in it. 

GDF : No, but if the response is just data link, data link, data link, we know what that means.

PD : I agree that it makes sense, but it's odd when the response doesn't have the thing you were searching for because of the indirection. 

GDF : Okay, I mean, we could... It's still more useful than it is now. We could add something, but I mean, then the problem is the people who don't do this.

I'm looking at Mast . They  currently don't use it, or they're migrating to the point, I guess. They would be repeating it. You have two columns that would say fits, fits, fits, fits, fits, fits, right? 

FB :  There is something I don't understand, if you do that, how can you distinguish that the access URL is going to data link instead of... 

JS : But data link is a data point. It's a quantification. 

GDF : Nobody would ask that. I'm not making any... Yeah.

JS : I think there are two different points. We are mixing two differentb things in one single field. So one thing is what we are going to obtain on location of data link, and the thing is the link that is inside the ObsCore is a direct link or data link.

GDF : No, I understand that. That is what the problem is. 

JS : It's like what is missing is a new field in ObsCore the data link is inside this particular record.

JS : It's a direct link or it's a data link. I'm just saying the format for the fits or whatever is going to be produced at that location.

GDF:  But that's not that compatible.

What I'm proposing is... 

JS :That is correct. 

BC : EpnCore, we solved that so that the access URL should lead to the format that is in access format. So if it is fits, then it says fits.

And we have a data link URL that goes to the...

GDF :  Well, I understand. I mean, that might have been a good idea, but it's... It's not the way of solving what people do or have to do. It is correct.

GDF : It would be wrenchingly non-backwardly compatible if you changed what ObsCore... 

JS : It's a matter of... 

JS: So maybe sometime you have to modify the standard. I think this is...

GDF :  I'm only trying to change SIAV2. I'm not trying to change ObSCore.

And then there's something here. I just don't want to drive like that. But I find this true.

AM : If you solve the data link as a format and the same column, what is the main... The real format of the FITS.

GDF:  If that's not going to break all existing clients, that's maybe... What do you think about that, Pat?

PD :  I mean, I've thought about changing the access URL and the format there to... 

I just thought about changing the access URL to point on our package service and have the service descriptor to get to the data link service. Because currently they both go to data link.

GDF : So you would not be able to do that because you wouldn't be able to have different data link endpoints for different missions. 

PD : Yeah, that would be... That's harder to refer back from service descriptors. There's no... 

GDF : I mean, I don't think there's any way.

We need to double that direction. 

PD Yeah, possibly. Possibly.

GDF : Yeah.  I think that would be...

 PD : You couldn't do that. But if I did... Say I decided I'm going to output a tar file of all the files, then people don't want to search for tar either.

They want to know, is the thing in the tar Fits files or HDF5 ones ? 

GDF : Yeah, so you have the same problem. 

PD :So there can be other levels of indirection where there are formats that are intervening that aren't meaningful that nobody cares about. And that's the problem here.

That's really just... 

GDF : Well, maybe we need a data product format column. 

JS : I just think there's a prototype that's using the data link but shouldn't be forced to use the data link. 

Because in fact, both the data link and the link, there are different options.

And there are different formats. So there is not a format that we can put in the ObsCore table because it is inconsistant. 

GDF: No, I don't understand.

We said specifically the format of this. 

JS : Yeah, but in the data link, there are different options inside the data link. We'll have to access the data table.

FB : This is not unique. So we have to... You mean if there are multiple...

JS :  It's not unique for all the different links. 

GDF: You mean if there's more than one this? 

FB: Oh yes, you can have... Yeah.

GDF : So in this case, it makes sense with the... Well, but then we could define it to mean if you have multiple this is if any one of them matches then you get the whole thing, you get it. It's not... It shows this. But like, I mean, I don't know.

I know past use case for multiple this is that they're all associated with the same observation. Yeah, I think they're all parts of it.

PD :  Like someone decided to tile the thing into multiple files.

GDF : So I promised François that we would not take a lot of time. So we know what the problem is. And I think this is a very real problem.

It makes it for our services. It makes the format parameter useless.

PD :  Yeah, Marcus kind of proposed that searching that as a query parameter isn't practically very useful because people will find out what the format is and decide whether or not they want to download it later.

So searching for Fits isn't sort of... It's not a scientifically motivated part of the query. So I thought Marcus... 

GDF : I would  notagree with that. Like searching for parquet is an extremely useful thing to do in the context of like Mario's thing.

We need to move this into the list and talk about it in a running meeting. 

FB : Yeah, so please write issues or emails. 

GDF : What is the preferred thing you say? So email to... 

GM : You can  create an  issue and send an email  to... 


GDF : I'll do that. Okay, all right. That's great. Thank you for the time. 

SODA
-----
-----

FB : So the Soda discussion. 

Dropping MOC
---------------
So currently  in the current artifact, there is a proposal for a MOC parameter.

I think during the running meeting, many people, including people who are here, said it was useless, too difficult to manage and we should drop it. So just... Well, the cons of removing it, which is the core of the parameter, the only thing is that it was supposed to be... To provide cutouts on top of... By reusing responses of other services, like we do for the other parameters. But I think people thinking to implementation said that this kind of matching could be done by the client instead of doing by the server.

And I see Marco was one of them. So should we drop it? Everybody thinks we should drop it? Okay, we'll drop. 

The pixel cutouts.
-----------------------

This is something also which we discussed at the very beginning and said, well, we will add that in the second version. The main point we were afraid of is the syntax. So currently, I've written something with the CFITs syntax, which is well known.

But Markus made the contest that it's difficult with that to provide ranges. So if I understood well, because he didn't write anything, but I think what I caught what he said is to have as much pixel parameter as we have axes. And just write the intervals there.

This will allow to have in the service descriptor the values, the possible ranges. And indeed, because this use case came from enough colleagues, the possibility to skip, to take one pixel upon N, or in that case, upon two, to reduce the size to have some quick view. So that's what I understood.

GDF : The two is a downsampling? Sorry? The two in the square brackets. 

FB : You pick one pixel upon two. It's downsampling.

So this functionality has been implemented by Robert, who is online. The Galactea prototype used also for SKA. 

GDF : And these are FITs pixel numbers? They're one based? Sorry? Are these FITs pixel numbers that are one based? 

FB : Yeah. So are we able to? 

PD : CFITs IO syntax also lets you select HDUs, which needs to, and generally, for formulating some kinds of cutouts, if the piece that you want spans multiple HDUs,
then you have to specify the HDU and the range of pixels in that HDU, and then a different HDU and the range of pixels in that one. So separating these in different parameters means that you can only, well, you need to specify the HDU, and you'd only be able to cut out of one HDU per request. So CFITs IO syntax does a lot more than that separate parameter formulation can do.

FB : My counter argument to Marcus was not that one, which I think is a good point. It was that, in any case, because we propose also this metadata parameter, it would be able to retrieve the FITS header, the WCS. And so, we will have the information on the ranges of pixels anyway.

PD : Yeah. 
FB : So let this open at the moment, but not a great input for Marcus' solution here. 

PD : For people with multi-extension FITS files, the separate pixel formulation is really substandard.

GDF : So you're suggesting if someone used that CFITS-IO syntax, what they would get is a cut-out with the same number of extensions as the number of extensions that they touched, rather than some mosaic or something? 

PD : Yeah. Yeah.

FB : So another question came by Robert, who is still online. Bravo, Robert. Yes.

He asked me, what would happen if we have in the same query world coordinate parameters, POS or CIRCLE, BAND, and the pixels? So my guess is that it would be like when we have multi-parameters. So the current specification reads that, in that case, it's optional for us. So not all the services do that.

And if they do it, we have a question of how we manage these responses, which are basically several cutouts. And there is a github issue on that. So pixels are, in some way, another representation of this filtering parameters.

So it's the same issue, actually. So enough for this one. James, sorry.

JD : I was going to say, well, currently, the CASDA service has channels as another way of pulling out on the spectral axis. So if we put a 3,000 channel cube, after a slab of 100 channels. So one use case for if we go to the pixels part is providing more coordinates for the target, and then providing pixels for the channels you're up to.

PD : So then you're kind of coupling. But you want a couple of those to be together, then. 

JD : It's an add, yeah. They're not just two separate. Yep. You don't want one, sort of, one way and one risk.

Exactly, that's right. Yeah, that's what you don't want. There would be surprises in that case.

FB : Yeah. What I remember about this question of pixel cutouts came immediately when we discussed the first version of SODA. And the decision was to push that to the next one, just because we were afraid to have too much discussion on the syntax.

 Is it true? 
 
 PD : Yeah. 
 
 GDF : I just want to say, for multiple, for Rubin and SpherEx, we do need a solution to this. And we'll just animate one if this doesn't come along.

So it would be great to have a standardized way to do it. 

JD : Is that pixels possible coordinates, or just the pixels themselves? Do you think that's the solution?

GDF :  Well, I'm assuming we already had coordinates. 

JD : Yeah, we already have.

GDF : Yes, yeah, I think I got it. Yeah, so. 

changing the Data Product type
----------------------------------

FB : The other discussion which came from the running meeting was the DP type parameter, which is also borrowed from SIA.

What does that mean in the SODA context? We already have services. But the spec is rather fuzzy about this. But I think the expectation is that if you are cutting out in an image, you get an image. If you are cutting out in a cube, you get a cube. But we have services which actually reduce from a cubne to either an image or a spectrum, or a time series if it's a time cubey. So the current proposal, the artifact, reads that the DP type parameter allows to reduce the dimension by picking up corresponding coverage, so the spatial or the spectral coverage and summing inside.

And Markus said, well, I don't like this DP type. Just add a summation parameter or sum. I don't know exactly what he said.

And I don't remember why he didn't like the DP type. I don't know if somebody remember that. That was in the running meeting.

GDF : So the pixels parameter, if you could wildcard axes in it, I don't know the CFITSIO parameter by heart. 

PD : Yeah, you can. 

GDF : Then you could do this with pixels.

You would just say 1 to 1,000 colon 1,000. And right? In pixels. Yeah.

And you know the size of the image because you've got the image reference from ObsCore in the first place. So you know what the XEL or whatever is for that axis, the spectral axis or whatever. I mean, it's not very direct.

But it's a capability that's already there. As soon as you have that and you have pixels, you can do it. You don't need an extra parameter.

PD : I think that the third number in the CFITSIO syntax is striding, not to have to say. 

GDF : Oh, that's not what I understood him to say. Oh, no.

FB : Yeah, it's picking every nth pixel. 

GDF : Oh, I thought it was downsampling. It isn't.

Because you'd have to specify whether you're summing or averaging. I misunderstood your downsampling. That's what I meant to say is what's striding.
 OK, well, that actually changes my attitude towards the thing in the first place because I think downsampling is more useful than striding. Yeah, it's just that there are multiple mathematical operations in downsampling. So I don't know if we want to pick one because it's like picking wavelength in meters, right? If we pick one, no one will be happy.

We have. So that's fair. 

FB : It's not in my slides, but there is a proposal to reducing or changing the resolution.

But I didn't know that it was controversial. I didn't add it in ly slides.

GDF :  No, I mean, that issue is real.

PD : Absolutely. Yeah, I anticipate that there isn't one mathematical operation to downsample. So we need to pick the one they like.

FB : Yes. But it was not done. Yeah, this DP type is really to pick up a region and you build a spectrum by just adding this.

GDF/ So we've got the same problem just with this, right? 

PD : It's an abstraction. But saying I want an image out of that cube is an abstract way of saying downsample the third axis. Right.

But not saying how. And so people won't really go for that. 

GDF : But that same logic would apply to having it be a factor, right? Yeah.

So if you're allowing it here, I mean, to be consistent, it should be allowed there as well. Yeah.

FB :  So again, if you see some inconsistencies, please write


GDF? :The nice thing with DP type, so this is my point of view, what I want to use instead of summation or whatever, is that actually what we are doing when we, it was already the case. 

FB : Oh yes, it was already the case for SODA 1.0. You are actually forcing the ObsCore parameters of the output of the response. Because the result you get has its own, it's a new data set.

It has its own ObsCore description. And what we are actually doing is forcing that. So DP type is related to data product type.

And in that case, you are forcing the output. So it's also consistent with ObsCore. 

GDF : So like if I started with a cube and I put DP type spectrum, I would be saying I want a skewer through the cube that gives me a- Yeah.

JD: The problem with this is that if you look at, say, DP type in ObsCore, you've got image or cube for the quantum meters at this point in time. So there's others as well, or spectrum. But if you look at, just to take our service as an example, we then go to, for an image, we'll then go to data product subtype.

And we might have zero maps, moment one maps, moement two maps. We might have weights maps and those sort of things. So there's a whole raft of things that have come out of this that we calculated from that cube, noise mapping.

And so the DP type is, let's say it's necessary, but not sufficient. It gets you part of the way, but doesn't get you all the way. What actual action you're trying to request.

GDF : I mean, I don't think there's anything wrong with our individual sites adding optional parameters to our services we have. This is just supposed to be a common core.

JD : Yeah, but the problem is that it doesn't specify what you want.

GDF : It says, I want to take a cube, I want an image out of that. But what image do you want? Well, no, I'm saying on your service, your service could have a data product subtype parameter. 

JD : Sure, absolutely.

But I guess I'm saying is that I don't see you can build a service which just takes I want an image from this cube and gives you back a result of the size of the image. 

FB : Currently, what I wrote, was inspired from the fact you had one example which you provide. I have seen your hand up Jesus

 You have in your service, in your CASDA service, you have example where you provide both the cutout which is a cube and the cutout which is a spectrum. Yes. And when I produced this text, I was thinking to your service.

JD : Yeah, I understood. 

FB : I see it exists. And what is written at the moment, it's just you sum up.

So it's the very basic thing. And for channels, summing up some channels, including everything, maybe it makes sense, I think.

JD : You could probably, I mean, you could define them, I guess.

JS :And then the question is that if there are plans of not doing looping, so the operations that are not cutout as such, that implies that a person that could even change the format of the output of the sample operation because it will not work. 

GDF : Yeah, some kind of operator parameter. 

JS : In principle, again , so that is not only for cutouts.

It's for anything, yeah, any transformations. In this case, so that is described even the signature of the cutout, how to do a cutout in the test. And maybe we need to be able to do that.

Basically, telling how to describe the API operation that you want to invoke. The input and output maybe could be the one missing the selected. And the output could be something that you can express.

But what you try to have is everything. So this is important because it's the starting of this kind of problem, this kind of path. So you can try to do a lot of cutouts from cube to cube and image to image or something.

And so whatever, but to be able to even to change the format of the operation implies a different format.

FB :  We have something for format, response format, yeah.

JS :  But even also the kind of thing of the API signature because you can discover at the cube and you can discover that it's an operation.

So the structure of this data cube, you can integrate like a soda thing. But you would need the user even to decide the parameter to be used to do the structuring. So there are things you can do.

FB : Yeah, but we have to see to compromise. 

js : So cutout only is definitely too low.

fb:  And then when we design soda 1.0, we already know that it was also the same thing.

We had to provide the minimum as soon as possible. It was already difficult. So, but we expected to give more functionality.

So we don't have to try to standardize any kind of server-side operation you can do. I think it's too heavy to standardize everything. 

Js: No, but that was the way to describe.

So I think that idea is instead of keeping the document or the different operations, try to describe how to describe it. Yeah. Okay.

GDF : I had a similar use case to his. In the Rubin world, there are a lot of data products that we only keep around for 30 days after they have been taken and then they get deleted. And then we'll have services which we haven't built yet to recreate them on demand.

And 

so I need to implement these by about a year from now. And one example is start with a calibrated single epoch image and produce a difference image from it. And that's exactly that data product subtype use case because in fact, those are, that's a data product subtype in the Rubin world.

And so that might be a way to do it. Now, I had not actually thought of doing this with a parameter, I think. I was just going to have a service that just does that.

You know, it takes some delete parameters, but it's just, it's the only function of that endpoint is to compute difference images. But it's an interesting idea to do it that way. I think I'm a little bit on Francois' side though here.

I'm like, I'm worried about trying to make some kind of metalanguage that lets us describe like all those use cases that we've fantasized about for Soda because they really are very different from each other. I almost like to put the effort in DALI, maybe just to have a richer set of standardized parameters that we could draw from. When building unique services, you know, which typically like our cutout or our data product recreation service is going to be a UWS service.

And then it needs a bunch of parameters to tell it what to do. And the more I can pull those from the DALI vocabulary, the more likely it is that IVOA will be able to have an easy time invoking it. So that's kind of the direction that I would go to do these more complex things, rather than to just define them as capabilities of Soda.

FB Yes Marco. 

MM : Yeah, just a quick clarification. So the issue is that on one side, the data set will never be deep, so you want to do some operation, server-side operations in Soda, and you'll have something else.

And so one parameter is not enough. Possibly we need a couple of them. Can we restrict the cases where the data set itself plus these two parameters can at least address some initial use cases? And the second parameter would be a pointer to a controlled list of operations that you want to do, that are described somewhere else.

But it's somewhere else that describes these operations, the same or greater than one. So the list of operations, just to start, that can take not too many parameters, something that will be inconsistent with everything that you are cutting, slicing, transporting. 

FB : I think I remember before we separated SIA and Soda, there is a very old version of SIA2 written by Doug Toddy, where there is some language like this.

MM : It was possibly a method, and then something that's related. But- 

FB : Do you remember? We may have a look. I don't know.

I think I can find it again. 

MM : Because what I'm worried about, maybe I didn't understand the point, is that if we rely on data products, suite types or suite types of some sort, that gets only to the point of the data provider and will never get this kind of information. But if the providers use the same operation, but then makes it for a list that is understood by everybody.

FB : But James, for your service, which produces a spectrum on a given special, you are just summing? Sorry, we're just- Are you just summing all the spectra in the array?

JD   I think so. I've got a vague recollection from where I've been that there was a beam aspect to it as well. But it's been a while.

So I'll have to look. And in all fairness, I will say I don't think that service is very heavily used. Because the method in which you get a spectrum out is very, very important for science.

And unless that's documented properly, and I'm quite happy so I haven't done that, then people are not going to trust it. So, yeah.

BM(?) :  Could I add, so the ESO HDRL library, the high level data reduction library, has a lot of methods for taking cubes and turning them into images.

There's different ways of collapsing them with different statistics and things. So, unless you want to generate all those different options, it can get pretty complicated. 

JD: What was the library?

BM(?) :  HDRL. High level data reduction library. 

GDF : So like, some, mean, truncated, medium, all the obvious things. But yeah, it's clicking in there or... Yeah, same, yeah.

MM :I don't know what you're thinking. I mean, we don't need to go to the world. We can start with what we need.

GDF : I mean, one possibility is to say what we're trying to provide here is a, I don't know, it's up to the data provider to use their best judgment for what is best for their data in responding to this and to document it very well, as you said, right? So that when you go to that service, there's a link to documentation or something that you know what you're going to get. But it's your discretion. And one of the reasons I like the double indirection of data link is that you can have a different service for every mission, right? You can have the WISE cutout service and the UVEX cutout service and the SpheroX cutout service.

They all do different things, but they respond to the same parameters. That might be something where we would then ask, is that, leave it up to the discretion of the data publisher. Does the community care enough for us to want to put in the work to make that happen? Would they trust that we've chosen good defaults? I guess then you could say, well, your mission-specific service can publish in ways we've already defined in IVOA what its other options are.

They're just, they're not interoperable options, but at least they're discoverable. 

JD : I think the comment that was made about the five-year hours is an important one. That's one of those things that might combine, it might be used by users from a large number of different data providers.

If we, to be successful, we'd have to be able to make things easier for five years. That's the example.

controlling the RESPONSEFORMAT
---------------------------------

FB :  Okay, so this one, RESPONSEFORMAT,  strongly coming from CDS and also in the context of the SRC network.

We already, so the response format is what you should be able to choose. So basic things, instead of application slash fits, you could have PNG or JPEG. So I think this is understandable.

If you're, another use case, which we already prototyped at CDS on top of the so-called HIPS-to-FITS service. HIPS-to-FITS is not an API, which is standard, but we could build a SODA interface on top of this. So you discover in some way HIPSes, which are specific spatial data sets in some way.

And from that, you produce an image, a FITS image. So, but the reverse can also be true. That is that you discover some cube and you want to produce on the fly an HIPS on this.

This second service we already have, THomas Boch implemented it. And we also think it should be interesting to have a SODA interface on top of this. This means that in that case, it produces first the basic, first channel of the cube at a low resolution.

And then you can download and on the fly get whatever channel and whatever resolution you want at some place in the cube. And if you want to manage that with a SODA interface, you do that through the response format. And what you get is just the URL to the head directory of your produced HIPS.

BM :  We've done some internal dynamic generation of HIPS at Data Central. And it's really fantastic for web services.

So in terms of the response, it might be useful to also add, for example, the order of the HIPS that is there, whether it's FITS or PNG. There's a few separate extra metadata that would be useful to return as well as the directory. So that, for example, if you're sending that response that you get to AladinLite or something to generate the HIPS, instead of having to pass the properties file, you want to just use those main metadata to make it just as a convenience thing.

 But it's a fantastic idea and I really support it. 

FB  : Well, what is really fantastic is to have such on-the-fly HIPS things. Here, the proposal is having this in the context of SODA by just finding out the appropriate parameters to generate that or to run that behind.

GDF : I'm just asking, thinking out loud, what do you get from standardizing this that you don't get from just returning a self-describing service descriptor in a links table that describes this service and gives all of its parameters? Because that's how you would discover it, anyway, probably, I think. 

FB : You will discover, well, you discover a huge cube and-

GDF :  Right, and then the data publisher- 

FB : For example, with  ObsCore. Yeah.

And one of the services you provide is this on-the-fly generation. 

GDF : Right, but it's absolutely useful, but you can already provide a service descriptor that's self-describing with a lot of information about all the parameters that it has and all sorts of options, the things you were talking about, you know, mean, median, sum, all those things that might be relevant to this. You could document those in the service descriptor, things that we would never standardize in SODA because they're just way too down in the weeds.

So why- I'm not being negative. I want to understand what's the extra value of it actually being standardized? Because Firefly already would give you a nice little form to fill out with all those parameters. It makes menus out of the options and everything just straight off of the service descriptor.

So, and PyVO does similar things. 

FB : Well, in the document, we already have all the parameters for the spatial resolution ,for the projection, but in that case, the projection should be only HPX, of course. So I thought it was possible to standardize this.

GDF : No, we could. I'm just wondering, like, what do we get from standardizing? I mean, I need this for SpherEx too. We're going to build something very similar for SpherEx.

So I'm happy to participate in the standards process, 

FB : but- I mean, I may be short  in the answer. Except that, well, that was in the context of the SRC network where we saw that SODA is the generic term to provide this kind of operations. And it seemed not too difficult to do, but maybe I missed something.

GDF : I'm not against it. I'm just, like, I'm wondering if, like, we have to then have a discussion about how we meet your needs and SpherEx's. And by the time we actually come up with the standard, which we can do.

FB : Of course, we can already do it with a simple service descriptor. That's true. The same for HiPStoFITS. 

So, I don't know for this one.

JD:  I mean, HipSToFits is going to respond to Gregory's comment. 

 The head directory actually includes all the files, and it includes the public spot.

BM : Yes, that's what I'm saying. So it would be included by default if you didn't have that. Yeah, but the thing is, if you have some web service, you don't want to necessarily pass the properties on to work out those very core properties.

Like what the index level is, what format it is. Because to load that up in a lab at night, you need that. Oh, you need to supply those.

Yeah, gotcha. That's why. So they're not necessarily optional metadata.

FB : Yeah, but at the moment, well, it's not with the SODA interface. But Aladin Lite, in the context always of the SRC network, already accesses such an on-the-fly service on top of Qt. You know, there is a small subset of test data set coming from Pathfinders.

We play with that all the time. So the same data sets where Robert is playing with the new SODA, we are also playing with on-the-fly HIPS and things like that. That's really cool.

INAF and CDS are in the same prototyping team. So we play together. And Jesus looks at us.

If we are doing stupid things or not. 

JD : Just don't even go to that second slide. Yeah, there is.

FB : I think I have two slides. So this one is the coordinate system change, also introduced by our INAF colleague. Robert is still there.

RB : Yeah, sure. 

changing coordinate system
----------------------------
FB : So, it's not in the current document because this came rather recently. Proposal for having a POS-sys and BAND-sys parameters to change the spatial params coordinate system and the spectral quantity. Currently it works with ICRS and Galactic.

There is also a grid but I must confess I didn't understand what it was doing. And bands, but Marco or Robert can explain us. And there is also a band-sys which is very important I think.

Alberto is still there? No. Alberto Micol? Because he had strong demands for ESO to have a capability to extract to make cutouts on velocity cubes where the spectral axis is in velocity. So, with that functionality developed by Robert, you can really do it.

So, we have to think if we have to standardize. So, the first thing, the special thing is you should be able to provide the controls for your cutout in the galactic coordinates, for example. And the second thing is you should be able to give velocity limits for providing your cutouts.

Oh, it's to change how POS is interpreted, not to change the WCS to be a galactic coordinates. Yeah, the input parameters. So, I must confess I am a little bit reluctant to have these new parameters and try to see if we could have the same functionality with the current parameters.

So, for the position, we already have a string type parameter which is the POS. Because POS, well, we have several parameters for the spatial constraints. So, either we have purely numerical parameters such as circle, polygon, etc.

But we kept also the POS one. So, maybe we define a little bit the POS and allowing to have the coordinate system inside is enough. You see, it's already a string. 

Yes, James? 

JD : There's a footnote in SODA 1 which says about POS. The future version of the specification may allow the use of other reference systems, specifically making systems of data. 

PD : Now, in that context, sorry, I wrote that.

JD : No, it's okay. It's a chronic chapter for us. Yeah, we've just used that as a, I don't know whether cover is the best word, but as a means of providing in Galactic coordinates for cubes which are in Galactic coordinates themselves.

And so, allowing cutouts of those cubes. 

PD : So, I think what I had in mind there is that, so the files themselves have their kind of native coordinate system. And if you could tell the user what the native coordinate system is, then you really just need one flag that says, I want to work in the native coordinates of the thing.

And that potentially reinterprets any of the cutout parameters to be in the native coordinates of the data. So, you have to be able to tell them what those are, including the units and everything, whatever's in the bits WCS. So, if it's a velocity cube, then you don't have to say it.

GM : This is something that really reminds me of what happens in ADQL when duplicating. And we decided that, well, it also reminds me of the STC-S syntax. Yeah, yeah, I kind of got it.

PD : Really want to bring it back. Yeah, exactly. So, but there, you're working in a tap service when you're querying, you're working in the metadata tells you what the quantities are.

And you're working in the native quantities. And that's sort of what this would do is let you work in the native quantities instead of the interoperable ones, but not all of the other arbitrary ones. 

GDF : So, we would define, like, for instance, that POS would leave the constraints on the numerical values of the POS parameters when the native switch is in place.

Yeah. Because a lot of the coordinates have a different convention for the range. Right.

PD : So, yeah, by default, it's ICRS and wavelength and MJD. Yeah. But if you put native equals true, then it's whatever the data is.

You have to be able to find out what that is. 

GM : I would tend to go for the solution of having POS and POS is separate as it is actually in tap. As you said, it runs in the data.

This column is expressing these questions. So, that's all right. You say.

FB So, if Marco has an answer.

MM :  I think I inferred that they did. But as far as you're able to pay, well, then it's good.

Because in this case, the difference between or other discovery services and that's a different thing. While using POS-systems, it's not actually robust, it only has a different name, but that's the change. You have to, you know, have a way to convey to the client what kind of coordinate systems you are supporting, at least.

And it lets the user ask maybe something that creates an issue on the outside, transforming, whatever. So one of the things that a lot of people ask is that you might have an equatorial north pole where you don't have a simple bounding box. 

GDF : I guess I'm worried, like, if you think about P3T and trying to write rigorous specifications, if you turn the native flag on, you're basically saying all bets are off with what the valid value of pos is. 

So I would be happier if we put an opaque native coordinates cut parameter, and you can put any string you want there. And if you do that, it's interpreted in whatever the native system is. But you don't reuse pos because you've destroyed the meaning of pos. 

If someone has some bizarre coordinate system, how do you know that circle is even meaningful? Or range, right? It's not even defined. If it's in velocity, it doesn't have a finite range. 

MM : You don't understand it right. 

You want to keep it as it is, what we have, but introduce an extra parameter that is sort of freeform, and it's the native. Well, it's not freeform. It is freeform because what you've already done when you say native true is you've just turned pos into a free format.

You've already gone, like, I'm sorry, I don't know what to tell you. 

PD : You've changed the limits, for sure. 

GDF: There are all kinds of other coordinate systems. 

There are some that are unbounded. Like velocity. It's not bounded.

Sure. It just seems like we've already introduced a parameter native. You just say native equals and then whatever this thing is.

The same string that you want the user to be able to put in pos, if you say native equals that string, it does that. And then pos still has a rigorous definition.

FB:   For velocity, I didn't make the point here, but I think I said already to Robert that we don't know which is the reference frequency. 

And I think the answer of Robert is it's in the FITS header, but it's not enough for me. We should, if we do this way, we should have this somewhere. 

EXtracting WCS, metadata
------------------------

PD : Yeah, I think that has to be in like, do you have a slide about extracting WCS? 

FB : I have not a lot about this.

We had already, because there was no discussion recently about this, I propose a metadata parameter, which does basically what you have already in your service. I remember you have this in several years. So for feed setter, but I imagine also that for metadata, you could come from another starting point than SIA or Upstart. 

And in that case, you would be happy to get the ObsCore description of the dataset. Imagine you find your soda in a link in a publication or whatever, could be useful to get ObsCore, or CAOM. So these are metadata. 

The data model for the data, this is something else. I don't know if we should force that for the full dataset if there is a data model for the representation of the data.

PD :  This I'm less certain about.

I mean, FITS headers or HDF5 metadata extraction could be sort of a low-level function that could be implemented. 

FB : The FITS headers definitely are something we should have. Well, if we are starting from other things than OpsCore, I think ObsCore is also useful. 

Of course, if you are coming from ObsTap or SIA, it's not useful. You already have it. So that was my last slide.

We have a recording, so I will try to make I don't think anybody had notes. 

GM We tried to give some notes. I think it was hard

FB : So I will try to make something with the recording too.