AM : Alberto Micol BC : Baptiste Cecconi FB : François Bonnarel GDF : Gregory Dubois-Felsmann GM : Gregory Mantelet JD : James Dempsey JS : Jesus Salgado MM : Marco Molinaro PD : Pat Dowler Introduction ------------ FB: This is supposed to be a follow-up of this running meeting we had recently where I presented what was new during the last times in DAP and Soda. So today I send you the links to the previous material, and it's also on the page. It presents he current status, not of the merged PRs, but of the github "artifacts". So if you want you can look at the proposals. And also the link to the RM notes because some of the points were already discussed. So what I want to do now is different, is to tackle the controversy point. Either coming from the running meeting or coming in the meantime or some others coming before. There is a DataLink implementation note published on github just after the release of DataLink 1.1 in December; I don't think many people notice it, but we can discuss a few minutes if we can publish it. It's just a note. - For SIA and DAP, I don't have a lot. I think for DAP there is only one point which is the one step access to Soda URL where there are at least two possible solutions. So that's what I want to explain. - And then for Soda we have more. * We have the MOC parameter proposal, which has been asked to be removed. * The pixel parameter format, which is discussed. * The DP type parameter to reduce the dimensionality is also contested in some way. * The RESPONSEFORMAT from and to HiPS, for which there is no real controversy, but I think we have to clarify a few things. * There is a new proposal and implementations also from our colleague, Robert Butora, who is online for alternative coordinate systems. * And then there was a question about Soda and the data model. DataLink implementation note ----------------------------- FB : So the implementation note is something which was actually extracted from a previous version of Datalink because it was non-normative and it's actually doing two things. It explains how we can discover Datalink URLs outside the context of ObsCore, basically. And the second thing is giving some suggestions on how a data provider should fill the Datalink table fields. So I don't know if you had a look at it. The only thing I want to know is if if you think it's useful, if we can publish it as an IVOA a note. For example, Mark Taylor said that the second part where I try to explain how you could fill the Datalink fields is redondant with the spec. This was also motivated by a discussion I had in Tucson with Gregory and Trey. So I don't know. Maybe you should just have a look. Or if somebody has comments right now, please speak now. If not, just have a look and tell me. DAP discussion -------------- FB : So now I can go to DAP. I think I have only one point for that, or two points. Gregory made the comment yesterday that we have in the current proposed PR, we have something which are errata, which have already been accepted. So we should have a basic document with the errata accepted before going to that. So that's okay. I will do this ASAP. GDF : Just so that the DAP changes are easier to identify. FB : Then the only question which I think is controversy is how to access the SODA cutout in one shot. So cutouts or transformation of the datasets or both existed in the old SIA-1. Of course, with the development of SODA, Datalink and SIA or ObSTAP, we don't have this anymore in the current version. So I think it's a need which has been expressed several times. But how can we do that? So there is one simple solution which benefits from the fact,that the input parameters of SIA and SODA are basically very similar. Very few differences, except that SODA actually presently has less parameters than SIA. So if you discover some datasets with the SIA parameters, you can immediately copy that to your SODA query. And so you will get first the discovery of the relevant datasets. But at the same time, you can force a cutout in the same dataset you just discovered. And there are two ways to do that. - First way is, well, I just copy past the parameters of the SIA query to a SODA URL, query URL, and this I put in access_URL FIELD. The second solution I wrote also, as you can see in the current proposed PR, there are really two solutions alternatively. - This second solution is what I understood from Pat, who said, well, we have the service descriptor. So what we can do is add the service descriptor in the ObsCore response, but fill all the parameters with the input values from SIA query. And just if you get this, from your client interface, you click, and basically you have the same result. - So that's the two solutions. Did I understood well what you wanted, Pat? PD : Yeah, roughly. I mean, the advantage of the service descriptor is that the query can be for some region, and then you get responses, and then the cutouts are obviously going to be in that region, but there might be multiple of them. It might be smaller. So providing the service descriptor, that's the client to one query, and then multiple different cutout requests. Depending on what they're trying to do. So and of course, this could be done now, and you could pre-fill the values, or you could put it in the options, like to put it in as kind of the way it's documented in data link or SODA with n-axes, something like that. So I kind of prefer the service descriptor because it gives the client the ability to do those things directly from the query response, but do multiple things, not just the hard one. If you hard-coded in the URL, then that should be opaque to the client. They shouldn't really try to go in and mess with that. FB: So Gregory and then James. GDF : So I have sort of an insight to share from some struggles we've had with this in thinking about like how Firefly should work and in the IRSA interfaces and in the Rubin interfaces where we've done different things for different reasons. When you do a cutout, so in the IRSA interface, in some cases, there's a place where you put in your search location, and then there's a radius of searching, and then there's a cutout size, and they're distinct things. And because of the way that SIA works where it's intersects, that's ultimately what it's doing, if you want to be sure that your target is in the image that you're going to get back, you're pushed towards having a small radius of the search circle in the parameter, but that may not be what you want for the size of the cutout. You might want a big cutout. So I'm a little crazy about this mechanism in the first place, which is why I liked that in the draft of DAP, you added this parameter that would control whether it would do this or not, essentially, right? I forget, what did you call that parameter? I can't remember the name of it. FB : Retrievemode, I think. GDF : Yeah, something like that, right? Came from SIAv1, I think, as a concept. So if you use that parameter, and you say, no, I mean it, I want you to make cutouts of the same radius as the thing that I gave you, then the data link point I wanted to raise is, should the this semantics be applied to the cutout at that point? Because that meant that they were asking for a cutout, and the way they get the cutout is with this. This is a thing that happens in Firefly right now in the Rubin interface. You get back a data link table that tells you where the whole image is, and it also tells you where the cutout service is. I'm going to provide a service descriptor, and Firefly goes, oh, well, there's the this, so I'm going to go retrieve the this. And so there's the whole 50 megapixel image there in your session when what you wanted was a cutout, right? And so we need to think about, what's a good way to tell a client software what it should do so that the client doesn't have to do some kind of magic, do what I mean intuition thing to figure out whether it should load the actual image or the cutout. Now, Firefly puts a limit, like if XSS size is bigger than something, then it won't try. But a lot of things don't have XSS size, so that's the thing that I thought about. And this may not be a very good, like, I don't know, meeting room discussion, but I was going to write something up about that for you. FB : Okay, and James? JD: So I have a lot of concerns with the second solution. The reason being that we already have a way to do whatever we want to reduce an arbitrary cutout, and that is SIA, Data Link, SODA. And we've got clients which work for that, and that is well covered. But what we don't have is a migration path from SIA 1. And if we want this to be an SIA 1 migration path, the clients that are currently using SIA 1 are expecting two get commands. A get command to the search, and a get command to get the cutout. And I absolutely agree that in that retrieve mode where this is the cutout, you're, that what you should be doing is providing a mean to get to that particular cutout. And I mean, the one parameter that you then add in on top of the SIA 1, is then whether you want a PNG or a FITS file. in the usual case. No arguments, there are other situations. So I actually don't see, I mean I can write a client for the second thing with no problems, that's fine, but I just don't see fulfilling the goal of having a replacement for that SA1 one-shot cutout thing. I really argue we should keep it really simple, keep it, and you know if someone wants to modify, I've absolutely seen a situation where you have one different search versus cutout radius, but if someone wants to do that they can just modify the URL they're working with. PD : So when you say modify the URL you're talking about that access URL equals that. JD: Yeah, yeah, and type the radius. PD : But do they know that's SODA or is it an opaque URL? JD : Well, yeah, PD : because at that point it can be just an opaque URL. That's right, yeah, absolutely. GDF : I think it's important that it can be an opaque URL. PD :Yeah, yeah, and so modifying is not, but I understand the like the direct thing, so that means that when you query, you have in mind that you want to do, that this is exactly you want to do, so there has to be some sort of a mechanism to control what kind of access URL you get. Whether you get a download URL, a cutout URL, or a data link URL. JD : Yes, yeah. PD : So what would you do, what do you do in the scenario where the row in ObsCore, normally let's say in our service where you would take the ID, call data link, and you get this and an auxiliary, say it's some sort of weight map, and ideally I feel like if the correct thing for the user might be to chop out, they would do a cutout in the science data, and they should chop out the equivalent piece of the weight map. And so to interpret that other thing, maybe, maybe not, like I don't know what they're trying to do, which is kind of my problem with pre-coding. GDF : That's exactly what the SphereX cutout service has to do, is exactly what you said. PD : Yeah, and so if you have that URL, how do I convert those two things? JD : Yeah, I think if we're talking about it's a simple to request URL, you're just getting back a FITS image or a PNG, and that's what the current SIA1 does, and if that's the part we're trying to cover, then I think it's probable we should try to restrict ourselves too. But I will note that, you know, in a similar note, the CASDA cutout service takes the FITS image, but it also appends the BEAMS table, for instance, that kind of image, so you've got a, you can interpret what the BEAM is, the different slices, so you can get an MDF and put it in there, or just a, yeah, just a multi-HDU FITS. Yeah, yeah. FB : Well, you are right. The motivation to propose the first solution was, I heard several times, and I feel myself that something from SIA1 was missing. Yeah. But first of all, for this behavior, as Gregory pointed out, will not be the only behavior, it's just on choice if you, like before in SIA1, you had, what was it called, mosaic cutout or archive modes, so you have to choose which one you want. And second, for what Pat is looking for, I think if we have the retrievemode and get this kind of URL, this doesn't prevent us to, or forbid us to add the service descriptor in the same response, it is allowed. And in that case, if your main point is that the people have to be able to choose their parameters, it's totally possible. So in the classical way, you have either a direct link to the full dataset and the service descriptor can be there in the first response, or it can be in the data link response, both are possible. And with that solution, the service descriptor with three parameters could still be there, so people could still click on the access URL, get me back the cutout or go to the service descriptor. JD : And I have no problems with that, I think as long as we're just satisfying the path. FB: And of course, the URL above will be there only on choice of user if you choose the retrievemode like this. JD :If I've asked for, give me a simple cutout, and then there is no problem with the simple access URL, which gives them that. I think we've satisfied the requirement. FB : Yes, Gregory? GDF : I mean, if we're talking about trying to recover SIAv1 behavior, I think there are a lot of use cases in SIAv1 where I would have asked for a cutout to be made on the fly in this one shot thing, where I would have used a value of intersects that's different from the overlap one, which is the only thing that you can do in SIAv2. Because if your goal is I want every image in which I can find a hundred arc second radius thing around my favorite extended source, you can't get, that's not what you're going to get from SIAv2. You're going to get things where it's just barely at the edge, right? Because it always does overlap. In SIAv1, you would have said I want the disk to be contained within the image so that the full area of the cutout is filled from that image. That would be a very common thing to do. And I wonder if we need to take a step back and say, I wasn't here when you decided not to carry that parameter forward to SIAv2. So I don't know what the argument against carrying it forward was. PD : So you're saying have a way for the user to specify contains instead of overlap? GDF: Yeah, yeah. Because then that makes this much more useful. They work together. They're very compatible. FB : I think the motivation, our main motivation to go to SIAv2 was to have something closer ObsCore first. GDF : Oh no, that of course. FB: And second, we wanted to tackle not only spatial parameters, but also band and time. GDF: No, I get what the positive reasons were, but I don't know why you didn't reach out. FB : I remember fairly well, I was with Pat in all that. The counterpart was that we had to do something rather primitive, basic, in order to go fast. So that's the main reason why we scoped a lot of things, which were in SIAv1, because it would be too difficult in the context of all these dimensions. Because SIAv1 was doing this only on the spatial axis, but we wanted to add the time axis, the polarization. Pat : I think the other difference is that at the point where we created this, we had ObsTap, which means that there was a way to do that thing, which was write a TAP query. And the more we add to this, it's just filling the space between SIA and TAP. GDF : Well, okay, so there are a couple things there. So that depends on the data publisher actually having an ObsTap service. PD : True, yeah. GDF : Which is connected to a mental model that says, oh, there must be a database behind there that looks like ObsTap. But in fact, Rubin's in the middle of implementing SIAv2 over the Butler. And the Butler doesn't have these tables. So it's making that stuff up on the fly, but it can still do the queries. And so the Butler is perfectly capable of doing the math to do contains and overlaps and intercepts and all those things. PD :I mean, it's true that that's one of the reasons to have, yeah. GDF : And you wouldn't, you CADC, wouldn't be harmed by this because the way you implement SIAv2 is actually by proxying it over TAP anyway. And so you can just write the ADQL that corresponds to those relationships. So I kind of want to at least throw this out there that maybe if we're going to do this, it might be good to do it together with the intersects mode selector. Because I think they work really, really well together. Should I write a proposal? FB : Yeah. So is it enough for today for this one? So now we go to, I didn't have anything else about DAP where I see the transformation from SIA to more datasets was rather admitted. And I don't remember any big discussion apart from that one. GDF : So I have a, like a data link intersection with ObsCore queries thing that, I don't know if there's any other business moment here, I could do it at the end. After this. It's a concern that I have about how they work together. Now that that's something that occurred to me, I shared it with TAP already. It has to do with the format specifier. FB : Yes. GDF: And I don't know if, would you like me to wait till the end or do you, or is this a good time before we switch to SODA? * FB : Do you think it could be long? GDF: I think I can say it quickly and then we can decide to discuss it. Say it and yes, I would like. Yeah. I think it's short. Yeah. So the format specifier of SIAv2 is, and I'm not confusing format and responseformat, right? Format meaning the format of the dataset. FB : Yes. GDF: Is explicitly defined as having a relationship with the access format value in the hypothetical underlying obsCore table. Right. Okay. That's where, that's what it's, I mean, it says that in the standard, it says that. Yeah. It says that you're, you know, you're looking at access format to determine is it fits, is it parquet, is it CSV, whatever, right? But if your service like CADCs and Rubins uses data link, access format is always data link. And so you've rendered the format control completely useless, you know? And it's, the value of using data link is so enormously large compared to this problem that like I can live with it. I mean, I could not build the Rubin interfaces I want to build without data link. But it's really a shame to not be able to have the user be able to say, I only want fits or I only want... FB : So, Marco? MM: This is something which appears to me and maybe wat I'm saying is a litle bit silly. Should the database inform how to map to the this? GDF: I know that's what I'm saying. Yeah. But that's not what the standard says. The standard is absolutely unambiguous. That's something really... Yeah. The standard says it's against the access format in the table that comes back from the query. And the access format in the table that comes back from the query is data link, data link, data link, data link, data link. It's useless. PD : I think we didn't expect DataLink to be so successful. BC : So you mean that there is nothing in the access URL, it's only data link? The idea... GDF No, no, no. Access URL is whatever the URL for the link services. Yeah, the idea is... Format is just whatever the thing is. It's the application. FB: The idea was... And I think Marcus who is in his bed, I think now. As in this service, in GAVO service, really as... For example, for images which are not too big. Yeah. It's the image application slash fits or whatever. And for larger data set, he has the data link answer because he cannot provide them directly. Yeah. GDF : Because that's what he wants to do, but we don't want to do that. We want it to use data link for every single image. FB : Yeah, I know. So in that case, we have to change ObsCcore. GDF : Do we, or do we change SIAv2? Because it's totally useless, right? PD : It's just the meaning of the format parameter. Yeah. GDF : I bet you no one is going to... This is part of the... FB : In SIAv2, this is part of the ObsCore standard. GDF: No, it's just the way it was written. It could be written... Was it you who said? No, it was you who said this. You could just rewrite SIAv2 to say if the access URL and the access format are data link, then the format parameter is to be evaluated against the content type of the thing that's the this. It's very well defined. It's a completely unambiguous formal specification. AM: One thing I don't understand. If a user sees and the machine sees this format, which is not a data link, but it's whatever it is, how will it know that the link is a data link or the link is directly a link to the... GDF: Because, okay, so we're leaving this difficult problem to the implementer. We might decide to add a hidden column to our ObsCore table, for instance, that says what the actual format is. I prefer to add a second column. This is what I'm... No, no, no, but I don't want to tell people that they have to add a column to the OpsCore standard in order to do this. Yeah. Right? I don't want to add anything to ObsCore in a formal... AM : That's kind of a confusion. It's a no, right? We're not downloading data? GDF : No, it's not. It's an implementation detail. PD :It doesn't have to be in the table. It's odd if you query and your query says, show me things with block with image slash fits, and the response doesn't have image slash fits in it. GDF : No, but if the response is just data link, data link, data link, we know what that means. PD : I agree that it makes sense, but it's odd when the response doesn't have the thing you were searching for because of the indirection. GDF : Okay, I mean, we could... It's still more useful than it is now. We could add something, but I mean, then the problem is the people who don't do this. I'm looking at Mast . They currently don't use it, or they're migrating to the point, I guess. They would be repeating it. You have two columns that would say fits, fits, fits, fits, fits, fits, right? FB : There is something I don't understand, if you do that, how can you distinguish that the access URL is going to data link instead of... JS : But data link is a data point. It's a quantification. GDF : Nobody would ask that. I'm not making any... Yeah. JS : I think there are two different points. We are mixing two differentb things in one single field. So one thing is what we are going to obtain on location of data link, and the thing is the link that is inside the ObsCore is a direct link or data link. GDF : No, I understand that. That is what the problem is. JS : It's like what is missing is a new field in ObsCore the data link is inside this particular record. JS : It's a direct link or it's a data link. I'm just saying the format for the fits or whatever is going to be produced at that location. GDF: But that's not that compatible. What I'm proposing is... JS :That is correct. BC : EpnCore, we solved that so that the access URL should lead to the format that is in access format. So if it is fits, then it says fits. And we have a data link URL that goes to the... GDF : Well, I understand. I mean, that might have been a good idea, but it's... It's not the way of solving what people do or have to do. It is correct. GDF : It would be wrenchingly non-backwardly compatible if you changed what ObsCore... JS : It's a matter of... JS: So maybe sometime you have to modify the standard. I think this is... GDF : I'm only trying to change SIAV2. I'm not trying to change ObSCore. And then there's something here. I just don't want to drive like that. But I find this true. AM : If you solve the data link as a format and the same column, what is the main... The real format of the FITS. GDF: If that's not going to break all existing clients, that's maybe... What do you think about that, Pat? PD : I mean, I've thought about changing the access URL and the format there to... I just thought about changing the access URL to point on our package service and have the service descriptor to get to the data link service. Because currently they both go to data link. GDF : So you would not be able to do that because you wouldn't be able to have different data link endpoints for different missions. PD : Yeah, that would be... That's harder to refer back from service descriptors. There's no... GDF : I mean, I don't think there's any way. We need to double that direction. PD Yeah, possibly. Possibly. GDF : Yeah. I think that would be... PD : You couldn't do that. But if I did... Say I decided I'm going to output a tar file of all the files, then people don't want to search for tar either. They want to know, is the thing in the tar Fits files or HDF5 ones ? GDF : Yeah, so you have the same problem. PD :So there can be other levels of indirection where there are formats that are intervening that aren't meaningful that nobody cares about. And that's the problem here. That's really just... GDF : Well, maybe we need a data product format column. JS : I just think there's a prototype that's using the data link but shouldn't be forced to use the data link. Because in fact, both the data link and the link, there are different options. And there are different formats. So there is not a format that we can put in the ObsCore table because it is inconsistant. GDF: No, I don't understand. We said specifically the format of this. JS : Yeah, but in the data link, there are different options inside the data link. We'll have to access the data table. FB : This is not unique. So we have to... You mean if there are multiple... JS : It's not unique for all the different links. GDF: You mean if there's more than one this? FB: Oh yes, you can have... Yeah. GDF : So in this case, it makes sense with the... Well, but then we could define it to mean if you have multiple this is if any one of them matches then you get the whole thing, you get it. It's not... It shows this. But like, I mean, I don't know. I know past use case for multiple this is that they're all associated with the same observation. Yeah, I think they're all parts of it. PD : Like someone decided to tile the thing into multiple files. GDF : So I promised François that we would not take a lot of time. So we know what the problem is. And I think this is a very real problem. It makes it for our services. It makes the format parameter useless. PD : Yeah, Marcus kind of proposed that searching that as a query parameter isn't practically very useful because people will find out what the format is and decide whether or not they want to download it later. So searching for Fits isn't sort of... It's not a scientifically motivated part of the query. So I thought Marcus... GDF : I would notagree with that. Like searching for parquet is an extremely useful thing to do in the context of like Mario's thing. We need to move this into the list and talk about it in a running meeting. FB : Yeah, so please write issues or emails. GDF : What is the preferred thing you say? So email to... GM : You can create an issue and send an email to... GDF : I'll do that. Okay, all right. That's great. Thank you for the time. SODA -----