VOEvent Transport Protocol Agreement
Version 1.1DRAFT

IVOA Technical Note 2006-04-12

Rob Seaman, National Optical Astronomy Observatory, USA
Alasdair Allan, University of Exeter, UK
Scott Barthelmy, NASA Goddard Spaceflight Center, USA
Andrew Drake, California Institute of Technology, USA
Matthew Graham, California Institute of Technology, USA
Robert White, Los Alamos National Laboratory, USA
Phillip Warner, National Optical Astronomy Observatory, USA

Rob Seaman, seaman@noao.edu


Blah, blah, blah...

1. Introduction

Blah, blah, blah...

2. VOEvent Sky Transient Alerts

The transmission of a VOEvent packet announces that an astronomical "event" has occurred, or provides information contingent to a previous VOEvent. As discussed in the Sky Event Reporting Metadata (VOEvent) specification, the packet may include information regarding the "who, what, where, when, how & why" of the event. This technical note discusses usage conventions intended to ensure interoperability and optimize the balance between robust packet transport and simplicity of implementation.

2.1 Publishing VOEvent Packets

It is expected that VOEvent packets will be used as the basis of an information infrastructure of VOEvent "Brokers" (Publishers, Relays, Repositories and other VOEvent aware components) that will hold VOEvent packets keyed by their identifiers. These Brokers may harvest packets from each other, so that a packet may be held in more than one Repository. In addition to the harvesting protocol, there will be three ways for clients to interact with the database servers:

Transport mechanisms may include options as diverse as E-mail and cellphone, as well as "push" or "pull" web services. The IVOA Events Working Group is responsible for suggesting best practices and ensuring interoperability. Broadly speaking, the creator of a packet will submit it with an empty identifier to a publisher, who will check syntax and respond with the same packet, but with the identifier filled in.

The subscription mechanism is expected to be the chief way in which users will be informed of new events. A subscription to an event service is a filter on the stream of events that an event registry processes: whenever certain criteria are met for an incoming event, the subscriber is notified by a transport mechanism that the subscriber has chosen. The filter may involve the curation part of the event (e.g., "all events published by the Swift spacecraft"), the location ("anything in M31"), or they may involve the detailed metadata of the event itself ("whenever the cosmic ray energy is greater than 3 TeV").

The discovery of a new celestial phenomenon may be Nobel-prize material, and it is hoped that a VOEvent packet will be the chosen medium for its announcement. The astronomical community generally prefers open systems — VOEvent packets do not convey intellectual property (IP) restrictions on the data they contain. Organizations can work within a closed system of clients and servers if privacy is required. This solution is simpler and more effective than demanding that all servers understand a schema for IP restriction.

2.2 VO Identifiers

VOEvent benefits from the IVOA identifier syntax developed for the VO registry. These identifiers are required to begin with "ivo://", and are meant to stand in for a particular metadata packet, obtainable from a VO registry. A registered VOEvent packet is one that has a valid identifier — meaning that a registry exists that can resolve that identifier to the full VOEvent packet. The identifiers thus provide a citation mechanism — a way to express that one VOEvent packet is a follow-up in some fashion of a previous packet.

VO identifiers may also be used for efficiency. One section of the VOEvent schema [5] is about curation (who is responsible for this candidate discovery), and that section may be replaced by a VO identifier which points to the relevant organization. If a group creates similar VOEvent packets regularly, it would be preferable to use the VO identifier in each packet rather than sending the whole list of people and contacts each time.

For these reasons, VOEvent packets will often contain VO identifiers, as defined and discussed in [16]. These take the general form "ivo://authorityID/resourceKey", and are references to metadata packets that may be found at a VO registry or VOEvent database.

The lookup procedure is similar to looking up a URL on the world wide web: each registry controls a number of authorityIDs. These are like domain names on the net: each is resolved to exactly one endpoint machine through a system of distributed knowledge. Once that machine is discovered, it should be able to resolve the secondary part of the identifier, the resourceKey. Indeed, the machine that holds an authorityID has made a promise to continue to resolve all the resourceKeys that it has issued. The corresponding organization has the responsibility for ensuring that VOEvents once issued remain available indefinitely via their VO identifiers.

An International Virtual Observatory Resource Name (or IVORN) is used to identify each VOEvent packet. All such IVORNs must be of the form:


For example, for eSTAR this would be:

    AUTHORITY_ID = uk.org.estar

With events (re)published from several sources:

    PUBLISHER = estar.ex
    PUBLISHER = gcn.gsfc
    PUBLISHER = pl.edu.ogle

Note the somewhat unfortunate fact the PUBLISHER section of the IVORN corresponds to the Author role, while the AUTHORITY_ID corresponds to the Publisher role. Both AUTHORITY_ID (Publisher) and PUBLISHER (Author) are entities that may be entered into a VO Registry.

2.3 Authentication and Authorization

VOEvents provide a mechanism for alerting members of the astronomical community to time-critical celestial phenomena. As a result of such an alert, significant hardware, software and personnel assets of the community may be retargeted to investigate those phenomena. The scientific and financial costs of such retargeting may be large, but the potential scientific gains are larger. The success of VOEvent — and of observations of astronomical transients in general — depends on minimizing both intentional and unintentional "noise" associated with this communications channel. All of the familiar internet security worries apply to VOEvents.

Authorization is a question of controlling who may receive the information contained in an event stream or in a specific VOEvent. This is perceived to be purely a function of the transport layer through which VOEvents are delivered from publisher to subscriber. As such, VOEvent access authorization is outside the scope of this specification. On the other hand, the authentication of messages is a matter of extending trust outward from the publisher of a packet to any client who later receives that packet. How may a subscriber be confident that the apparent origin of a packet was the actual origin of the packet? How may a subscriber be confident that the packet was not modified in transit?

Where authorization is concerned with limiting user access, authentication is concerned precisely with enabling trust in unfettered access. It is likely that some future version of VOEvent will benefit from a general purpose VO authentication standard. VOEvent packets will often be distributed through unofficial as well as official channels — for example, one astronomer may forward a VOEvent of interest to another via E-mail. This is not only behavior that cannot be avoided, it is behavior that should be encouraged via solid support for authentication.

Some predictions are clear regarding future VO security standards and practices — others are hazy at best. The VO is likely to adopt widely recognized network standards such as SSL [32] and S-HTTP [31] to secure transport channels. SAML [30] may be used to distribute security assertions based on X.509 [34] certificates. The precise semantics are unclear, however, for providing support for these standards within VO documents. It is non-trivial (or impossible) to directly embed a digital signature within a document since the signature changes the document (see [27]). The identifier of a document might be used to retrieve a security certificate from a remote database via a chain of registries; alternately, a document might contain an explicit pointer to such a certificate, e.g., to a PGP [29] digital signature. In the case of VOEvent, diverse possibilities suggest themselves. References to external security certificates could be provided via explicit <Reference> elements, via "followup" <Citations>, via some extension to the <Who> curation schema, via a VOEvent id database or registry query, or perhaps via an entirely new <Authentication> sub-element that responds to a broader VO standard. It is premature for the current version of the VOEvent specification to mandate future usage in this area.

3. Architectural Roles

Figure 1: The VOEvent network comprises components filling a half dozen simple roles.

One may distinguish between the roles of Publisher and Subscriber and the publish and subscribe methods that might be implemented for various VOEvent component classes in an object oriented programming paradigm. One might, for instance, encapsulate the roles of Publisher, Subscriber, Relay and Repository as classes. Each object instantiated as one of those classes would then offer either or both of the publish or subscribe methods as appropriate, for instance, a Publisher implements only the publish method, while a Relay implements both methods.

4. Transport Protocols

Nothing has yet been said about the underlying transport protocol(s). It is likely that different projects will continue to experiment with new protocols for quite some time. Interoperability requires at least two parties to agree on a protocol, of course, and it is anticipated that a small number of options will grab the lion's share of the VOEvent "market". To bootstrap the system, two reference transport protocols have been designated, one option for Push architectures and one for Pull messaging architectures.

4.1 Reference Push Protocol

A raw socket connection is established via a specified port. Negotiation for a port address is deemed too machine dependent - for instance, through a firewall. After establishing the connection, each packet will be emitted by the publisher, preceded by a 4 byte network ordered signed int count of the size of the packet. An action item is to decide between ASCII and UTF-8 encoding. Once established, the socket remains connected indefinitely. Responsibility for timeouts related to dropped communications links are reserved to each end of each channel.

This protocol is certainly not sufficient for all purposes, but it is simple enough to require minimal effort for any project to implement, which will aid the goal of growing the network until it reaches critical mass. Since we're about as far from the holy grail of guaranteed delivery of messages as you can get, some additional features are desirable to add robustness to the packet streams.

Software acting as a Publisher (or Relay) should offer a TCP server. Subscribers (or Relays) should subscribe to a Publisher's feed by opening a continuous connection from as client to the server's socket. This connection is bi-directional. This means that many clients can connect to one server (Publisher) without the Publisher having to keep track of who is connected. This avoids the trap of centrally control and means the network can form on an ad-hoc basis. If the Publisher wishes to limit distribution they can do so by using white listing of IP addresses at a local firewall (e.g. RAPTOR currently implements this method to limit connections to its Publisher).

Before writing a VOEvent message to its socket a Publisher will write a 4-byte networked ordered integer (corresponding to a network ordered long on 32-bit platforms, although not on a 64-bit platform) which contains length of the VOEvent message which follows in bytes. The Publisher then writes the VOEvent message as ASCII text (at least in the current implementations).

It should be noted that the standard will not change if the VOEvent message is written as UTF-8 as the "length of message" network ordered integer will simply a larger number to accommodate the larger number of bytes taken by the message. Although not currently standard this may imply that an tag denoting the type of message (UTF-8 or ASCII) should perhaps head up all messages passed over the wire. This is (not currently) the case with (any of) the current implementations.

As outlined below, a VOEvent message of role="ack" is written by the client to the same socket connection it received the original message on. This is also headed up with a 4-byte networked ordered integer with the length of the ACK message.

Similarly, role="iamalive" messages are written to the server socket (by the Publisher) at periodic intervals (chosen by the Publisher). These are handled in a similar manner (above) to all other VOEvent message. However the client (Subscriber) replies to these messages with messages of type="iamalive" rather than an ACK message.

4.1.1 ACK Packets

Upon successful receipt of a packet, the receiving component will return a role="ack" packet to the sender. At its simplest, this will be an empty VOEvent element:

    <VOEvent id=same_as_original_packet role="ack" version=... />

4.1.2 I_AM_ALIVE Packets

On an arbitrary schedule(s), each broker emits role="iamalive" packets to other brokers (meaning other VOEvent participants, perhaps including individual subscribers). Subsequent discussions have clarified the format of these packets to include a timestamp and explicit publisher ID:

    <?xml version='1.0' encoding='UTF-8'?>
    <VOEvent role="iamalive" id="ivo://talons.lanl/001" version="1.1">

Upon receipt, the subscribing entities will respond with the same packet, but with an updated <PublisherID>. The timestamp will allow calculation of roundtrip network latencies.

We are continuing to discuss possibilities for ensuring multihop, end-to-end route validation. This might include a time-to-live field similar to IP packet routing. These mechanisms will likely continued to be layered upon any future messaging protocols supported by VOEvent, at least optionally. Overtly these apply only to push protocols, of course, but the goal of ensuring a reliable packet distribution mechanism is critical whether push or pull is used.

The simplest implementation should have the Subscriber echo the Publisher's I_AM_ALIVE message, although more informatively they could add additional meta-data such as the time of receipt and other relevant information.

4.2 Reference Pull Protocol

In short: RSS 2 with enclosures. We'll undoubtedly have more to say about this choice after we gain more experience. The current implementations offer VOEvent messages via the RSS2.0 standard, making use of both the , and tags (for those RSS clients which know what to do with an tag). For more on the tag see http://blogs.law.harvard.edu/tech/rss and for "use cases" see http://www.thetwowayweb.com/payloadsforrss

The RSS file is made available via HTTP, limitation of distribution can be done by any of the methods available to a standard HTTP server. The RSS feed should be served as MIME type application/rdf+xml by the HTTP server. Note the use of application/xml+voevent MIME type for the tag. This is somewhat controversial and may require some discussion.

5. Examples

These examples should be taken as suggestions for review by each project for their own purposes. Interoperability with VOEvent Brokers operated by other community members should be carefully verified.

Figure 2: The network of VOEvent compliant Brokers passes sky transient alerts from originating sites to Subscribers.

5.1 Example ACK Packet

An ack packet might also contain "normal" VOEvent content:

    <VOEvent role="ack" version="1.1" id="ivo://uk.org.estar/estar.broker#ack"
	    <Param value="stored" name="{LOCAL_FILE||DB_HANDLE||REST_ENDPOINT}" />

5.2 Example I_AM_ALIVE Packet

    <VOEvent role="iamalive" id="ivo://uk.org.estar/estar.broker#"
      version="1.1" xmlns="http://www.ivoa.net/xml/VOEvent/v1.1">

5.3 RSS 2.0 Feed

An (extract from a) RSS2.0 compliant feed is shown below,

    <?xml version="1.0" encoding="UTF-8"?>
    <rss version="2.0"

	<title>eSTAR Event Feed</title>
	    This is an RSS2.0 feed from eSTAR of VOEvent notices
	    brokered through the eSTAR agent network.
	<pubDate>Wed, 22 Mar 2006 23:40:44 GMT</pubDate>
	<lastBuildDate>Wed, 22 Mar 2006 23:40:44


		Received packet (via eSTAR) at
		Packet role was 'test'
	    <pubDate>Wed, 22 Mar 2006 23:40:42 GMT</pubDate>
	    <enclosure length="1351"
	    type="application/xml+voevent" />

		Received packet (via eSTAR) at
		Packet role was 'test'
	    <pubDate>Wed, 22 Mar 2006 23:30:42 GMT</pubDate>
	    <enclosure length="1351"
		type="application/xml+voevent" />


6. References

    Some event alert networks:
  1. ATEL: The Astronomer's Telegram
  2. CBAT: Central Bureau for Astronomical Telegrams
  3. eSTAR: eScience Telescopes for Astronomical Research
  4. GCN: The Gamma-Ray Burst Coordinates Network
  5. rtVO: The real-time Virtual Observatory
  6. VOEvent: IVOA Sky Transient Metadata
  7. Some surveys reporting events (or planning to):
  8. LIGO: Laser Interferometer Gravitational Wave Observatory
  9. LSST: Large Synoptic Survey Telescope
  10. Palomar-QUEST: A case study in designing sky surveys in the VO era
  11. Pan-STARRS: the Panoramic Survey Telescope & Rapid Response System
  12. RAPTOR: RAPid Telescopes for Optical Response
  13. Swift: Catching Gamma-Ray Bursts on the Fly
  14. Robotic telescope infrastructure:
  15. RoboNet: RoboNet-1.0
  16. ROBOT: A list of robotic telescope projects
  17. RTML: Remote Telescope Markup Language
  18. VO standards:
  19. ID: IVOA Identifiers
  20. RM: Resource Metadata for the Virtual Observatory
  21. STC: Space-Time Coordinates Metadata for the Virtual Observatory
  22. UCD: Unified Content Descriptor
  23. VOConcepts: a proposed UCD for Astronomical Objects, Events, and Processes
  24. VOTable: Format Definition
  25. Astronomical resources:
  26. NED: NASA/IPAC Extragalactic Database
  27. SIMBAD: Set of Identifications, Measurements and Bibliography for Astronomical Data
  28. TYCHO: De Stella Nova
  29. UNITS: Standards for Astronomical Catalogues: Units
  30. UTC: the future of Coordinated Universal Time
  31. Computing resources:
  32. Checksum: FITS Checksum Proposal
  33. ISO 8601: standard representation of dates and times
  34. PGP: Pretty Good Privacy
  35. SAML: Security Assertion Markup Language
  36. S-HTTP: Secure HyperText Transfer Protocol
  37. SSL: Secure Sockets Layer
  38. XML: Extensible Markup Language
  39. X.509: Public Key Certificate Infrastructure

Appendix: Notes

A.1 A SOAP Based Transport Protocol

Subscribers make a SOAP client connection to the Publisher's SOAP server and call a method to register "interest" with the Publisher the receiving messages. This registration of interest includes a SOAP end point to which VOEvent messages can be passed back to the Subscriber. The Publisher will (synchronously) acknowledge the registration with a VOEvent message of type="ack".

When a Publisher wishes to emit a VOEvent message it will make a SOAP client connection to all Subscribers who have registered interest in receiving messages and pass the VOEvent message to the Subscriber's SOAP server. As part of this synchronous transaction the Subscriber will acknowledge this message with a VOEvent message of type="ack".

The Publisher will periodically emit VOEvent messages of type="iamalive" which will be handed to the Subscribers in a similar manner to the above, except that the synchronous reply from the Subscriber is of type="iamalive".

This push method is (obviously) not as fast as the VANILLA TCP transport protocol outlined above, but does fill a niche.

It should be noted that this still allows ad-hoc networks to form, since registration of interest (at least can be) handled automatically. Limitation of distribution can either be done in a similar manner to VANILLA TCP, using white lists and a local firewall. However the current implementation uses SOAP over HTTP and authorisation is be done using username/password via HTTP Cookies appended to the SOAP message.

The eSTAR Broker (as currently implemented) also offers a SOAP endpoint which allows Publishers which do not offer VANILLA TCP and RSS2.0 to "inject" VOEvent messages into the existing network (the Broker offers these "mandatory services" for the Publisher as a by product of republishing the message). This is the "eSTAR native" way of publishing a VOEvent message.

A.2 Various Comments

If eSTAR then relays the talons gcn interpretations should eSTAR then add onto the ID number or in some way indicate in the body that they are the really point?

With GCN's you normally increase the "hop count" filed if you are relaying the message.

The terminology can get a bit sticky here unless we are careful. Single software systems can be both a subscriber and a publisher rolled into one (i.e. relay). The software developer must be careful not to broadcast the "iamalive" from "both ends". Standards would indicate that the server (he/she who creates the socket) is responsible for generating the iamalives and the client (he/she who connects to the socket) must echo the iamalive or risk getting cut off. Also the client can break the connection if they have not received an "iamalive form the server within a set period. However only the server should be the source of the iamalives.

Creating the additional time stamp for meta-data should not be a requirement of an overall standard but could be requested by a "publisher" if they wanted their clients to add that information.

TALONS does keep track and it is extremely easy with TCP/IP and does not interfere with the connections. It does provide the server with the capability to layer security and screen who is connecting and bump bad or unknown connections. This can make the term "subscriber" really mean something. Through this method we know exactly who is on and what they are getting.

Open subscription servers where anyone can connect would be a good idea, but they could be tricky from a network security standpoint. Even if you go this route you will still want to set up IPChains or IP Table security to protect the system, otherwise you will be very vulnerable to hacking attacks of a variety of types.

Open subscription could be done easier via the Pull method (RSS Feeds) or other means of scrubbing websites such as VOEventNet offers. Use of manual alert or observation insertion WebPages to send in data.

We might want to start thinking about UTF-8 as a standard and move out of ascii. There is little work to do a conversion and once we start getting events from Asia it might be a necessity.

Maybe need a "hop count" in the tag along with the new roles for "relayed" messages.

A smart relay is a stateful component which knows which source events to send to which sinks.

Although is "stateful" the right word in this context? It may be stateless (except for the knowledge of which feeds to relay), once aggregated it the feeds are sent to all sinks (Subscribers). There is no attempt to retain (Repository) or filter (Broker) the feed on a per-sink basis.

Good discussion point. What is state to VOEvent? A first cut might be the list of filtering rules corresponding to each subscriber. As with TCP/IP packet networking, this might also correspond to the equivalent of the DNS and routing tables. If not, then VOEvent is a more brutally simple entity than the underlying internet in that the routing has to be explicitly realized in the "wiring" of the VOEvent network. I think the ideal of VOEvent would be that a new source of alerts (e.g., Pan-STARRS) or a new sink of alerts (e.g., KESTREL son of RAPTOR) could simply be plugged into the GVN with the easy of sticking a conference room CAT-5 into the back of your laptop. We're trying to invent the VO's version of DHCP.

The word "state", however, tends to imply some kind of network memory or hysteresis - that is, not just an uploaded set of per-subscriber rules, but a short latency learning algorithm based on received packet content history. There might also be a suggestion that the filtering/followup multi-input/multi-output rules used for programming a broker might need to be conveyed via the VOEvent packet stream itself, rather than assuming that some web form be used to break the paradigm.

It seems to me that the VOEvent state needs to be distributed throughout the network in some fashion. So, twenty-three subscribers sign up for a variety of packet types through a particular Broker. These rules imply that that particular Broker only needs to receive a subset of all packets, so the "upstream" brokers don't need to forward other types of packets. Etc., etc., etc. It starts to sounds a lot like a network of TCP/IP switches - with similar need for the same "interesting" computer science routing algorithms.

In any event, the question is not whether Relays have state, but rather whether those that don't would benefit from being described differently. The point isn't that there might be different flavors of Publishers, Subscribers, Repositories, as well as of Relays - but whether the architecture will benefit from expanding the list of fundamental components. So far I haven't heard anything to convince me of any such benefit. Relays can be simple, or they can be arbitrarily complicated, but they all implement both the publish and subscribe methods and may physically connect to arbitrary numbers of packet inputs and packet outputs.

Relays are VOEvent's neurons, each with an indeterminate number of synapses to plug into other neurons.

The exercise we are engaged in is different. We're trying to identify the simplest complete and self-consistent set of components we need to describe our interoperating system architecture.

Consider Greg Aldering of the SN Factory as the author of sky transient alerts. Aldering arranges for information describing his alerts to be forwarded to a VOEvent Publisher. This Publisher may have been located via the VO registries or more likely he had his folks install some tarball one of us provided. The Publisher receives semantic information via some indeterminate interface(s) and constructs a conforming VOEvent packet. The Publisher assigns an ID. The Publisher may sign the packet. The Publisher then calls its "publish" method to disseminate the packet out one or more connections to entities that are subscribed. (May need a word for this: a VOEvent "channel"?)

The subscribing entities may include Repositories, Relays, complex "Brokers", or simple Subscribers. The act of "Publishing" involves no decision-making - alert in, packet out. If filtering is needed (which will often be the case), a Relay component provides this. Often, one imagines, the original Publishing entity will actually be a complex Broker, not a simple Publisher, and will have integrated Relay functions such as filtering, but whether or not this Broker can be separated into two clean components, this is architecturally what it is.

Greg acts as an Author. LANL, eSTAR, VOEventNET, GCN(2) operate Publishers. The act of "Publishing" is different than the act (method) of "publishing" (and maybe we should change one of the names). The former corresponds to formatting a conforming packet and assigning an ID. The latter corresponds to emitting a conforming packet via a VOEvent channel.

A Relay "subscribes" to conforming VOEvent packets over its input channels and "publishes" packets (typically unmodified, albeit often filtered by content) over its output channels.