Proposal for handling TIME in Met-Ocean Web Map Services: only encode one run in a Capabilities document
Note that following Adrian Custer's comments below, perhaps it's more accurate to summarize this proposal as "Encode separate runs a separate Layers". These Layers may or may not be in separate Capabilities docs.
Briefly, we need to find a mechanism to handle forecast model runs, which have the following characteristics:
- The runs overlap in time, i.e. yesterday's 48-hour forecast is valid for the same time as today's 24-hour forecast
- Each run might be of a different length: some forecasting centres run long-range forecasts every 6 hours and shorter-range forecast in the intervening 3 hours
- (Also, runs might go wrong: they may stop unexpectedly at some point)
- It's possible (I think?) for different runs to use different spatial resolutions
- Although this probably constitutes a different model
of forecast model runs therefore conceptually has more than one time dimension, as discussed elsewhere on this wiki. Furthermore, due to different run lengths and possible missing values the dimensions could be "ragged".
I believe it is desirable to require as little modification to the WMS specification as possible, to allow standard clients to interoperate. The addition of extra DIMENSIONs is allowable by the specification but these dimensions are unlikely to be understood by standard clients. The addition of extra operations to the WMS interface (e.g. GetDimension) is a possible solution, but would be an addition to (not an interpretation of) the WMS specification.
The proposal is simple. Since many of the problems are caused by attempting to encode collections
of model runs in a single Capabilities document, we propose to encode only one run
of a model in a single Capabilities document. (Note: encoding each run as a separate Layer in the same Capabilities doc would have the same overall effect.)
The advantages include:
- There is only one TIME dimension per layer, so we can use the standard dimension definition
- No modifications to the WMS specification are required (not to handle TIME anyway). Hence standard time-aware WMS clients would be able to interpret the TIME dimension correctly
- It doesn't matter if different runs are of different length, or in fact if they differ in any other way, since they will be described in different documents.
- Individual Capabilities documents can be small
- Capabilities documents can easily be dynamic to reflect the "live" state of a run: only the TIME dimension needs to be updated
- (Solutions that package many related runs into a single Capabilities doc will find this harder)
- One could easily generate a Capabilities document for the "best estimates" timeseries.
- i.e. the series of past analyses, plus the latest forecast
- This can be a static endpoint
There are, of course, disadvantages, notably:
- Large numbers of Capabilities documents will be generated.
- New service endpoints will appear frequently (say, every 3 hours)
- WMS does not help much in linking these documents together, so some other entity (a catalogue maybe) needs to associate documents from the same run together.
This proposal essentially trades off the sophistication of the WMS against the sophistication of catalogues. It means that catalogues need to be able to handle large numbers of frequently-changing service endpoints, and be aware that Capabilities documents are "grouped". The catalogue would probably need to understand the semantics of the time dimension to allow users to search for information about a particular validity time, returning a set of runs that cover this time. Special clients (e.g. Met workstations) that wish to take advantage of the relationship between model runs would need to communicate with these special catalogues, however the interaction of these "special" clients and the WMS servers themselves would be entirely standard (as far as TIME is concerned anyway).
There are likely to be "community-specific" catalogues (for the met-ocean community) and "general-purpose" ones (like GEOSS). I imagine that the general-purpose catalogues (for wide audiences) would contain perhaps only the Capabilities documents representing the best-estimates time series, or some other slowly-changing endpoint. The met-ocean catalogues would need to handle the fast "churn" of Capabilities documents, maybe holding large volumes of documents. (Old forecast runs can of course be expired from the catalogue at some sensible rate.)
I believe that it is easier to handle the extra sophistication we need at the catalogue level, not at the level of WMS. But there may well be other issue I haven't considered. Discussion is of course extremely welcome!
- 14 Jan 2010
John Caron has created a very handy diagram in a poster that shows several ways in which users like to specify time in requests to a server of collections of forecast model outputs. The poster is available at:
This diagram is a big help to me in clarifying the challenge of dealing with time in data requests to a forecast model output server. It should also be noted that, in cases where the request is for "Model Run Datasets," (i.e. a vertical slice in the diagram), the THREDDS Data Server essentially implements Jon Blower's proposed approach. That is, each Forecast Dataset has it's own capabilities list. But it's also good to keep in mind the other types of requests for which the TDS can provide a capabilities list.
- 15 Jan 2010
This proposal does not seem to solve much and introduces the new complexity of the myriad of services being spawned and the recourse to a Catalog servers to track those services.
The proposal is simple. Since many of the problems are caused by attempting to encode collections of model runs in a single Capabilities document, we propose to encode only one run of a model in a single Capabilities document.
The only solution this approach actually solves is the size of the capabilities document and the benefit of that cannot realistically be evaluated yet. A semantically equivalent solution would be to have a single service which offered each model run as a separate layer. That would be entirely reasonable for non-archival services which might keep only the runs from the last fifty or hundred hours. At worse, this would only be a hundred "Layers" although, with nesting of elements it would probably be possible to reduce this. So there seems to be little need to spawn a new service endpoint for every run---the capabilities document can merely be updated. So, with the cost of a little XML, we can maintain the 'service' notion as commonly used. Note that this solution does not preclude the need for DIMENSION elements in the capabilities documents since the model runs will still have various environmental parameters being modeled.
However, more fundamentally, before worrying about where to put the semantics, we need to figure out what those semantics might actually be. What exactly will the client need to figure out to be able to use one or more MetOcean WMS servers? Only after we know that can we realistically consider the alternatives. This proposal neither resolves how a client would figure out the various services had similar data, nor what elements would be described for each layer, so the proposal has not gotten us very far.
So our focus needs to determine the semantics which need to be communicated; then we can evaluate where to put such semantics.
-- AdrianCuster - 18 Jan 2010
Thanks for your response Adrian.
The only solution this approach actually solves is the size of the capabilities document and the benefit of that cannot realistically be evaluated yet.
The intention was to avoid the necessity of coming up with a new way to encode multiple TIME dimensions. You're right that we can modify the granularity of the situation by putting multiple runs in the same document (as long as they are separate Layers), so perhaps this proposal should have been called "one Layer per run" or something like that. But experience with a few systems (THREDDS, ncWMS, COWS) suggests that having one endpoint per run is a reasonable approach, although it's not really the crux of the matter. (You might find when you work through the situation completely that a Capabilities document for even a single run will get pretty large, if all modelled phenomena are encoded, with multiple possiblities for the elevation layer, for all times.)
I should also state that this proposal allows data provides to express correctly the situation where different runs of the same forecast model have different lengths in time, perhaps because some runs failed, or perhaps because different runs deliberately proceed for different lengths of time. Some proposals involving new time DIMENSIONs cannot express this.
There will always have to be catalogues to organize Capabilities documents, so there will have to be a trade-offs around granularity, and the catalogues will have to have some semantic sophistication.
Note that this solution does not preclude the need for DIMENSION elements in the capabilities documents since the model runs will still have various environmental parameters being modeled.
I haven't seen a need for other DIMENSION elements, although perhaps they are necessary (I'm not convinced there are other true DIMENSIONS apart from elevation and time, although there may be multiple ways of expressing elevation). But anyway I believe the proposal is still useful to ensure that TIME at least can be handled correctly by existing clients.
However, more fundamentally, before worrying about where to put the semantics, we need to figure out what those semantics might actually be.
Correct, this was - as requested - a proposal about the TIME dimension only. There are lots of other semantic elements we need to worry about for WMS, but I did not deal with these.
-- JonBlower - 19 Jan 2010