CETIS 2006 Architecture

The architecture of services and mashups session will focus on the concrete deployment of (web) services in an institution. Any type of data or function is fair game, and any service integration technique considered.

To start off, we'll first take stock of recent SOA work; which e-framework service genres have been 'done', mapped to where they are used in typical institutional architectures, and what role they play there.

Then we'll move on to the core: where data integration needs to go next. For that, participants will be asked for their three most pressing data integration issues. We'll collate, discuss and rank them, before exploring solutions.

The exploration of solutions will be centred around the complexity of the technique. Starting with in-browser presentation level mashups, via deeper data integration to full on SOAP and WS* based solutions. This examination of development and deployment pain versus operational and sustainability gain involves other aspects to be examined include service deployment topologies such as Enterprise Service Bus (ESB), hub and spokes, or just plain point to point.

The planned outcome of the session, therefore, includes:
 * An up-to-date picture of which services are deployed between what systems
 * A list of services that need development and deployment next
 * An outline of the optimum service integration type for these services

= Outcomes =

The service architectures and mashups session covered three things: identifiying what kind of service genres (i.e. functionality) had already been done, identifying service genres that need more development and an appraisal of the role mash-ups can play in service deployment.

Course advertising
People used the eXchanging Course-Related Information (XCRI) schema in course advertising workflows even before it was properly finalised. Reasons why it spread as rapidly as it did are various, but include:
 * The fact that it addressed the evident and well known problem of manual replication of course advertising data in many different places
 * The fact that it concerned data that is meant to be published, rather than data that needs to be guarded against unauthorised use.
 * The fact that it's used primarily for display means that differences between initial implementations are not necessarily fatal for interoperability.
 * The fact that the source data usually sits in file stores or simple (MS Access) databases means that developing a webservice on top is relatively straightforward and doesn't require vendor involvement.

One issue, as with many of these types of data, can be the institutional change management required to access the source data, and then publish it via a web service.

Course discovery and entry requirements
Away from the course advertising side, new services are also developed to tackle the problem of how best to couple learners with courses. Making use of an XCRI based course repository, a wizard has been developed that guides the learner through the application process, partially informed by a learner's profile. It involves a service that is capable of searching the repository, and then some more specific logic aimed at providing parts of the required course application entry fields as well as guidance for filling them in.

An open question remained about how to deal with bad data that is propagated through services of this kind. Other than that, lessons learned include putting the user first, and only then building the infrastructure to make sure needs are met.

More widely, it was felt that the e-framework's role in initial service development was to record service technology that is reasonably stable and consolidated, after experimentation 'in the wild'. In this, it should focus on tools and methods of direct use to implementors now, but most of all, it should remain iterative and follow experience closely. "The journey is more important than the destination". In that sense, it is likely that any institution will have many potentially interesting databases that can be opened up with web services fairly easily. The most useful are likely to become apparent over time, and are prime candidates for dissemination via the e-framework knowledge base.

Authentication and authorisation
Implementing Shibboleth remains challenging, but it is getting deployed now. Areas that are currently explored include the ins and outs of a phased roll-out of Shibboleth using different technologies. After all, web services are not the only data integration method, and other approached will continue to be supported as well.

Doing SOA, also when it is limited to authentication and authorisation, tends to unearth all sorts of unknown data. For example, car park user data can include a number of people not known anywhere else. More widely, it is recognised that Shibboleth doesn't have all the answers with regard to identity management. Particularly with regard to a learner or staff member's wider online identities, of which the institutional, Shibboleth managed, identity is just one. The role of meta ID management frameworks such as Microsoft's InfoCard could be investigated.

Federated search & service coordination
Integrating federated search in many different applications via web services is getting to be comparatively easy now. Coordinating or orchestrating several services is not. Disaggregating a repository's functionality into services, and then combine them in new ways, for example, is still very much cutting edge work. Newer BPEL tools make the task more approachable, but sometimes a monolithic application's business logic is simply easier to use.

More widely, it should be born in mind that a lot of leading companies are not ready to deploy SOA just yet. The question is, though, what counts as 'deploying SOA': does it have to be the full scale, one-fell-swoop implementation of WS* services across the enterprise, or can it also be the piecemeal adoption of various simple lower case web services?

Timetabling and calendar
The general area of calendaring, timetabling and scheduling was the clearest data integration pain point. Laborious manual methods, many sources of data in various formats controlled by different groups, non-trivial business logic and more make it a difficult domain. Some differentiation should be applied, though. Scheduling of all resources in a general sense could well encompass more permutations than can be computed. Timetabling of classes in rooms is slightly more tractable, given that there are some programmes that can do it- assuming there are actually enough resources to meet demand. It still wouldn't be an problem that can be solved by merely stitching a few services together, though.

Exposing the raw calendaring data of people as well as other resources via simple, standard protocols such as iCalendar is a much more doable first step. From there, use cases such as those around personal learning environments can be addressed comparatively easily. There is still the issue of how to deal with confidential data, and, to a lesser extent, tentative vs. confirmed events.

Student destination data
As with course information, student careers and first destination data is gathered in many different formats and used in many different places. Furthermore, it is public data that needs to be disseminated, which makes it easy to implement technically, and also easy to 'sell' politically.

Person / Group / Membership
In many institutions, different systems can be authoritative for different kinds of groups, sometimes even at different times of the year. The VLE, for example, could well hold the most accurate set of course group data in September, before the student record service catches up in November. Also, there is an enduring need to find individual students or groups on the fly. For that reason, some extensible means of adding group semantics to Enterprise, and a general Enterprise query service could be desirable.

Mashups
The lightweight data integration technique probably needs a little more discussion to establish a satisfactory definition. Questions that are still open include:
 * Do the services that you are mashing up have to be outside of your control?
 * Does the mash-up need to add value, and if so, how do you measure that?
 * Does the mash-up need to transform data, or is mere aggregation good enough?

One important difference between workflow and mashups is that the latter has no state transfer. That is, it's no solution for service orchestration. Also, unlike the traditional WS* stack, there's little provision for security, which makes it more immediately suited for public data. The main thing, though, is to get the data out, because people will find good use for it.