Additional aggregation work for UKOER pilot programme

As part of the additional technical work for UKOER pilot programme that CETIS undertook in the summer of 2010 we investigated three approaches to facilitating the provision of aggregations of resources representing the whole or a subset of the UKOER programme output (possibly along with resources from other sources). Aggregation here does not mean RSS aggregation: Steeple's Ensemble and Nottingham Uni's Xpert (to name just two examples) had already shown how that could be used in this context. So this work focussed on approaches namely aggregation of services, or more specifically cross-search. Three approaches were tried:
 * 1) A Google custom search of sites known to be relevant
 * 2) Querying and relevant services and aggregating the results through a Yahoo pipe
 * 3) Querying relevant services through their APIs and aggregating the results as a webpage.

A presentation based on this work was given by Lisa Rogers at the Open Educational Resources International Symposium (UKOER10), London, July 2010.

Also in connexion with this a developer event on "OER Gathering" was organised as part of CETIS's ongoing core activities.

Google Custom Search
A Google Custom Search Engine (Google CSE) allows one to use Google to search only certain selected pages. The pages to be searched can be specfied as individually or as URL patterns identifying a part of a site or an entire site. Furthermore the search query can be modified by adding terms to those entered by the user.

The custom search engine can be accessed through a search box that can be hosted on Google or embedded in a web page, blog etc. Likewise, the search results page can be presented on Google or embedded in another site. Embedding of both search box and results page utilises javascript hosted on the Google site.

The pages to be searched can be specified either directly by entering the URL patterns via the Google CSE interface, listed in an XML or TSV (tab separated variable) file which is uploaded to the Google CSE site, or as a feed from any external site. This latter option offers powerful possibilities for dynamic or collective creation of Custom Search Engines, especially since Google provide a javascript snippet which will use the links on a page as the list of URLs to search. So, for example a group of people could have access to a wiki on which they list the sites they wish to search and thus build a CSE for their shared interest, whatever that may be

A refinement that is sometimes useful is to label pages or sites that are searched. Labels might refer to sub-topics of the theme of the custom search engine or to some other facet such as resource type. So a custom search engine for engineering OERs might label pages as different branches of engineering {mechnical, electronic, civil, chemical, ...} or type of resource to be found {presentation, image, movie, simulation, articles, ...}. In practice, whatever the categorisation chosen for labels, there will often be pages or sites that mix resource from different categories, so use of this feature requires thought as to how to handle this.

A Google CSE for UKOER
Our example of a simple Google CSE can be found hosted on Google

This works as a google search limited to pages at the domains listed below with the term '+UKOER' added to anything entered by the user. The '+' in the added term means that only those pages which contain the term UKOER are returned. This is possible since the programme mandated that all resources should be associated with the tag UKOER. Each site was labelled so that after searching the user could limit results to those found on any one site (e.g. just those on Jorum Open). The domains searched are:
 * http://open.jorum.ac.uk/
 * http://www.vimeo.com/
 * http://www.youtube.com/
 * http://www.slideshare.net/
 * http://www.scribd.com/
 * http://www.flickr.com/
 * http://repository.leedsmet.ac.uk/main/
 * http://openspires.oucs.ox.ac.uk/
 * http://unow.nottingham.ac.uk/
 * https://open.exeter.ac.uk/repository/
 * http://web.anglia.ac.uk/numbers/
 * http://www.multimediatrainingvideos.com/
 * http://www.cs.york.ac.uk/jbb
 * http://www.simshare.org.uk/
 * http://fetlar.bham.ac.uk/repository/
 * http://open.cumbria.ac.uk/moodle/
 * http://skillsforscientists.pbworks.com/
 * http://core.materials.ac.uk/search/
 * http://www.humbox.ac.uk/

These were chosen as they were known to be used by a number of UKOER projects for disseminating resources. We must stress that these are meant to be illustrative of sites where UKOER resources may be found, they are definitely not intended to be a complete or even a sufficient set of sites.

This is the simplest option, the configuration files are hosted on Google and managed through forms on the Google website. Expanding it to cover other web sites requires being given permission to contribute by the original creator and then adding URLs as required.

Reflections
Setting up this search engine was almost trivially easy. Embedding it in a website is also straightforward (Google provides code snippets to cut and paste.

The approach will only be selective for OERs if those resources can be identified through a term or tag added to the user-entered search query or if it can be selected through a specific URL pattern (including the case where a site is wholly or predominantly OERs). This wasn't always the case.

Importantly, not all expected results appear, this is possibly because the resources on these sites aren't tagged as UKOER or may be due to the pages not being indexed by Google, however sometimes the omission seems inexplicable. For example a search for "dental" limited to the Core materials website on Google yields the right results the equivalent search on the CSE yields no results.

While hosting the configuration files off-google and editing them as XML files or modifying the programmatically allows some interesting refinement of the approach we found this to be less easy. One difficulty is that the documentation on Google is somewhat fragmented and frequently confusing. Different parts of it seem to have been added by different people and different times, and it was often the case that a "link to more information" about something we were trying to do failed to resolve the difficulty that had been encountered. This was compounded by some unpredictable behaviour which may have been caused by caching (maybe on serving the configuration files, or Google reading them, or Google serving the results), or by delays in updating the indexes for the search engine, which made testing changes to the configuration files difficult. These difficulties can be overcome, but we were unconvinced that there would be much benefit in this case and so concentrated our effort elsewhere.

Conclusions
If it works for the sites you're interested in, we recommend the simple Google custom searches as very quick method for providing a search for a subset of resources from across a specified range of hosts. We reserve judgement on the facility for creating dynamic search engines by hosting the configurations files on ones own server.

Yahoo pipes
Yahoo pipes provide a simple way of aggregating, filtering and mashing content from around the web. Through a simple graphical user interface, the user can specify data sources from the web and operations to be carried out on the data, thus allowing what has been referred to as programming without coding. The data sources are feeds in the broadest sense, that is as well as RSS and ATOM feed, the application can read iCal, CSV (comma separated values), or arbitrary XML and JSON. Single values can also be provided by user input at a prompt. There is a wide range of operations that may be performed on the data at the feed, item or string level; these include merging feeds, filtering on a data element, setting or modifying data elements, and performing queries using the Yahoo Query Language (YQL). Data is fetched and manipulated using modules, a full list of which is available from http://pipes.yahoo.com/pipes/docs?doc=modules. Modules may be linked together so that the output of one module becomes the input of another, in other words data is piped from one operation to another, using the graphical user interface. The final output is in the form of a feed, i.e. can be obtained in a range of formats similar to those listed above for input. Importantly, "pipes" may be shared by making the source open and allowing it to be cloned by others. This allows the beginner to start making their own pipes by modifying one by someone else which does something similar to what they want to achieve.

John Robertson has written from a beginners point of view about how Pipes can be used to aggregate RSS feeds, whereas Tony Hirst has blogged about many more advanced uses of Yahoo Pipes; between them these two illustrate how easy it is to get started with pipes and how powerful they can be.