Feed deposit/Which DC elements

From CETISwiki

Jump to: navigation, search

Barry Cornelius, OUCS, University of Oxford, mailto:barry.cornelius@oucs.ox.ac.uk
created: 13th May 2010

Contents

Introduction

The University of Oxford has been providing RSS for feeds for several years. An example of the kind of RSS we generate for our normal RSS feeds is http://rss.oucs.ox.ac.uk/engfac/censorship-audio/rss20.xml. Early in 2010, we looked at how we could provide an RSS feed that could be consumed by JorumOpen. An example of what we are generating is http://rss.oucs.ox.ac.uk/engfac/censorship-audio/rss20.xml?destination=jorumopen. As documented at http://podcasts.ox.ac.uk/openspires.html, we currently have 45 feeds whose contents could be recorded in JorumOpen.

This document aims to point out some of the issues we have when providing RSS for JorumOpen. It would be great if there were a web page identifying the elements that should be used, which ones are mandatory/optional and what the elements should contain. The final section of this document aims to provide a suggestion for this but you may wish to argue with these suggestions!

Back in February 2010, in an e-mail, Laura Shaw gave an example of a test feed and when producing feeds for JorumOpen we based many of our decisions on that example.

I see the CETIS "Feed deposit" page http://wiki.cetis.ac.uk/Feed_deposit refers to the "Best Practices: RSS" web page that is at http://www.ocwconsortium.org/share/best-practices-rss.html. I will refer to the "Best Practices: RSS" web page below.

That in turn gives an example of an RSS 2.0 feed that is appropriate for JorumOpen. That feed is at http://openlearn.open.ac.uk/file.php/1/learningspace.xml. I will refer to this learningspace example below.

The last link on the web page http://community.jorum.ac.uk/course/view.php?id=49&topic=2 gives an example of an RSS feed which has been designed and successfully imported into JorumOpen. This feed was provided by the University of Nottingham. The feed is at http://community.jorum.ac.uk/mod/resource/view.php?id=248. I will also refer to this Nottingham example below.

Elements used for the feed

First, I will go through the various elements that are provided for the feed itself. So these will be child elements of the channel element. For each element, I will give an example from the Oxford feed http://rss.oucs.ox.ac.uk/engfac/censorship-audio/rss20.xml?destination=jorumopen


title

<title>Censorship in Literature in South Africa</title>

Should we be using title or dc:title? This applies both for the title of the feed and for the title of each item. The RSS specification says that a feed must have a title element and an item must have either a title or a description element. The feedvalidator does not object if the feed or an item has both title and dc:title elements.

The learningspace example has a title element for the feed and both title and dc:title for each item.

For both the channel and each item, Oxford has chosen to use RSS's title element rather than using the dc:title element. This is also what is done in the Nottingham example.

link

<link>http://rss.oucs.ox.ac.uk/engfac/censorship-audio/rss20.xml?destination=jorumopen</link>

What should we link to? We could either use the URL of the feed or the URL of a web page that is associated with the feed. For example, for our Admissions podcasts, we could use something like:
<link>http://rss.oucs.ox.ac.uk/offices/undergrad-podcasts/rss20.xml</link>
or:
<link>http://www.admissions.ox.ac.uk/podcasts/</link>
I guess it depends on how this information will be used by JorumOpen.

Both the learningspace and the Nottingham examples have a link element that points to a web page. However, Oxford has chosen to use the URL of the feed.

Aside: the feed validator will give a warning if the channel element does not contain an atom:link that gives the URL of the feed. An example is:
<atom:link rel="self" type="application/rss+xml" href="http://rss.oucs.ox.ac.uk/engfac/censorship-audio/rss20.xml?destination=jorumopen"/>
In its feeds for JorumOpen, Oxford includes the atom:link element as well as the link element. This is also done in the Nottingham example.

Aside: it may also be appropriate to identify the JorumOpen Collection that a feed belongs to. This could be done using:
<category domain="http://www.jisc.ac.uk/oer/Collections/">HE: Business and Administrative studies</category>

description

<description>The issues surrounding ... South Africa.</description>
<dc:description>The issues surrounding ... South Africa.</dc:description>

Although RSS has a description element, in an e-mail Gareth Waller said that the description should be in the relevant element of the metadata namespace. Hence, we concluded that we have to use dc:description.

However, the feed is invalid if we leave out the RSS description element. So for the feed Oxford has chosen to include both a description element and a dc:description element.

For each item, Oxford only provides a dc:description (as this is OK as far as validity is concerned).

For the feed the learningspace example has only a description element and for each item it provides both a description and a dc:description element. This is also done in the Nottingham example. So they do the complete opposite of what Oxford do.

I see the "Best Practices: RSS" web page says that the description element for both feed and each item can be the RSS description element. So there is no need to include a dc:description element. That is what we would prefer to do.

date

<dc:date>2009-11-18T09:33:41Z</dc:date>

The web page http://dublincore.org/documents/dces/ says that a dc:date element provides a point or period of time associated with an event in the lifecycle of the resource. Well that's a free for all!

Oxford's normal RSS feeds only have lastBuildDate (which is when the feed was last updated). So for the RSS feeds for JorumOpen we are providing a dc:date element that says when the feed was last updated.

The learningspace example has lastBuildDate, pubDate and dc:date elements that have the same contents and that seem to say when the feed was last updated.

Although the example given by Laura Shaw last February did not include a date for each item, Oxford have chosen to add a dc:date element to each item stating when the item was last updated. For each item the learningspace example has a pubDate element but no dc:date element. And the Nottingham example provides both.

So a JorumOpen feed from Oxford provides the date when the item was last updated. This might, for example, be the date when a correction was made to the title of the item. It is debatable as to how relevant this date is. We are proposing to acquire the date the recording was made. This then leads to numerous possibilities about what to expose in the RSS for each item:

  1. when the recording was made;
  2. when the recording was last updated;
  3. when the item of the feed was created;
  4. when the item of the feed was last updated.

We are currently just providing the latter, but may change to replacing that by the first of these if it is known. Is it possible in DC to provide more than one date? If so what is the RSS for this?

contributor/publisher/creator

<dc:contributor>University of Oxford</dc:contributor>

For a feed, Oxford provide a dc:contributor element because the feed provided in Laura's example provided a dc:contributor element.

For a feed, both the learningspace and the Nottingham examples use dc:publisher instead of dc:contributor.

For a feed, the "Best Practices: RSS" web page also suggests using dc:publisher.

For each item, Oxford provide one dc:contributor element for each presenter, e.g.:
<dc:contributor>Mike Nicholson</dc:contributor>
<dc:contributor>Emma Coulston</dc:contributor>
<dc:contributor>Sinead Gallagher</dc:contributor>

For each item, the learningspace example provides:
<dc:publisher>The Open University</dc:publisher>
<dc:creator>The Open University</dc:creator>
Similar elements are also provided in the Nottingham example.

For each item, the "Best Practices: RSS" web page suggests using dc:publisher to name the site publishing the feed and dc:creator to identify "the person or entity primarily responsible for creating the course content".

So that would suggest that instead of:
<dc:contributor>Mike Nicholson</dc:contributor>
<dc:contributor>Emma Coulston</dc:contributor>
<dc:contributor>Sinead Gallagher</dc:contributor>
Oxford should be providing:
<dc:publisher>University of Oxford</dc:publisher>
<dc:creator>Mike Nicholson</dc:creator>
<dc:creator>Emma Coulston</dc:creator>
<dc:creator>Sinead Gallagher</dc:creator>

Another issue in this area is how the contents of the dc:publisher/dc:creator element should be formatted. At the moment, JorumOpen uses a mix of formats. On 11th February 2010, at http://community.jorum.ac.uk/mod/forum/discuss.php?d=92, Ed Bremner wrote Looking at resources within JorumOPEN, I see that there is little consistency with how the 'Author' field is used. I can find:

I can also find examples where the 2nd author field is used to hold institute name. Does JorumOPEN have any suggested best practice?

The reply (from Nicoloa Siminson, Jorum Community Enhancement officer) was thanks for raising this issue! We're not suggesting any best practice at this stage - but we are currently undertaking a cataloguing evaluation for the metadata profile being used with JorumOpen, and I will pass your findings on to my colleagues. The evaluation is due to run until March this year, after which I hope we will be able to share our findings more widely, e.g. with the UKOER Programme projects.

The Nottingham example uses:
<dc:creator>Hoffmann Robert Dr</dc:creator>
but with multi-worded surnames this will be ambiguous. Surely,
<dc:creator>Hoffmann, Dr Robert</dc:creator>
would be better.

Above, I said that the learningspace example uses:
<dc:publisher>The Open University</dc:publisher>
and that Oxford use:
<dc:contributor>University of Oxford</dc:contributor>
When an affiliation appears in a dc:publisher/dc:creator element, should one use:
<dc:publisher>Open University</dc:publisher>
<dc:publisher>Oxford University</dc:publisher>
so that in any list the item appears under O rather than T or U?

language

<dc:language>en-gb</dc:language>

Although RSS has a language element, Laura's example seem to suggest that we use a dc:language element. This is also recommended in the "Best Practices: RSS" web page. The learningspace example provides both language and dc:language. Like Oxford, the Nottingham example just provides a dc:language element.


rights/license

<dc:rights>http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</dc:rights>
<creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</creativeCommons:license>

In the past, I have queried whether it is a misuse of the dc:rights element to be using this element to convey licensing information. I thought its function was to convey copyright information. Aside: I think there has been a proposal to add a dc:license element to Dublin Core.

I would be happier to use:
<cc:license>http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</cc:license>
and possibly
<dc:rights>Copyright. University of Oxford</dc:rights>
as well.

Today I've discovered that this is exactly what is recommended by the "Best Practices: RSS" web page. Yeah!

For the dc:rights element that appears in the channel element, Laura's example uses a URL whereas her example of dc:rights for the item uses a CDATA section containing some text coded in HTML (that includes the URL):
<dc:rights><![CDATA[ Licence info in a CDATA section with html <a target=\"blank\" href=\"http://creativecommons.org/licenses/by-nc-sa/2.0/uk/\">Creative Commons Attribution-NonCommercial-ShareAlike UK 2.0 Licence (BY-NC-SA)</a> ]]></dc:rights>

For both the feed and each item, Oxford is using:
<dc:rights>http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</dc:rights>
We add this to the channel if all the recordings have this licence. We also add this to each item if the recording associated with the item has this licence. It is unclear to me as to what JorumOpen will do with a feed that has a mix of CC and non-CC recordings, i.e., what happens if the channel element does not have a dc:rights element. However, maybe we will only be supplying them with feeds where all the recordings have a CC licence.

For both the feed and for each item, the learningspace example uses:
<dc:rights>Licensed under a Creative Commons Attribution - NonCommercial-ShareAlike 2.0 Licence - see http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</dc:rights>
<cc:license>Licensed under a Creative Commons Attribution - NonCommercial-ShareAlike 2.0 Licence - see http://creativecommons.org/licenses/by-nc-sa/2.0/uk/</cc:license>

For the feed, the Nottingham example uses:
<copyright><![CDATA[Except for third party materials (materials owned by someone other than The University of Nottingham) and where otherwise indicated, the copyright in the content provided in this resource is owned by The University of Nottingham and licensed under a <a target="blank" href="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/">Creative Commons Attribution-NonCommercial-ShareAlike UK 2.0 Licence (BY-NC-SA)</a>]]></copyright>
For each item, the Nottingham example uses:
<dc:rights><![CDATA[Except for third party materials (materials owned by someone other than The University of Nottingham) and where otherwise indicated, the copyright in the content provided in this resource is owned by The University of Nottingham and licensed under a <a target="blank" href="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/">Creative Commons Attribution-NonCommercial-ShareAlike UK 2.0 Licence (BY-NC-SA)</a>]]></dc:rights>

Elements used for each item

So those are the elements that are provided for a feed. I'll now move on to look at elements that are provided for each item. Above we have already looked at the title, description, date, contributor/publisher/creator and rights/license elements and so I won't do those again!


subject

<dc:subject>censorship</dc:subject>
<dc:subject>South Africa</dc:subject>
<dc:subject>literature</dc:subject>
<dc:subject>apartheid</dc:subject>

Although RSS has a category element, we have to use the dc:subject element.

Oxford ought also to provide:
<dc:subject>ukoer</dc:subject>
The learningspace and Nottingham examples do this. Instead, at the moment, Oxford is providing:
<category domain="http://www.jisc.ac.uk/oer/">ukoer</category>
One of the advantages of RSS's category element is that you can tie a keyword to a taxonomy.

Again we also ought to be providing a JACS code:
<dc:subject>Q323</dc:subject>
Again I would prefer to provide:
<category domain="http://www.jisc.ac.uk/oer/jacs/">Q323</category>
The learningspace and Nottingham examples do not provide a JACS code.

format

<dc:format>audio/mpeg</dc:format>

This is a mime type. I presume it is the mime type of the resource.


identifier

<dc:identifier>http://rss.oucs.ox.ac.uk/tag:2010-01-12:134103:320:engfac/censorship-audio</dc:identifier>

Oxford's normal RSS feeds already output an RSS guid element and so we use the contents of that. The feed validator at feedvalidator.org will give a warning if an item does not contain an RSS guid element. So each item of Oxford's JorumOpen RSS feeds has both a guid element and a dc:identifier element. This is also done in the learningspace example. The Nottingham example provides only a guid element.

link/relation

<link>http://www.ox.ac.uk/media/student_life_at_oxford.mp3</link>

This is a link to the resource.

<dc:relation>http://www.ox.ac.uk/admissions/undergraduate_courses/finding_out_more/podcasts/second_episode.html</dc:relation>

I think it is the URL of something that is related to the item. So I guess Oxford ought to provide what appears in the link element of an item of an normal Oxford RSS feed. Unfortunately, we are unable to do this as we don't have this information!

If we were able to do this, the result would be weird. Oxford's normal RSS feeds have:
<enclosure url="http://www.ox.ac.uk/media/student_life_at_oxford.mp3" length="13716212" type="audio/mpeg"/>
<link>http://www.ox.ac.uk/admissions/undergraduate_courses/finding_out_more/podcasts/second_episode.html</link>
Here the the enclosure element points to the recording and the link element points to an associated web page.

For JorumOpen, we should be using:
<link>http://www.ox.ac.uk/media/student_life_at_oxford.mp3</link>
<dc:relation>http://www.ox.ac.uk/admissions/undergraduate_courses/finding_out_more/podcasts/second_episode.html</dc:relation>
Here the the link element points to the recording whereas the dc:relation element points to the associated web page.

So the link element changes its purpose!


type

<dc:type>...</dc:type>

This describes the type of the content, e.g., Course. We are not providing this.


source

<dc:source>...</dc:source>

I'm not sure what this is. We are not providing this.

A suggestion

Earlier I said it would be useful if there were a web page identifying the elements that should be used, which ones are mandatory/optional and what the elements should contain. Here is a suggestion for this. You may wish to argue with these suggestions!

Feed

elementtypecontentshow many times
titlestringthe title of the feedonce
linkURLa web page associated with the feed0 or 1
descriptionstringa description of the feed0 or 1
dc:dateISO 8601the date when the feed was last updatedonce
dc:publishersortable institution stringthe name of the institution0 or 1
dc:languageRFC 5646the primary language of the feed0 or 1
dc:rightsstringa copyright statement for the feed0 or 1
cc:licenseURLthe URL of a Creative Commons licence for the feed0 or 1


A sortable institution string is a string that sorts well, e.g., Oxford University rather than University of Oxford.

All the elements belong to the core RSS notation apart from those with a dc namespace (http://purl.org/dc/elements/1.1/) or a cc namespace (http://backend.userland.com/creativeCommonsRssModule).

Item

elementtypecontentshow many times
titlestringthe title of the itemonce
descriptionstringa description of the item0 or 1
dc:dateISO 8601the date when any aspect of the item was last updatedonce
dc:publishersortable institution stringthe name of the institution0 or 1
dc:creatorsortable person stringthe name of an author/creator/presenter0 or more
dc:languageRFC 5646the primary language of the resource0 or 1
dc:rightsstringa copyright statement for the resource0 or 1
cc:licenseURLthe URL of a Creative Commons licence for the resource0 or 1
dc:subjectcomma-less stringa keyword0 or more
dc:subjectcomma-less stringthe string ukoeronce
dc:subjectcomma-less stringa JACS code for the resource0 or more
dc:formatRFC 2046the MIME type of the resourceonce
guidstringa unique identifier for the itemonce
linkURLthe URL of the resourceonce
dc:relationURLthe URL of a related item0 or more
dc:typestring?0 or more
dc:sourcestring?0 or 1


A sortable institution string is a string that sorts well, e.g., Oxford University rather than University of Oxford.

A sortable person string is a string that sorts well, e.g., Vaughan Williams, Mr Ralph rather than Mr Ralph Vaughan Williams.

All the elements belong to the core RSS notation apart from those with a dc namespace (http://purl.org/dc/elements/1.1/) or a cc namespace (http://backend.userland.com/creativeCommonsRssModule).

It would also be good to have:

elementtypecontentshow many times
dc:dateISO 8601the date when the resource was first madeonce