This is a quick overview of BBC programme metadata, available in public for re-use.
We thought we’d share this on our blog, rather than leaving it hidden in project-only email discussions. Hopefully it’ll prove useful and interesting for others! Thanks to Chris Newell from the BBC for the original details.
The BBC’s schedules and metadata for all BBC programmes are publicly available at: http://www.bbc.co.uk/programmes (known informally as “slash programmes”)
To get machine readable data you append .json or .xml to the schedule pages and .json or .rdf to the programme pages. There’s some basic information for developers at http://www.bbc.co.uk/programmes/developers alongside a description of the “Programmes ontology“. Within the programme metadata there are ‘genre’, ‘format’, ‘subject’, ‘people’ and ‘place’ properties. The genre is a hierarchical taxonomy which is tightly controlled and represents programme genres. Every programme description includes a least one genre. The format property uses a non-hierarchical taxonomy which is tightly controlled and represents programme formats.
The subject, people and place properties aren’t hierarchical and aren’t restricted in the same way as genres and formats – new subjects, people and places are created as required by new programmes. You can get more information about any of these tags using the URL in the rdf:resource attribute (in RDF terms, the value of the po:place, po:subject or po:person properties).
For example, in this tag…
<po:place rdf:resource="/programmes/topics/musikverein#place"/>
…there is more information at: http://www.bbc.co.uk/programmes/topics/musikverein.rdf
These pages include links to DBpedia if you want to gather more metadata.
There’s a lot more to explore in the BBC metadata universe, but we thought we’d start off by sharing these basics. As Vista-TV develops, we will be evaluating different mechanisms for enriching this metadata with information derived from anonymized viewing logs, subtitles, video and other sources.