SemTech 2011: Where Semantics Meet the Real World

Jun 09, 2011

What a difference a year makes. Talk at the 2010 Semantic Technology conference tended toward the theoretical, with speakers touting the benefits of semantic web solutions that seemed a perfect fit for publishers in a highly competitive marketplace: increased flexibility of content, streamlined production systems, enhanced content personalization, and improved archive utilization. But for the most part, speakers and sessions presented the opportunity from the vendor side of the equation.

At SemTech 2011-held this week in San Francisco-however, John O'Donovan, director of architecture and development for the UK's Press Association delivered a keynote called "Semantics in News, Sport and Media: A Compelling Case and New Architectural Pattern for Semantics in Every Enterprise," and it showed that the promise of semantics is finally making its way into publisher workflows and onto the end user desktop. O'Donovan shared his experience using semantic technology as the chief architect of the British Broadcasting Corporation's (BBC) web presentation of World Cup 2010.

As O'Donovan explained the challenge, "We had content coming from multiple sources. We were looking to reduce costs in the face of increased complexity, and the content had to link together and drive traffic to the site." The World Cup site was intended to be large scale, with separate pages for each player, match, and team, and that information would be updated on a continuous basis. O'Donovan concluded by saying, "It would have been impossible to do this without semantics."

O'Donovan described a framework that included both a content store, where articles, maps, match results, photos, and videos are warehoused, and a metadata store, where the semantic attributes thereof are kept separately. The "triples" in the metadata store-"triples" being a way to match a subject with predicate and object, each of which describe a particular aspect of the subject-can then be added to and expanded upon, to account for additional facets of the assets in the content store that may emerge over time. Pages to match user queries are then created on the fly, based on real-time semantic understanding of the content store assets.

In a follow up session called Dynamic Semantic Publishing, senior technical architect in the Future Media and Technology division of the BBC, Jem Rayfield, provided further quantification of the scope of data that the World Cup 2010 project had to deliver, in an effort that took three months from start to finish: 750 pages averaging over 2 million views per day, with spikes as high as 6-10 million views per day.

One of the biggest challenges Rayfield mentioned was making it simple for journalists to annotate their stories; he demonstrated a tool in use at BBC called Graffiti, which allows journalists to select from a pre-populated set of tags and disambiguate terms that may have multiple meanings. Semantics enables inferred tagging as well, to save the journalist time. If the journalist tags a player, then team, match, and venue can be instantly applied. The semantic approach is now being used to underpin BBC's web coverage of the Euro Cup 2012 as well as the 2012 Olympics in London, the complexity of which may make World Cup 2010 seem like just a warm-up.

The BBC team-as well as media professionals like Michael Dunn, chief technical officer of Hearst Interactive Media, and Fernando Carolo, product owner of search and semantic tools for traded tips on how to overcome management resistance to semantics during a panel discussion on The New Content Ecosystem, moderated by Rachel Lovinger of Razorfish. A presentation by Vishal Gupta of Elsevier explored the ways in which semantics are improving discovery across the entire corpus of Science Direct content, in partnership with technical partner PureDiscovery.

If the lineup at SemTech 11 is an indication, it's clear that publishers are now shaping the future of semantics as much as being influenced by it.