This is a pretty long post based on a chat with the BBC’s head of search. If you’re interested in search, though, I reckon it’s worth ploughing through. I really learnt a lot from talking to Matt McDonnell: he has a very interesting and very important job working right at the heart of the future BBC.
Search as a gateway to everything
Matt didn’t want me to call him ‘head of search’. It’s not his job title and it sounded like “hagiography” to him. Still, he is in charge of search and I reckon he has a reasonable claim to the title ‘most important person at the BBC’ right now. I’m pretty sure the BBC org chart doesn’t reflect that, though, and I’m also sure that there are plenty of BBC executives who’ve never heard of him.
As the old ways into BBC content fade, search becomes more important. It’s a reasonable assumption that search will be the primary gateway to all BBC content within a few years, including the stuff that goes out on the linear channels (BBC1, BBC2, Radio 1, Radio 2 etc.). The channels themselves are already losing their gateway function. Viewers and listeners are much less likely to use a channel as a way into an evening’s viewing than they were in the pre-digital era. Themes, personalities, strong programme brands: all are becoming more important than channels. This, for instance, is one of the reasons for the BBC’s growing investment in top talent: Jonathan Ross may be an expensive presenter but he’s pretty economical when considered as a gateway to BBC content (at least when he’s not on suspension for being an arse).
On iPlayer, for instance, the channels already play a reduced part in programme selection. Programmes are still organised by channel but that’s an arbitrary echo of the BBC’s org chart: there’s no good reason to classify television content by linear channel once it’s online but nervous channel controllers insist on superimposing the channel name on shows that go out on iPlayer: they fear that their carefully commissioned and scheduled content has been stirred into an undifferentiated soup of shows and that the investment they’ve made in their channel’s brand will be wasted. But users conditioned by exposure to YouTube and MySpace and Google probably don’t even see the channel ident.
Likewise, the BBC’s homepage may be one of the most important in Britain but a growing proportion of users don’t use it to locate content: they find the stuff they want via a search, either using the site’s search field or by searching at Google or Yahoo or ask.com. Sitting next to Matt at his desk in White City it was revealing to watch his own navigation habits: every page he showed me was located via a search, even pages at his own site—no bookmarks, no browsing and no typing in the address field. When search is good enough it replaces all three.
Matt’s just coming to the end of a big programme of work that will sharply reduce the emphasis on web search at bbc.co.uk. The fact is that the BBC’s early ambition to ‘own’ UK web search has probably held the Corporation back from implementing really good site search and useful content structure so this is a big relief. And here’s a truly fascinating aside: when you search the web at bbc.co.uk, the top three results are often sites selected by BBC editors (here’s an example: asthma). Until recently these results were labelled as such (something like ‘best links’) but Matt’s team just removed the label.
The high quality, editor-selected results are still there, right at the top of the list but since the label was removed the click-through rate for these links has actually gone up substantially! Users weren’t clicking on the hand-selected links because they were suspicious that they might be sponsored links. They had learnt from exposure to Google and other search engines that the ‘special’ links at the top of the list are qualitatively different from the others and were avoiding them for that reason. Fascinating and counter-intuitive.
Another major initiative from the search team involves the creation of ‘topics’ pages: useful pages of information assembled from BBC sources and elsewhere about specific subjects. Topics is still in beta: you can check out the handful of hand-coded topics pages here. Many more are planned and what’s fascinating is that about 95% of them will be automatically generated.
This is all pretty hardcore semantic web stuff. The BBC topics starts by crawling Wikipedia daily and pulling in new pages created since the last visit. Wikipedia provides authority here: confirming that a topic is real (not that it’s relevant or useful: just that it exists) and doing ‘disambiguation’—sorting out the 19 different places called Rome, for instance. If the system finds a new entry at Wikipedia it then searches the BBC for information that’s similar to the Wikipedia entry—using Wikipedia’s text as a ‘training document’. If it finds none then no page is created: the topic is obviously not of sufficient relevance. If it finds content—news stories, programme pages, whatever—it generates a new topic page. John Muth, one of the developers working on the system, says he expects there to be tens of thousands of topic pages pretty soon after launch.
The result will be thousands of new pages, an extraordinarily rich information asset that exposes a lot of authoritative BBC content that would otherwise have been neglected or even lost. This is going to be a real public service win and – let’s face it – a much better idea than trying to make bbc.co.uk a destination for web search. Live syndication of Wikipedia content will also mean that the topic pages improve as Wikipedia does (although pages needn’t use Wikipedia content). Further (the semantic web is a mighty rich and interwoven thing), people will be able to syndicate the BBC topics pages for their own use: they will be published under a Creative Commons licence like the hundreds of thousands of artist pages in the /music hierarchy. Tools will be provided and schools and libraries or even businesses will be able to build useful information resources of their own by tapping into this clever blend of content from the BBC and the commons.
And video too
This all gets even more exciting when you add the potential to search the hundreds of thousands of hours of video produced by the BBC annually. Matt’s team is currently testing a system that analyses video files, creating a transcript that can then be indexed and added to the web of content on the topic pages. The transcript can also be used to ‘chapterise’ the video itself so users can jump to a particular part of the video based on the transcript.
Let’s face it: once the BBC’s audio and video content—the Corporation’s crown jewels obviously—has been opened up to search there’s really no further argument: It’s game over. All other gateways to the BBC’s content will be officially obsolete and search will have won. Maybe I should keep my mouth shut.
I’ll admit things have been a bit quite around here for a week or so but lots has been going on in the background and I’ve got some interesting blog posts queued up: one delayed while I wait for approval from its subject, which I suppose is the kind of thing you get at a place like the BBC!
In an hour or so I’m heading over to Television Centre to talk to Roly Keating who is the new Director of Archive Content. He and right-hand man Tony Ageh (Controller, Archive Development) are in the meeting-people-and-finding-out-what-they’ve-got phase of what I am pretty sure will be a fascinating period of change in the archives. Yesterday evening I asked my Twitter followers <waves> for some questions to ask Roly. They didn’t let me down:
- how’s he planning to licence all that data he can open up?
- Does he feel like he’s got to play catch up to where the BBC were in 2003 when Dyke announced the Creative Archive?
- How does he plan to balance commercial against public interest in his valuation of archive content and its exploitation?
- What lessons can the BBC learn from the French INA archive of public service broadcast content?
- How will he make use of Tony’s unique talents…
- Ask him about the roof at Windmill Road?
- Can we have BBC Archive digitized and either streamable on a 7 day to view from 1st view basis. I’d even buy bits if I could.
- for instance I’d buy old episodes of I’m Sorry I Haven’t A Clue rather than the rubbish anthologies.
- I’d also love to be able to datamine all the metadata and the subtitle tracks. As long as the subtitles weren’t the live ones!
- “Are we archiving everything now that we’ll wish we’d archived in the future, and doing it in the right ways?”
- Could you ask Roly how the BBC is going to ‘guarantee’ that the archives are still available in 100 and 500 years time?
Sophie Walpole suggested (via the handy medium of speech) that I might ask Roly “is there any demand for archive content?”. Adrian Woolard suggested (via Yammer): “Does he have any money?” and Ant Miller “Has he any idea how big a potential user community exists in academia- and what a great test bed they would be for new services.”
This, incidentally, is why I love Twitter. The other day I went to James Cridland’s fascinating Radio at The Edge conference and, along with about a dozen others in the audience, Twittered away like mad. The resulting stream of updates (gathered together by the simple expedient of a #tag) constitutes the best coverage of the event (if you ask me) and, of course, many people followed the event through the day just by watching the #tag #rate.
I twittered throughout. I like twittering live events but mostly like to provide a comic commentary (“get a load of that syrup” kind of thing) because Twiitter obviously doesn’t allow for much depth of commentary—especially not when you’re tapping away on a mobile phone.
So I’m not 100% convinced of the value of live twittering in news or information terms—although it’s probably true to say that I ‘broke’ Hunt’s endorsement of the licence fee and of Ofcom’s view of Channels 4’s role in PSB before any other media outlet, all of whom waited until the end of the speech to file.
The main feature of Hunt’s speech was its utter blandness. Hunt is hardly a firebrand—in fact he seems to be rather uncomfortable pronouncing in the obligatory censorious tones on Brand/Ross—but I was quietly hoping for something a bit tougher than “The licence fee is here to stay because it works”.
Hunt essentially endorsed the PSB status quo: the licence fee is safe (although in the past Hunt has supported top slicing), C4’s plight was acknowledged and privatisation was ruled out, Ofcom’s view of the PSB landscape was supported. The only distinctive element was an endorsement of local and super-local media with a suggestion that the BBC might be required to support it.
I asked Hunt if he’d force the BBC to share assets with other broadcasters and media firms if his party were elected and he made a nod to Mark Thompson’s commitment to ‘partnerships’ but said it was time to do something concrete: to ‘talk turkey’ in fact.
Here are my tweets from the event.
How should the BBC have handled this incident? As your semi-official BBC openness monitor I think it appropriate that I chip in with some practical tips for dealing with maverick multi-million pound talent.
1. Let people listen to the show: Some people think the Daily Mail waited nearly a week to go large on the Ross/Brand story because by then the show had been removed from iPlayer. They think The Mail did this because the prank calls sound worse when read from a transcript than they do in their original context—giving The Mail greater control over the story. Putting the show up in a prominent place from day one and allowing it to stay up beyond the seven-day window would have neutralised that particular risk.
2. Respond openly and directly: using a blog, of course. This begins to seem like such basic stuff that I’m genuinely surprised this wasn’t done right. I’m pretty sure that if this had been Exxon or Philips or GSK a crisis response blog would have been live within ten minutes of the first complaint.
The BBC should keep such a blog live at all times just in case: it could be called something like ‘BBC Responds’. It should have an editor and all the managers involved should have author privileges (and it should be easy to assign new authors as the story develops). There’s an excellent precedent for using a blog in this way at the BBC News editors’ blog.
Like any news organisation, the BBC knew about The Mail’s story the night before it came out. Imagine how different this whole episode would have been if something as simple as this had been put up on a blog that evening:
The Daily Mail is running a story tomorrow about last week’s Russell Brand show. They’re focusing on recordings of prank calls made to actor Andrew Sachs’ answerphone. We’ve just spoken to the producers and, as of half an hour ago, the programme had received two complaints from listeners about the item. We’ll keep an eye on this story.
3. Use the blog properly. All the risks here flow from failure to communicate honestly with the people who care: refusal to provide a spokesperson, hesitation to find out what happened, complacency about the outcome, senior management silence. The risks produced by a quick response or a slightly too-frank blog post will always be dwarfed by the risks of doing nothing. Pasting up a press release or an official statement from the DG won’t do either. Requiring the managers who authorised the broadcast to explain themselves online in an informal way would.
4. Don’t listen to the lawyers. Lawyers who are (I’m guessing here) advising the BBC that a quick and honest response to this crisis would present risks should be told to shut up. If Davie had been live with a one paragraph blog post as soon as the Mail story broke, the BBC would have retained control of the story and avoided handing an already hostile press another win.
Better yet, if Ross and Brand had been required to explain themselves, to enter a dialogue with listeners, right there on the blog, we’d have understood how the gaffe occurred and they’d have more quickly understood the scale and meaning of the public’s objections.
(I made a couple of small changes and added paragraphs five and six to this entry this morning, 30 October, after I’d learnt that the item on the Russell Brand show had received two complaints from listeners before the Daily Mail story).
In a really interesting hour with Mark last week we covered a lot of ground but two really important issues, both of which I think are pretty newsworthy: access to attention data and the BBC’s speech radio archive.
The BBC is a lot like Tesco: continually accumulating useful data about you and your habits. And, like Tesco, the amount of data gathered can only increase, especially as bbc.co.uk moves to a sensible unified login scheme and adds personalisation to the site (the BBC equivalent of your Tesco Club Card).
Data captured from anonymous users is of limited value: once they’ve been persuaded to log in their data takes on a different character: data from different sites and services can be blended to form a much more complete picture. The various stations, sites and services Mark manages have the potential to capture, organise and store huge amounts of personal data:
- Tracks and artists you listen to
- Stations and programmes you like
- Genres and labels you prefer
- Comments and forum posts you’ve left
- Groups you’ve joined and created
- Annotations and metadata produced
This is the kind of stuff they call attention data. If you’ve used Last.FM you’ll understand what I mean: it’s the stuff they use to refine the selection of sounds you hear when you click play. The BBC’s attention data, though, has the potential to be much richer and more useful, since you’re likely to consume a much wider range of content via the various BBC outlets.
This data, though produced by you at your expense, is not currently yours. You can’t use it and you can’t delete it. You can’t even see it. Mark wants to change that. His vision is for a system that permits users of the BBC’s audio & music properties to get access to their own attention data and to put it to use. Mark wants a system that packages user data and presents it back to its owner in a form that can be used elsewhere: at Last.FM or iTunes for instance, or for use in a blog or as a kind of signature for social networking sites like Facebook and Bebo.
What’s powerful about all this, of course, is that we can’t now imagine the uses that this data will be put to. We’ll just have to wait for the media and tech startups, the geeks and the musos to get hold of it and start to build applications around it.
I like this. This is how enlightened organisations will deal with customer data, especially this kind of behaviour data: information about how you use the systems you interact with. It’s an example of the BBC exposing the data it creates and making it available to users without making assumptions about what they’ll do with it.
User data is a valuable asset but it’s one that belongs to its subject – that’s you. Without wishing to wander too far off topic, Mark’s plan also hooks in nicely with the wider trend away from old-fashioned CRM (‘Customer Relationship Management’) to its much groovier, network-native successor VRM (‘Vendor Relationship Management’). In a VRM world your personal data is your own and you share it only with those you trust: VRM systems will allow you to rent your data to businesses who want to sell you stuff and withdraw it whenever you feel like it. It’s appropriate for the BBC to build a user-centric, VRM-style data infrastructure.
Speech radio archive
Mark also told me about his vision for the speech radio archive: for the enormous bank of great speech radio that goes out mainly on Radio 4 but also in other corners of the BBC, including local radio. He sees this archive as one of the glories of the Corporation and wants to see it made more widely available to licence fee-payers without the arbitrary restriction of the seven-day window.
There are significant barriers to achieving a universal speech radio archive: not least the multiple, overlapping rights of programme creators and BBC Worldwide’s interest in the more commercial output – which is mostly comedy, spoken word and drama. Mark sees these as significant but surmountable and points to the deals that have been done in other parts of the Corporation to secure blanket clearance from rights owners (BBC News has such an arrangement with contributors to news output).
In any case, I think the most likely outcome of a concerted effort to open up BBC speech radio is a sort of patchwork audio archive with some items ‘greyed out’ and only available commercially but the majority available for playback in perpetuity: the kind of pragmatic arrangement that only a free content zealot could object to. Bring it on.
I’ve been watching the Electric Proms on the TV (click play above to see one of them). The whole thing turns out to be a very fine thing: another example of what you get when you crunch together the BBC’s guaranteed audience with unparalleled cultural clout and production values to die for: would any artist in the world have refused the opportunity to appear?
So I’m left with a few questions: I suspect that some of my readers may be able to help with these:
- Who owns the rights to the concert recordings? Were they purchased on terms that will allow the BBC to make further use of them?
- Will they be available to licence fee-payers in perpetuity in some kind of archive or will they be subject to the seven-day catch-up window like last year’s? [answering my own question: this press release says the seven day window applies].
- Was there any discussion of trying out new licencing methods or was the old recording industry model the only one on offer? Did the BBC consider using its influence to encourage more permissive/open arrangements?
- Has any of the secondary material—the BBC New Music Shorts, for example—been commissioned on a CC basis or similar?
- Does anyone care about all this? The concerts were great (So far I’ve loved The Streets and Nitin Sawney best) but would anyone be bothered if they disappeared into a vault next week? Should we just accept that the old-world rights regime that’s so transparently broken elsewhere should apply here?
- If this year’s concerts go the way of last year’s and disappear in seven days, would the BBC consider trying a different model next year: explicitly advancing a CC-based festival, for instance. The Open Proms, maybe?
…with Mark Friend about attention data, syndication and opening the BBC’s huge archive of speech radio.
…producer based in Jersey suggests on the internal Yammer system that the corporation’s libel course be published on the web site. He’s weary of deleting comments from the general public that are potentially libelous and thinks that encouraging people to read the BBC’s staff libel course might help…
…about cataloguing the BBC’s efforts at openness—using something like the ‘programmes ontology‘ developed by Tom Scott? I want to leave behind something that might have some formal value: an organised catalogue of assets, content, code and activities that have been explicitly shared with the outside world would be handy wouldn’t it?
Interesting chat in the Broadcast Centre cafeteria with Lucy Hooberman from BBC Research & Innovation (which is part of Future Media & Technology). She’s not an engineer: she’s one of the people in the organisation responsible for stimulating innovation and change. We talked about Common Platform. She’s been wondering what it is. Specifically, is it:
- A neutral record of change at the BBC?
- A campaign?
- A programme of activity aimed at driving that change?
Could it be all three?
Lucy also wondered how it might relate to the campaigning style of some of the proponents of Ofcom’s now-defunct Public Service Publisher concept a couple of years ago (I participated in Ofcom’s mock tender process for the PSP—fascinating). I think she’s warning me that campaigning for things is a good way to alienate people—even if the idea is the right one—especially at a complicated place like the BBC and sometimes in the wider industry.
Point taken Lucy. As to those questions: I hope to record what I learn (confidentiality permitting). I also have a fairly explicit ‘agenda’: I believe the BBC should open up to the people and communities that fund it and that the Corporation has a enabling role to play in economy and society (so I guess it’s a campaign of sorts). And I want to set up and leave behind some interesting activity, something that’ll go on working after I’ve gone. All three then!