SXSW 2005 Panel: The Semantic Web

Friday, March 18, 2005

[This is a (mostly complete) transcript of Mike Linksvayer, David Galbraith, Cameron Marlow, Steve Champeon, and Matthew Haughey talking about using data and research to improve design. Official details about the panel here.]

Haughey: [Details about the Creative Commons party.] This panel’s semantic web. For the past three years I’ve worked on a project with a bunch of people making a semantic web application. I’ve been a skeptic ever since then. Three years to do a tiny piece, so how’s the entire web going to be prescribed by really intense standards?

Haughey: I want to start off really quick. I pitched this as… I don’t do much semantic web stuff. Barely read. It’s a crazy world and there are so many standards on top of standards. But. [Gives definition really fast… Ack.] I also run Metafilter. I have tons of content. Millions of records in a database. I’m sort of a skeptic, I’ll say it out front. I just wanted ot assemble some people and see what there thoughts were. We’ll start with introductions and a quick statement.

Galbraith: I was interested in tags and how they might enable some of the things in RSS. I’m a skeptic, as well. RDF is very old and not a single RSS reader that will read metadata. Nor a simple way to create an RSS module without sitting down in a committee. Why can’t I filter stuff in an aggregator by price? Tagging. That seems like something people are willing to do. If I define a flag in Flickr or whatever. Tag it Apple. Is that the company? Fruit? So I was interested in letting people use name=value pair. Fruit=apple. And that’s a mess. But if a bunch of people start tagging things intelligently, then maybe that tag will be bumped up into an RSS module. So I’m interested in the bottom-up approach.

Marlow: I’d love to say I’m not a skeptic. But my interest comes from my background in the field formerly known as knowledge representation, now known as the semantic web. It’s really building upon a whole literature of knowledge representation. Which never solved all the problems it was supposed to. I’ve a skeptic because I’ve never really seen an application that takes advantage of this sort of knowledge. As a tool developer, I’m constantly looking for uses for people’s data. My conception of the SW is that it’s mostly defined by data that exists that computers can take advantage of. The reality is that it’s the glue that holds these different pieces of information together so the computer can make inferences about how to deal with them.

Linksvayer: Besides working at CC. I also did a thing called BT [?]. I wanted to draw paralells between SW and AI. AI was to produce intelligent machines. SW tech has been around for a relatively short period of time. May be used all over, but not recognized as such. Java. Applets would revolutionize the web. People did make many applets, but Java has been hugely successful as the 21st century version of COBOL. [Etc.] SW may be following a common path. Initial failure to succeed in the “sexy” domain, but useful for business purposes. Oracle has included a kind of SW in their recent products. [This is hard to follow, what he’s saying. Sorry.] But. There are reasons to think SW might be different. If you’re looking at data on the web, what is it but agiant data integration problem? A good example is MSpace. You can see that they’re using RDF to integrate data from a bunch of different sources. Users don’t see that, but it’s there. They’re pulling in a Classical music ontology by someone else from, where you’ll find dozens of schemas and RDF datasets developed by people from all over, detailing beer, wine, veg food, and stuff you would put in your mouth. [Matt shows MSpace browser on the screen.] By design the SW is meant to be decentralized and not controlled by the W3C.

Champeon: I got into this business in 1993. Unemployed liberal arts major. Etc. I stumbled into SGML-based document conversion on the day I was laid off from the t-shirt printer. Which was great. They were trying to land some contract. We were tagging court data so it could be searched upon in various ways and I was really impressed by this because people put a lot of thought into these rules and how you could add structure and meaning to this information. We thought it would never last because it was too simple and generic. SGML was too complex, XML fixed much of that. I couldn’t get away from the idea that you should be able to mark up any language. What we got at first were people using the tags wrong, trying to lay out their page some how and just screwing up. So they started talking about SW and I thought: Cool. They’ve fixed a lot of the problems. And then it started to get muddy. On the issue of what he was saying about “Apple.” I have an iPod. And there’s an integration kit for iPod and Mini, the car. Apple introduces the mini. Then the Mac mini. So you can’t find the Mini iPod integration kit anymore. So if the tech doesn’t give you a way to apply context, you’ll have trouble attaching meaning to a document in a larger context. Because the meaning of words evolves over time, as well. Context dictates the semantics your going to apply to any given chunk of data. So all of these further complexifications don’t tackle the core problem.

Haughey: I wanted to throw out a few topics. Semantic Web (upper case): A rich data description. Semantic web (lower-case): Adding meta-data to information. So I want to ask: Is lower-case SW a gateway drug to upper-case SW?

Galbraith: Yes, absolutely. RDF is defined as a data model. There’s nothing to say that in the grassroots sense things might gravitate towards a similar model. In a Darwinian sort of way. Initially the W3C did a very good job with some real successes with standards like HTML, but I think all the XHTML/CSS stuff is bullshit. There’s a fundamental structural problem. In SW, I tihnk the premise is right.

Marlow: Riffing off of this, the web as it stands is a semantic web. Language. The reason we’re making this parralel web is because we think computers are to stupid to understand this language. But there are companies out there making leaps and bounds actually reading this language that people write. Just because the computer can’t understand it now isn’t a limitation on the data, it’s a limitation of the computer.

Linksvayer: I wanted to spell out something that keeps coming up. If you want to use the SW it requires going through a committee. That’s not the case at all. It’s designed to be decentralized. So the lower-case sw might be a good way of bootstrapping into integrations with the upper-case SW.

Haughey: I looked around the SW and it’s kind of born of an academic world. Do you think it’s being held back for lack of money?

Champeon: SGML was a defence dept thing that came out of IBM. People were involved because there was so much stuff to convert and they wanted to make a million dollars. When HTML hit, a lot of them went to the W3C with their pet project and ended up working on some extension or related technology that evolved from HTML. Or something.

Galbraith: The interesting stuff is happening in the enterprise at the moment. There’s an XML database that would make something like Technorati fantastic. But they’re using it for Boeing who have their own XML documents.

Champeon: Many companies that have lots of data are still the ones using it. Like IBM had ITDOC [?]. […] There’s the pure vision and then the awkward reality.

Haughey: Low-hanging fruit? Quick win?

Galbraith: Taking tags to the next level. Fruit=apple. Then tools to allow people to make their own RSS modules. And an aggregator that uses this information. None do right now. If you wnat Bloglines to display your RSS feed from a custom module, it’ll be blank. All that data is thrown away.

Marlow: I’d probably agree. The big win the short term is taking the hype and the use of these data structures and creating some way of turning those into something real semantic web applications can use. The people who can do this right now may not be able to actually make the tag definitions. But if you let people at large make these definitions, then you’ll get a much nicer emergent systems. Semantic middle-ware.

Linksvayer: The killer app is PageRank. Other things. CC, of course. You can look for works under terms you want. That’s all meta data. Some obvious short-term: decetralized social calendars and events. There’s a desire for those and a lot of work has been done. It just requires somoen to build aggregators. And then, of course, data integration ingeneral. That’s the killer app in a broad sense.

Haughey: Chicken-and-egg problems. You’ll come up with a new tag or use, then you have to get thousands of instances of that sitting around and then someone will make an app. Right? Doesn’t always really happen. CC, we just made the tool ourselves. Movable type and TypePad automatically spit out RDF data. Why hasn’t someone built a search engine around that?

Marlow: I collect all that data on Blogdex. I try to avoid search at all costs, but I collect all of that data. And it’s useful. The fields that would be interesting to me, though, are not really filled in. No one uses keywords. I haven’t founf any gains from this data I have.

Champeon: HTML works because it’s a layout language, a mark-up language — but it’s pretty useless as far as marking up language. No richness there, semantically. And no reason for someone writing a blog to add semantic information because they don’t really care. […]

Haughey: Last question. Google’s kind of skirted the whole SW. Ever see an SW Google Labs search engine? Or will they ever need the SW?

Linksvayer: Obviously they’re doing the right thing in concentrating on text. But they’ll only do it if there’s a huge amount to gain from it. To a degree they can drive whatever standards they want, if they declare they’re going to use a certain kind of data.

Marlow: And they have a specific way of turning non-textual features into a part of their ranking algorithm. But the core tech of Google is search, so I can’t envision them doing something wild and out-there.

Linksvayer: They can easily allow you to add different sorts of stuff into that one field, as they do.

Haughey: Questions?

[These are a bit too quick to write down clearly. One topic that came up was Dublin Core, which might be worth taking a look at.]

Galbraith: Flickr is an amazing thing. They’ve solved the problem of tagging. There are an average of five tags per item.

[Note: I might alter this transcript in the coming days to remove typos and clarify the content. I post it now just to make it available as soon as possible.]