Posts tagged ‘RDF’

The Semantic Web

As with most of my posts, I will start discussing the Semantic Web at a very high level, geared towards gaining an overall understanding of what it means, why it’s relevant and what it’s implications are. Future posts will discuss the future and associated technologies.

In it’s simplest form, Semantic Web is about giving meaning to the vast amount of data that lies on the Internet and more importantly, making it machine readable so that all that data starts to make sense not only to humans (as is the case today), but also to machines such that machines can infer meaning from the data, associate it to other data to draw conclusions or provide new services. Imagine that you are about to take a vacation. What do you do today? You go to Google, type in “vacation packages”, click on a few links to see what these packages entail, sort and search though ones that fit your budget and then decide if any of the suggested packages is something that you’re interested in. Now imagine if there was an easy way to just tell the Internet – “find me the best vacation package under $3000” and somehow, the Internet knew enough information about you to do know where you’ve been recently (hence not to present those vacation destinations to you), what your general likes/dislikes are and given a budget of less than $3000, what current packages make the most sense for you. Now that would be something wouldn’t it? In essence, that is the power of the Semantic Web – drawing conclusions, forming relationships from given data sets and arriving at results.

Of course, this is a 10,000 ft view of the Semantic Web, so the above description is overly simplified. In reality, there are many issues associated with inferring relationships and arriving at conclusions. Take for example:

  • Suman Chaudhuri is the owner of this blog
  • The owner of this blog lives in Michigan

From these 2 statements, one can infer that Suman Chaudhuri (me) lives in Michigan. You (and machines) could not have known this by just processing each statement on it’s own, but combined, this conclusion can be reached. That is the power of the Semantic Web and the more that machines can process and infer, the more powerful the Web grows. However, as I mentioned, it’s not always that simple. Take for example:

  • Suman lives in New York.
  • People from New York speak with a New York accent.

This would mean that Suman speaks with a New York accent. But the truth is that I have been raised all over the world (England, States, etc) and although I do have an American accent, it is by no means close to a New York accent. The problem is that the 2nd statement is a generalization, but machines cannot understand that and that is where the problem lies.

But I digress. This article is about a high level discussion about the Semantic Web. So what is needed to get to the next generation of the web, the so called “Web 3.0”, where machines can understand and process data, make inferences and build on the idea of collaboration and social media (currently termed Web 2.0)? The data needs to be described, categorized and annotated. Meta-data is the key to the Semantic Web, as it is to SOA. As are ontologies. Ontologies are data models that describe concepts related to a domain, along with relationships that exist between those concepts.

Two specifications that are important in this space are the Resource Description Framework (RDF) and Web Ontology Language (OWL). RDF is an XML based language used to describe relationships. OWL is used to describe ontologies. However, in my opinion, both RDF and OWL are pretty complex and that is one of the reasons why adoption of the Semantic Web has been slow. To address these issues, people have come up with microformats. The idea is to embed meta-data directly into HTML and allow microformat aware browsers to deduce information about the concepts within a domain. For example, you might have come across the hCard microformat, that captures a person’s contact information (first name, last name, phone, etc) within XML tags. It is not as powerful as RDF or OWL, but it’s popularity lies in its simplicity. Having said that, there are a few drawbacks. Firstly, the onus is up to us to categorize data using the right format. Secondly, the number of existing microformats is nowhere close to capturing the myriad of different information that currently exists.

So we still have a long way to go before the world’s information is categorized and “Web 3.0” or whatever they call it then comes to life. But the promise of a semantic web is a very powerful concept and intriguing nonetheless.

December 31, 2007 at 12:41 am 1 comment

Add to Technorati Favorites
July 2017
« Jul    

Recent Posts