[ Originally written for Kuliza Technologies on June 14th, 2011 ]
Did the three big companies take the correct decision in introducing Schema.org ?
Semantic web is a web of information, which is marked with machine
understandable metadata in addition to the Human readable web-content.
Recently, Google, Yahoo and Microsoft collaborated and came up with
Schema.org, which is their privately
hosted Semantic mark-up vocabulary.
This introduction has been a hot topic of discussion in the Semantic
web community, majorly because of the syntax chosen by the three
companies to develop the vocabulary. The major issue with this release
has been that the terms in Schema.org are expressed in microdata syntax,
as opposed to the currently popular RDFa serialization of RDF. I am
currently contributing open-source code to the Semantic web community
through my project, which involves creating an RDF Vocabulary publishing
platform. So maybe I might appear a bit biased towards RDFa over
microdata here.
Bit of History -
RDF is a knowledge representation framework that encodes data as
subject-predicate-object triples. When you combine triples, they form
graphs. Initially, RDF/XML serialization format was used for semantic
marking, and it separated the semantic marking from the HTML content.
Over the course of time, Microformat syntax emerged, wherein the
Semantic metadata content was integrated into the HTML itself. RDFa is
another serialization of RDF, that was based on Microformat, i.e.,
integrating HTML Content and the metadata. Microdata is a set of tags,
introduced with HTML5, which claimed to improve upon RDFa.
An important thing to note here is that RDFa and Microdata – both are
syntaxes. Both are both Entity-Attribute-Value models that support
using URIs as universal identifiers. There also exists an algorithm for
converting Microdata to RDF. On the other hand, Schema.org is a
vocabulary. A vocabulary has terms, which can be specified in any
syntax. Schema.org terms have been originally specified in Microdata
syntax.
Can’t we just specify all the terms in RDFa syntax and continue using them?
The answer is Yes, and as a matter of fact, the work is already in
progress as I write this post. People in the RDFa community, Richard
Cyganiak (My Google summer of code 2011 mentor) and Michael Hausenblas,
have worked to develop an RDFS definition for the terms of Schema.org,
and hosted it at
http://schema.rdfs.org/.
So what is the issue here?
Google has asked the web community to use either microdata or RDFa since using both the syntaxes confuses its parsers.
“While it’s OK to use the new schema.org mark-up or continue to
use existing Microformat or RDFa mark-up, you should avoid mixing the
formats together on the same web page, as this can confuse our parsers.”
… “If you have already done mark-up and it is already being used by
Google, Microsoft, or Yahoo!, the mark-up format will continue to be
supported. Changing to the new mark-up format could be helpful over time
because you will be switching to a standard that is accepted across all
three companies, but you don’t have to do it.”
And then it adds:
“We will also be monitoring the web for RDFa and Microformat
adoption and if they pick up, we will look into supporting these
syntaxes.”
This sounds as if Google is pushing developers who are looking for
SEO to start using microdata syntax, a standard that is not in much use
yet, since it gets a sort of priority in its parsing algorithms. This
takes away the freedom from the developers to choose whatever syntax
works best for them. Although RDFa is a bit more complex than
Microdata, it can covers more use cases, and some developers might be
more comfortable using it.
Few years ago, the web-developers community was reluctant in
semantically marking their web-content. The semantic web community
worked hard to make the web developers understand the future benefits of
having linked data all over the web. So, many of the developers slowly
started using RDFa and Microformat, and a recent survey showed that 4%
of websites used RDFa, which is more than any other. See
http://tripletalk.files.wordpress.com/2011/01/rdfa-deployment.png for the comparison.
RDFa is being used by Drupal 7, Facebook OGP, Best Buy, all
e-commerce sites which use the GoodRelations Vocabulary and many more
major deployments globally.
And now schema.org asks them to learn a new syntax yet again. Lets
face it; if Google, MS and yahoo declare that they would support only
microdata for parsing content on the web, most of the web developers who
are majorly looking for SEO would definitely follow. This would
adversely affect the growth of RDFa deployments.
Thus, a large portion of the Semantic Web community is not happy with
the decisions. Some believe that the vocabularies provided by
schema.org won’t suffice if you want to cover complex domains since it
is not extensible.
Another matter of concern is that it seems w3c was not consulted at
all, while schema.org was developed. Commercialization of standards is
never a good thing, and that’s what Schema.org does. In fact, Manu
Sporny, chairperson of RDFa group in w3c, has been very aggressive in
opposing schema.org and he goes to the extent of saying that he would
soon start a revolution against “The false choice” of using microdata in
schema.org. I have been following him on twitter and he has been
gathering support there to put pressure on the three big Companies. He
also believes that “Microdata doesn’t scale as easily as RDFa – early
successes will be followed by stagnation and vocabulary lock-in.”
The solutions-
The most obvious solution to this problem is that Google, bing and
yahoo announce that they would treat RDFa and microdata with equal
priority in their parsing algorithms.
Bing has already stated that it can parse a page that includes
multiple syntaxes. However, Google parsers cant do this, and needs to
incorporate this feature in their parsing algorithms as soon as
possible.
However…
Schema.org does seem to have a created a lot of negative buzz, but
lets not forget that some kind of RDF vocabulary standardization like
this was long due. Currently, due to lack of a definite standard, it is
difficult for developers to decide on which one to use for mark-up.
Schema.org does solve this problem and makes life easier for developers
as well as for search engines. As Google states:
“Creating a schema supported by all the major search engines
makes it easier for webmasters to add mark-up, which makes it easier for
search engines to create rich search features for users.”