I have been very busy lately, with no time to write. But I attended great seminars meanwhile, like Pierre De Wilde from Tinkerpop talking at the GBI about the Property Graph model. He explained how we can construct and consult information in a graph database (‘traverse a graph’).
A Graph database is composed of vertices and edges. Each vertice has a name (id) and some properties, and the edges have also a name and can also have properties, and as they have a direction, there is one identified Outgoing vertice and one identified Ingoing vertice. In this example:
Who worked with author 1 in any of his books?
will look like this in the traversal language Gremlin:
This can be read as: Take the graph, look for vertice 1 and follow all outgoing edges with name ‘created’. From all the vertices that you have reached, follow the ingoing edges with name ‘created’ and return the name of vertices you have reached. The result is 1, 6 and 4.
Project DBpedia is a great example of a graph database, it began in 2007 and is now linking more than 3.64 million things from Wikipedia data in a graph. Now in order to work and be able to navigate through the metadata, it has to be cleaned, standardise. What do I mean? If you have a property called ‘place of birth’ and another ‘birth place’, you know they are the same, you would like those 2 properties to be traversed when someone looks for the natives of a particular city or country.
There are some initiatives, like the movement ‘freeyourmetadata’, that encourages you to give your metadata in the best possible shape, so that you help constructing graphs that can be easily and fully traversed. DBpedia has to deal with the data that is already there in Wikipedia, they created an ontology to tackle this quality problem, so that you can find results even through synonyms.
I particularly liked his mention of Tim Berners-Lee’s vision on Linked Data:
Internet = net of computers
World Wide Web = web of documents
… and the next step is
Giant Global Graph = graph of metadata
… also called the semantic web, open and linked data