Learning LOD

For a side summer project, I’ve taken on figuring out how to obtain and work with Linked Open Data (LOD). The Yale Center for British Art transformed their collection data into LOD a couple years back, so they provide a nice local source for the data. As far as I can tell right now, the project entails learning (just enough of) SPARQL and RDF parsing, and possibly improving my Gephi or Cytoscape chops. Meaning, of course, that the third part is needing to figure out what can be done with YCBA’s data and learning how to do that.

Off the cuff, I think these are the most relevant pre-existing factors affecting whether this post is comprehensible and useful for you:

  • MacBook Pro from late 2014; 2.2 GHz processor, 16 Gb RAM
  • Python 2.7.6
  • pip 7.0.1
  • Experience writing code as well as working with relational databases and SQL
  • Experience administering a server
  • Perhaps most importantly, before this week I spent a little time with Bob DuCharme’s Learning SPARQL from O’Reilly. Also possibly available at a library near you.

What I did today to get meaningful results:
First I needed to give my Python the capability of addressing a SPARQL endpoint and parsing the results.
sudo pip install rdflib (this also installs isodate, SPARQLWrapper, and html5lib)
sudo pip install simplejson
Then I executed some this Python fragment from Semantic Web:

from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?label
    WHERE { 
      <http://dbpedia.org/resource/Asturias> 
      rdfs:label
      ?label .
    }
""")

sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
      print result["label"]["value"]

And got back these results:

Asturias
منطقة أستورياس
Asturien
Asturias
Asturies
Asturie
アストゥリアス州
Asturië (regio)
Asturia
Astúrias
Астурия
阿斯图里亚斯

With that in hand, I played around with the SPARQL query a bit to get other parts of the DBPedia entry. (After a brief visit to Wikipedia to learn a bit more about Asturias.)

This feels good enough for today. Next thin, I think, is to press some more on Learning SPARQL and see what else I can ask DBPedia.

Update 12 June: Fixed omission of sudo before second package install. Added that I have Pip, and where to get it.