Rdflib Documentation
Total Page:16
File Type:pdf, Size:1020Kb
rdflib Documentation Release 5.0.0-dev RDFLib Team Nov 01, 2018 Contents 1 Getting started 3 2 In depth 19 3 Reference 33 4 For developers 157 5 Indices and tables 171 Python Module Index 173 i ii rdflib Documentation, Release 5.0.0-dev RDFLib is a pure Python package work working with RDF. RDFLib contains most things you need to work with RDF, including: • parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, RDFa and Microdata. • a Graph interface which can be backed by any one of a number of Store implementations. • store implementations for in memory storage and persistent storage on top of the Berkeley DB. • a SPARQL 1.1 implementation - supporting SPARQL 1.1 Queries and Update statements. Contents 1 rdflib Documentation, Release 5.0.0-dev 2 Contents CHAPTER 1 Getting started If you never used RDFLib, click through these 1.1 Getting started with RDFLib 1.1.1 Installation RDFLib is open source and is maintained in a GitHub repository. RDFLib releases, current and previous are listed on PyPi The best way to install RDFLib is to use pip (sudo as required): $ pip install rdflib Support is available through the rdflib-dev group: http://groups.google.com/group/rdflib-dev and on the IRC channel #rdflib on the freenode.net server The primary interface that RDFLib exposes for working with RDF is a Graph. The package uses various Python idioms that offer an appropriate way to introduce RDF to a Python programmer who hasn’t worked with RDF before. RDFLib graphs are not sorted containers; they have ordinary set operations (e.g. add() to add a triple) plus methods that search triples and return them in arbitrary order. RDFLib graphs also redefine certain built-in Python methods in order to behave in a predictable way; they emulate container types and are best thought of as a set of 3-item triples: [ (subject, predicate, object), (subject1, predicate1, object1), ... (subjectN, predicateN, objectN) ] 3 rdflib Documentation, Release 5.0.0-dev A tiny usage example: import rdflib g= rdflib.Graph() result=g.parse("http://www.w3.org/People/Berners-Lee/card") print("graph has %s statements."% len(g)) # prints graph has 79 statements. for subj, pred, obj in g: if (subj, pred, obj) not in g: raise Exception("It better be!") s=g.serialize(format='n3') A more extensive example: from rdflib import Graph, Literal, BNode, Namespace, RDF, URIRef from rdflib.namespace import DC, FOAF g= Graph() # Create an identifier to use as the subject for Donna. donna= BNode() # Add triples using store's add method. g.add( (donna, RDF.type, FOAF.Person) ) g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) ) g.add( (donna, FOAF.name, Literal("Donna Fales")) ) g.add( (donna, FOAF.mbox, URIRef("mailto:[email protected]")) ) # Iterate over triples in store and print them out. print("--- printing raw triples ---") for s, p, o in g: print((s, p, o)) # For each foaf:Person in the store print out its mbox property. print("--- printing mboxes ---") for person in g.subjects(RDF.type, FOAF.Person): for mbox in g.objects(person, FOAF.mbox): print(mbox) # Bind a few prefix, namespace pairs for more readable output g.bind("dc", DC) g.bind("foaf", FOAF) print( g.serialize(format='n3')) Many more examples can be found in the examples folder in the source distribution. 4 Chapter 1. Getting started rdflib Documentation, Release 5.0.0-dev 1.2 Loading and saving RDF 1.2.1 Reading an NT file RDF data has various syntaxes (xml, n3, ntriples, trix, etc) that you might want to read. The simplest format is ntriples, a line-based format. Create the file demo.nt in the current directory with these two lines: <http://bigasterisk.com/foaf.rdf#drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns ,!#type> <http://xmlns.com/foaf/0.1/Person>. <http://bigasterisk.com/foaf.rdf#drewp> <http://example.com/says> "Hello world". You need to tell RDFLib what format to parse, use the format keyword-parameter to parse(), you can pass either a mime-type or the name (a list of available parsers is available). If you are not sure what format your file will be, you can use rdflib.util.guess_format() which will guess based on the file extension. In an interactive python interpreter, try this: from rdflib import Graph g= Graph() g.parse("demo.nt", format="nt") len(g) # prints 2 import pprint for stmt in g: pprint.pprint(stmt) # prints : (rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'), rdflib.term.URIRef('http://example.com/says'), rdflib.term.Literal(u'Hello world')) (rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person')) The final lines show how RDFLib represents the two statements in the file. The statements themselves are just length-3 tuples; and the subjects, predicates, and objects are all rdflib types. 1.2.2 Reading remote graphs Reading graphs from the net is just as easy: g.parse("http://bigasterisk.com/foaf.rdf") len(g) # prints 42 The format defaults to xml, which is the common format for .rdf files you’ll find on the net. RDFLib will also happily read RDF from any file-like object, i.e. anything with a .read method. 1.2. Loading and saving RDF 5 rdflib Documentation, Release 5.0.0-dev 1.3 Creating RDF triples 1.3.1 Creating Nodes RDF is a graph where the nodes are URI references, Blank Nodes or Literals, in RDFLib represented by the classes URIRef, BNode, and Literal. URIRefs and BNodes can both be thought of as resources, such a person, a company, a web-site, etc. A BNode is a node where the exact URI is not known. URIRefs are also used to represent the properties/predicates in the RDF graph. Literals represent attribute values, such as a name, a date, a number, etc. Nodes can be created by the constructors of the node classes: from rdflib import URIRef, BNode, Literal bob= URIRef("http://example.org/people/Bob") linda= BNode() # a GUID is generated name= Literal('Bob') # passing a string age= Literal(24) # passing a python int height= Literal(76.5) # passing a python float Literals can be created from python objects, this creates data-typed literals, for the details on the mapping see Literals. For creating many URIRefs in the same namespace, i.e. URIs with the same prefix, RDFLib has the rdflib. namespace.Namespace class: from rdflib import Namespace n= Namespace("http://example.org/people/") n.bob # = rdflib.term.URIRef(u'http://example.org/people/bob') n.eve # = rdflib.term.URIRef(u'http://example.org/people/eve') This is very useful for schemas where all properties and classes have the same URI prefix, RDFLib pre-defines Names- paces for the most common RDF schemas: from rdflib.namespace import RDF, FOAF RDF.type # = rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type') FOAF.knows # = rdflib.term.URIRef(u'http://xmlns.com/foaf/0.1/knows') 1.3.2 Adding Triples We already saw in Loading and saving RDF, how triples can be added with with the parse() function. Triples can also be added with the add() function: Graph.add(triple) Add a triple with self as context add() takes a 3-tuple of RDFLib nodes. Try the following with the nodes and namespaces we defined previously: 6 Chapter 1. Getting started rdflib Documentation, Release 5.0.0-dev from rdflib import Graph g= Graph() g.add( (bob, RDF.type, FOAF.Person) ) g.add( (bob, FOAF.name, name) ) g.add( (bob, FOAF.knows, linda) ) g.add( (linda, RDF.type, FOAF.Person) ) g.add( (linda, FOAF.name, Literal('Linda'))) printg.serialize(format='turtle') outputs: @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . <http://example.org/people/Bob> afoaf:Person; foaf:knows[afoaf:Person; foaf:name "Linda"]; foaf:name "Bob". For some properties, only one value per resource makes sense (i.e they are functional properties, or have max- cardinality of 1). The set() method is useful for this: g.add( ( bob, FOAF.age, Literal(42))) print "Bob is", g.value( bob, FOAF.age ) # prints: Bob is 42 g.set( ( bob, FOAF.age, Literal(43))) # replaces 42 set above print "Bob is now", g.value( bob, FOAF.age ) # prints: Bob is now 43 rdflib.graph.Graph.value() is the matching query method, it will return a single value for a property, optionally raising an exception if there are more. You can also add triples by combining entire graphs, see Set Operations on RDFLib Graphs. Removing Triples Similarly, triples can be removed by a call to remove(): Graph.remove(triple) Remove a triple from the graph If the triple does not provide a context attribute, removes the triple from all contexts. When removing, it is possible to leave parts of the triple unspecified (i.e. passing None), this will remove all matching triples: g.remove( (bob, None, None)) # remove all triples about bob An example LiveJournal produces FOAF data for their users, but they seem to use foaf:member_name for a person’s full name. To align with data from other sources, it would be nice to have foaf:name act as a synonym for 1.3. Creating RDF triples 7 rdflib Documentation, Release 5.0.0-dev foaf:member_name (a poor man’s one-way owl:equivalentProperty): from rdflib.namespace import FOAF g.parse("http://danbri.livejournal.com/data/foaf") for s,_,n in g.triples((None, FOAF['member_name'], None)): g.add((s, FOAF['name'], n)) 1.4 Navigating Graphs An RDF Graph is a set of RDF triples, and we try to mirror exactly this in RDFLib, and the graph tries to emulate a container type: 1.4.1 Graphs as Iterators