General
Coding Projects
Links
Other Pages Here
Semantic Web/RDF Library for C#/.NET
By Joshua Tauberer

SemWeb is my Semantic Web/RDF library written in C# for Mono or Microsoft's .NET 1.1/2.0. The library can be used for reading and writing RDF (XML, N3), keeping RDF in persistent storage (memory, MySQL, etc.), querying persistent storage via simple graph matching and SPARQL, and making SPARQL queries to remote endpoints. Limited RDFS and general-purpose inferencing is also possible. SemWeb's API is straight-forward and flexible. (What is RDF?)

This is an open-source library, so patches are most welcome.

Latest Release: 1.061 - 6/19/08

1.061: Corrected rdfstorage.exe build and fixed RdfXmlReader for properties with datatypes, and other small things.

1.06: Many bug fixes to SPARQL queries (ORDER BY, XSD datatypes, lang()). Improvements to Store.Query and GraphMatch (reorder query smartly), Euler (allow variables as predicates in rules and major speedup). Other fixes for RdfReaders (allow DTDs, implement Dispose), SQLStore, SqliteStore, and SparqlProtocol. SPARQL build files are now included in the package. More details in the ChangeLog.

1.05: Many bug fixes and a SQL Server store contributed by Khaled Hammouda!

1.03: Various minor changes. The code is now dual-licensed.

1.02: This release has fixes and small enhancements to RdfWriter, RdfReader, N3Reader, and GraphMatch. SPARQL protocol MIME type handling is improved. Some general overview documentation has been added. Also, two fairly dangerous changes were made. GraphMatch now deals with limits in an adaptive way for intersective queries. The SPARQL engine has been updated to the latest upstream version in Ryan Levering's Subversion repository. 1.021: SPARQL fixes.

See the ChangeLog for details.

About the Library

SemWeb was first released in June 2005 and has been more recently tested with triple stores of over 1 billion statements (see see this). The core features, like reading/writing RDF in XML and N3, persistent SQL-backed storage, and SPARQL queries, are pretty solid. Peripheral features like RDFS reasoning and backward-chaining reasoning are working, but less tested and less complete. The library has no particular tools for OWL schema. It operates at the level of RDF triples only.

SPARQL support is based on Ryan Levering's SPARQL implementation in Java (with some bug fixes and performance improvements that I hope to get up-stream), converted to .NET with IKVM (IKVM was written by Jeroen Frijters). The Euler class for general-purpose inferencing is adapted from Jos De Roo's JavaScript Euler inferencing engine.

You may also be interested in LinqToRdf, a library that provides C# LINQ querying over RDF using this library.

License: SemWeb is licensed primarily under the terms of the GNU GPL (version 2 or later). Note that SemWeb includes some external components: the SPARQL engine (sparql-core.dll) is licensed under the GNU LGPL; the Euler proof mechanism (Euler.cs, which is a part of SemWeb.dll) is licensed under the W3C Software License; and IKVM (IKVM*.dll) is GPL-compatible. The source code which I wrote myself (which is most everything but the above, see the README for details) is dual-licensed under both the GPL and the Creative Commons Attribution license (which was the only license I listed from 2005 through 2007).

SemWeb was mentioned in this article in RedmondDeveloper.

SemWeb is used in F-Spot (Gnome photo management), Beagle (Gnome desktop search) for its experimental RDF access layer, and (at least at one time) Sentient Knowledge Explorer (a commercial data visualizer).

Download, Documentation, Etc.

Download

Source code, binaries (.dll assemblies, both .NET 1.1 and .NET 2.0 versions), and HTML documentation are included in the downloads:

Documentation

Development

Features

  • Straightforward and consistent API; really easy to deploy; no platform-specific dependencies.
  • RDF/XML: Reading and writing RDF/XML (including XMP). The reader is streaming, which means the entire document doesn't ever need to be loaded into memory. Parsing passes all W3C tests. Try out the library using the validator here.
  • Notation 3: Reading and writing NTriples, Turtle, and most of Notation 3 (all streaming at roughly 20,000 statements/sec).
  • SQL DB-backed persistent storage for SQL Server, MySQL, Sqlite, and PostgreSQL. (The MySQL store is the most tested with. Writing to the MySQL store goes around 5,000-10,000 statements/sec using the DISABLEKEYS import method, scaling to at least a billion triples, see this.)
  • There is of course also an in-memory store.
  • Persistent storage supports an extended Select operation to query many things at once (much faster than making individual calls to the underlying database).
  • Reasoning: RDFS reasoning (though not complete) and rule-based reasoning based on the backward-chaining Euler engine, over any data.
  • 4-Tuples: Statements are quads, not triples. The fourth meta field can be used for application-specific purposes, like storing provenance, grouping statements, or storing N3 formulas.
  • Querying: Simple graph entailment tests and SPARQL queries over any data source, a remote SPARQL store (read-only persistent storage backed by a remote SPARQL-over-HTTP service and methods for making arbitrary SPARQL queries), and an ASP.NET SPARQL Protocol server handler.
  • Extensibility: Implementing new persistent storage or sources of statements is as simple as implementing an interface, either forward-only or persistent.
  • Experimental algorithms for finding MSGs and making graphs lean.

Benchmarks

I ran the following benchmarks with version 0.85 of this library. The benchmarks involved the LUBM benchmarks, with 50 universities, totaling 6.9 million statements spread accross 1,000 files totaling at 540MB. The benchmarks were performed on a single-processor 1.8GHz AMD with 1GB RAM, under the Mono 1.2.3 runtime in Fedora Core 6.

Loading The Data

StorageTimeStorage SizeComments
(reading only)5 min (23k stmts/sec)n/a
N3 file on disk9 min (12k stmts/sec)n/a
mysql-5.0.2730 min (3.8k stmts/sec)769MB (117 bytes/stmt)
sqlite-3.3.372 min (1.6k stmts/sec)1.1GB (162 bytes/stmt)run with version 0.751 9/29/2006
postgresql-8.194 min (1.2k stmts/sec)1.6MB (238 bytes/stmt)run with version 0.751 9/29/2006

Querying The Data

The LUBM suite contains 14 sample queries, of which only a subset can be answered by the library because the rest use reasoning not supported by the library. The RDFS class supports a subset of RDFS reasoning, and that was enough to answer queries 1, 3, 4, 5, 6, 7, 8, 9, and 14. Run against the MySQL data source, most of the queries take 2 seconds to answer, including program start-up time. The ones that have an enormous answer take 30-70 seconds.