SemWeb.NET: Semantic Web/RDF Library for C#/.NET

SemWeb.NET is a Semantic Web/RDF library written in C# for Mono or Microsoft's .NET. The library can be used for reading and writing RDF (XML, N3), keeping RDF in persistent storage (memory, MySQL, etc.), querying persistent storage via simple graph matching and SPARQL, and making SPARQL queries to remote endpoints. Limited RDFS and general-purpose inferencing is also possible. SemWeb's API is straight-forward and flexible. (What is RDF?)

This is an open-source library, so patches are most welcome.

Library Status

The latest version is 1.0.7 dated May 18, 2010. Thanks to Ruben Verborgh for a number of contributions including a rewritten Notation 3 writer that supports writing graphs.

See the ChangeLog for details.

As of May 19, 2009 I'm taking an indefinite hiatus from this project. That means that while I'll try to apply any patches to fix existing bugs, I won't be actively developing the library futher, and I won't be answering questions for help on the mail list. Over the last four years it's been fun to work on it, but I don't think there has been enough uptake of the Semantic Web in the .NET world (or otherwise) for me to justify spending more time on this when I have other things in life I'd rather be working on.

About the Library

SemWeb was first released in June 2005 and has been more recently tested with triple stores of over 1 billion statements (see see this). The core features, like reading/writing RDF in XML and N3, persistent SQL-backed storage, and basic SPARQL queries, are pretty solid. Peripheral features like RDFS reasoning and backward-chaining reasoning are working, but less tested and less complete. The library has no particular tools for OWL schema. It operates at the level of RDF triples only.

The embedded SPARQL library is a fork of Ryan Levering's SPARQL implementation, in Java, converted to .NET with IKVM (IKVM was written by Jeroen Frijters). The Euler class for general-purpose inferencing is adapted from Jos De Roo's JavaScript Euler inferencing engine.

You may also be interested in LinqToRdf, a library that provides C# LINQ querying over RDF using this library.

License: SemWeb is licensed primarily under the terms of the GNU GPL (version 2 or later). Note that SemWeb includes some external components: the SPARQL engine (sparql-core.dll) is licensed under the GNU LGPL; the Euler proof mechanism (Euler.cs, which is a part of SemWeb.dll) is licensed under the W3C Software License; and IKVM (IKVM*.dll) is GPL-compatible. The source code which I wrote myself (which is most everything but the above, see the README for details) is dual-licensed under both the GPL and the Creative Commons Attribution license (which was the only license I listed from 2005 through 2007).

SemWeb was mentioned in this article in RedmondDeveloper.

SemWeb is used in ROWLEX, F-Spot (Gnome photo management), Beagle (Gnome desktop search) for its experimental RDF access layer, and (at least at one time) Sentient Knowledge Explorer (a commercial data visualizer).

Download, Documentation, Etc.

Download

Source code, binaries (.dll assemblies, both .NET 1.1 and .NET 2.0 versions), and HTML documentation are included in the downloads:

The library is also packaged in Ubuntu as libsemweb1.0-cil but the latest version there is years old and has serious bugs.

Documentation

Development

  • ChangeLog (i.e. version history)
  • Browse Source Code Repository
  • Download latest code from the Subversion repository at svn://razor.occams.info/semweb
  • Yahoo! Groups Mail List
    • Because I am no longer actively work on this project, I may not respond to all mail on the mail list.
    • Feel free to join, post questions/suggestions/bugs, and discuss.
    • I moderate the group to filter out spam.
    • Please understand the triples you are encoding in any examples by reading the N3 primer or using a validator before posting.
  • DOAP RDF file

Features

  • Straightforward API that is easy to deploy and is completely cross-platform.
  • RDF/XML: Reading and writing RDF/XML (including XMP). The reader is streaming, which means the entire document doesn't ever need to be loaded into memory. Parsing passes all W3C tests. Try out the library using the validator here.
  • Notation 3: Reading and writing NTriples, Turtle, and most of Notation 3 (all streaming at roughly 20,000 statements/sec).
  • Validation of IRIs and XSD datatyped literals during file reading (or when requested) and parsing of xsd:dateTime, date, and time into XsdDateTime instances.
  • Output in GraphViz dot format.
  • SQL DB-backed persistent storage for SQL Server, MySQL, Sqlite, and PostgreSQL. The MySQL store (if not the others) scales to at least a billion triples (see this).
  • There is of course also an in-memory store.
  • Persistent storage supports an extended Select operation to query many things at once (much faster than making individual calls to the underlying database).
  • Reasoning: RDFS reasoning (though not complete) and rule-based reasoning based on the backward-chaining Euler engine, over any data.
  • 4-Tuples: Statements are quads, not triples. The fourth meta field can be used for application-specific purposes, like storing provenance, grouping statements, or storing N3 formulas.
  • Querying: Simple graph entailment tests and SPARQL queries over any data source, with translations of queries into SQL when possible. A remote SPARQL data source client is also available. And an ASP.NET SPARQL Protocol server handler (see this example).
  • Queries over federated data sources simply by running any query on a Store instance that has had many SelectableSource objects added into it with AddSource.
  • Extensibility: Implementing new persistent storage or sources of statements is as simple as implementing an interface, either forward-only or persistent.
  • Experimental algorithms for finding MSGs and making graphs lean.

Benchmarks

LUBM Load Time and Query Time Benchmarks

I ran the following benchmarks with version 0.85 of this library. The benchmarks involved the LUBM benchmarks, with 50 universities, totaling 6.9 million statements spread accross 1,000 files totaling at 540MB. The benchmarks were performed on a single-processor 1.8GHz AMD with 1GB RAM, under the Mono 1.2.3 runtime in Fedora Core 6.

Loading The Data
StorageTimeStorage SizeComments
(reading only)5 min (23k stmts/sec)n/a
N3 file on disk9 min (12k stmts/sec)n/a
mysql-5.0.2730 min (3.8k stmts/sec)769MB (117 bytes/stmt)
sqlite-3.3.372 min (1.6k stmts/sec)1.1GB (162 bytes/stmt)run with version 0.751 9/29/2006
postgresql-8.194 min (1.2k stmts/sec)1.6MB (238 bytes/stmt)run with version 0.751 9/29/2006
Querying The Data

The LUBM suite contains 14 sample queries, of which only a subset can be answered by the library because the rest use reasoning not supported by the library. The RDFS class supports a subset of RDFS reasoning, and that was enough to answer queries 1, 3, 4, 5, 6, 7, 8, 9, and 14. Run against the MySQL data source, most of the queries take 2 seconds to answer, including program start-up time. The ones that have an enormous answer take 30-70 seconds.

Berlin SPARQL Benchmark

See this page for the results of the Berlin SPARQL Benchmark.