SemWeb is my Semantic Web/RDF library written in C# for Mono or
Microsoft's .NET 1.1/2.0. The library can be used for reading and
writing RDF (XML, N3), keeping RDF in persistent storage (memory, MySQL, etc.),
querying persistent storage via simple graph matching and SPARQL,
and making SPARQL queries to remote endpoints. Limited RDFS and general-purpose inferencing is also possible. SemWeb's API is
straight-forward and flexible. (What is RDF?) This is an open-source library, so patches are most welcome. Latest Release: 1.063 - 11/20/081.0.6.3: Literal.FromValue is now correctly culture-insensitive, SPARQL default/named dataset is handled differently now, new RdfReader.Create(Uri) overload, improvements to SQLStore.Query, Store.Select for Meta, rdfstorage.exe tool. New documentation for SPARQL endpoints, rdfstorage.exe. 1.062: Changes related to recognizing NTriples format, validating datatype'd literals, parsing and comparing xsd:date/dateTime in and out of SPARQL, GraphViz output, and other SPARQL fixes. Added new documentation for the BSBM benchmark, which is a good example of setting up an ASP.NET SPARQL endpoint. 1.061: Corrected rdfstorage.exe build and fixed RdfXmlReader for properties with datatypes, and other small things. 1.06: Many bug fixes to SPARQL queries (ORDER BY, XSD datatypes, lang()). Improvements to Store.Query and GraphMatch (reorder query smartly), Euler (allow variables as predicates in rules and major speedup). Other fixes for RdfReaders (allow DTDs, implement Dispose), SQLStore, SqliteStore, and SparqlProtocol. SPARQL build files are now included in the package. More details in the ChangeLog. See the ChangeLog for details. About the LibrarySemWeb was first released in June 2005 and has been more recently tested with
triple stores of over 1 billion statements (see see this).
The core features, like reading/writing RDF in XML and N3, persistent
SQL-backed storage, and SPARQL queries, are pretty solid. Peripheral features like
RDFS reasoning and backward-chaining reasoning are working,
but less tested and less complete. The library has no particular tools
for OWL schema. It operates at the level of RDF triples only. The embedded SPARQL library is a fork of Ryan Levering's
SPARQL implementation, in Java, converted to .NET with
IKVM (IKVM was written by Jeroen Frijters).
The Euler class for general-purpose inferencing is adapted from
Jos De Roo's JavaScript Euler inferencing
engine. You may also be interested in LinqToRdf,
a library that provides C# LINQ querying over RDF using this library. License: SemWeb is licensed primarily under the terms of the GNU GPL (version 2 or later).
Note that SemWeb includes some external components: the SPARQL engine
(sparql-core.dll) is licensed under the GNU LGPL; the Euler proof mechanism
(Euler.cs, which is a part of SemWeb.dll) is licensed under the W3C Software License; and IKVM (IKVM*.dll) is GPL-compatible.
The source code which I wrote myself (which is most everything
but the above, see the README for details) is dual-licensed under both the GPL and the Creative Commons
Attribution license (which was the only license I listed from 2005 through 2007). SemWeb was mentioned in this article in RedmondDeveloper. SemWeb is used in F-Spot (Gnome photo management), Beagle (Gnome desktop search) for its experimental RDF access layer,
and (at least at one time) Sentient Knowledge Explorer (a commercial data visualizer). Download, Documentation, Etc.DownloadSource code, binaries (.dll assemblies, both .NET 1.1 and .NET 2.0
versions), and HTML documentation are included in the downloads: DocumentationDevelopmentFeatures- Straightforward API that is easy to deploy and is completely cross-platform.
- RDF/XML: Reading
and writing RDF/XML (including XMP). The reader is streaming, which
means the entire document doesn't ever need to be loaded into
memory. Parsing passes all W3C tests. Try out the library using
the validator here.
- Notation 3: Reading
and writing NTriples, Turtle, and most of Notation 3
(all streaming at roughly 20,000 statements/sec).
- Validation of IRIs and XSD datatyped literals during file reading (or when requested) and parsing of xsd:dateTime, date, and time into XsdDateTime instances.
- Output in GraphViz dot format.
- SQL DB-backed persistent storage for SQL Server, MySQL, Sqlite, and PostgreSQL.
The MySQL store (if not the others) scales to at least a billion triples (see this).
- There is of course also an in-memory store.
- Persistent storage supports an extended Select operation
to query many things at once (much faster than making individual calls
to the underlying database).
- Reasoning: RDFS reasoning (though not complete) and
rule-based reasoning based on the backward-chaining Euler engine, over any data.
- 4-Tuples: Statements are quads, not triples. The fourth meta field
can be used for application-specific purposes, like storing provenance,
grouping statements, or storing N3 formulas.
- Querying: Simple graph entailment tests
and SPARQL
queries over any data source, with translations of queries into SQL when possible.
A remote SPARQL data source
client is also available.
And an ASP.NET SPARQL Protocol server handler (see this example).
- Queries over federated data sources simply by running any query on a Store instance that has had many SelectableSource objects added into it with AddSource.
- Extensibility: Implementing new persistent storage or sources of statements is as simple as
implementing an interface, either forward-only
or persistent.
- Experimental algorithms for finding MSGs
and making graphs lean.
BenchmarksLUBM Load Time and Query Time BenchmarksI ran the following benchmarks with version 0.85
of this library. The benchmarks involved the LUBM benchmarks,
with 50 universities, totaling 6.9 million statements spread
accross 1,000 files totaling at 540MB. The benchmarks were
performed on a single-processor 1.8GHz AMD with 1GB RAM,
under the Mono 1.2.3 runtime in Fedora Core 6. Loading The Data| Storage | Time | Storage Size | Comments |
|---|
| (reading only) | 5 min (23k stmts/sec) | n/a | | N3 file on disk | 9 min (12k stmts/sec) | n/a | | mysql-5.0.27 | 30 min (3.8k stmts/sec) | 769MB (117 bytes/stmt) | | | sqlite-3.3.3 | 72 min (1.6k stmts/sec) | 1.1GB (162 bytes/stmt) | run with version 0.751 9/29/2006 | | postgresql-8.1 | 94 min (1.2k stmts/sec) | 1.6MB (238 bytes/stmt) | run with version 0.751 9/29/2006 |
Querying The DataThe LUBM suite contains 14 sample queries, of which only a subset can
be answered by the library because the rest use reasoning not supported
by the library. The RDFS class supports a subset of RDFS reasoning, and
that was enough to answer queries 1, 3, 4, 5, 6, 7, 8, 9, and 14. Run
against the MySQL data source, most of the queries take 2 seconds to
answer, including program start-up time. The ones that have an enormous
answer take 30-70 seconds. Berlin SPARQL BenchmarkSee this page for the
results of the Berlin SPARQL Benchmark. | |