The ISLab Instance Matching Benchmark
Provided by the Information Systems
& Knowledge Management Lab
Dipartimento di Informatica e Comunicazione - Università degli Studi di Milano
via Comelico 39, 20135, Milano, Italy
Contact person for IIMB: Alfio Ferrara (ferrara at di.unimi.it)
The ISLab Instance Matching Benchmark is
a benchmark automatically constituted using one data source and modifying it according to various criterias. The benchmark is generated using the ISLab Instance Matching Benchmark tool.
The testbed (download) provide OWL/RDF data about actors, sport persons, and business firms taken from the
You can directly access each data source by the URL: http://islab.di.unimi.it/iimb/[dataset ID]/abox.owl, where [dataset ID]
ranges from 001 to 037.
The dataset is organized as follows:
The main directory contains 37 sub-directories and the original ABox and the associated TBox (abox.owl and tbox.owl).
The original ABox contains about 200 different instances.
Each sub-directory contains a modified ABox (abox.owl + tbox.owl) and the corresponding mapping with the instances in the original ABox (refalign.rdf).
The introduced modifications are the following:
- Directory 001: Contains an identical copy of the original ABox (the instance IDs are randomly changed!).
- Directory 002 - Directory 010: Value transformations (i.e., typographical errors simulation, use of different standard for representing the same information). In order to simulate typographical errors, property values of each instance are randomly modified. Modifications are applied on different subsets of the instances property values and with different levels of difficulty (i.e., introducing a different number of errors).
- Directory 011 - Directory 019: Structural transformations (i.e., deletion of one or more values, transformation of datatype properties into object properties, separation of a single property into more properties).
- Directory 020 - Directory 029: Logical transformations (i.e., instantiation of identical individuals into different subclasses of the same class, instantiation of identical individuals into disjoint classes, instantiation of identical individuals into different classes of an explicitly declared class hierarchy).
- Directory 030 - Directory 037: Several combinations of the previous transformations