Semantic Data Harmonisation


Information is typically mapped manually using syntax. Whilst this approach has been very successful - especially when the information to be mapped is not too complex - there are several clear disadvantages:

  • An in-depth knowledge of both the source and destination schemas is required.
  • New schemas need to be studied in detail before creating mappings – this is a very expensive step that cannot be avoided.
  • Mappings are often tricky to understand and error-prone as humans normally think in terms of the actual semantics instead of the syntax
  • Mappings typically cannot be reused
  • The mappings must be performed by a technician with schema knowledge whereas what is wanted is the mapping of information by business people
  • Messages cannot be automatically translated due to a lack of semantic understanding and mismatches

However, if a company wishes to trade electronically with other companies or public administrations, they would like to exchange different types of information ranging from orders to company returns as easily as possible. Today, the greatest barrier to this wish is not the cost of hardware, software or communications. Rather, it is the "interoperability" barrier – i.e. the difficulty in providing information to potential trading and manufacturing partners in a format their systems can understand. Companies do not have time to learn the technical specifications of XML, EDI or IDOCs. However, they do know the user concepts (semantics) of their information. They want something that allows information transformation, but based on their knowledge only, and for it to be easy and inexpensive to do without the reliance on technicians or consultants. The aim is to address many of the aforementioned problems by allowing anyone to create mappings based upon semantics – even if they are not realising they are using semantics  - Ie instead of focussing on syntax, to concentrate on identifying semantic assets and mapping those instead.  For example, two concepts <Town> and <Ville> may be grouped into one logical semantic entity called "XX" which is mapped to a well defined concept of an address.  This could apply across language barriers but also within them eg <Pavement> and <Sidewalk>. The mapping process is typically based on ontologies used to define and link these semantic assets. The link to the original syntax is still made, but this is completely transparent to the user.  This is necessary since the mapping between the two need to operate on a syntax level to extract information and then reformat it in a way required by the target.

Relationship to CREMA

CREMA states that it intends to apply its approach ‘to integrate data from concrete existing software systems [and] to do this a software application will be developed [] which will enable a business analyst driven approach for the automatic linking of organisations data schema (databases, sensors, messages, spreadsheets, knowledge sources). Thus CREMA Task “Data Harmonisation Services” purpose is to link these sources to the model identified in the CREMA Task on “CREMA Data Model”. The approach will be to use annotations, ontologies and semantic mapping techniques to rapidly construct the links and then from there generate standalone automatic transformation executables (manufacturing maps). These will be available in the CREMA map store

General References


  1. L. Mazzola, P. Kapahnke, M. Vujic, M. Klusch, "CDM-Core: A Manufacturing Domain Ontology in OWL2 for Production and Maintenance", in Proc. 8th International Conference on Knowledge Engineering and Ontology Development (KEOD), Porto, Portugal, 2016
    none entered
    none entered
  2. P. Agarwal et. al., “Approximate Incremental Big-Data Harmonization”, IEEE International Congress on Big Data, Santa Clara Marriott, CA, USA, 2013, pp. 118-125
    none entered
    none entered
  3. S. Abels, V. Chepegin, S. Campbell., “Semantic Interoperability for Technology-Enhanced Learning Platforms”. Proceedings of the 9th IEEE International Conference on Advanced Learning Technologies, ICALT2009, Riga, Latvia, 2009
    none entered
    none entered
  4. S. Abels, H. Sheikhhasan, S. Campbell., “STASIS - Creating an Eclipse Based Semantic Mapping Platform”. Proceedings of the e-Challenges 2008 Conference, Stockholm, Sweden. 2008
    none entered
    none entered


  1. Talend Data Integration Link
    The fastest, most cost effective way to connect data.

    Talend offers robust data integration in an open and scalable architecture to maximize its value to your business. As part of the Talend Data Fabric, Talend Data Integration provides the unified tools to integrate, cleanse, mask and profile all of your data, enabling you to turn data into decisions.

    - Develop and deploy 10 times faster - Increase trust in data

    - Lower cost of ownership
    none entered


  1. HarmoSearch Project (2010-2013): Harmonised Semantic Meta-Search in Distributed Heterogeneous Databases
    Objective of HarmoSearch is to leverage the use of an existing mediation and harmonisation service for the European tourism market, called Harmonise, by adding new components of clear market value addressing specific user needs. The current version of Harmonise, Harmonise 2.0, is an online service to exchange data with others partners without the need to change the local data schema. Research work is needed to develop an (semi-) automatic mapping tool, which allows users to generate mappings without any technological knowledge or mapping skills, and a semantic meta-search component, based on a semantic registry, to addresses scientific issues like mapping of search queries and intelligent routing to appropriate data sources. HarmoSearch is based on the work of past projects and activities, like Harmonise (IST 2000-29329), Harmo-TEN (eTEN C510828), a CEN Workshop Agreement (CWA 15992:2009 E), activities of the non-profit HarmoNET association, the project (eTEN C046229) and portals like e.g. This project brings together five SME partners with a history in the domain and of which some have already made investments in the Harmonise system in the past. They would benefit from a broader use of the Harmonise system in general, but each partner also brings a specific need in addition. Even in a conservative business model, carefully evaluating future incomes, the project could generate additional sales on a five year horizon doubling the total project volume and exceeding the investment needed by SME partners for this project. Not considering the overall positive impact on the European tourism market by overcoming technical obstacles for new and improved services. Project results are first of all the ownership of the SME partners, while RTD partners have the right to use knowledge for their research work and other partners may use the outcomes for their networks (HarmoNET and
    none entered
  2. EENVplus Link
    none entered
    none entered
  3. STASIS Link
    none entered
    none entered
This page was last changed on 9 June 2017, at 16:58.

Please log in if you do not want to leave your comment anonymously.

To contribute:
Log in