[S4.3] Heritage data-centric research: are FAIR data fair enough?

Date: Thursday,4 October
Time : 09:30 - 13:00
Duration : Half day
Place: Cultural Conference Centre of Heraklion (CCCH)
Audience Type : Open
Type: Invited session

In the current trend for e-Science, i.e. collaborative, computationally- or data-intensive research, archaeology is not a laggard. A number of initiatives are addressing how to manage and use data produced by heritage research, most notably the ARIADNE one in the archaeological domain (https://www.ariadne-infrastructure.eu), presently involving the most important research centres from all European countries in creating a comprehensive and integrated archaeological data infrastructure that so far has already registered little less than 2.000.000 archaeological datasets. Such infrastructure, implemented by ARIADNE, is bringing archaeology out of the “long tail of science”, i.e. those disciplines that make little use of data-centric research. It is revolutionising the concept of Big Data: not relatively few datasets, each with terabytes of numbers, as in nuclear physics; but millions of small datasets, all potentially relevant to a specific research question but including a large (and unknown) majority probably irrelevant at all.

E-Science relies on the well-known FAIR principles (https://www.force11.org/fairprinciples), stating that data should be Findable, Accessible, Interoperable and Re-usable. Now, if “F”, “A” and “I” mainly depend on the technical way in which data and metadata are generated, stored, managed and curated, the “R” has less technical (but not less important) implications. It involves theoretical, methodological and epistemological aspects that have not received enough attention in the current debate. It has been argued that e-science discovery could be modelled as a deterministic discovery process; nevertheless, even in this perspective, simply modelling the provenance of data is not sufficient, but the provenance of the hypotheses and results generated from analyzing the data need to be modelled as well.

Thus, to reuse data in cultural heritage it is necessary to expand the “R” facet of the FAIR principles at least into R3: Re-usable, Relevant and Reliable. Judging relevance and reliability may appear obvious to a human eye, but it is not to machine processing. Data reliability depends on a chain of trust that needs to be adequately supported by documentation, and on this regard the CIDOC CRM may play a key role. If in the past reference to previous discoveries published in journals and books was based on the academic practice of peer-review and on the authoritativeness of the author and of the publication, re-using data created by others is still lacking a similar good practice.

The session will discuss such aspects and propose ways to address the issue. Contributions will come from purely cultural heritage practice (“What would you need to rely on somebody else’s data?”) to semantics (“What would you suggest to document, in order to support reliability?”). Both aspects will be analysed in light of the CRM: does it already provide a sufficiently rich toolbox, or additions are required? If so, which ones?


Franco Niccolucci

PIN, Prato, Italy

Nicola Barbuti
University of Bari, Italy
About the speech:

The R4 to Identify Born and Digitized Cultural Heritage: Re-usable, Relevant, Reliable and Resistant

It is urgent and imperative to identify what, and how much of the digital resources produced up to day we can identify as “born digital and digitized cultural heritage”. This process needs the definition of clear and homogeneous criteria, according to which we can distinguish digital cultural entities from the daily magmatic production of data. As the FAIR Principles alone do not seem to be sufficient for this purpose, we believe that the FAIR R should be quadrupled in R4: Re-usable, Relevant, Reliable and Resistant. We think that these requirements will give the digital data the value of Cultural Heritage, as they are perfectly specular to the definition we can give of what we commonly consider tangible and intangible cultural heritage.

Martin Doerr
About the speech:

CRMInf: Supporting Facts by Arguments

In the current practice of documenting cultural heritage the maintainers of the databases mostly present facts as their best knowledge, adding some citations, but without analyzing the reasons why a particular fact is believed or not. Archaeological records may contain more detailed justifications, but only in limited cases related to individual facts. On the other side, computer scientists have developed advanced argumentation systems, but more to support an expert dialogue than to justify and maintain the validity of facts in documentation systems. CRMInf is a CIDOC CRM - compatible extension designed for the latter. Currently, it contains a basic model of ways to acquire new knowledge, and it is being further specialized for supporting more directly the discourse with historical sources and with scientific observations. We will present the theory underlying CRMInf, the current state of development, and implementation issues.

Achille Felicetti
PIN, Prato, Italy
Short Bio:

Achille Felicetti has a degree in Archaeology and a diploma in computer programming. He coordinated the teams working on the creation of semantic tools, such as the AMA mapping tool and the SAD semantic query system used in EPOCH and COINS projects, as well as the semantic annotation tools of 3D-COFORM. He coordinated the development of the ARIADNE platform and portal for interoperability of archaeological information and the definition of terminological tools and vocabularies for standardization of archaeological information. He is currently coordinator of the team in charge of the definition and application of the CIDOC CRM CRMarchaeo extension for the encoding of excavation information, and of the CRMtex extension for the modelling of inscriptions and ancient texts. He is responsible for the development of annotation and NLP tools for knowledge extraction from textual archaeological documentation within the EOSCpilot initiative. Achille is also system administrator of the VAST-LAB laboratory at PIN and is interested in new technologies research and web applications development. He is the creator of various tool sets for Digital Libraries building and for the management of cultural heritage datasets using semantic technologies. He is the author of many papers dealing with the conceptual modelling and application of semantic technologies to the standardisation, interoperability and management of archaeological documentation.

About the speech:

Heritage Science and Cultural Heritage: a CIDOC CRM-enabled Model for Integration and Interoperability

The main goal of our model is to collect provenance data of scientific datasets resulting from Heritage Science research, and to document it in a standard and accessible way. Our approach, inheriting and adapting common logics and concepts of existing models and taking inspiration from the semantic principles of CIDOC CRM, proposes a schema composed of reusable XML modules, intended to describe Heritage Science entities (including actors, devices, datasets, analysis and other events) in great detail, and dynamically organised in a common framework by means of a set of internal links based on persistent identifiers. Such a structure implements a platform-independent meta-format able to express the essence of the data while remaining unbound to any specific system or software, and supports the necessary confidence in somebody else’s data for re-use.

Marianna Figuera
University of Catania, Italy
About the speech:

A Fuzzy Approach to Evaluate the Attributions’ Reliability in the Archaeological Sources

The problem of the relevance of the archaeological sources could be addressed from a different perspective: considering the reliability concept liked to the subjectivity inside the archaeological data. I would like to present a case study of the so-called small finds coming from Phaistos and Ayia Triada (Crete). The unusual finds analyzed and the specific history of excavations of the two sites led to the realization of a procedure in which a Fuzzy approach has been used to preserve the degree of uncertainty of the functional attributions. The concept of “probability of belonging” and the management through multi-assignment of the sources’ attributions could suggest a possible methodological approach to the validation of the relevance and reliability of the archaeological data.

Sorin Hermon
STARC, The Cyprus Institute, Nicosia, Cyprus
About the speech:

How FAIR are the FAIR principles for archaeological data?

The aim of the presentation is to discuss the added value of making archaeological data FAIR, in particular primary data collected during fieldwork, such as 3D models of excavation units, analytical measurements and geodesic data. The main argument of the discussion is that without a formal representation of data provenance, such data can be FAIR but of little use for archaeological research.

Olivier Marlet
University of Tours, France
About the speech:

Logicist writing for reliability of data-centric research in archaeology

Within the framework of the activities conducted by the consortium MASA (Memory of Archaeologists and Archaeological Site) from the very large facility Huma-Hum, the Laboratoire Archéologie et Territoires (University of Tours/CNRS, France), in collaboration with the MRSH (University of Caen/CNRS, France), set up a logicist writing publication for the results of the excavation of the Rigny cemetery. For this publication, Elisabeth Zadora-Rio has formalized her archaeological reasoning according to the precepts of Jean-Claude Gardin, thus proposing a clear structuring of the logic of inferences allowing going from field observations to the most synthetic interpretations. The web application developed makes it possible to read the publication in a synthetic way or to deepen the reading by going as far as excavation data, information directly linked to ArSol, our online database.

Pierre-Yves Buard
University of Caen, France
Christian-Emile Smith Ore
University of Oslo, Norway
Short Bio:

Christian-Emil Ore is an associate professor and head of Unit for Digital Documentation (EDD) at the University of Oslo and has worked with digital methods in the humanities for 25 years. Ore works along three main lines: Methods for cultural heritage documentation, lexicography & corpus and electronic text editions (medieval charters).

About the speech:

CIDOC-CRM as semantic glue for excavation data sets, site and monument registries and museum collections

In archeology, a major issue the last 10-15 years has been to rescue, preserve and give access to the data sets from archeological excavations. The EU infrastructure project, ARIADNE, has been a driving force in this work. As a result, a very large number of archaeological datasets are now accessible. It is a goal to apply the FAIR principles from e-science to these archeological datasets. Still there is a huge number of data sets, which has been definitely lost or is not accessible or re-useable. It is also an issue that too often there are only weak links between the excavation and the data sets on the one hand and the museum collections (find repositories), site and monument registries and publications on the other. To strengthen the FAIR-ness of the datasets such links have to be strengthened or at least established. In Norway a new infrastructure project, ADED (Archaeological Digital Excavation Documentation) was launched in 2018 with the objective to create a repository for data sets and establish the aforementioned links. In this infrastructure, the CIDOC-CRM suite will be applied as semantic glue. The presentation will use this concrete project as a basis for discussing the applicability of CIDOC-CRM to the challenge of increasing the FAIR-ness of the datasets.

Panos Constantopoulos
Athens University of Economics and Business, Athena Research Centre, Greece
About the speech:

Ontology-based research process documentation as a reusability enabler
The Scholarly Ontology (SO) is an ontology for modelling research processes derived from CIDOC CRM. It has evolved as a generalization of the NeDiMAH Methods Ontology (NeMO) and enjoys extensive empirical grounding. Due to its cross-disciplinary character, the SO enables documenting and analyzing research processes unfolding in one or more domains, and, correspondingly, associating data from disparate, domain-specific sources. The research process is addressed from four complementary perspectives: activity, procedure, resource and agency. We view the contextualized, structured, process-oriented documentation of scholarly work using SO as an enabler of the reusability of cultural heritage data.

Joseph Padfield
The National Gallery, London, UK
About the speech:

Putting theory into practice - Using a CIDOC based venue ontology to describe the movement of paintings within the National Gallery

Using the CIDOC CRM in the NG with the Venue ontology allows considering how it is practically used, developing a simple, internal PID system and its incorporation within a practical tool for capturing and recording the movement of paintings, thus documenting their provenance and relationship with the parts of the gallery and the whole.