Publication date: 12 October 2020
Contributors
Trond Aalberg (Norwegian University of Science and Technology, Norway)
Carlo Bianchini (Università degli studi di Pavia, Italy)
Marshall Breeding (independent consultant, USA)
Elena Corradini (Biblioteca comunale di Ala – Sistema bibliotecario trentino, Italy)
Karen Coyle (independent consultant, USA)
Marija Dalbello (Rutgers University, USA)
Claudio Forziati (Università degli studi di Napoli Federico II, Italy)
Mauro Guerrini (Università degli Studi di Firenze, Italy)
Antonella Iacono (Biblioteca civica di Biella, Italy)
Giovanni Michetti (Sapienza Università di Roma, Italy)
Maura Quaquarelli (Alma mater studiorum – Università di Bologna, Italy)
Roberto Raieli (Sapienza Università di Roma, Italy)
Riccardo Ridi (Università Ca’ Foscari, Venezia, Italy)
Gino Roncaglia (Università della Tuscia, Viterbo, Italy)
Lucia Sardo (Alma mater studiorum – Università di Bologna, Italy)
David Weinberger (writer, USA)
Paul Gabriele Weston (Università degli studi di Pavia, Italy)
Edited by
Italian Library Association (AIB) – Study Group on Cataloguing, Indexing, Linked Open Data and Semantic Web (CILW)
Aims and scope
The aim of this document is to share recommendations concerning theories and techniques, uses and developments, possibilities and risks of Semantic Web projects and Linked Data technologies, with a focus on their social usefulness, their value for culture, their importance for scientific research and academic studies.
Methodology
To achieve the purpose, we asked the invited authors to share their vision on the topic and to summarize the crucial elements in defined points.
For each of the common crucial elements we also asked the authors to comment and provide further suggestions, in order to include different opinions on the convergent topics.
At the end of this process, we have blended the different, and sometimes incompatible, viewpoints in a single document, divided into paragraphs, and each one in theses, that provide a unified definition and facilitate a global, shared conversation about Semantic Web sense and use.
License
This work is licensed under the AIB Copyright statement.
Waiving the AIB Copyright statement, this work is licensed under the license Creative Commons — Attribution 2.0 Italy (CC BY 2.0 IT).
The Manifesto
We, the undersigned, believe that increasing access to information and
knowledge across society, assisted by the availability of information
and communications technologies (ICTs), supports sustainable
development and improves people’s lives.
(The Lyon Declaration, 2014)
Introducing the theme
The Web has become the largest assembly of public meaning in human history, yet that meaning is divided by differences in language, concepts, and norms.
The Semantic Web aims at making the Web’s meaning universally accessible and usable. It can be described as a global network of interlinked and semantically annotated data, enabled by a stack of formats and specifications for representing and sharing information in a world-wide context.
The Web was initially designed for pages intended for human eyes. The Semantic Web adds the cues required for machines to understand the information on those pages.
The Semantic Web that is emerging is partially fulfilling the vision of it as an extension and evolution of the document-centric Web towards a data-centric Web, but its use (and our needs and requirements for use) is changing as we’re exploring and learning its boundaries.
In parallel with the development of the technologies for the Semantic Web, there has also been a paradigm shift in the strategies for creating, sharing and managing data in many domains.
Rather than the ‘old fashioned’ local and system-centric perspective, most information management initiatives are now adopting open, decentralized, and global data-centric strategies where the emphasis is on increasing the value of data by building common reference models and technologies.
To achieve these results, interoperability, reuse and integration shall be the basic aims, being the greatest benefit of the Semantic Web not to structure the meanings but making meanings interoperable – across systems, data structures, languages, cultures and thoughts, which are context-dependent.
The biggest challenge for the Semantic Web is creating self-explanatory data that may be clearly interpreted by entities other than the creators, because they are interrelated with contextual data. In doing so, everyone can improve the quality of data.
The various relations in the Semantic Web can be developed only if ontologies appropriate to each data context are structured to support the net of links. Doing so it would be possible the navigation among them and the relation among different domains.
The conceptual development of an ontology is a complex issue already within a given domain. A community agreement is a possible solution to start with and find the structures and terms that can represent resources.
Behind the simplicity of the Linked Data there is the complex description of the data with which the basic triples of data are formed, and this complexity increases when the aim is to link data across different domains and datasets.
Principles
The Semantic Web, born from the ideas of Tim Berners-Lee, enriches and ‘magnifies’ the original democratic instances of the World Wide Web.
Berners-Lee has also designed a data structure to capture and express the unconstrained information posted on the Web, which is called Linked Data.
Linked Data enable information to be expressed in simple molecules – called triples – that state a relationship between two things. This relationship is designed to enable systems to connect what is known regardless of the languages used, and to name what those triples refer to. Unlike traditional data management systems that can only record information that they were set up to know about – like fields in a set form – Linked Data can stretch to include unanticipated concepts they would encounter.
As the new Semantic Web technology, Linked Data are developed as bottom up systems, implemented by a variety of subjects across the Web as we know it.
Despite the diversity of their respective paths, a common framework is produced, both in the institutions and in the grassroots communities, starting from the common vision and purposes.
The system is designed so that almost anyone can produce Linked Data that populate the network of the Semantic Web in the form of triples – the basic semantic structure – by adopting the four rules established by Tim Berners-Lee and by sharing the practices and protocols recommended by W3C.
Semantic enrichment and ontologies support automatic inferencing by software agents such as search engines.
The Linked Data mechanism is the technology that allows to create, structure and support the Semantic Web through URI, XML, RDF, OWL, SKOS, SPARQL and other standards that are all free, born and developed collaboratively, and essentially provide granularity, integration and sharing.
Linked Open Data (LOD) are Linked Data released under an open license, which supports their reusability and interoperability, that is the ability to exchange information at various levels:
– semantic: that is to connect meanings, namely the various ways in which the communities refer to the same concept or similar concepts;
– technological and technical: allowing the systems we use to manage, describe and process data to interact seamlessly;
– human: allowing all data-using communities to exchange their skills and knowledge
freely and independently through shared specifications and data models;
– organizational: leveraging cooperation between communities that manage data in order to ensure quality, provenance, reliability, compatibility of licenses and usability of data.
LODs are not tied to any specific language or culture, and shall be increased without geopolitical, commercial or any other limits. The Semantic Web created through them is a set of unlimited connections without predefined schematizations.The Semantic Web finds its richness in being a decentralized and open system, in which different meanings can coexist, without there being a convergence a priori, or a universally shared schema of meaning.
The Semantic Web is part of a worldwide, general trend toward information sharing, which entails renouncing a proprietary perspective in order to create a common space. In this space, every cultural resource can circulate in digital format or representation of the original, material form, due to its fully shareable and reusable nature.
The purposes of connecting, sharing, reusing, cooperating, are not new to cultural institutions and stakeholders, but they are now part of a growing information landscape. Opening, connecting and sharing data is a very effective way to disseminate knowledge and democratize its use.
The most democratic utopia of the Semantic Web would therefore be that any human being, from anywhere in the world, any level of society and education, would access resources following the ramified paths of the linked open data up to reach either the desired bibliographic information, or a digital image, a project digitized by the copyright holder, etc., or even information about which he or she was previously not aware of.
Mines of meaning
Almost everything on the Web was placed there with a purpose, which means it expresses meaning. And even what was placed on the Web following a machine-generated event, is there because it has been associated to some meaning.
The creator’s intentions may differ from the meaning attributed to something by whoever stumbles upon that meaning on the Web. The mines of meaning are literally inexhaustible, because there is no functional difference in a tight and circular connection between discovering meaning and creating it.
The Semantic Web is already a crucial part of the natural environment for digital knowledge, but due to the low-level representation of the data there is a need for layers of software between the data and the end user.
While we are happy to see, and embrace, the rise of additional standards and protocols for connecting distributed information, we believe the Semantic Web and LOD provide distinctive benefits for information embedded in Web pages, and for representing and interconnecting large, complex data as part of the open Web’s infrastructure.
Technologies will help solve some critical Semantic Web factors such as the query language of RDF systems, SPARQL.
The main difficulties that the user encounters in creating SPARQL queries and running them on a system – i.e. knowledge of both language and ontologies to be used in the query – can be overcome with advanced tools, capable of correlating user queries with the results produced by different data sources.
Many independent tools aimed at exploring, searching and making sense of the Semantic Web data have been developed, such as interactive data visualization and query in experimental domains. But we have still to develop friendly, usable tools that empower users to make use of this data, comparable with the web browsers that contributed to the success of the World Wide Web.
It is advisable to reach an international and mutual agreement among the communities in producing and disseminating knowledge, at least inside the respective domains.
Beside an intellectual agreement, it is necessary to respect and consider the communities that shall benefit from this knowledge, trying to adopt solutions that can be useful for people.
In all this, data and knowledge as ‘commons’ seem to be inseparable from Open Access and Open Science, as from Open Source.
Data quality and structure
Because of the formalism and soundness of Semantic Web standards and supporting technologies, the Semantic Web has the potential to be adopted as the main solution for data management and dissemination in many domains and to be used for almost any kind of data.
The quality of the Semantic Web is an emerging topic that needs to be given high priority in future research and development.
Publishing data on the Semantic Web should also imply that the data meet basic quality requirements: simply transforming any local information to RDF does not necessarily produce really reusable data.
Data quality has many dimensions and so far, the terminology for discussing quality – and the methodology for analysing quality – is poorly defined.
On one hand we need metrics and methods to assess and improve the quality, on the other hand we need to increase the awareness about the Semantic Web data quality.
An important lesson learned from the first decade of the Semantic Web, is that universal meaning, linking and reuse of data highly depends on a core set of commonly agreed upon and recognized concepts and types.
Numerous authoritative Semantic Web resources have emerged and provide us with trustworthy identifiers of concepts in many domains. However, the current set of such resources are far from being sufficient and do not necessarily have the required quality in terms of reliability and coverage needed as a backbone for the Semantic Web.
Semantic systems on the Web are loaded with a fundamental tension between providing enough structure in order that our machines can forge connections among pieces, and providing not an inviolable structure, that would prevent either the creation or discovery of new meanings and connections.
For each coherent set of information, the ideal level of granularity shall be the one that maximizes the possibilities of reusability in different contexts and of exploration of the information nodes along diversified paths. This shall happen without compromising the readability of each set of information even as unitary documents.
Finding such a balance is never easy, and at the moment it is impossible to think a machine or an abstract rule may take decisions on this behalf without human intervention.
Mind the risks
If Semantic Web means to increase the quality, standardization and interoperability of metadata on the Web, adding them to primary documents in order to facilitate their search, evaluation and use, then it can be a realistic and useful project to which librarians, archivists, curators and scholars can importantly contribute, thanks to their skills and values.
If Semantic Web means to replace primary documents with granular data combinable from time to time, then it would be a senseless project, because it is impossible to imagine an overall system of production, storage, communication, acquisition and use of knowledge that leaves aside that fundamental element of the organization and management of information represented by the document.
If Semantic Web means to delegate completely (or, in any case, to a large extent) to algorithms and automatic mechanisms the bundling of data in order to obtain more extensive information structures than the original ones and to infer evaluations and decisions from these new connections, then it can be a dangerous project.
Not only documents, but even data may be not objective and neutral, and their choice, organization, contextualization, interpretation and evaluation are activities in which machines can help but must never replace human beings.
Human responsibility on interpretation of documents and data was needed before the Web and the Semantic Web, and it will be so also in the future.
Anyway, documents and data are mostly – if not always – not objective and not neutral. Communication is always oriented: even a traffic sign is oriented, intended for a specific audience, and meaningful for a limited range of people.
The resource analysis process should lead to a new synthesis in which these extremely granular data do not lose meaning, nor the relationship with the objects and their original contexts.
This need for contextualization is strongly felt, but various professional communities would not respond to a request for aggregating data that are too differently described in each discipline. A specific sector may need to develop specific sets of information to better represent a resource. These sets may then be variously combined and interpreted whenever used in other knowledge fields.
In any case, Semantic Web architecture always implies an organizational choice of raw data and, therefore, a specific informational choice, although it is neutral or agnostical from a domain-related point of view.
If, on the one hand, the LOD paradigm offers the basis to communities to share their heritage and culture, on the other hand every single community involved must play its own role in the process, by valuing its own authority, in order to reach a convergence that includes richness of data related to every identifiable, definite, and truthful context.
Due to the awareness of the risks posed by knowledge production and publication by virtually anyone, the Semantic Web project has been provided from the origin with some criteria of self-protection and control, that are currently experimented and investigated.
The three highest levels of the Semantic Web architecture – indicated by Berners-Lee – are those of unifying logic, proof and trust, that would complete the Semantic Web architecture with essential structures to provide data and consequent disseminated knowledge with trustworthiness, certainty, and precision/exactness.
At the top, the search for credibility and authoritativeness of the production sources and of data uploaded to the Web form the level of trust, that will confirm machine operations conducted through verified and trustworthy sources, thanks also to cryptography and digital signature tools.
The freedom of the Net allows everyone to publish anything. This can generate issues for the Semantic Web, where machines, without the ability to discern between true and false, need that the level of trust be activated to be able to work on available data.
Finally, among the risks, the fragility of a system based substantially on the persistence and immutability of the identifiers (e.g. URI) must be indicated. These identifiers are not inherently persistent. They persist only because people manage them by ensuring, for the time in which they are able to do so, persistence and uniqueness.
Open developments
The future of the Semantic Web is laden with various expectations. It is promising, thanks to the activity of many organizations that already own big datasets and decide to share them concretely and to start new digitization projects of their own resources.
The entire knowledge community allows the dissemination of these same resources through the web, at the same time enriching them with new data and exploring their meaning through unexpected paths and relationships.
Various commercial organizations, among which some publishers, have begun to believe in the Semantic Web as a perspective that will affect their future development, business models, and businesses. Therefore, they are looking for collaboration with cultural institutions, universities, research centers and the whole web community, to prepare for the future.
As a matter of fact, in the last period most of the systems and new generation activities are increasingly open by default.
In this respect, the debate grows around the diffusion of Open Access, LOD, Open Science and governmental as well as commercial Big Data. It is necessary for scientific and industrial research to embrace this increasingly complex and open world, that is circulated by the Internet and the Web.
For this reason it is to be expected to see more attention to how library, archives and museums associations worldwide would propose the adoption of the Semantic Web principles by its members, such as in the capabilities of resource management systems and discovery systems, in other practical applications for cultural institutions, or through partnerships with global initiatives.
Not to forget
Data has value directly proportional to its use and re-use.
We can evolve from being on the Web to being of the Web, and the entire Web can be ours to use.
We need a Semantic Web because we live in a semantic world.
Meaning does not exist in things as such, but it is either generated or inferred through relationships and interactions among data.
We are never going to agree about what things mean on the Web, or off of it. This is a good thing.