Wiki source code of About
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | == Goals and Objectives == | ||
| 2 | |||
| 3 | The overall goal of implementing SKMS in statistical organizations is to increase the effectiveness and potential of using statistical data by ensuring unambiguous and semantically rich interpretation — both by people and information systems. | ||
| 4 | |||
| 5 | Moving towards [[FAIR>>https://www.go-fair.org/||rel="noopener noreferrer" target="_blank"]] statistics and interoperable statistical data SKMS focuses on the following objectives: | ||
| 6 | |||
| 7 | * Provide a shared semantic environment that brings together documents, glossaries, classifications, and standards into a unified, machine-readable framework (factory) for the creation, dissemination and interpretation of Linked Open Statistical Data (LOSD). | ||
| 8 | * Create an extensible, interconnected context for data modelling based on semantic assets via both machine-readable forms interpretable by information systems, and visual representations understandable by people. | ||
| 9 | * Enable the preparation and dissemination of LOSD and semantically rich metadata ([[“smart” metadata>>http://cosmos-conference.org/index.html||rel="noopener noreferrer" target="_blank"]]), in accordance with [[FAIR>>https://www.go-fair.org/||rel="noopener noreferrer" target="_blank"]] principles, ensuring semantic interoperability. | ||
| 10 | * Provide semantic assets for reuse to enhance the quality of linked data and metadata, improve comparability, and facilitate cross-domain integration. | ||
| 11 | * Foster collaboration between IT professionals and statistical experts to co-develop semantic models, aligning terminology and classifications, preparing informative indicators and LOSD sets descriptions to ensure their relevance, usability, and operational value. | ||
| 12 | |||
| 13 | {{box cssClass="box_green"}} | ||
| 14 | == SDMX Implementation == | ||
| 15 | |||
| 16 | International standards like Statistical Data and Metadata eXchange (SDMX) have provided a robust foundation for metadata exchange in official statistics. However, our experience has revealed significant limitations influencing the achievement of semantic interoperability. SKMS addresses these gaps by integrating SDMX structures into a semantic interpretation environment via the Interoperability Basis platform. The platform supports semantic alignment, enrichment, and publication of data exchange standards using a knowledge management system, modeling tools, namespace control, and persistent URI infrastructure. | ||
| 17 | {{/box}} | ||
| 18 | |||
| 19 | == Linked Data == | ||
| 20 | |||
| 21 | The World Wide Web Consortium ([[W3C>>https://www.w3.org/||rel="noopener noreferrer" target="_blank"]]) recommends Linked Data as the most effective way to publish data on the Internet. Linked Data is developed according to the principles of the [[Semantic Web>>https://www.w3.org/standards/||rel="noopener noreferrer" target="_blank"]] — a global semantic infrastructure and a set of fundamental rules for representing data on the Internet in a way that allows information systems to interpret its meaning correctly. | ||
| 22 | |||
| 23 | Linked Open Statistical Data (LOSD) refers to statistical datasets published as Linked Data under an open license such as [[CC BY 4.0>>https://creativecommons.org/licenses/by/4.0/||rel="noopener noreferrer" target="_blank"]], promoting free reuse and wide dissemination. Interoperability is achieved by creating, exchanging, and using LOSD in ways that preserve the meaning and context of the data, regardless of the systems involved. | ||
| 24 | |||
| 25 | Human understanding and machine interpreting of statistical data is often difficult due to the lack of formalised domain knowledge and the absence of machine-readable, semantically enriched data. Poor semantic structure means that even published linked data can be hard to discover and accurately relate to domain concepts. | ||
| 26 | |||
| 27 | The High-Level Group for the Modernisation of Official Statistics ([[HLG-MOS>>https://unece.org/statistics/networks-of-experts/high-level-group-modernisation-statistical-production-and-services||rel="noopener noreferrer" target="_blank"]]), under the United Nations Economic Commission for Europe ([[UNECE>>https://unece.org/ru||rel="noopener noreferrer" target="_blank"]]), addresses the challenges of data interoperability within national statistical systems. It develops and promotes methods, models (including semantic models such as ontologies), and standards through coordinated initiatives. One of these initiatives is the Data Governance Framework for Statistical Interoperability ([[DAFI>>https://unece.org/sites/default/files/2024-03/HLG2023%20DAFI%20Final_0.pdf]]), published in 2023. This framework provides a reference model for implementing governance programs that support the creation, sharing, and use of data in ways that preserve semantic meaning across systems. | ||
| 28 | |||
| 29 | Another priority of HLG-MOS is the development of rich (“[[smart>>http://cosmos-conference.org/index.html||rel="noopener noreferrer" target="_blank"]]”) metadata — metadata that is standardised (understandable and reusable across contexts), active (capable of driving statistical processes), and aligned with the [[FAIR principles>>https://www.go-fair.org/||rel="noopener noreferrer" target="_blank"]] : Findable, Accessible, Interoperable, and Reusable. | ||
| 30 | |||
| 31 | We share these goals and move forward in step with HLG-MOS initiatives — SKMS already reflects key principles and objectives that resonate with this international agenda. | ||
| 32 | |||
| 33 | A key enabler of [[FAIR>>https://www.go-fair.org/||rel="noopener noreferrer" target="_blank"]] implementation in statistics is the use of semantic technologies for both data dissemination and the formalization of knowledge in the form of semantic models (semantic assets). Semantic assets (SAs) are reusable formal representations of data such as: (1) metadata schemas (e.g. XML or RDF), (2) core data models or common models, (3) ontologies, thesauri, and reference data (e.g. code lists, taxonomies, glossaries). These assets are published as open data standards and used in the development of knowledge management systems, harmonizing indicators and classifications, and preparing LOSD. Semantic models support unambiguous interpretation, semantic search, and the discovery of data across disparate sources. | ||
| 34 | |||
| 35 | The adoption of LOSD creates new opportunities for discovering, searching, comparing, and integrating statistical data from multiple sources through [[Semantic Web>>https://www.w3.org/standards/||rel="noopener noreferrer" target="_blank"]] technologies, including semantic integration methods. This approach enables the achievement of the highest level of data maturity according to the 5-star model proposed by Tim Berners-Lee. | ||
| 36 | |||
| 37 | == Operational Cycle == | ||
| 38 | |||
| 39 | Effective implementation of LOSD requires an open semantically rich interpretation environment. The set of systems and tools under SKMS “umbrella” forms a unified terminological and methodological basis for the development of rich semantic models and then provides the possibility of their use for the preparation, dissemination and interpretation of linked data and rich metadata. An important principle underlying the proposed methods and tools is to ensure the collaboration of IT specialists and statistical experts. | ||
| 40 | |||
| 41 | The full operational cycle consists of seven stages: | ||
| 42 | |||
| 43 | 1. Collection and systematization of methodological documents (creation of an electronic library), adding annotations, discovering terms-candidates and primary markup with related terms and documents. Publishing documents in original structured form with hypertext markup in a specialized "Methodology" section. | ||
| 44 | 1. The development of glossaries (the formation of detailed terminological articles), indicators descriptions based on the analysis of methodological documents, and then the generation of corresponding semantic assets. Refinement of hyper-text markup in accordance with modelled glossaries. | ||
| 45 | 1. Publishing semantic assets generated in the SKMS. | ||
| 46 | 1. Development, aligning and cataloging of necessary SA, code lists or other models of statistical domains in accordance with semantic standards. | ||
| 47 | 1. Importing datasets from external sources or data warehouses (DWH). Transformation of datasets using the RDF Data Cube Vocabulary, semantic enrichment. | ||
| 48 | 1. Visualization and validation of semantic models and LOSD sets. | ||
| 49 | 1. Construction of rich metadata that is transmitted for publishing in external analytical systems. | ||
| 50 | |||
| 51 | SKMS is based on the XWiki extension to using semantic technologies. It provides special templates for publishing documents, glossary terms, and indicator descriptions. They are used by domain experts to formalize statistical knowledge and provide their human-readable representation fixed in SAs. The LOSD pipeline is supported by generators and constructors developed to automate the formation of LOSD, semantic models and semantically enriched metadata. SKMS may be integrated with a cataloging service that supports not only the organisation of semantic assets, but also their visualization, access, and dissemination through standard interfaces such as OpenAPI and SPARQL Endpoints. | ||
| 52 | |||
| 53 | == Benefits == | ||
| 54 | |||
| 55 | Combining international experience and our own research, SKMS provides a semantically rich interpretation environment for statistical institutions to: | ||
| 56 | |||
| 57 | * Enhance the quality of statistical data and metadata | ||
| 58 | * Harmonize statistical terminology and classification | ||
| 59 | * Align with [[FAIR>>https://www.go-fair.org/||rel="noopener noreferrer" target="_blank"]] principles | ||
| 60 | * Ensure semantic interoperability and reuse | ||
| 61 | * Facilitate accurate (meta)data interpretation | ||
| 62 | |||
| 63 | The use of SKMS brings the following benefits: | ||
| 64 | |||
| 65 | * Adoption of semantic modelling in statistical practice | ||
| 66 | * Generation of semantically rich metadata and LOSD sets | ||
| 67 | * Validation of results using visualization tools | ||
| 68 | |||
| 69 | == Key Users == | ||
| 70 | |||
| 71 | SKMS is designed to support a wide range of stakeholders engaged in the creation, management, dissemination, and use of statistical knowledge as well as the development of linked statistical data. | ||
| 72 | |||
| 73 | Each user group contributes to and benefits from the semantic foundation provided by SKMS: | ||
| 74 | |||
| 75 | * **National Statistical Offices**: expected to provide domain-specific documentation, develop national semantic assets, and integrate LOSD into their official dissemination platforms. They can use SKMS to align methodologies, harmonize indicators, and to enhance the quality of statistical data metadata. | ||
| 76 | * **International Organisations** (e.g. ILO, FAO, Eurostat, UNECE): contribute international classifications, standards, and glossaries, and can use SKMS to support semantic interoperability across countries. They benefit from improved alignment of national data and from the ability to publish reference models in a reusable semantic format. | ||
| 77 | * **Statistical Methodology Experts**: play a key role in reviewing and formalizing statistical definitions, ensuring conceptual clarity and consistency across indicators and classifications. Their contributions strengthen the semantic backbone of statistical domains. | ||
| 78 | * **Metadata and Knowledge Managers**: are responsible for curating glossaries, maintaining multilingual terminologies, and ensuring the semantic quality of published content. They use SKMS to build, manage, and share semantic models. | ||
| 79 | * **Data Integration and Interoperability Teams**: apply SKMS tools and semantic assets to link data across sources, map between standards, and ensure that contextual meaning is preserved in statistical exchanges. They help implement [[FAIR>>https://www.go-fair.org/]] principles. | ||
| 80 | |||
| 81 | Each of these user groups contributes to the ecosystem of Linked Open Statistical Data, enabling a sustainable and collaborative infrastructure for semantically enhanced statistics. | ||
| 82 | |||
| 83 | == Standards and Technologies Used == | ||
| 84 | |||
| 85 | The Semantic Knowledge Management System relies on a set of well-established [[Semantic Web>>https://www.w3.org/standards/||rel="noopener noreferrer" target="_blank"]] standards and vocabularies: | ||
| 86 | |||
| 87 | * [[**FOAF (Friend Of A Friend)**>>https://xmlns.com/foaf/spec/]] – a vocabulary of named properties and classes for describing people and their relationships, built using RDF and OWL. | ||
| 88 | * [[**vCard (The Electronic Business Card)**>>https://www.w3.org/TR/vcard-rdf/]] – a data format for representing and exchanging contact information about individuals and organizations (e.g. for phonebooks or email clients). | ||
| 89 | * [[**OWL (Web Ontology Language)**>>https://www.w3.org/OWL/]] – a language for defining and linking ontologies, supporting formal descriptions of concepts, properties, and relationships in the [[Semantic Web>>https://www.w3.org/standards/||rel="noopener noreferrer" target="_blank"]]. | ||
| 90 | * [[**Dublin Core™ Metadata Initiative (DCMI)**>>https://www.dublincore.org/specifications/dublin-core/dces/]] – a standard set of metadata terms used to describe a wide range of resources, including elements, encoding schemes, and syntax guidelines. | ||
| 91 | * [[**RDF 1.1 Concepts and Abstract Syntax**>>https://www.w3.org/TR/rdf11-concepts/]] – the foundational knowledge representation model of the [[Semantic Web>>https://www.w3.org/standards/||rel="noopener noreferrer" target="_blank"]], defining how RDF data is structured using triples. | ||
| 92 | * **RDFS (RDF Schema 1.1)** – a vocabulary extension to RDF, providing classes and properties for defining basic ontologies and structuring RDF resources. | ||
| 93 | [[https:~~/~~/www.w3.org/TR/rdf-schema/>>url:https://www.w3.org/TR/rdf-schema/||rel="noopener noreferrer" target="_blank"]] | ||
| 94 | * **RDF Data Cube Vocabulary** – a W3C vocabulary for publishing multidimensional statistical data in RDF, compatible with the SDMX cube model. | ||
| 95 | [[https:~~/~~/www.w3.org/TR/vocab-data-cube/>>url:https://www.w3.org/TR/vocab-data-cube/||rel="noopener noreferrer" target="_blank"]] | ||
| 96 | * **SDMX (Statistical Data and Metadata Exchange)** – an international standard for the exchange of statistical data and metadata, supported by key statistical organizations. | ||
| 97 | [[https:~~/~~/sdmx.org/>>url:https://sdmx.org/||rel="noopener noreferrer" target="_blank"]] | ||
| 98 | * **SKOS (Simple Knowledge Organization System)** – a W3C standard for representing knowledge organization systems such as thesauri, taxonomies, and classifications. | ||
| 99 | [[https:~~/~~/www.w3.org/TR/skos-reference/>>url:https://www.w3.org/TR/skos-reference/||rel="noopener noreferrer" target="_blank"]] | ||
| 100 | * **SKOS-XL (SKOS eXtension for Labels)** – an extension of SKOS that allows for richer descriptions and relationships between lexical labels. | ||
| 101 | [[https:~~/~~/www.w3.org/TR/skos-reference/skos-xl.html>>url:https://www.w3.org/TR/skos-reference/skos-xl.html||rel="noopener noreferrer" target="_blank"]] | ||
| 102 | * **XKOS (SKOS extension for statistical classifications)** – a vocabulary extending SKOS for describing statistical classifications and code lists, jointly developed by INSEE and Eurostat. | ||
| 103 | [[https:~~/~~/rdf-vocabulary.ddialliance.org/xkos.html>>url:https://rdf-vocabulary.ddialliance.org/xkos.html||rel="noopener noreferrer" target="_blank"]] |