Source: https://www.w3.org/Addressing/URL/uri-spec.html
Initial paper from Berners-Lee: https://www-sop.inria.fr/acacia/cours/essi2006/Scientific%20American_%20Feature%20Article_%20The%20Semantic%20Web_%20May%202001.pdf
The Semantic web, sometimes known as Web 3.0 (not to be confused with Web3), is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata.
A Gartner study stated that graph analytics are among the most underleveraged capabilities because organizations don’t understand the contrast and complementary nature of graph insights relative to relational analysis.
For humans and machines, knowledge graphs describe and contextualise together micro and macro business complexity using domain model, enterprise architecture and digital twin
Use an ontology to consistently and semantically document and manage
Use an ontology to consistently and semantically document and manage identifiers for shared entities in order to reuse them seamlessly across humans and machines
Use an ontology (e.g. DCAT) to consistently and semantically make lists of assets (including data sets)
Use an ontology to consistently and semantically document data architecture at conceptual, logical and physical levels.
Use an ontology to consistently and semantically document
Identification, extraction, mapping and disambiguation of PII.
As per GDPR, any person (applicant, employee, supplier...) can ask a company to identify her/his personal information in all systems and request it to be deleted.
After data discovery and extraction, storing and mapping the data in a KG supports the disambiguation of retrieved entities.
Many traditional technologies aren’t designed to connect the dots, so detecting money laundering schemes requires a tremendous amount of laborious effort. Teams of inspectors are burdened with manually going through gobs of data.
Using a graph to depict web pages and their keywords using web scrapping and NLP
KPI differ in the contexts of B2B and B2C customers even though the meaning of CLV is identical.
CLV for B2B customers is frequently related to leasing an entire fleet and depends on macro-economic drivers and corporate finances, while B2C customers are privately leasing individual cars that fit to their personal stage of their life. Specific customers are established as instances of the B2B or B2C class. Consequently, such context is highly relevant to derive appropriate business rules to maximize the CLV of the customer base.
“Pasted image 20240125094838.png” could not be found.
Making effective real-time recommendations depends on a data platform that understands the relationships between entities. Only graph technology efficiently tracks these relationships according to user purchase, interactions and reviews.
Use an ontology to consistently and semantically document business building blocks and how they interact together.
Definition: Collection of terms used for a specific purpose or belonging to a specific business domain. Terms can be organised as lists or taxonomies
Flat list of terms (graph nodes) with no specific structure.
A hierarchical of terms (graph nodes) with parent-child relationships (graph edges) between upper-indented (lower) terms.
Many companies currently face challenges caused by demographic changes: In the upcoming years employees will retire more frequently than graduates will join, an effect frequently described as ‘war for talent’. As a qualitative consequence, experience and knowledge will be lost. In such a situation, capturing and formalizing expert knowledge about a certain domain is highly beneficial to retain knowledge in a company while replacing retired employees. The use of ontologies and knowledge graphs therefore has significant advantages in their capability to store and retrieve domain knowledge in an understandable format.
Graphs are designed to express relatedness. Graph technology uncovers patterns that are difficult to detect using traditional representations such as tables.
A graph database can store complex, densely connected access control structures spanning billions of parties and resources. Its richly and variably structured data model supports both hierarchical and non-hierarchical structures, while its extensible property model allows for capturing rich metadata regarding every element in the system.
Most enterprise manufacturers use vendor applications: CRM systems, work management systems, accounts payable, accounts receivable, point of sales systems, and so on. Due to this approach, you need to store and model data as a graph where a native graph stores interconnected master data that’s neither purely linear or hierarchical.
By nature, supply chain management is dynamic, with many moving parts, and where bottlenecks may occur at any given point. The challenge is that the volume and detail of data generated by traditional databases lacks real-time, accurate information processing capabilities.
Definition: A symbolic representation of information using visualization techniques.
A diagram from knowledge graph is a subset of nodes and edges organised topologically in a predefined manner.
A complete URI consists of
The first element is the name of the scheme separated from the rest of the object by a colon
e.g. "http" in https://www.w3.org/Addressing/URL/uri-spec.html
What follows the colon in a format depending on the scheme. The path is interpreted in a manner dependent on the protocol being used. However, when it contains slashes, these must imply a hierarchical structure.
More specific than URIs and GUPRIs
Global Unique Persistent Resolvable Identifier
Need syntactic and semantic governance of GUPRIs
Human understandable or not?
Contains # or /? https://www.w3.org/wiki/HashVsSlash
Non-human understandable URI: https://www.rcsb.org/structure/2C0K
Human understandable URI: https://dbpedia.org/page/Basel
Questions you need to ask
The domain knowledge connected to the data (many-to-many) must be stored in an abstract, machine readable, but humanly usable representation -> KG/ontologies
Definitions:
Fast processing of huge numbers of records when the structure/schema of the data is known ahead of time
Use an ontology to consistently and semantically document and manage
Use an ontology for any "meta" information you need to report on
The Semantic Web Stack illustrates the architecture of the Semantic Web.
The Semantic Web is a collaborative movement led by international standards body the World Wide Web Consortium (W3C). The standard promotes common data formats on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a "web of data". The Semantic Web stack builds on the W3C's Resource Description Framework (RDF).
1730
Switzerland and Sweden
1870
Library classification Dewey
USA
1900
Semantics, ontology and logic Husserl
Austria
1960-1970
Network databases (CODASYL) and semantic networks Quilian et al.
1970-1990
Predicate logic as the foundation of knowledge representation Hayes et al.
1972
Term coined by a linguist Edgar W. Schneider
Austria
Late 1980s
Project called Knowledge Graphs, focusing on the design of semantic networks
1985
1997 onward
The semantic web, RDF, OWL, etc Lassila et al.
Finland
2005
2007
2012
Google introduced their Knowledge Graph incorporating content extracted from indexed web pages. Entity and relationship types associated with this knowledge graph have been further organized using terms from the schema.org vocabulary.
2019
IEEE combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph.
Nowadays
Modern knowledge graphs and graph databases
Systems thinking is a way of making sense of the complexity of the world by looking at it in terms of wholes and relationships rather than by splitting it down into its parts.
It has been used as a way of exploring and developing effective action in complex contexts, enabling systems change. Systems thinking draws on and contributes to systems theory and the system sciences.
Populating the KG with discrete facts in the format of simple sentences:
Example of text
The forest is composed by high trees. Green leafs growing on tree branches form a thick foliage that spans from the village to the hill like a impenetrable green wall
Equivalent graph
For humans
For machines
Human programming using mathematical functions and variables
Machine learning (still based on algorithms created by humans)
is hardly
requires extra effort to generate knowledge
Knowledge: context about what questions or problems the information is contributing to answer or solve.
Information: context around data usually answering the 5 Ws: who, what, when, where, why , how. Additional information about FAIR, data productisation, governance... can be added based on the needs.
Data: only common data to be shared between different stakeholders/systems such as master or reference data. Usually no transactional data.
Several definitions. Focus on discrete mathematics.
A graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related".
Knowledge is a form of awareness and ability to adapt (to environment).
Can be segmented in
A knowledge graph is a knowledge base that uses a graph-structured model to identify object (or actions) and link them together.
A data base is an organised collection of data in a computer memory managed by a database management system (DBMS), the software enabling the management of the database content.
https://en.wikipedia.org/wiki/Database
A knowledge base is an organised collection of knowledge in a computer memory. A knowledge base is reacher than a database. It contains more information.
https://en.wikipedia.org/wiki/Knowledge_base
Introduction to Knowledge Graphs
Cedric Berger
June 8th, 2024
contact@biomedima.org
Knowledge graphs are universal descriptors of systems of interacting elements.
Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics underlying the used terminology.
Best for data integration
RDF stands for Resource Description Framework, a World Wide Web Consortium (W3C) standard created originally to model metadata.
Triple Stores store and express information in a sentence-like structure of three; subject-predicate-object, denoted by two nodes connected by a single edge. For e.g. Josh likes bread.
Best for data analytics
Property Graphs, also called Labeled Property Graphs (LPG), are a variant of graph databases where entities and their relationships have associated attributes.
Attributes can be any property that gives details of a data entity or a relationship. Property graphs get their name from their capability to include properties associated with nodes and edges denoted as key-value pairs. For e.g. Mark wrote a book. In this case Mark is a data entity represented as a node in the graph. Its associated key-value pair could be Person: Author.
Property graphs are focused on offering faster querying and extensive storage.
Differences | RDF Triple Stores | Labeled Property Graphs |
---|---|---|
Representation | Entities and relationships are represented in a subject-predicate-object structure. | Entities and relationships have associated attributes represented as key-value pairs. |
Querying Languages | Standard querying language SPARQL. | Every property graph implementation generally has its own language. |
Internal Structure | Entities and relationships in RDF do not have any internal structures and are only recognized by their URIs. | Entities and relationships in property graphs have an internal structure that goes beyond labels and includes properties as a part of their identity. |
Focus | RDFs are focused on offering standardization and interoperability. | Property graphs are focused on data entities to enhance storage and speed up querying. |
Use Cases | Useful for any use case requiring knowledge graphs with slow-changing datasets. Perfect for scenarios requiring reasoning or inference or where information from other data stores is required. E.g., testing or evaluation. | Useful for any use case requiring large knowledge graphs with dynamic datasets that need deep traversal from time to time, e.g., social graphs. |
Wikidata
DBpedia
Linked Open Data
CELLAR
Open targets
Pharos
Hetionet
other life science knowledge graphs
AirBnB
Microsoft
Walmart
Uber
eBay
Humans think associatively but normal search applications index information hierarchically or rank results based on term frequency.