Knowledge Graphs

The Semantic web, sometimes known as Web 3.0 (not to be confused with Web3), is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata.

Wikipedia

Semantic web

Advantages of Property Graphs

  • Simplicity: Property graphs are simple and quick to set up and use. Knowledge graphs based on property graphs can be an excellent start for new users.
  • Easy Navigation: Property graphs are easier to traverse without limitations or standard querying languages.
  • Detailed: Properties associated with relationships in property graphs offer more detail about the data entities and their relationships without having to create extra nodes for every detail. The interpretation of the information is left up to the user.

Disadvantages of Property Graphs

  • Lack of global IDs: No URIs like in semantic web. Unique identifiers are local to the property graphs and have no meaning to any other database.
  • Lack of Interoperability: The lack of standardization (no shared simple schema language like RDF, RDFS, OWL) makes it difficult to share or exchange data with different data stores and prevent graph merging.
  • No self-describing data: The concept of embedded or referenced ontology does not exist in labeled property graphs
  • Vendor Lock-in: Organizations using a knowledge graph based on property graphs cannot integrate their information across multiple tools or systems. The possibility of being locked into a single property graph vendor is very high.
Labeled Property Graphs PROs and CONs

Advantages of RDF Graphs

  • Standardization:All RDF-based knowledge graphs use the same standard framework and formal semantics for storing and representing data along with a standard querying language. Data sharing between RDF data stores on the web is simplified thanks to RDF’s web-native syntax.
  • Interoperability: RDF Triple Stores follow a W3C-supported standard that allows interoperability among knowledge graphs. This interoperability allows RDF-based graphs to integrate and exchange information with each other.
  • Extensibility: RDF Graphs allow users to add new nodes and relationships, or even substructures, without requiring rebuilding the database.

Disadvantages of RDF Graphs

  • Deep Search Complexity: Performing a deep search in large RDF graphs is a complex task as it requires traversing through every relationship.
  • Strict Adherence to Standards: All information stored in RDF should be in the form of triples meaning only two objects can be linked at a time, which can be limiting for many use cases.
Triple Stores PROs and CONs
‎Intro to KGs.‎035.png
‎Intro to KGs.‎036.png
‎Intro to KGs.‎037.png
‎Intro to KGs.‎038.png
‎Intro to KGs.‎039.png
Gartner hype cycle 2022.png
Gartner hype cycle 2023.png
Gartner impact radar 2023.png

Gartner study stated that graph analytics are among the most underleveraged capabilities because organizations don’t understand the contrast and complementary nature of graph insights relative to relational analysis.

Gartner impact radar 2024.png

For humans and machines, knowledge graphs describe and contextualise together micro and macro business complexity using domain model, enterprise architecture and digital twin

  • Alignment and engagement by aligning understandings across businesses on what each keystone concept means and documenting, you improve communication across teams and increase shared understanding of goals and increase the sense of purpose at work of stakeholders.

  • Data Governance and Data integration: A knowledge graph helps you
    - contextualise your data asset and link it to data knowledge in order to adapt you governance
    - link existing data from different silos (organisational, systems, internal/external,...) to make your existing data more valuable and more useful.

  • Knowledge aggregation and artificial intelligence: Knowledge graphs accumulate institutional knowledge and expert experience as they grow. This opens the door to more reliable, ethical machine-learning for search and recommendations. KG can serve as curated and governed input to feed generative AI.

  • Data science and productisation Knowledge graphs represents avatars of data assets and describe them within context including provenance from system-of-record, lineage through transformation and required product metadata hence ensuring transparency and trust.
Enterprise architecture overview.png
FAIR cookbook lifescience data.png
Microsoft KG screenshot.png
Pasted image 20240605165155.png
Linkedin KG screenshot.png
Uber KG screenshot.png
Pasted image 20240605165453.png
eBay KG screenshot.png
Wallmart KG screenshot.png
Pharos.png
Airb&b KG screenshot.png

Use an ontology to consistently and semantically document and manage

  • list of term or values
  • taxonomies of terms or values
List name 1
Item 1.1
Item 1.2
Item 1.3
Item 1.4
Item 1.5
Taxonomy name 2
Item 2.1
Item 2.1.1
Item 2.2
Item 2.3
Item 2.3.1
Item 2.3.1.1
Item 2.4
Vocabulary management

Use an ontology to consistently and semantically document and manage identifiers for shared entities in order to reuse them seamlessly across humans and machines

Product
Identifier ABC
System A
Definition 1
Domain A
Identifier 123
System B
Definition 2
Domain B
Identifier XYZ
System C
Definition 1
Domain C
Master data management

Use an ontology (e.g. DCAT) to consistently and semantically make lists of assets (including data sets)

  • Inventories: for internal use (not for customers) describing what we have in stock
  • Catalogs: for external publication (for customers) to understand all products and/or services available for their consumption
Material
Mat. #1
Feature 1
Feature 2
Mat. #2
Mat. #3
Component
Comp. #1
Comp. #2
Products
Product #1
Feature 1
Feature 2
Feature 2
Feature 3
Services
Service #1
Feature 6
Feature 7
Feature 8
Catalogs and inventories

Use an ontology to consistently and semantically document data architecture at conceptual, logical and physical levels.

Conceptual object 1
Logical object 1.1
Logical object 1.2
Logical object 1.3
Physical object 1.2.1
Physical object 1.2.2
Physical object 1.2.3
Logical object 1.1.1
Logical object 1.1.2
Physical object 1.3.1
Data model management

Use an ontology to consistently and semantically document

  • where data comes from (system-of-record or source-of-truth)
  • where it goes (target migration system, data product, platform)
  • how it is processed or transformed along its journey
ETL
stored
API
is published
is published
Source-of-truht
Data platform
Archiving system
Data analytics
Dashboard
Provenance & lineage

Identification, extraction, mapping and disambiguation of PII.

As per GDPR, any person (applicant, employee, supplier...) can ask a company to identify her/his personal information in all systems and request it to be deleted.

After data discovery and extraction, storing and mapping the data in a KG supports the disambiguation of retrieved entities.

Compliance with GDPR

Many traditional technologies aren’t designed to connect the dots, so detecting money laundering schemes requires a tremendous amount of laborious effort. Teams of inspectors are burdened with manually going through gobs of data.

Anti-money laundring

Using a graph to depict web pages and their keywords using web scrapping and NLP

  • Automatically extract web page structure and content
  • Store it in a KG
  • Mapping similar entities and create information clusters
  • Oppose different content using specific relationships
Web site scraping and disambiguation
EU_CELLAR.png

KPI differ in the contexts of B2B and B2C customers even though the meaning of CLV is identical.

CLV for B2B customers is frequently related to leasing an entire fleet and depends on macro-economic drivers and corporate finances, while B2C customers are privately leasing individual cars that fit to their personal stage of their life. Specific customers are established as instances of the B2B or B2C class. Consequently, such context is highly relevant to derive appropriate business rules to maximize the CLV of the customer base.

“Pasted image 20240125094838.png” could not be found.

Using Customer Lifetime Value (CLV) as a steering mechanism in a car leasing company

Making effective real-time recommendations depends on a data platform that understands the relationships between entities. Only graph technology efficiently tracks these relationships according to user purchase, interactions and reviews.

Real-time recommendation engine

Use an ontology to consistently and semantically document business building blocks and how they interact together.

are consumed by
is manufactured at
is located in
applies national
is subject to
imposing
is sensitive to
Product
Customer
Factory
Country
Regulation
Product color
Business architecture

Definition: Collection of terms used for a specific purpose or belonging to a specific business domain. Terms can be organised as lists or taxonomies

List

Flat list of terms (graph nodes) with no specific structure.

Taxonomy

A hierarchical of terms (graph nodes) with parent-child relationships (graph edges) between upper-indented (lower) terms.

Vocabularies
Pasted image 20240606162910.png
Pasted image 20240604182107.png

Many companies currently face challenges caused by demographic changes: In the upcoming years employees will retire more frequently than graduates will join, an effect frequently described as ‘war for talent’. As a qualitative consequence, experience and knowledge will be lost. In such a situation, capturing and formalizing expert knowledge about a certain domain is highly beneficial to retain knowledge in a company while replacing retired employees. The use of ontologies and knowledge graphs therefore has significant advantages in their capability to store and retrieve domain knowledge in an understandable format.

Knowledge retention and talent war

Graphs are designed to express relatedness. Graph technology uncovers patterns that are difficult to detect using traditional representations such as tables.

Fraud detection

A graph database can store complex, densely connected access control structures spanning billions of parties and resources. Its richly and variably structured data model supports both hierarchical and non-hierarchical structures, while its extensible property model allows for capturing rich metadata regarding every element in the system.

Identity and access management

Most enterprise manufacturers use vendor applications: CRM systems, work management systems, accounts payable, accounts receivable, point of sales systems, and so on. Due to this approach, you need to store and model data as a graph where a native graph stores interconnected master data that’s neither purely linear or hierarchical.

Bill of material

By nature, supply chain management is dynamic, with many moving parts, and where bottlenecks may occur at any given point. The challenge is that the volume and detail of data generated by traditional databases lacks real-time, accurate information processing capabilities.

Supply chain management

Definition: A symbolic representation of information using visualization techniques.

A diagram from knowledge graph is a subset of nodes and edges organised topologically in a predefined manner.

Diagram

A complete URI consists of

  • a naming scheme specifier
  • followed by a string whose format is a function of the naming scheme.

Scheme

The first element is the name of the scheme separated from the rest of the object by a colon
e.g. "http" in https://www.w3.org/Addressing/URL/uri-spec.html

Path

What follows the colon in a format depending on the scheme. The path is interpreted in a manner dependent on the protocol being used. However, when it contains slashes, these must imply a hierarchical structure.

WWW URI specifications

More specific than URIs and GUPRIs

Global Unique Persistent Resolvable Identifier

  • Global: globally global or global for your business?
  • Unique: who and how to ensure uniqueness?
  • Persistent: even true disasters?
  • Resolvable: As per FAIR principle A1?
  • Identifier: for machines and for humans?

Need syntactic and semantic governance of GUPRIs

GUPRI
‎Intro to KGs.‎024.png

Human understandable or not?
Contains # or /? https://www.w3.org/wiki/HashVsSlash

Examples:

Non-human understandable URI: https://www.rcsb.org/structure/2C0K
Human understandable URI: https://dbpedia.org/page/Basel

If you go for human understandable GUPRI

Questions you need to ask

  • Define your domain
  • Define prefix and suffix
  • Do you need to add a timestamp?
  • Do you need versioning?
GUPRI (syntax) governance
  1. Persistent over time: based on a governance for syntax and syntactics
  2. Unique: once assigned, a URI shall not be reused or deleted
  3. Unambiguous: not open to more than one interpretation
  4. Human and machine readable: human can read and understand them and web browsers or other IT application interpret them
  5. Vendor and system agnostic: defined, assigned and governed by an internal team. We keep the full freedom to operate.
  6. Indicative about context: rich syntax referring to business context, system-of-record (SoR) and standards. They indicate what it is and what it contains
  7. Semantic enablers: URI can auto-describe themselves when becoming URL and resolving in a web page describing the thing they uniquely represent
  8. Link and buffer to external world: the pace data/information/knowledge evolves internally differs from external world. When mapped to external entities, enterprise-specific URIs enable to decouple enterprise DIK ecosystem from real-world. We can decide when to sync.
Advantages of using URIs
‎Intro to KGs.‎022.png
‎Intro to KGs.‎017.png
Merging_graphs_1.png
Merging_graphs_2.png
Merging_graphs_3.png
‎Intro to KGs.‎020.png

While data is stored in database, where is the domain knowledge?

The domain knowledge connected to the data (many-to-many) must be stored in an abstract, machine readable, but humanly usable representation -> KG/ontologies

‎Intro to KGs.‎018.png
‎Intro to KGs.‎025.png
‎Intro to KGs.‎019.png

Federated search across different (relevant?) data sources

Data integration

  • Model-driven: no hard-coded rules
  • Quick and costless to modify

360 views

  • Customer 360
  • Product 360
  • ...

Data-driven business

  • To improve the entire value chain using data and insights generated from data analyses
  • Need for a high degree of data quality and a well-founded understanding of the context of data and their business semantics are relevant drivers

Usages of KGs

Good use cases for relational DBs

Definitions:

  1. A formal, explicit specification of a shared conceptualisation.

  2. A systematic representation of knowledge, usually applied to one specific domain of knowledge/business, that defines concepts (graph nodes) and relationships (graph edges) between them.

  3. A map displaying concepts (data, information, knowledge , people, facilities, constraints...everything that is important per your concerns) based on conventions (legend for road, rivers, village, mountains... representation).

  4. It is a digital map that machines can understand so that human can query it.
Ontology

Data governance cases

Miscellaneous cases

Intro to KGs.014.png
‎Intro to KGs.‎021.png
‎Intro to KGs.‎016.png
  • Storing, filtering and sorting key-value pairs
  • Storing, filtering and sorting time series

Knowledge graph technology stack

Intro to KGs.026.png
Screenshot DB-egines KGs.png

Fast processing of huge numbers of records when the structure/schema of the data is known ahead of time

‎Intro to KGs.‎023.png

Represent complex dependencies

schema.org screenshot.png

Ask questions, get facts from the graph

personal KG book screenshot.png
‎Intro to KGs.‎029.png

Find and reduce blind spots

Facebook KG screenshot.png
Pasted image 20240604181407.png
Pasted image 20240604181213.png
LOD.png
Connecting neurons.png
Wikidata screenshot.png
DBpedia screenshot.png
Neuron network.png
Intro to KGs.027.png
Intro to KGs.028.png
Screenshot github awesome KGs.png
Screenshot Medium DG comprehensive comparison.png

Use an ontology to consistently and semantically document and manage

  • Data domains
  • Data products
  • Responsibilities (ownership)
Transformation 1
Transformation 2
Transformation 3
Data product A
Domain X
Data source 1
Data source 2
Data source 3
is responsible for
is consumed by
Owner
Data product A
System
Data productisation
Pasted image 20240610151947.png

Use an ontology for any "meta" information you need to report on

  • FAIR
  • Quality
  • GDPR
  • ...
Findable
F1
F2
F3
F4
Accessible
A1
A1.1
A1.2
A2
Interoperable
I1
I2
I3
Reusable
R1
R1.1
R1.2
R1.3
Data quality
AccuraTE
Complete
Consistent
Enduring
Available
Legible
Contemporaneous
Original
Metadata management
Pasted image 20240604181046.png
‎Intro to KGs.‎030.png
‎Intro to KGs.‎031.png

The Semantic Web Stack illustrates the architecture of the Semantic Web.

The Semantic Web is a collaborative movement led by international standards body the World Wide Web Consortium (W3C). The standard promotes common data formats on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a "web of data". The Semantic Web stack builds on the W3C's Resource Description Framework (RDF).

Wikipedia

Pasted image 20231206154650.png

Semantic web stack

1730

Graph theory Euler and Taxonomy of plants and animals Linnaeus

Switzerland and Sweden

1870

Library classification Dewey

USA

1900

Semantics, ontology and logic Husserl

Austria

1960-1970

Network databases (CODASYL) and semantic networks Quilian et al.

1970-1990

Predicate logic as the foundation of knowledge representation Hayes et al.

1972

Term coined by a linguist Edgar W. Schneider

Austria

Late 1980s

Project called Knowledge Graphs, focusing on the design of semantic networks

1985

1997 onward

The semantic web, RDF, OWL, etc Lassila et al.

Finland

2005

2007

2012

Google introduced their Knowledge Graph incorporating content extracted from indexed web pages. Entity and relationship types associated with this knowledge graph have been further organized using terms from the schema.org vocabulary.

2019

 IEEE combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph.

Nowadays

Modern knowledge graphs and graph databases

History of knowledge graphs

Second-hand knowledge for humans and machines

  • enables to capture knowledge in a multi-dimensional and infinitely-extendable manner
  • open-standard; never out dated; always re-usable
  • that can be retrieved when needed

A simplified visual/graphical representation of a complex reality

  • that satisfied different levels of complexity to be disclosed on-demand

Standards for integration of various proprietary data formats

  • quick and flexible: schema-free
  • based on a description (model) and not hard-coded rules

Systems thinking is a way of making sense of the complexity of the world by looking at it in terms of wholes and relationships rather than by splitting it down into its parts.

It has been used as a way of exploring and developing effective action in complex contexts, enabling systems change. Systems thinking draws on and contributes to systems theory and the system sciences.

Wikipedia

System thinking.png

System thinking

What knowledge graphs are good (and not good) at

History and usage of knowledge graphs

Key concepts about knowledge graphs

Who benefits from KGs?

Pasted image 20240604180936.png
Quotes.002.png

Benefits of using knowledge graphs

Intro to KGs.012.png
Intro to KGs.013.png
Intro to KGs.014.png
Intro to KGs.010.png
Intro to KGs.011.png
Intro to KGs.009.png
Intro to KGs.001.png

Populating the KG with discrete facts in the format of simple sentences:

Example of text
The forest is composed by high trees. Green leafs growing on tree branches form a thick foliage that spans from the village to the hill like a impenetrable green wall

Equivalent graph

composed_by
is
has
has
grows_on
Forest
Tree
High
Leaf
Branch
Populating a knowledge graph
Learning styles.png

For humans

  • by directly experiencing reality via sensors
  • by indirectly understanding descriptions of reality via sensors (mostly visual in our society)

For machines
Human programming using mathematical functions and variables

Machine learning (still based on algorithms created by humans)

  • supervised:
    • using a human-labelled training set of data and a feedback loop for correction
    • goal is to predict an (binary) outcome from input data
    • e.g. cat or no cat on image
  • unsupervised:
    • using unlabelled data; no feedback mechanism
    • goal is to identify unknown patterns in data based on similarities or relationships human don't see
    • e.g. clustering
The process of gaining new knowledge (learning)

is hardly

  • understandable
  • trustable
  • FAIR
    • findable
    • accessible
    • interoperable
    • reusable

requires extra effort to generate knowledge

  • how to know if the data is useful to solve/answer a problem/question
  • data scientist spend 80% of their time understanding/preparing the data
Data without context
Intro to KGs.008.png
Intro to KGs.007.png
Intro to KGs.006.png
Intro to KGs.003.png

Knowledge: context about what questions or problems the information is contributing to answer or solve.

Information: context around data usually answering the 5 Ws: who, what, when, where, why , how. Additional information about FAIR, data productisation, governance... can be added based on the needs.

Data: only common data to be shared between different stakeholders/systems such as master or reference data. Usually no transactional data.

What contains a Knowledge Graph?
Intro to KGs.004.png
Intro to KGs.005.png
DIKW-Digital twining-KGs.004.png

Several definitions. Focus on discrete mathematics.

A graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related".

What is a graph?

Knowledge is a form of awareness and ability to adapt (to environment).

Can be segmented in

  • Fact-based knowledge (or true knowledge) e.g. humans have two eyes
  • Opinion-based (or guesswork knowledge) by virtue of justification e.g. I have two eyes, therefore I'm human
  • Belief-based knowledge (propositional knowledge) e.g. I'm a good person

Pasted image 20231110171832.png

What is knowledge?

A knowledge graph is a knowledge base that uses a graph-structured model to identify object (or actions) and link them together.

Pasted image 20231110172142.png

What is a knowledge graph?

A data base is an organised collection of data in a computer memory managed by a database management system (DBMS), the software enabling the management of the database content.
https://en.wikipedia.org/wiki/Database

A knowledge base is an organised collection of knowledge in a computer memory. A knowledge base is reacher than a database. It contains more information.
https://en.wikipedia.org/wiki/Knowledge_base

What is a knowledge base?

What is a knowledge graph?

Content

  1. What is a knowledge graph
  2. Benefits of using knowledge graphs
  3. History and usage of knowledge graphs
  4. Key concepts about knowledge graphs
  5. What knowledge graphs are good (and not good) at
  6. Knowledge graph technology stack
  7. Why adopting knowledge graphs now

Introduction to Knowledge Graphs


Cedric Berger
June 8th, 2024
contact@biomedima.org

‎Quotes.‎002.png

Knowledge graphs are universal descriptors of systems of interacting elements.

Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics underlying the used terminology.

Knowledge graphs as universal descriptors of complex systems

Universal support for description

Example

Populating a Knowledge Graph

Context

Data, Information, Knowledge

Knowledge and graph definitions

Rich context, rich knowledge

A natural pattern

Example of KG usefulness

History of KGs

Triple store (semantic) specificities

Nodes and edges

GUPRIs

Data vs. model

Model components

Good use cases for KGs

System architecture

Knowledge graph solutions

Best for data integration

RDF stands for Resource Description Framework, a World Wide Web Consortium (W3C) standard created originally to model metadata.

Triple Stores store and express information in a sentence-like structure of three; subject-predicate-object, denoted by two nodes connected by a single edge. For e.g. Josh likes bread. 

likes
Josh
bread
RDF graphs or triple store

Semantic vs. property graphs

How to feed a knowledge graph

Best for data analytics

Property Graphs, also called Labeled Property Graphs (LPG), are a variant of graph databases where entities and their relationships have associated attributes.

Attributes can be any property that gives details of a data entity or a relationship. Property graphs get their name from their capability to include properties associated with nodes and edges denoted as key-value pairs. For e.g. Mark wrote a book. In this case Mark is a data entity represented as a node in the graph. Its associated key-value pair could be Person: Author.

Property graphs are focused on offering faster querying and extensive storage.

Labeled Property Graphs
‎Intro to KGs.‎033.png

Why adopting knowledge graphs now?

RDF standard

1. Don't miss it

2. It makes your business smarter

3. It boosts your freedom to operate

4. You get a competitive advantage

A KG enables you to do things that you currently don't or poorly do. In particular, KG support data-centric data governance, data science (including AI) and monitoring business value delivery.

A KG can be big or small as you want it to be to add value. Using open standards, it maps and represent inter-dependencies from vision, mission to keystone concepts and expand it into an encyclopedia if you need to.

Our digital world is becoming more complex everyday and we understand less and less of it. Without a proper digital transformation, the daily-growing digital complexity of modern businessess overwhelms human brain's cognition.

Most of your competitors are already using a knowledge graphs at the heart of their businesses. Their graphs can be good, bad, flimsy, or robust, but they have them and use them.

‎Intro to KGs.‎032.png
‎Intro to KGs.‎034.png
Differences RDF Triple Stores Labeled Property Graphs
Representation Entities and relationships are represented in a subject-predicate-object structure. Entities and relationships have associated attributes represented as key-value pairs.
Querying Languages Standard querying language SPARQL. Every property graph implementation generally has its own language.
Internal Structure Entities and relationships in RDF do not have any internal structures and are only recognized by their URIs. Entities and relationships in property graphs have an internal structure that goes beyond labels and includes properties as a part of their identity.
Focus RDFs are focused on offering standardization and interoperability. Property graphs are focused on data entities to enhance storage and speed up querying.
Use Cases Useful for any use case requiring knowledge graphs with slow-changing datasets. Perfect for scenarios requiring reasoning or inference or where information from other data stores is required. E.g., testing or evaluation. Useful for any use case requiring large knowledge graphs with dynamic datasets that need deep traversal from time to time, e.g., social graphs.
Triple Stores vs. Property Graphs

No entity exists in complete isolation.

Everything is a part of a system

The purpose of life is to connect, create a network, a system of communicating entities exchanging information”

Wikidata

DBpedia

Public/open

Linked Open Data

CELLAR

Open targets

Pharos

Hetionet

other life science knowledge graphs

Individuals

Organizations

Private

Google

FaceBook

AirBnB

Microsoft

Walmart

Uber

eBay

LinkedIn

Generation of new insights

  • unforeseen relationships between entities
  • inference

More relevant search by association

Humans think associatively but normal search applications index information hierarchically or rank results based on term frequency.

from any source
creating LOmD to supplement Wikipedia
creating LOD from Wikipedia
a support for
from vision
uses tech
context
who can claim she/he understand (as well as machines) her/his business like this?