Harnessing the Power of Knowledge Graphs: Enriching an LLM with Structured Data | by Steve Hedden | Jul, 2023

A step-by-step guide to creating a knowledge and exploring its potential to enhance an LLM

Steve Hedden
Towards Data Science

For accompanying code, see notebook here.

In recent years, (LLMs), have become ubiquitous. Perhaps the most famous LLM is ChatGPT, which was released by OpenAI in November 2022. ChatGPT is able to generate ideas, give personalized recommendations, understand complicated topics, act as a writing assistant, or help you a model to predict the Academy Awards. has announced their own LLM called LLaMA, has LaMDA, and there is even an open- alternative, BLOOM.

LLMs have excelled in (NLP) tasks like the ones listed above because LLMs have historically focused on — data that does not have a pre-defined structure, and is usually text-heavy. I asked ChatGPT, “why have LLMs historically focused on unstructured data?” The reply was:

“LLMs have historically focused on unstructured data due to its abundance, availability, and the challenges it poses. Unstructured data provides a vast source for training language models, allowing them to learn patterns, context, and semantics. LLMs excel in handling the complexities and nuances of unstructured data, capturing the subtleties of language and leveraging the rich context, opinions, and creative expressions it contains. While efforts are underway to bridge the gap between LLMs and structured data, the historical emphasis on unstructured data stems from its unique characteristics and the valuable information it offers, allowing LLMs to provide more human-like responses.”

Structured data is usually quantitative and well-organized, usually into rows and columns, and, as ChatGPT notes, there is still a gap between LLMs and structured data.

Knowledge graphs (KG), on the other hand, are excellent at querying structured data. A knowledge graph is,

“directed labeled graph in which domain specific meanings are associated with nodes and edges. A node could represent any real-world entity, for example, people, company, computer, etc. An label captures the

Source link