Table of Contents
Mapping Mythology with GenAI: Building a Mahabharata Chatbot Using GraphRAG


Book a call
Editor’s Note: This blog is adapted from the talk by Siddhant Agarwal, Developer Relations lead for APAC at Neo4j. In this session, he shared how he combined GenAI, graph theory, and ancient Indian epics to create a knowledge-rich chatbot powered by Neo4j and GraphRAG—bringing the Mahabharata to life through data, relationships, and intelligent querying.
From Graphs to Epics: Where It All Began
Hi, I am Siddhant Agarwal. I lead DevRel for APAC at Neo4j, a native graph database company that’s been at the forefront of graph technology for nearly two decades. Before Neo4j, I worked at IBM, Google, and a few startups. And for the past 10+ years, I’ve been building communities, solving developer problems, and exploring the overlap between data and human context.
Neo4j is built on graph theory—nodes and relationships forming a web of connected information. It’s how LinkedIn shows you second-degree connections or how fraud detection systems map suspicious behavior. But this time, I wanted to use it for something... different.
I wanted to answer a simple question: What happens when Generative AI meets one of the most complex stories ever told—the Mahabharata?
Why Mahabharata? Because It’s a Network
The Mahabharata is not just an epic. It’s a data goldmine. With over 1.8 million words, 200+ key characters, and countless relationships—father-son, siblings, rivals, mentors—it reads like a massive interconnected web of knowledge. Perfect for a graph-based representation.
But traditional storage systems struggled to model it intuitively. Tables and joins made the data hard to explore, and building a chatbot on top of that structure would have been chaotic. That’s where Neo4j came in.
I started by manually mapping out nodes (characters) and edges (relationships). Using Cypher, Neo4j’s query language, I created 191 nodes and over 500 connections, each with meaningful metadata—names, roles, relationships, and more. But this was just the foundation.
Phase Two: Bringing the Epic to Life
The goal was not to stop at a static dataset. I wanted people to interact with the Mahabharata. So I built an early chatbot interface on top of Neo4j. You could ask it questions like, “Who were Ashwatthama’s parents?” and it would reply with one-word answers—rudimentary, but functional.
That version worked for conferences. But when someone asked, “Can it answer contextual questions too?”, I realized it needed to do more. I had only scratched the surface of what was possible.
Scaling the Dataset: 18 Books, 5,400 Pages, 10 M+ Characters
To make the chatbot truly comprehensive, I had to ingest the entire Mahabharata corpus—18 books, each around 300 pages. That’s 5,400 pages, 10.8 million characters, and 2.7 million tokens if you’re counting LLM compute costs.
Handling this volume with standard GenAI approaches would be expensive and inefficient. That’s when I turned to Vector RAG.
Beyond Vector Search: Enter GraphRAG
Vector RAG helps by chunking documents into small sections, converting them into embeddings, and storing them in a vector database. At query time, the LLM fetches relevant chunks and combines them with the prompt. But even that has limitations—mainly around context, relationship traversal, and answer depth.
GraphRAG solves this.
Instead of storing isolated vector chunks, GraphRAG builds knowledge graphs by extracting entities, mapping relationships, and connecting context-rich content across the dataset. This allows for semantic traversal, entity resolution, and far more accurate responses.
Neo4j’s LLM Graph Builder Tool made this seamless. I connected it to my instance, uploaded the PDFs, and the tool did the rest—no code, no APIs, no stress. The result: over 51,000 nodes and 500,000 relationships, all extracted from raw text and visually explorable in Neo4j.
The Chatbot, Reimagined
With the knowledge graph in place, I connected it to a GenAI interface using GPT-4. Users could now ask nuanced questions like “Why did Bhima take a vow of celibacy?” or “How did that affect the throne of Hastinapura?”—and get clear, relevant answers rooted in the Mahabharata's structure and narrative.
Even better, the tool allowed me to test different LLMs, customize chunking parameters, handle orphan nodes, de-duplicate overlapping entities (e.g., “Bangalore” vs. “BLR”), and add post-processing layers like community detection.
Full Control, Full Flexibility
For those who prefer working with code, we have launched the GraphRAG Python package, which gives you full programmatic control. It supports various retrievers—vector, cipher, full-text—and lets you fine-tune everything from schema extraction to token limits.
Neo4j also remains schema-flexible. You can dynamically modify or generate schemas as your data evolves—ideal for anyone building domain-specific graph applications.
What’s Next: Multilingual and Mixed Reality
The next steps? I am working on multilingual support to make the chatbot accessible in Indian languages, because you can’t build a mythological AI system in English alone. Beyond that, I am exploring VR interfaces that let you converse with characters like Arjuna or Krishna in a fully immersive environment.
This journey—from ancient epics to GenAI, from static data to dynamic intelligence—has only just begun.
Dive deep into our research and insights. In our articles and blogs, we explore topics on design, how it relates to development, and impact of various trends to businesses.