May 6, 2025

Mapping Mythology with GenAI: Building a Mahabharata Chatbot Using GraphRAG

Dive into a Mahabharata chatbot built with GraphRAG & Neo4j! GenAI meets epic data to deliver smart, contextual answers rooted in India’s greatest story.

Author

Prince Kumar Thakur
Prince Kumar ThakurTechnical Content Writer
Mapping Mythology with GenAI: Building a Mahabharata Chatbot Using GraphRAG

Table of Contents

Editor’s Note: This blog is adapted from the talk by Siddhant Agarwal, Developer Relations lead for APAC at Neo4j. In this session, he shared how he combined GenAI, graph theory, and ancient Indian epics to create a knowledge-rich chatbot powered by Neo4j and GraphRAG—bringing the Mahabharata to life through data, relationships, and intelligent querying.

From Graphs to Epics: Where It All Began

Hi, I am Siddhant Agarwal. I lead DevRel for APAC at Neo4j, a native graph database company that’s been at the forefront of graph technology for nearly two decades. Before Neo4j, I worked at IBM, Google, and a few startups. And for the past 10+ years, I’ve been building communities, solving developer problems, and exploring the overlap between data and human context.

Neo4j is built on graph theory—nodes and relationships forming a web of connected information. It’s how LinkedIn shows you second-degree connections or how fraud detection systems map suspicious behavior. But this time, I wanted to use it for something... different.

I wanted to answer a simple question: What happens when Generative AI meets one of the most complex stories ever told—the Mahabharata?

Why Mahabharata? Because It’s a Network

The Mahabharata is not just an epic. It’s a data goldmine. With over 1.8 million words, 200+ key characters, and countless relationships—father-son, siblings, rivals, mentors—it reads like a massive interconnected web of knowledge. Perfect for a graph-based representation.

But traditional storage systems struggled to model it intuitively. Tables and joins made the data hard to explore, and building a chatbot on top of that structure would have been chaotic. That’s where Neo4j came in.

I started by manually mapping out nodes (characters) and edges (relationships). Using Cypher, Neo4j’s query language, I created 191 nodes and over 500 connections, each with meaningful metadata—names, roles, relationships, and more. But this was just the foundation.

Phase Two: Bringing the Epic to Life

The goal was not to stop at a static dataset. I wanted people to interact with the Mahabharata. So I built an early chatbot interface on top of Neo4j. You could ask it questions like, “Who were Ashwatthama’s parents?” and it would reply with one-word answers—rudimentary, but functional.

That version worked for conferences. But when someone asked, “Can it answer contextual questions too?”, I realized it needed to do more. I had only scratched the surface of what was possible.

Scaling the Dataset: 18 Books, 5,400 Pages, 10 M+ Characters

To make the chatbot truly comprehensive, I had to ingest the entire Mahabharata corpus—18 books, each around 300 pages. That’s 5,400 pages, 10.8 million characters, and 2.7 million tokens if you’re counting LLM compute costs.

Handling this volume with standard GenAI approaches would be expensive and inefficient. That’s when I turned to Vector RAG.

Beyond Vector Search: Enter GraphRAG

Vector RAG helps by chunking documents into small sections, converting them into embeddings, and storing them in a vector database. At query time, the LLM fetches relevant chunks and combines them with the prompt. But even that has limitations—mainly around context, relationship traversal, and answer depth.

GraphRAG solves this.

Instead of storing isolated vector chunks, GraphRAG builds knowledge graphs by extracting entities, mapping relationships, and connecting context-rich content across the dataset. This allows for semantic traversal, entity resolution, and far more accurate responses.

Neo4j’s LLM Graph Builder Tool made this seamless. I connected it to my instance, uploaded the PDFs, and the tool did the rest—no code, no APIs, no stress. The result: over 51,000 nodes and 500,000 relationships, all extracted from raw text and visually explorable in Neo4j.

The Chatbot, Reimagined

With the knowledge graph in place, I connected it to a GenAI interface using GPT-4. Users could now ask nuanced questions like “Why did Bhima take a vow of celibacy?” or “How did that affect the throne of Hastinapura?”—and get clear, relevant answers rooted in the Mahabharata's structure and narrative.

Even better, the tool allowed me to test different LLMs, customize chunking parameters, handle orphan nodes, de-duplicate overlapping entities (e.g., “Bangalore” vs. “BLR”), and add post-processing layers like community detection.

Full Control, Full Flexibility

For those who prefer working with code, we have launched the GraphRAG Python package, which gives you full programmatic control. It supports various retrievers—vector, cipher, full-text—and lets you fine-tune everything from schema extraction to token limits.

Neo4j also remains schema-flexible. You can dynamically modify or generate schemas as your data evolves—ideal for anyone building domain-specific graph applications.

What’s Next: Multilingual and Mixed Reality

The next steps? I am working on multilingual support to make the chatbot accessible in Indian languages, because you can’t build a mythological AI system in English alone. Beyond that, I am exploring VR interfaces that let you converse with characters like Arjuna or Krishna in a fully immersive environment.

This journey—from ancient epics to GenAI, from static data to dynamic intelligence—has only just begun.

SHARE ON

Related Articles.

More from the engineering frontline.

Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

 From MVP to Scale: Designing Architecture for AI-First Products
Article

May 11, 2026

 From MVP to Scale: Designing Architecture for AI-First Products

A panel of architects and engineering leaders at thegeekconf mini 2026 discuss how to build and scale AI-first products — from MVP decisions to production-level challenges. The conversation covers data quality, model selection, security, token economics, and the mindset teams need to navigate a fast-moving AI landscape.

The AI native Enterprise Evolution | Saurabh Sahu
Article

May 7, 2026

The AI native Enterprise Evolution | Saurabh Sahu

Explore Saurabh Sahu’s insights on AI-native enterprise, AI gateways, model governance, agentic SDLC, and workspace.build for scalable AI adoption from thegeekconf mini 2026.

The Next Era of AI Builders: Building Autonomous Systems for Frontier Firms — Pallavi Lokesh Shetty
Article

May 5, 2026

The Next Era of AI Builders: Building Autonomous Systems for Frontier Firms — Pallavi Lokesh Shetty

Discover Pallavi Shetty’s view on the next era of AI builders, covering autonomous systems, trusted agents, data quality, and frontier firms from thegeekconf mini 2026

The Autonomous Factory: Architecting Agentic Workflows with Clean Code Guards | Akash Kamerkar
Article

May 5, 2026

The Autonomous Factory: Architecting Agentic Workflows with Clean Code Guards | Akash Kamerkar

Akash Kamerkar’s thegeekconf mini 2026 talk explores the ACDC framework for building safer agentic workflows with clean code guards, sandbox testing, and AI-driven software development.

OpenClaw: Build Your Autonomous Assistant | Deepak Chawla
Article

May 4, 2026

OpenClaw: Build Your Autonomous Assistant | Deepak Chawla

Discover how Deepak Chawla explains OpenClaw for building autonomous AI assistants through data preparation, knowledge bases, AI engines, and agent automation.

From Prompt Chaos to Production AI: Spec-driven Development for AI Engineers | Vishal Alhat
Article

May 4, 2026

From Prompt Chaos to Production AI: Spec-driven Development for AI Engineers | Vishal Alhat

Learn how Vishal Alhat’s thegeekconf mini 2026 session explains spec-driven development and how AI engineers can move beyond prompt chaos to build production-ready applications.

Scroll for more
View all articles