May 4, 2026
OpenClaw: Build Your Autonomous Assistant | Deepak Chawla
Discover how Deepak Chawla explains OpenClaw for building autonomous AI assistants through data preparation, knowledge bases, AI engines, and agent automation.
Author


Book a call
Table of Contents
Editor's Note: This blog post is adapted from a talk delivered at thegeekconf mini 2026 by Deepak Chawla, Founder of HiDevs. With over twelve years of experience in AI, ML, and data science, Deepak explores the three-stage framework he built across 200+ proofs of concept for 50+ enterprise companies. His session breaks down how anyone can build GenAI applications by mastering data preparation, knowledge base creation, and agent-based automation.
The Three-Stage GenAI Application Framework
There are three steps or stages to develop any generative AI application. This workflow is part of the proposition I created after doing 200+ proofs of concept for 50+ enterprise companies. If understood, it covers up to 80% of GenAI application solutions and use cases. Anyone who wants to work with GenAI or AI agents can understand this.
1. Knowledge Base
Working with data requires working with different sources, formats, and levels of data capacity and storage. For example, building a customer support AI agent involves multiple data sources such as FAQs, customer support tickets, and product documentation. This data goes into the data preparation and processing engine.
Use-Case-Based Data Cleaning
Consider the Amazon reviews dataset — do you remove emojis during cleaning or not? The answer depends on the use case. You cannot go and cut out emojis by default. Everything depends on the use case. If the use case requires numeric-level decisions, you remove emojis because numbers do not carry sentiment. So do not make your cleaning process constant every time. Everything depends on the use case. This is the first stage.
Data Structure and Format Unification
There are many different kinds of structures — JSON, XML, PDF, CSV, PPT, and more. Based on your use case, you define the right structure. In the case of a customer support chatbot, when you have different file formats in Stage 1 — PDFs, documents, and Excel files — you convert them into a single file format. You cannot feed a mix of PDFs and CSVs into a pipeline. Once you define a single file format, whether JSON, Excel, or sometimes a Google document, it depends on the use case. JSON works well here because the answers are short — 30 words at most — and JSON stores key-value pairs that accommodate that.
Chunking and Splitting
Once the data is structured and clean, you perform chunking and splitting. LLMs have a context window limit — whether 10 million, 1 million, 100K, or 120K tokens. Think of packers and movers: you cannot ship your entire house in a single box. The same constraint applies to LLMs. In an enterprise solution with 100 TB of data and a chatbot serving 100 million users 24/7, you cannot load everything at once. That is why you split and chunk.
Chunking is not uniform. Moving boxes come in different sizes — some marked handle with care. The same applies here. For plain text, HTML, Markdown, and source code, there are different splitting functions. You cannot use a single function across all formats, just as you cannot cut every vegetable and fruit with the same knife.
Vector Databases and Embedding Models
Why do we need a vector database instead of MongoDB, PostgreSQL, or MySQL? Foundation models are built on transformers — algebra and mathematics. They only understand numbers. That is why we convert data into numerical representations and store them in a vector database. Traditional relational databases use tables and columns, and MongoDB uses key-value pairs. Neither supports similarity search the way a vector database does. With embedding models, we convert text into vectors and perform searches based on similarity. This completes Stage 1 — converting all data sources into a knowledge base. If you crack this stage, 80% of your GenAI use cases are solved.
2. The AI Engine
With the knowledge base ready, you put in a query. Using cosine similarity, you pull the relevant chunks. Those chunks go into a prompt template alongside the LLM model, memory, and human feedback — all within a single AI engine, or what LangChain calls a chain. You get the output and pass it through an evaluation step, which loops back to the human. This is where LLM models account for about 20% of the GenAI application pipeline. Just as in data science, 80% of your time goes into the data. The fundamentals stay the same — the tools keep adding on top.
A proper understanding of your use case is critical here. Chunk size is not a fixed number. When you pack household items, not every box is the same size — some things need extra care. Here too, you define chunk size based on what the data contains and how the model consumes it.
3. OpenClaw — Why the Hype
OpenClaw gets a lot of attention because it provides a chat interface that removes the need for any coding. Through its gateway, you put in your requirements and it builds internal agents, agent skills, memory, a knowledge base, and an identity — all from a single chat. India has one of the largest WhatsApp user bases in the world, so a chat-based interface spreads fast. That is where the virality comes from.
If you look at the logs behind OpenClaw, coding is still happening. You cannot see it, but that does not mean it is not there — just as no one has seen Paris in this room, but that does not mean it does not exist. Do not change your understanding based on YouTube thumbnails. YouTube now runs A/B tests on thumbnails, switching to whichever gets the most engagement. So be careful about what you take from those.
What OpenClaw Enables
Recommended Alternatives
There are three options to consider: OpenClaw, NanoClaw, and NamoClaw. OpenClaw on a MacBook consumes the entire machine — do not use anything else on it while it runs. For learning purposes it works, but I do not recommend it for production. NanoClaw is a lighter version that achieves most of what OpenClaw does. NamoClaw, from Nvidia, includes proper security layers. Mac Mini provides a super isolated environment and is a good option for running any of these.
Real-World Implementation — HiDevs CRM Agent
At HiDevs, we have a sales team and a lead generation team, each working across three partnership models — A, B, and C. A person talks to leads across all three models. To avoid duplicate outreach, we built a Telegram bot. At the end of each day, every team member logs what they discussed with each lead into the bot. Before approaching any new lead, they check with the bot whether someone else is already talking to that person. No wasted resources, no wasted time.
All those chats and leads go into our CRM — a Google Sheet, one of the best CRM tools available. Every Friday or Saturday morning, an automated email goes out: how many leads each person spoke with, who is close to closing, who needs a follow-up, who is not interested. 100% automation. The only thing every team member has to do is talk to the Telegram bot.
Prompt Compression and LLM as Judge
Related Articles.
More from the engineering frontline.
Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

May 22, 2026
AI in Insurance: Building Production-Ready Products for Claims, Underwriting, and Customer Experience
This blog breaks down what it takes to build production-ready AI in insurance across claims, underwriting, and customer experience. It covers the gap between AI pilots and live deployments, the architecture and governance requirements that determine whether a system holds up at scale, and what insurers need to get right across data infrastructure, compliance, and human oversight before going live.

May 21, 2026
Cursor vs. Lovable vs. Replit: Which Vibe Coding Tool Builds the Most Production-Ready Code?
This guide breaks down Cursor, Lovable, and Replit across the criteria that matter most to CTOs, founders, and engineering leaders, making platform decisions with real operational consequences.

May 21, 2026
Explainable AI in Insurance Underwriting: Balancing Accuracy and Compliance
Discover how XAI helps insurers improve underwriting accuracy while meeting regulatory, auditability, and transparency requirements.

May 15, 2026
Build vs Buy: Choosing the Right AI Strategy for Insurance Companies
Build or buy AI for insurance? Learn how to avoid vendor lock-in, lower AI operating costs, and build scalable, compliant insurance platforms.

May 15, 2026
Beyond AI Pilots: Building Production-Ready RCM Platforms for Denial Prevention, Coding Accuracy, and Smarter Billing
Build production-ready RCM platforms for denial prevention, coding accuracy, smarter billing, compliance, and scalable healthcare AI revenue operations.

May 15, 2026
Why AI Insurance Projects Fail in Production
Why do most AI insurance projects fail in production? Discover the hidden architectural, compliance, and scaling gaps behind failed AI deployments.