May 4, 2026

OpenClaw: Build Your Autonomous Assistant | Deepak Chawla

Discover how Deepak Chawla explains OpenClaw for building autonomous AI assistants through data preparation, knowledge bases, AI engines, and agent automation.

MeetUp

Artificial Intelligence

Large Language Model

Events And Conferences

Author

Apoorva PathakContent Writer

OpenClaw: Build Your Autonomous Assistant | Deepak Chawla

Book a call

Table of Contents

Editor's Note: This blog post is adapted from a talk delivered at thegeekconf mini 2026 by Deepak Chawla, Founder of HiDevs. With over twelve years of experience in AI, ML, and data science, Deepak explores the three-stage framework he built across 200+ proofs of concept for 50+ enterprise companies. His session breaks down how anyone can build GenAI applications by mastering data preparation, knowledge base creation, and agent-based automation.

I am Deepak Chawla, founder of HiDevs. We are building the world's largest generative AI workforce by ensuring everyone can work in the AI world and build something for themselves or their company. I have extensive experience in AI, ML, data science, and machine learning, dating back to 2014. On August 23rd, 2024, I left my high-paying job and started my own company to make sure everyone can build something using AI.

The Three-Stage GenAI Application Framework

There are three steps or stages to develop any generative AI application. This workflow is part of the proposition I created after doing 200+ proofs of concept for 50+ enterprise companies. If understood, it covers up to 80% of GenAI application solutions and use cases. Anyone who wants to work with GenAI or AI agents can understand this.

1. Knowledge Base

Working with data requires working with different sources, formats, and levels of data capacity and storage. For example, building a customer support AI agent involves multiple data sources such as FAQs, customer support tickets, and product documentation. This data goes into the data preparation and processing engine.

Use-Case-Based Data Cleaning

Consider the Amazon reviews dataset — do you remove emojis during cleaning or not? The answer depends on the use case. You cannot go and cut out emojis by default. Everything depends on the use case. If the use case requires numeric-level decisions, you remove emojis because numbers do not carry sentiment. So do not make your cleaning process constant every time. Everything depends on the use case. This is the first stage.

Data Structure and Format Unification

There are many different kinds of structures — JSON, XML, PDF, CSV, PPT, and more. Based on your use case, you define the right structure. In the case of a customer support chatbot, when you have different file formats in Stage 1 — PDFs, documents, and Excel files — you convert them into a single file format. You cannot feed a mix of PDFs and CSVs into a pipeline. Once you define a single file format, whether JSON, Excel, or sometimes a Google document, it depends on the use case. JSON works well here because the answers are short — 30 words at most — and JSON stores key-value pairs that accommodate that.

Chunking and Splitting

Once the data is structured and clean, you perform chunking and splitting. LLMs have a context window limit — whether 10 million, 1 million, 100K, or 120K tokens. Think of packers and movers: you cannot ship your entire house in a single box. The same constraint applies to LLMs. In an enterprise solution with 100 TB of data and a chatbot serving 100 million users 24/7, you cannot load everything at once. That is why you split and chunk.

Chunking is not uniform. Moving boxes come in different sizes — some marked handle with care. The same applies here. For plain text, HTML, Markdown, and source code, there are different splitting functions. You cannot use a single function across all formats, just as you cannot cut every vegetable and fruit with the same knife.

Vector Databases and Embedding Models

Why do we need a vector database instead of MongoDB, PostgreSQL, or MySQL? Foundation models are built on transformers — algebra and mathematics. They only understand numbers. That is why we convert data into numerical representations and store them in a vector database. Traditional relational databases use tables and columns, and MongoDB uses key-value pairs. Neither supports similarity search the way a vector database does. With embedding models, we convert text into vectors and perform searches based on similarity. This completes Stage 1 — converting all data sources into a knowledge base. If you crack this stage, 80% of your GenAI use cases are solved.

2. The AI Engine

With the knowledge base ready, you put in a query. Using cosine similarity, you pull the relevant chunks. Those chunks go into a prompt template alongside the LLM model, memory, and human feedback — all within a single AI engine, or what LangChain calls a chain. You get the output and pass it through an evaluation step, which loops back to the human. This is where LLM models account for about 20% of the GenAI application pipeline. Just as in data science, 80% of your time goes into the data. The fundamentals stay the same — the tools keep adding on top.

A proper understanding of your use case is critical here. Chunk size is not a fixed number. When you pack household items, not every box is the same size — some things need extra care. Here too, you define chunk size based on what the data contains and how the model consumes it.

3. OpenClaw — Why the Hype

OpenClaw gets a lot of attention because it provides a chat interface that removes the need for any coding. Through its gateway, you put in your requirements and it builds internal agents, agent skills, memory, a knowledge base, and an identity — all from a single chat. India has one of the largest WhatsApp user bases in the world, so a chat-based interface spreads fast. That is where the virality comes from.

If you look at the logs behind OpenClaw, coding is still happening. You cannot see it, but that does not mean it is not there — just as no one has seen Paris in this room, but that does not mean it does not exist. Do not change your understanding based on YouTube thumbnails. YouTube now runs A/B tests on thumbnails, switching to whichever gets the most engagement. So be careful about what you take from those.

What OpenClaw Enables

OpenClaw connects to a wide range of workflows — email management, flight check-in, and more. There is a misconception that OpenClaw is dangerous. That is half true. If you give it access to your entire system, then yes, there is risk. The access you grant determines the risk level.

Recommended Alternatives

There are three options to consider: OpenClaw, NanoClaw, and NamoClaw. OpenClaw on a MacBook consumes the entire machine — do not use anything else on it while it runs. For learning purposes it works, but I do not recommend it for production. NanoClaw is a lighter version that achieves most of what OpenClaw does. NamoClaw, from Nvidia, includes proper security layers. Mac Mini provides a super isolated environment and is a good option for running any of these.

Real-World Implementation — HiDevs CRM Agent

At HiDevs, we have a sales team and a lead generation team, each working across three partnership models — A, B, and C. A person talks to leads across all three models. To avoid duplicate outreach, we built a Telegram bot. At the end of each day, every team member logs what they discussed with each lead into the bot. Before approaching any new lead, they check with the bot whether someone else is already talking to that person. No wasted resources, no wasted time.

All those chats and leads go into our CRM — a Google Sheet, one of the best CRM tools available. Every Friday or Saturday morning, an automated email goes out: how many leads each person spoke with, who is close to closing, who needs a follow-up, who is not interested. 100% automation. The only thing every team member has to do is talk to the Telegram bot.

Prompt Compression and LLM as Judge

We use prompt compression and a specific approach to summarization. We do not say give me four to five lines of context. We define a list of ten rules that the summary must follow, based on our requirements. An LLM acts as a judge checking the summary against those rules. We run two judges — a district court and a high court — to make sure quality does not slip. That is the multi-level agent connection we have built.

SHARE ON

More from the engineering frontline.

Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

From Prompt Chaos to Production AI: Spec-driven Development for AI Engineers | Vishal Alhat

Article

May 4, 2026

From Prompt Chaos to Production AI: Spec-driven Development for AI Engineers | Vishal Alhat

Learn how Vishal Alhat’s thegeekconf mini 2026 session explains spec-driven development and how AI engineers can move beyond prompt chaos to build production-ready applications.

From AI Artifact to Deployed Application: Your AI Implementation Roadmap

Article

Apr 30, 2026

From AI Artifact to Deployed Application: Your AI Implementation Roadmap

This blog walks enterprise teams and growth-funded startups through the complete journey of turning an AI artifact into a production-ready application. It covers an 8-stage implementation roadmap spanning architecture, infrastructure, security, deployment, and post-launch operations, alongside the common blockers that prevent AI initiatives from reaching production and how to avoid them.

Rebuild vs. Refactor: A Decision Framework for AI-Generated Prototypes

Article

Apr 30, 2026

Rebuild vs. Refactor: A Decision Framework for AI-Generated Prototypes

AI-generated prototypes move fast, but scaling the wrong foundation is costly. This blog helps leaders decide whether to refactor, rebuild, or modernize before it's too late.