Query Anything, Anywhere: Meet Presto
Discover how Presto enables real-time, federated querying across all your data sources—used by Facebook, Uber & Airbnb. Fast, scalable, and fully open-source.
Author

Date

Book a call
Table of Contents
Editor’s Note: This blog is adapted from Saurabh Mahawar's talk on leveraging Presto, an open-source SQL query engine designed for petabyte-scale analytics across distributed data sources. In this session, he explained how Presto powers real-time querying without moving or duplicating data, along with its architecture, use cases, and open-source ecosystem.
From Bookshelves to Big Data
I am Saurabh, a Developer Relations Engineer at IBM, working on Data Lakehouse solutions. Let me break this down with a simple analogy: imagine managing a massive bookstore and needing to find a specific title, along with its author, price, and publication details. Now scale that challenge to millions of books. That’s what querying petabytes of data feels like without the right system.
This is where Presto makes a difference. Developed by Meta in 2012, Presto is an open-source, distributed SQL query engine that enables real-time analytics across multiple data sources, without moving or duplicating data. Whether your data is in Mysql, PostgreSQL, Hive, MongoDB, or S3, Presto connects directly and queries it where it lives. And the best part? If you know SQL, you already know how to use Presto.
From Facebook’s Challenge to Everyone’s Solution
Back in 2012, Facebook was hitting the limits of what Apache Hive could handle. As data volumes grew, Hive could not deliver the speed or flexibility needed for large-scale, real-time analytics, especially when querying across different data sources.
To solve this, Facebook built Presto: a distributed SQL query engine designed for fast, federated analytics without moving data. What started as an internal solution quickly proved its value. By 2013, Presto was open-sourced, and it didn’t take long for the broader tech community to take notice.
Today, companies like Uber, Airbnb, Adobe, and many others use Presto to run complex queries at scale across systems, formats, and infrastructure.
How Presto Works: A Smarter Way to Query
At the heart of Presto’s architecture is the coordinator Node—the brain of the system. When a query is initiated, whether through tools like Tableau, Superset, or any other frontend, the coordinator handles parsing, planning, and distributing tasks across the system.
The actual computation is performed by Worker Nodes, which execute these tasks in parallel across distributed data sources. You can scale worker nodes as needed, but there’s always a single coordinator orchestrating the entire process.
Presto also uses Connectors to interface with different databases—MySQL, MongoDB, PostgreSQL, Hive, and more. These connectors convert Presto’s execution plan into native queries for each backend system. There’s no data duplication or movement—just fast, federated querying across systems, regardless of where your data lives.
A Real-World Example: Uber’s Pricing Engine
If you have ever booked an Uber and noticed the price change within seconds, that’s Presto working behind the scenes. Uber uses Presto to run real-time analytics based on driver availability, rider demand, and location heat maps. These pricing decisions happen in milliseconds, and they rely on Presto’s ability to query multiple sources without latency or lag.
Beyond pricing, Uber also uses Presto for fraud detection, ride analytics, and customer support workflows. It’s a core component of their data infrastructure.
Use Cases That Go Beyond BI Dashboards
Presto is not a database—it does not store data or perform CRUD operations. Instead, it’s a query engine optimized for analytical workloads (OLAP). You can use it for ETL validation, demand forecasting, user behavior analytics, and even reverse-engineering fraud patterns. Whether it’s a small team querying a few GBs or a billion-dollar enterprise analyzing petabytes, Presto scales flexibly—and it’s entirely open source.
A Demo: Running Multi-Source Analytics with Presto
For this meetup, I set up a demo with over 3 million records stored across MySQL and MongoDB. MySQL stored subscription and ad-related data, while MongoDB held real-time match viewing logs. Using Apache Zeppelin as the client, I queried users who watched at least one IPL match, along with their subscription type, match ID, and total watch time.
Presto handled it seamlessly—running federated queries across both sources and returning structured results in seconds. This is what makes Presto powerful: the ability to unify diverse datasets without re-engineering your data pipeline.
The Ecosystem and Where It’s Headed
Presto is written in Java, but the community has started reengineering components like the worker node in C++ for improved performance. It also supports Kubernetes deployment for cloud-native scalability. Major contributors include Meta, Uber, IBM, and Airbnb, but it’s an open ecosystem, and new contributors are always welcome.
If you are interested in analytics, distributed systems, or open-source infrastructure, I highly recommend diving into Presto. It’s flexible enough to run on a single machine and powerful enough to support global-scale businesses.
Final Thoughts: Query at Source. Operate at Scale.
In modern data environments, fragmentation is the norm. Presto eliminates the need for costly data movement by enabling fast, federated querying directly on source systems, regardless of scale or complexity.
It’s not a convenience layer; it’s infrastructure-critical. For teams dealing with diverse data ecosystems, Presto provides a consistent, SQL-first interface that delivers performance without architectural compromise.
If you are architecting systems that demand flexibility, scale, and speed, Presto belongs in your toolkit. The ecosystem is mature, actively maintained, and designed for real-world workloads.
Related Articles.
More from the engineering frontline.
Dive deep into our research and insights on design, development, and the impact of various trends to businesses.

Feb 12, 2026
The Enterprise AI Reality Check: Notes from the Front Lines
Enterprise leaders reveal the real blockers to AI adoption, from skill gaps to legacy systems, and what it takes to move beyond the first 20% of implementation.

Feb 10, 2026
The Three-Year Rule: Why Tech Change Takes Time
Successful enterprise technology transformation depends on a three-year investment strategy that prioritizes cultural readiness, leadership alignment, and robust governance frameworks to modernize legacy systems and improve operational efficiency.

Feb 9, 2026
Building the Workforce and Culture for the Future
AI won’t replace people—unprepared organizations will. Learn how to build skills, culture, and leadership for the AI era.

Feb 9, 2026
The Constant Core: Why Engineering Principles Matter More Than AI Tools
Successful AI integration requires a return to core engineering principles and technical foundations to ensure the workforce can solve deep architectural issues and manage complex systems when they fail.

Feb 9, 2026
Impact of AI on Software Engineering
7 billion lines of AI-generated code. 50x ROI. More hiring, not less. Explore the real impact of AI on software engineering roles and value.

Feb 9, 2026
Accelerating Revenue Velocity: The Blueprint for Content-Aware Sales Agents
Learn how content-aware AI sales agents and MCP reduce sales response time from days to minutes, helping enterprises accelerate revenue velocity.