Mar 22, 2024

All About Devin, the First AI Software Engineer

Let's chat about Devin, the first AI software engineer. We explore its game-changing skills, what it means for coding, and whether it is a threat or a boon.
Ahona Das
Ahona DasSenior Technical Content Writer
Sanket Sahu
Sanket SahuChief Executive Officer (CEO)
Kunal Kumar
Kunal KumarChief Operating Officer
Saurabh Srivastava
Saurabh SrivastavaSenior Software Engineer - II
lines

Just when we thought we had seen (and speculated) it all — Devin launched, and it has made quite an entrance. Its makers are touting it as “the first AI software engineer.” 

Devin, the first fully autonomous software
Devin (Source: Cognition AI)

Recently announced by the stealth-mode startup Cognition AI, Devin is

“… a tireless, skilled teammate, equally ready to build alongside you or independently complete tasks for you to review. With Devin, engineers can focus on more interesting problems and engineering teams can strive for more ambitious goals.”

— Scott Wu, Founder and CEO of Cognition AI

A lot to unpack. Here are the details →

Different from Current AI Coding Assistants?

So, is Devin the same as GitHub Copilot, the code autocompletion tool owned by Microsoft and OpenAI? Cognition AI says no, and that is why we are talking about it.

While tools like Copilot have been around to autocomplete and translate code, Devin takes the game up several notches. The AI assistant can complete an entire software development project from scratch.

To get started, you only need to give it a task using natural language commands. The software first gives you a step-by-step plan to handle the problem and then gets to work using the same tools a human developer would use.

Devin also has its own command line, its own code editor, and even its own browser. If something appears off, you can give the AI a prompt to fix the issue, and Devin will incorporate the feedback as it works, finding and fixing bugs on its own as it tests the code being written. Pretty crazy, right?

Devin in action (Source: Cognition AI)
Devin in action (Source: Cognition AI)

“Several implications arise with Devin's capacity to handle entire development projects autonomously. Efficiency stands to improve as Devin's rapid task completion could reduce time-to-market for new applications, facilitating quicker iteration and deployment.

From an AGI perspective, Devin's capabilities serve as a stepping stone, highlighting progress in AI research and development and showcasing the potential for AI to augment and enhance human capabilities in complex domains like software engineering,” according to GeekyAnts Founder Sanket Sahu.

An Amazingly Skilled Teammate?

Devin's performance
Source: Cognition AI

While not much is known to outsiders about how the technology works, Wu mentions that his team found unique ways to combine large language models (LLMs), such as OpenAI’s GPT-4, with reinforcement learning techniques.

Several features have made Devin the talk of the town in the tech world.

  • Devin can handle an entire development project end-to-end, executing tasks in a matter of minutes, right from writing code to fixing issues — and keep a calm head while at it.
  • Natural language commands are all it takes to hand Devin a new task, and it will initiate and accomplish them.
  • On the SWE-Bench benchmark, which tasks an AI with resolving real-world open-source GitHub issues, Devin correctly resolves 13.86% of the issues without assistance. This performance far surpasses the previous state-of-the-art model, which only managed to resolve 1.96% of issues unassisted and 4.80% with assistance. (See image above)
  • Cognition AI's significant claim regarding Devin is the company's breakthrough in a computer's ability to reason. In terms of AI, reasoning implies that a system can progress beyond predicting the next word in a sentence or the next snippet in a line of code. It can more closely resemble thinking and rationalizing to solve problems.
  • Devin has successfully passed practical engineering interviews at leading AI companies and has even completed actual jobs on Upwork.

Kunal Kumar, COO, GeekyAnts, predicts, “resource allocation within development teams could see optimization, with developers focusing on higher-level tasks while Devin manages routine coding duties. This could translate into cost savings for businesses due to reduced labour hours.”

Does This Threaten Devs? It’s Still A Grey Area

Powered by innovative AI techniques and funded by industry giants, Devin's capabilities is projected to far exceed those of existing AI coding assistants. To put things into perspective, Cognition AI is funded by Peter Thiel's Founders Fund and tech industry leaders, including former Twitter executive Elad Gil and Doordash co-founder Tony Xu.

However, does this spell the end for human developers? It's too soon to tell. While Devin has shown impressive capabilities, it remains a tool designed to aid, not replace, human ingenuity and creativity.

Saurabh Srivastava, Senior Software Engineer I at GeekyAnts calculated Devin’s development capabilities. Here are his findings:

The reported success rate of Devin, an AI model designed to address GitHub issues, is 13.86%, albeit within a specific context of SWE-Bench derived data, encompassing 2,294 Issue-Pull Request pairs from 12 popular Python repositories, all of which have unit tests.

However, this data subset represents a niche scenario with well-documented issues and consistent requirements, unlike real-world scenarios where requirements often change rapidly.

Devin's evaluation was also based on a random 25% subset of the dataset, raising questions about the generalizability of its performance. By applying basic mathematics, the actual success rate in these repositories is calculated to be 3.46%."

Untitled (49).png

Critics focus on the specific nature of the demo's prompt-based questions and the lack of insight into how long it takes to solve problems. Given the recent hype surrounding certain tech trends, benchmarks like Devin's are bound to meet with some skepticism.

The technology behind Devin remains largely unknown and its long-term impact is yet to be seen. And we are here for it.

Book a Discovery Call.

blog logo