Don't worry! The AI software engineer is still not here
Devin doesn't meet its marketing hype, but it has an interesting developer workflow that may become more common.
In March, Cognition AI announced that they were developing “the first AI software engineer” named Devin.
With our advances in long-term reasoning and planning, Devin can plan and execute complex engineering tasks requiring thousands of decisions. Devin can recall relevant context at every step, learn over time, and fix mistakes.
We've also equipped Devin with common developer tools including the shell, code editor, and browser within a sandboxed compute environment—everything a human would need to do their work.
Finally, we've given Devin the ability to actively collaborate with the user. Devin reports on its progress in real time, accepts feedback, and works together with you through design choices as needed.
Cognition AI’s marketing materials from March make software engineering sound like a dying profession. Sure, it’s pitched as your “teammate.” But it sounds like regular software engineers don’t stand a chance. Devin can use all of the tools that you can. You have to sleep. Devin is “tireless.” You are just one engineer. You get paid hundreds of thousands of dollars. Devin gets paid… well they hadn’t worked out the pricing yet, but obviously a lot cheaper than that!
And people really did wonder if they were being replaced! Of course, nobody on Hacker News bought into this[0], but it’s not hard to find Reddit threads where people felt like their jobs don’t have any future.
Well, Devin just went into general availability. Now you can hire an AI programmer for $500 (or more) a month. After all the marketing hype, of course I needed to know how endangered my job is. I watched their available videos and read through their example sessions sessions. Devin is objectively neat. I discuss that at the end of this post. However, first we need to discuss Devin’s failure to meet its marketing hype.
In this week’s blog post, they explain how to set Devin up for success.
While Devin can be an all-purpose tool, we recommend starting with:
Small frontend bugs and edge cases - tag Devin in Slack threads
Creating first-draft PRs for backlog tasks - assign Devin tasks from your todo list at the start of your day
Making targeted code refactors - use the Devin IDE extension (for VSCode and forks) to point Devin to parts of the code you want edited or upgraded
Devin has helped teams with everything from building integrations to migrating and maintaining documentation. Devin is versatile, but works best when you:
Give Devin tasks that you know how to do yourself
Tell Devin how to test or check its own work
Keep sessions under ~3 hours and break down large tasks
Share detailed requirements upfront
Invest in coaching Devin by providing feedback in chat and accepting suggested Knowledge, or adding your own Knowledge manually
Let’s break it down.
Devin cannot bring its own expertise to the table. You should know how to do the tasks you give Devin.
The best task scope is “tiny” to “small.”
They warn against leaving Devin running for more than 3 hours, which falls far short of “tireless.”
You need to heavily specify the work to be performed, and additionally how you’d like it tested.
You need to be present to react to its questions as it learns more about the task, so you’re not entering your own flow state while Devin is running.
This fails to meet the promise of their initial marketing materials. The promise of Devin is a scalable software engineer. The reality is that Devin needs heavier hand holding than an intern[1].
To their credit, their most recent promotional material is more realistic. It even includes extensive caveats. The end of the first video talks extensively about how working with Devin requires a more exacting approach than working with humans, because humans can infer details about the task from their past experiences but Devin cannot. This is their own marketing material. They could have made it sound as good as they wanted. And they felt it was important to provide asterisks.
Now let’s forget the marketing materials and approach Devin from a world where we have tools like Cursor, Claude, Github Copilot, etc. Let’s talk about what Devin can do.
Looking over their replays of example Devin sessions, Devin is as capable as other modern AI tools. In addition to that, it’s objectively cool that it can use a terminal and push code to Github. It’s helpful that it can interact on Slack. The sample code outputs that they provide are indeed small. Some of the small examples needed at least two attempts from Devin. One of the examples notes that once Devin gets merge conflicts, you’re better off stopping Devin and finishing the task yourself.
The second video has a pretty compelling example for what a core Devin workflow might look like. The interviewee noticed that an endpoint was missing schema validation, and told Devin to add it while the developer continued to try to work on the problem. This is a really neat new AI workflow that I haven’t seen anywhere else. Sure there are a ton of AI companies and maybe some others have done something like this. But I haven’t seen them yet!
AI is increasingly a part of a modern developer workflow. Github Copilot can either do predictive tab completion or flesh out the method of a given signature. Cursor can be instructed how to modify a codebase. Standalone LLMs can take your instructions and produce entire files. In fact, I would describe a modern developer workflow is…
Set up your context. Move your cursor to the right location and do any other vibes-based setup that might get the LLM to work better./
Prompt the LLM.
Regenerate the LLM output until it is perfect or fixable.
Tweak and move on.
Devin introduces a new AI developer workflow, which is having a personal assistant for your coding.
Find a specific change that you’d like done, but isn’t your top priority.
Tell Devin to work on it, and answer any questions about the task.
It iterates until its output is either perfect or fixable.
Tweak and move on.
It’s not an AI teammate, but working with an AI assistant still seems like a talent that you can foster and improve.
I have a confession to make. When I first read Devin’s initial announcement in March, I thought of a future where it would become difficult to compete with AI agents that don’t have to sleep. But I can rest easy for now. Examining the sample sessions, every session gets terminated by the human developer telling Devin to “sleep
”.
[0] Skepticism is the best way to gain karma on Hacker News.
[1] Maybe they’re comparable on day 1 — which itself is impressive for Devin. But if you’re good at recruiting and fostering intern talent, your intern will be a powerhouse at the end of the summer and Devin will still be about where it started.