How Devin Signals the Age of AI Agent - Weekly News Roundup - Issue #458
Plus: humanoid robot understands human speech; Nvidia gets sued over AI use of copyrighted works; Mercedes-Benz will trial a humanoid robot; DeepMind SIMA; and more!
Hello and welcome to Weekly News Roundup Issue #458. This was a week full of big news in the world of AI and robotics. We will take a closer look at Devin, the “first AI software engineer” and what it can tell us about the future of AI assistants and AI agents. In other news, Google DeepMind released SIMA, an AI agent playing 3D games, MEPs approve the world's first comprehensive AI law and Nvidia gets sued over AI use of copyrighted works. Meanwhile, OpenAI got into some trouble (again), and that needs a separate article to cover it properly. It was also a big week for humanoid robots - Figure has shown the fruits of their partnership with OpenAI and Mercedes-Benz will trial Apptronik. Enjoy!
On March 12th, 2024, Cognition AI emerged from the stealth mode and showed to the world Devin, “the first AI software engineer”. The software engineering community had mixed reactions to this news. Some responded with fear, anxiety, or anger, while others were more excited. Let's take a closer look at Devin, what it means for the future of software engineering and what it says about the future trends in AI.
According to Cognition, Devin is “the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.”
Unlike other coding assistants such as GitHub Copilot, which are designed to suggest changes to code or generate blocks of code, Devin aims to build entire apps from just a text description. It takes a text input describing what needs to be built, creates a step-by-step plan for developing the requested app, and then writes the code.
Cognition claims Devin is much better than other large language models such as GPT-4 or Claude 2 at solving coding tasks in SWE-Bench. Cognition describes their methodology in detail in the technical report.
It would be interesting to see how Devin would compare to Claude 3 or Google Gemini.
Devin is not yet available to the public, so independent verification of Cognition's claims is limited. However, the company has published a couple of videos showing different people writing working programs with Devin, showcasing Devin’s ability to use unfamiliar technologies, deploy web apps, autonomously find and fix bugs in codebases and solve real jobs on Upwork.
Upon closer examination of these videos by software engineers, certain issues were noted. While not explicitly acknowledged by Cognition, it appears that Devin can take a considerable amount of time — from several minutes up to 30 minutes — to produce a response. Under the hood, Devin uses reinforcement learning on top of GPT-4 which means it very likely generates multiple possible answers and evaluates them to find the best one. In theory, this approach can result in AI models with good reasoning capabilities but it is computationally very heavy and therefore expensive.
Although Cognition is a very young company, it has raised $21M to date from Peter Thiel’s Founders Fund and former Twitter executive Elad Gil and Doordash co-founder Tony Xu.
Devin is the latest step in automating coding. Andrej Karpathy perfectly summarises the recent trends in writing code in this tweet. Software engineers quickly adapt tools that make them more efficient. AI coding assistants like Github Copilot quickly became part of a modern software engineering toolkit.
Tools like Devin represent the next step forward in automated coding, where humans act more as supervisors who express their ideas on a high level of abstraction (for example, “write a Tinder for cats”). AI then writes the app and goes back and forth until it meets the requirements.
Right now, Devin and similar tools won’t replace software engineers. However, the AI will only improve, and it is possible that soon, the AI will code faster and better than any human. Jensen Huang, the CEO of Nvidia, even said that children today shouldn't learn to code because AI will do the coding for them. Others disagree, saying there will still be a need for human software engineers.
In either case, software engineering will substantially change in the next few years. Senior and experienced developers will be fine but the junior and less experienced developers will most likely be the most impacted by these new AI tools.
Interestingly, Cognition is looking to hire a software engineer. One might ask why the company bothers to hire someone since they have an AI ready to do the job.
Devin also represents another trend in AI - the emergence of AI agents. Last year, following the release of ChatGPT and GPT-4, some people realized the potential of asking these AI models to outline solutions for complex tasks and then executing these plans step by step. There was a brief period when projects such as AutoGPT saw a surge in popularity and then hype faded away. Now, it appears that AI agents are making a comeback. Devin serves as a perfect example of how an AI agent works - just describe the task and the AI will figure out the rest.
And it is not just AI enthusiasts looking forward to AI agents. Big players on the AI scene are interested in them, too. During the first OpenAI Dev Day last year, Sam Altman mentioned that the next milestone on the path to Artificial General Intelligence (AGI) involves the development of agents: highly capable bots that can plan and execute complex tasks. Demis Hassabis, the CEO of Google DeepMind, openly speaks about applying AlphaGo-like reinforcement learning features to create models capable of reasoning. Q*, the model that triggered the OpenAI drama in November 2023, is rumoured to also follow this or a similar approach.
Although Devin is not perfect and won't replace software engineers immediately, it offers a glimpse into the future. AI agents have the potential to empower individuals to accomplish tasks that would normally require a team. However, these advancements may also negatively affect the livelihoods of many. Regardless, we are on the cusp of significant changes to how we live and how we work.
If you enjoy this post, please click the ❤️ button or share it.
Do you like my work? Consider becoming a paying subscriber to support it
For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter.
Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving.
🦾 More than a human
Brain stimulation tech wins €5M to fight depression at home
Sooma, a Finnish medtech startup, has secured €5M to further develop a portable brain stimulation device for Transcranial Direct Current Stimulation (tDCS). Their device promises to help treat depression by stimulating specific parts of the brain. Patients can use it as a stand-alone intervention, or in combination with other therapies. Companies like Sooma are part of an emerging neurotech industry that promises to one day unlock full human potential by merging our minds with machines. I've written an entire series of articles on this subject for those interested in learning more.
🧠 Artificial Intelligence
MEPs approve world's first comprehensive AI law
The European Parliament has approved the EU AI Act, the world's first comprehensive framework for constraining the risks of artificial intelligence. The AI Act works by classifying products according to risk and adjusting scrutiny accordingly. The law's creators said it would make the tech more "human-centric." "The AI act is not the end of the journey but the starting point for new governance built around technology," MEP Dragos Tudorache added.
Nvidia is sued by authors over AI use of copyrighted works
Three authors - Brian Keene, Abdi Nazemian, and Stewart O'Nan - have sued Nvidia for using their copyrighted works without permission to train its NeMo AI platform. The lawsuit, filed in San Francisco federal court, centres on Nvidia's alleged use of around 196,640 books, including those by the authors, to train NeMo. The authors are seeking unspecified damages for copyright infringement over the past three years.
Claude 3 Haiku: our fastest model yet
Last week, Anthropic released the Claude 3 family of models, the new best large language models available. However, only two out of three models were initially released. This week, Anthropic has released the final model in the Claude 3 family, named Haiku. According to benchmarks provided by Anthropic, Haiku, the smallest model in the Claude 3 family, outperforms both GPT-3.5 and Gemini 1.5 Pro across nearly all benchmarks while also being faster and cheaper than its competitors.
▶️ Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat (1:01:33)
In this interview, Demis Hassabis, the CEO of Google DeepMind, shares why he thinks the path to AGI is a combination of large language models and reinforcement learning and why we can expect AGI to happen this decade. The other topics covered were your standard AI safety and alignment questions, how DeepMind plans to balance open source with safety and security, what’s next for DeepMind and why Hassabis is so passionate about robotics.
Can Chinese companies make Sora? This Tsinghua large model team gives hope
Sora, OpenAI’s text-to-video generator, caught everyone’s attention for its ability to generate high-quality and realistic video clips.
Google DeepMind SIMA - a generalist AI agent for 3D virtual environments
Researchers from Google DeepMind continue their tradition of creating AI agents that play video games with Scalable Instructable Multiworld Agent, or SIMA. This new agent can operate in 3D virtual worlds and follow natural-language instructions to carry out tasks within them, as a human might. Apart from creating an AI agent that can play a variety of games, from No Man’s Sky to Goat Simulator 3, researchers hope SIMA will help create more general AI systems and agents that can understand and safely carry out a wide range of tasks in a way that is helpful to people online and in the real world.
Cerebras WSE-3 AI Chip Launched 56x Larger than NVIDIA H100
Cerebras, an AI chip startup, has released its newest chip, WSE-3, which combines 900,000 cores on a single chip the size of a dinner plate. This approach promises to eliminate GPU-to-GPU bottlenecks that cause massive inefficiencies in training large language models. Cerebras says that is possible to train a 24 trillion parameter model on just one WSE-3 chip instead of thousands of GPUs. The article I’m linking to gives a good overview of the WSE-3 chip. I also recommend checking out this video from Dr. Ian Cutress of TechTechPotato who provides further insight into the chip's capabilities and Cerebras’ business model.
If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word?
🤖 Robotics
Got To Go Fast: The Rise Of Super-Fast FPV Drones
After months of work, Luke Maximo Bell has built the world's fastest FPV drone. This drone, resembling more a mini rocket than a traditional drone, can achieve speeds of up to 401 km/h. In a drag race, the drone was faster than a Red Bull F1 car and managed to keep pace with Max Verstappen around the Silverstone - something that not everyone can do these days.
▶️ Figure Status Update - OpenAI Speech-to-Speech Reasoning (2:34)
Two weeks ago, Figure announced a partnership with OpenAI. This week, the company has showcased the results of this collaboration. In a recent video, Figure 01 leverages OpenAI's models to engage in comprehensive conversations. The robot is capable of describing its surroundings, understanding commands in plain English, planning future actions, and applying common sense reasoning. Moreover, it can reflect on its memories and articulate its thought process verbally. This impressive demonstration is further proof that the era of commercial humanoid robots may be closer than we think.
Mercedes begins piloting Apptronik humanoid robots
Apptronik becomes the third commercial humanoid robot, following Agility Robotics’ Digit and Figure 01, to enter trials in a real-world environment. According to the press release, Apptronik’s robot will begin trials at Mercedes-Benz where the robots will be doing “some low skill, physically challenging, manual labour”, presumably moving things from one place to another.
ANYmal robot has a new skill: parkour
ANYmal, a four-legged robot by ETH Zurich researchers, joins an elite club of robots that can do parkour. Although ANYmal sometimes lacks the grace of Boston Dynamics’ Atlas, it stands out for its ability to keep its footing on both unstable and slippery surfaces, proving itself highly capable of navigating challenging terrains like construction sites and disaster zones. This advancement not only pushes the limits of what robots can do but also prepares ANYmal for real-world applications like search and rescue missions.
🧬 Biotechnology
▶️ Can We Use Bacteria to Refine Rare Earths? (14:17)
Modern electronics would not be possible without rare Earth elements. However, extracting them from ore is a complex, expensive and toxic process. This video explores the idea of using bacteria to refine rare Earth elements and how these new biological methods could potentially resolve the issues associated with current chemical extraction methods.
Scientists move step closer to making IVF eggs from skin cells
Scientists are a step closer to making IVF eggs from patients’ skin cells after adapting the procedure that created Dolly the sheep, the first cloned mammal, more than two decades ago. The work raises the prospect of older women being able to have children who share their DNA and overcome common forms of infertility caused by a woman’s eggs becoming damaged by disease or cancer treatment. The radical procedure, which may take a decade to perfect and approve in humans, would also enable male couples to have genetically related children since the men’s DNA could be combined in the fertilised egg and carried to term by a surrogate mother.
Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it.
Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.
A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support!
My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!"