Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457

Plus: OpenAI reveals Elon's emails; Unitree's humanoid robot is available for purchase; Microsoft’s engineer raises concerns about Copilot Designer and responsible AI; and more!

Mar 08, 2024

Welcome to Weekly News Roundup Issue #457. This week, Anthropic released Claude 3, probably the best large language model available today.

In other news, OpenAI responded to Elon Musk's lawsuit by publishing emails between Musk and the OpenAI founders. A Microsoft engineer has raised concerns about Copilot Designer and responsible AI. Unitree's humanoid robot is now available for purchase, and more!

Since its release almost a year ago, in mid-March 2023, GPT-4 was the top large language model available. Every other model released since then has been compared to GPT-4. There was a gap in capabilities between OpenAI’s top model and its competitors. However, as 2023 progressed, other companies began to close that gap, inching closer to the mark set by GPT-4, with some models even surpassing GPT-4 in certain benchmarks.

Recently, the competition has caught up with OpenAI. First, there was Google DeepMind and its Gemini models. This week, Anthropic released its latest model, Claude 3, which claims to be the most intelligent large language model currently available and sets new industry benchmarks across a wide range of cognitive tasks. Let's investigate these claims and evaluate how intelligent Claude 3 really is.

Similarly to Google Gemini, Claude 3 is not a single model but a family of three models, which are named, in ascending order of capability, Opus, Sonnet and Haiku.

Anthropic says that Claude 3 Opus set “a new standard for intelligence” and that it shows us “the outer limits of what’s possible with generative AI”. The company backs these claims with the benchmark results comparing all Claude 3 models with their main competitors - GPT-4, GPT-3.5, Gemini 1.0 Ultra and Gemini 1.0 Pro.

A comparison of the Claude 3 models against other models. Source: Anthropic

The results are quite impressive. However, what’s missing from this table are results from GPT-4 Turbo or Gemini 1.5 Pro. The reason for missing benchmarks from GPT-4 Turbo is that they have not been made publicly available. However, someone on Twitter shared unofficial GPT-4 Turbo benchmarks and they show that GPT-4 Turbo scores higher than Claude 3 Opus. The comparison to Gemini 1.5 Pro is provided in a paper describing the model and benchmark results in more detail and Claude 3 Opus emerges as a better model here.

Those are results provided by Anthropic, so let’s take them with a pinch of salt as the company wants to present itself and Claude 3 in the best light possible to attract as many customers as possible. Over the next days and weeks, independent tests will be available and we will have a better picture of how Claude 3 compares in real-life applications to its competitors. One such comparison was made by AI Explained, who compared Claude 3 Opus with GPT-4 Turbo and Gemini 1.5 Pro across multiple queries and tested how good Claude 3 is at understanding images and reasoning. AI Explained concludes that Claude 3 Opus is “probably the most intelligent model currently available”. It is better at understanding images than other models and it is better at reasoning and answering even tricky questions. Claude 3 Opus also has lower rates of false refusal. AI Explained tested that by asking how to make a party “go down like a bomb”. Only Claude 3 understood the true meaning of the question and provided an answer.

Each of the Claude 3 models has a context window of 200K tokens. However, all three models are capable of accepting inputs exceeding 1 million tokens. The larger context window is currently limited to selected customers who need enhanced processing power. Anthropic also showed that Claude 3 has an excellent recall accuracy across a 200K token context window, meaning it can recall any information no matter where in the input text it was.

While we are discussing the Claude 3 context window, one of Anthropic’s employees shared an interesting story about internal testing on Claude 3 Opus in which the model suspected it was being evaluated. Some people take stories like these as proof that Claude 3 is sentient but the same behaviour can be explained by the way large language models work, how Claude 3 was trained and how it was conditioned to behave, as Yannic Kilcher explains in this video.

Opus and Sonnet are available through Anthropic’s API, and Haiku will be available soon. Sonnet can be accessed for free on claude.ai, while Opus is available for Claude Pro subscribers. Sonnet is also available through Amazon Bedrock and in private preview on Google Cloud's Vertex AI Model Garden—with Opus and Haiku coming soon to both platforms.

Overall, Claude 3 Opus is a very good model, possibly the best large language model available today. It shapes to be a good alternative to what OpenAI and Google have to offer and justifies Anthropic’s $18 billion valuation. It will be interesting to see where the AI industry goes from here. Anthropic does not believe Claude 3 is “anywhere near its limits” and promises to release frequent updates in the coming months. Meanwhile, other companies won’t stay still. Google might release Gemini 1.5 Ultra and jump over Claude 3. Meta is currently training Llama 3. And if rumours are true, OpenAI is already training what could be GPT-5, projected to be released in the second half of the year. In the meantime, OpenAI could release partially trained GPT-5 as GPT-4.5 to challenge other models and regain the top spot. The next few months are going to be interesting in the AI space.

If you enjoy this post, please click the ❤️ button or share it.

Do you like my work? Consider becoming a paying subscriber to support it

Become a paid subscriber

For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter.

Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving.

🦾 More than a human

3D-printed skin closes wounds and contains hair follicle precursors
Researchers have developed a novel 3D printing technique capable of creating a complete, living system of multiple skin layers, including the ability to print hair follicles—a significant advancement over previous methods that could only print thin skin layers. This innovative approach is intraoperative, meaning the skin can be printed during surgery to immediately and seamlessly repair damaged skin. Although currently tested only in rats, this method holds great promise for applications in dermatology, hair transplants, and plastic and reconstructive surgeries.

🧠 Artificial Intelligence

OpenAI and Elon Musk
A week ago, Elon Musk filed a suit against OpenAI, claiming the company had veered from its mission of being a non-profit AI lab dedicated to creating AGI that benefits all of humanity. In response, OpenAI revealed emails sent between its original founders, including Elon Musk, Sam Altman, Greg Brockman, and Ilya Sutskever. These emails show that Musk was attempting to integrate OpenAI into Tesla; it also became clear to the company that the non-profit path was not viable due to the significant costs associated with training an AGI. This whole situation brings more and more dirt to the light and who knows what else will be uncovered. I think the whole issue of how closed OpenAI has become will require its own article. In the meantime, I recommend this post from

Gary Marcus

who adds more context to what has been revealed in those emails.

Microsoft’s engineer raises concerns about Copilot Designer and responsible AI
Microsoft AI engineer Shane Jones has raised alarms about the Copilot Designer AI generating disturbing images, including violence and illicit content. Despite reporting these issues to Microsoft and the FTC, his concerns were overlooked, so he published the letters on LinkedIn. Jones advocated for the AI's removal until safer measures are established and criticized the lack of response and reporting mechanisms for harmful content, underscoring the need for stricter AI regulation.

Inflection-2.5: meet the world's best personal AI
Inflection has released its newest model, Inflection-2.5. According to the benchmarks provided by the company, the new model gets close to GPT-4 while using only 40% of the compute required for training compared to GPT-4. Inflection-2.5 also has the ability to search the internet for additional information. This new model powers Pi, Inflection's personal chatbot, which you can try using their web app.

Public trust in AI is sinking across the board
According to a report from Edelman, trust in AI and companies developing AI is dropping, in both the US and around the world. Globally, trust in AI companies has dropped to 53%, down from 61% five years ago. In the US, trust has dropped 15 percentage points (from 50% to 35%) over the same period. Tech has also lost the title of the most trusted sector, going from leading industry in trust in 90% of the countries Edelman studied eight years ago to being the most trusted in only half of the countries.

Researchers jailbreak AI chatbots with ASCII art - ArtPrompt bypasses safety measures to unlock malicious queries
Security researchers have discovered a method to jailbreak AI chatbots using ASCII art. All that is needed is to craft a malicious query using those large, ASCII letters. This new technique has been proven to bypass security measures in ChatGPT, Gemini, Clause, and Llama2.

If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word?

Refer a friend

🤖 Robotics

▶️ Unitree H1 Breaking humanoid robot speed world record (1:15)

After Agility Robotics’ Digit started being tested at Amazon and Figure announced a massive Series B funding round last week, it is now Unitree’s time to show what they bring to the growing humanoid robotics scene. Meet Unitree H1, a general-purpose humanoid robot from China that can run with a speed up to 3.3 m/s, jump, walk up and down stairs and, of course, dance. Additionally, the robot is now available for purchase for $150,000 with deliveries starting in Q1 2024.

Watch an autonomous helicopter demo wildfire response skills
Sikorsky, in cooperation with Rain, a developer of aerial wildfire containment technology, has shown a demo of an autonomous Black Hawk that can suppress early-stage wildfires by accurately dropping water on fires. The test, conducted in Stratford, Connecticut, showcased a blend of Rain's wildfire detection and Sikorsky's flight autonomy technologies, enabling rapid, precise wildfire responses. Rain plans to further develop this technology with fire agencies.

Anyware Robotics’ Pixmo Takes Unique Approach to Trailer Unloading

Robotics company Anyware Robotics joins the trailer unloading market with their Pixmo robot. Pixmo uses an off-the-shelf Fanuc robotic arm equipped with suction cups to lift boxes, which are then placed on a unique, built-in conveyor belt. This conveyor belt, which only Pixmo has, transports the boxes out of the trailer, significantly accelerating the unloading process. It can achieve a throughput of up to 1,000 boxes per hour, or approximately one box every four seconds.

🧬 Biotechnology

Cultivated Biosciences poised to take its plant-based cream to market in 2025
Cultivated Biosciences, a Swiss biotech food startup, has developed a plant-based cream that mimics the taste and texture of traditional cream. Instead of relying on animals, the company uses a specific type of yeast. Cultivated Biosciences is yet another biotech startup that is trying to reproduce dairy products without using any animals. The company is now talking with food service companies to take its product to market in 2025 in the US, once it receives approval from FDA.

This Swedish startup wants to reduce the cost, and controversy, around stem cells production
Cellcolabs, a Swedish biotech startup, has raised $8.7 million to industrialize a new method of producing stem cells from volunteer-donated bone marrow, potentially reducing the cost of mesenchymal stem cells (MSCs) by up to 90%. Operating one of the world's largest stem cell production facilities in Stockholm, Cellcolabs aims to transform healthcare by making these crucial cells affordable and widely available.

Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it.

Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.

A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support!

My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!"