AI

Anthropic’s Claude 3 Is More Powerful Than ChatGPT?

By Scott FaulknerUpdated:March 5, 20246 Mins Read

Anthropic's Claude 3 Is More Powerful Than ChatGPT?

The AI race in 2024 is continuing to heat up as Anthropic, a startup backed by Google and Amazon, debuted Claude 3, a suite of artificial intelligence models that it claims is setting new industry benchmarks across a range of cognitive tasks, even approaching near-human capability in some cases.

Anthropic announces Claude 3!

The new models include: Opus, Sonnet, & Haiku

Opus, their most advanced model, outperforms GPT4 & Gemini Ultra in many categories including reasoning, math, code, and several other benchmarks

Here is Opus in action:
pic.twitter.com/6wgh7WYZ6c
— Allen T (@Mr_AllenT) March 4, 2024

Moreover, the AI startup confidently stated that the most capable of its new models, Claude 3 Opus, its first multimodal support, easily outperforms OpenAI’s GPT-4 and Google’s Gemini Ultra on industry benchmark tests, such as MMLU (undergraduate level knowledge), GSM8K (grade school maths), HumanEval (coding), and the colourfully named HellaSwag (common knowledge). CEO Dario Amodei even stated in an interview that “this is the Rolls-Royce of models, at least at this point in time.”

Besides the Claude 3 Opus, the suite of AI models includes Claude 3 Haiku and Claude 3 Sonnet. Although the two are less expensive than Opus, Haiku is the fastest and most compact model, while Sonnet’s main selling point is its high endurance in large-scale AI deployments.

Also Consider: 7 Free GPTZero Alternatives in 2024 to Detect AI Content

Navigator

Claude 3 Family Models

Claude 3 Opus

As mentioned above, Claude 3 Opus can navigate open-ended prompts and has a human-like understanding with cognitive reasoning, expert knowledge, mathematics, and language fluency. For example, researchers from the AI startup highlighted their surprise when discovering that Claude 3 Opus seemed to detect that they were testing it.

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… pic.twitter.com/m7wWhhu6Fg
— Alex (@alexalbert__) March 4, 2024

In a lengthy X post by prompt engineer Alex Albert, he stated: “Fun story from our internal testing on Claude 3 Opus. It did something I had never seen from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the “needle”) into a corpus of random documents (the “haystack”) and asking a question that could only be answered using the information in the needle. When we ran this test on Opus, we noticed some interesting behaviour – it seemed to suspect that we were running an eval on it.”

For example, Albert mentioned, “Opus not only found the needle but also recognised that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.”

While it is a remarkable feat and could be a breakthrough in the level of meta-cognition and self-awareness in AI, it still follows the ground rule of machine learning programs governed by word and conceptual associations, not conscious entities, meaning Opus could have learned the concept from its training data.

Claude 3 Haiku and Claude 3 Sonnet

While its youngest siblings, Claude 3 Haiku and Claude 3 Sonnet, may not possess near-human capabilities, Anthropic claimed on its news page that they are more targeted towards specific use. For example, Haiku answers simple queries and requests with unmatched speed and near-instant responsiveness.

Anthropic's Claude 3 Is More Powerful Than ChatGPT? (1) — Image Source: YouTuber TheAIGRID

Therefore, the AI startup highlighted that Haiku is more suited to build seamless AI experiences that mimic human interactions in customer interactions and content moderation. In addition, Haiku is the recommended cost-saving model out of the three, as it helps optimise logistics and inventory management and extract knowledge from unstructured data.

As for Sonnet, the model delivers strong performance at a lower cost than its peers, and its high endurance in large-scale AI deployments is ideal for enterprise workloads. For instance, Sonnet can search and retrieve vast amounts of knowledge in data processing and recommend products by forecasting and target marketing.

Lastly, Sonnet can effectively save time with its code generation, quality control and parsing text from images, making it the perfect balance between intelligence and speed compared to its brothers.

Related: Meet Le Chat, the French Rival to ChatGPT

Claude 3 Opus Stands Ahead of ChatGPT and Gemini Ultra

Today, we're announcing Claude 3, our next generation of AI models.

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision. pic.twitter.com/TqDuqNWDoM
— Anthropic (@AnthropicAI) March 4, 2024

Besides its features, Anthropic also compared Claude 3 peers on multiple benchmarks of capabilities, such as ChatGPT and Gemini. For example, in the image shown above, Claude 3 Opus trumps GPT-4 and Gemini Ultra on 10 AI benchmarks, while Sonnet and Haiku fall short by more than half.

However, these comparisons did not faze the AI startup as they emphasised the three models’ increased speed and cost-effectiveness compared to Claude 2. For example, there is a significant jump in the pace of analysis, forecasting, content creation, code generation, and multilingual conversation.

Lastly, the AI startup mentioned its Claude 3 family model would feature enhanced vision and advanced agentic capabilities, allowing the models to process visual formats like photos, charts, and diagrams, similar to GPT-4V and Google’s Gemini, in the coming months.

For more AI comparisons, follow our Twitter page https://twitter.com/playerdotme for daily updates and coverage.

Is Claude 3 Overblown and Overhyped?

As with every new AI release, the words revolutionary and breakthrough are synonymous with its introduction. However, many promised products, such as Apple’s Vision Pro, fall short of what it could deliver. Thus, is Claude 3 a simple case of blowing out of proportion?

According to AI researcher Simon Willison: “LLM benchmarks should always be treated with a little bit of suspicion as how well a model performs on benchmarks doesn’t tell you much about how the model feels to use.” However, he acknowledged that Claude 3 could be different as “no other model has beaten GPT-4 on a range of widely used benchmarks like this.”

While Simon Willison shared some optimism on the new AI model, HFS Research analyst David Cushman was quite crude as he stated: “These models don’t appear to be earth-shaking. They are a little better, allegedly, than some of the current models and with the speed with which AI vendors release models, new models are sure to be released soon that will overtake Claude 3.”

Nevertheless, opinions are just other people’s thoughts and do not validate the product. To try the Opus and Sonnet yourself, head to Anthropic’s website or Claude API. Do note that Opus is only available through Anthropic’s web chat interface if you pay $20 monthly for “Claude Pro”. As for Haiku, the AI startup claimed it would be available soon.

Author Profile

Scott Faulkner

Latest entries

Visited 112 times, 1 visit(s) today

Previous ArticleApple’s Wallet Takes Hit: EU Imposes $2 Billion Fine for Antitrust Breach

Next Article MacBook Air with Latest M3 Chip: A Great MacBook Air Upgrade

Scott Faulkner