Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI News

Anthropic Releases Claude Sonnet 4.5: Reinventing the "Rules" of Coding and AI Intelligence Development

2025-09-30 2.0 K

Anthropic The company has released its latest flagship model Claude Sonnet 4.5It's the most powerful coding model in the world. This is not just a regular iteration of the model, but a full-scale leap forward in the construction of AI intelligences (agents), computer operations, and complex reasoning capabilities.

Code is the cornerstone of the digital world, driving every app, spreadsheet, and software tool we use on a daily basis. Understanding and navigating these tools, as well as reasoning through complex problems, is at the heart of modern productivity.Claude Sonnet 4.5 was released to take this capability to new heights.

Accompanying the release of the new model is a series of reviews of the Claude A major upgrade in product ecology:

  • Claude Code EnhancementsThe new "checkpoints" feature allows users to save their progress and roll back to a previous state at any time. The terminal interface has also been refreshed with the introduction of native VS Code Expansion.
  • API Capability Extension: in Claude API New contextual editing features and memory tools have been introduced to allow AI intelligences to handle tasks with longer run times and higher complexity.
  • In-application functionality integration: in Claude application, users can now execute code and create files (e.g., spreadsheets, slideshows, and documents) directly in the conversation.
  • Developer Tools Open: Release Claude Agent SDKwill Anthropic Used internally to build Claude Code infrastructure open to all developers.

Claude Sonnet 4.5 has been fully launched today. Developers can access the Claude API invocations claude-sonnet-4-5 to use. Notably, the pricing is comparable to the previous generation of the Claude Sonnet 4 Consistent, for every million inputs/outputs token 3/15 dollars.

Top-notch intelligence and performance

Claude Sonnet 4.5 exist SWE-bench Verified The review achieved the best current score. This benchmark was achieved by capturing and validating GitHub on real software engineering problems to measure the coding and software repair capabilities of AI models in the real world. In real-world testing, theClaude Sonnet 4.5 Ability to stay focused for more than 30 hours on complex multi-step tasks.

Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-1

In terms of computer skills.Claude Sonnet 4.5 The same great leap forward has been made. In OSWorld In the benchmark test, it scored a massive 61.41 TP3T.OSWorld Designed to assess the ability of AI models to perform real computer tasks such as file management, software installation and system configuration. Just four months ago, theSonnet 4 With a leading score of 42.21 TP3T in this test, the improvement of the new model is obvious.

This capability has been adopted Claude for Chrome The extension was able to be applied. In the demo video below, the Claude How to work directly in the browser, including navigating websites, populating spreadsheets, and completing assigned tasks.

In addition to coding and computer use, the model has shown significant enhancements in broader assessments such as reasoning and math:

Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-2

In professional fields such as finance, law, medicine and STEM, experts have found that Claude Sonnet 4.5 Compare this with Opus 4.1 The old model within has made a qualitative leap in domain knowledge and reasoning ability.

financial legislation study of medicine STEM
Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-3 Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-4 Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-5 Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-6

The most "aligned" frontier model to date

In addition to being a powerful performer, theClaude Sonnet 4.5 also Anthropic The most "aligned" frontier model to date. Model Alignment aims to ensure that AI behavior is consistent with human intentions and values. With improved capabilities and extensive safety training, theAnthropic Dramatically improved the model's behavior, reducing undesirable tendencies such as flattery, deception, power-seeking, and encouraging delusions.

for modeled intelligences and computer-use capabilities.Anthropic Significant progress has also been made in defending against "cue word injection attacks". Cue word injection is one of the most serious risks facing AI intelligences today, where a malicious user can hijack an AI's original commands with cleverly constructed inputs, causing it to perform unintended or harmful actions.

Anthropic 发布 Claude Sonnet 4.5:重塑编码与 AI 智能体开发的“规则”-7

Claude Sonnet 4.5 exist Anthropic was released under the AI Safety Level 3 (ASL-3) framework, which ensures that the robustness of the model is matched with appropriate safety and security measures. These include classifiers designed to detect potentially hazardous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons.

Although these classifiers may sometimes misreport normal content, the Anthropic has reduced the false alarm rate by a factor of ten compared to the original, and provides users with the ability to seamlessly switch to the Sonnet 4 Options for the model.

Claude Agent SDK: A Core Tool for Openly Building Intelligent Bodies

Anthropic It took more than six months to iterate the Claude CodeThe team has accumulated a lot of experience on how to build and design AI intelligences. They have solved many tough problems: how to make intelligences manage memory effectively during long tasks, how to design permission systems that balance autonomy and user control, and how to coordinate multiple sub-intelligences to achieve common goals.

Now.Anthropic Packaging these experiences and tools into Claude Agent SDK Open to all developers. The SDK is not just for coding tasks, it provides a solid foundation for building complex intelligences of all kinds. This move is a clear signal:Anthropic Not only to provide powerful models, but also to empower developers to build the next generation of AI applications, thus creating a thriving ecosystem.

Research Preview: Imagine with Claude

As a limited time study preview, theAnthropic An experimental feature called "Imagine with Claude" has been launched. In this feature, theClaude The ability to generate software on-the-fly, where all features are not pre-defined and there is no pre-written code. The user is presented with Claude The process of dynamically creating and adapting software based on real-time interactions and requests.

This demo vividly shows what creativity can be unleashed when top models are combined with the right infrastructure. "Imagine with Claude" will be available to Max subscribers for the next five days.

How to get started

It is officially recommended that all users upgrade to Claude Sonnet 4.5. Whether you do this through an app, an API, or a Claude Code utilization ClaudeThe new models are a "direct replacement" option with greatly improved performance at the same price.

Evaluation Methodology Description

  • SWE-bench Verified: All Claude The results all use a file containing the bash A simple framework report of two tools, the File Editor and the Document Editor. The reported score of 77.2% was averaged over 10 trials on the full 500-problem dataset, calculated without testing, with a thought budget of 200K tokens.
  • Terminal-Bench: All reported scores use the default smartbody framework (Terminus 2) with an XML parser, with multiple runs on different dates to smooth the evaluation of sensitivity to the inference infrastructure.
  • AIMESonnet 4.5 The scores are reported at a sampling temperature of 1.0. The model uses 64K inference tokens in the Python configuration.
  • OSWorld: All reported scores use the official OSWorld-Verified frame with a maximum step count of 100, averaged over 4 runs.
  • MMMLU: All reported scores are averages of 5 runs on 14 non-English languages using extended thinking (up to 128K).
  • Finance Agent:: All scores are determined by Vals AI Run and publish on their public leaderboards.
  • The scores for the other models are referenced from OpenAI cap (a poem) Google The official release or public ranking of the

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish