Claude Code 是迄今为止最令人愉悦的 AI Agent 工作流之一。它不仅让代码的定向编辑和即兴工具开发变得不再恼人,其使用体验本身甚至称得上是一种享受。它拥有足够的自主性来完成有趣的任务,同时又不会像其他工...
When building knowledge base applications based on Retrieval Augmented Generation (RAG), document preprocessing and slicing (Chunking) is a critical step to determine the final retrieval results. The open-source RAG engine RAGFlow provides various slicing strategies, but its official documentation lacks clear explanations on method details and specific cases, giving ...
When building Retrieval Augmented Generation (RAG) systems, developers often encounter the following perplexing scenarios: Headers of cross-page tables are left on the previous page, causing data to become unrelated. Models confidently give completely incorrect content in the face of ambiguous scans. The summation symbol "Σ" in a math equation is incorrectly...
Let's start with a simple task: scheduling a meeting. When a user says, "Hey, let's see if we can do a quick sync tomorrow?" An AI that relies only on Prompt Engineering might reply, "Yes, tomorrow is fine. What time would you like to schedule it, please?" This response though...
Abstract The emergence of large-scale language models (LLMs) has opened up a new paradigm of search engines that utilize generative models to gather and summarize information to answer user queries. We unify this emerging technology under the framework of Generative Engines (GEs), which generate accurate and personalized responses that quickly ...
In the early days of the Manus project, the team faced a critical decision: should they train an end-to-end agent model based on open source models, or should they build agents that take advantage of the powerful "context learning" capabilities of cutting-edge models? Go back a decade and developers didn't even have a choice in the field of natural language processing. In ...
When building AI systems such as RAGs or AI agents, the quality of the retrieval is key in determining the upper limit of the system. Developers typically rely on two main retrieval techniques: keyword search and semantic search. Keyword search (e.g. BM25): fast and good at exact matching. But once a user asks a question worded...
The experience of communicating with a friend who always forgets the content of the conversation and has to start from the beginning every time is undoubtedly inefficient and exhausting. However, this is precisely the norm for most current AI systems. They're powerful, but they're generally missing a key ingredient: memory. To build systems that can truly learn, evolve, and collaborate...
From API calls for Large Language Models (LLMs) to autonomous, goal-driven Agentic Workflows, the paradigm for AI applications is undergoing a fundamental shift. The open source community has played a key role in this wave, spawning a plethora of AI focused on specific research tasks ...
Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 inference model using Unsloth and GRPO. A complete guide from introductory to mastery. 🦥 What you'll learn What is RL? RLVR? PPO? GRPO? RLHF? RFT?...
With the rapid development and wide application of large-scale language modeling technology, its potential security risks have increasingly become the focus of the industry's attention. In order to address these challenges, many of the world's top technology companies, standardization organizations and research institutions have constructed and released their own security frameworks. In this paper, we will analyze nine of them...
In the field of Large Language Modeling (LLM) research, the model's Leap-of-Thought ability, i.e., creativity, is no less important than the logical reasoning ability represented by Chain-of-Thought. However, there is still a relative lack of in-depth discussions and valid assessment methods for LLM creativity, which in ...
Mastering Claude Code: Hands-on Agentic Coding Tips from the Front Lines Claude Code is a command line tool for Agentic Coding. By Agentic Coding, we mean giving AI a certain degree of autonomy to understand tasks, plan steps, and perform actions (such as...
The GPT-4.1 family of models offers significant improvements in coding, instruction adherence, and long context processing capabilities over GPT-4o. Specifically, it performs better on code generation and repair tasks, understands and executes complex instructions more accurately, and can efficiently handle longer input text. This hinted work ...
1. INTRODUCTION In today's information explosion, a large amount of knowledge is stored in the form of tables in web pages, Wikipedia and relational databases. However, traditional question and answer systems often struggle to handle complex queries across multiple tables, which has become a major challenge in the field of artificial intelligence. To address this challenge, researchers ...
As the capabilities of large-scale language models (LLMs) evolve at a rapid pace, traditional benchmark tests, such as MMLU, are gradually showing limitations in distinguishing top models. Relying on knowledge quizzes or standardized tests alone, it has become difficult to comprehensively measure the nuanced capabilities of models that are critical in real-world interactions, such as emotional intelligence, creative...
The development of large language models (LLMs) is rapidly changing, and their reasoning ability has become a key indicator of their intelligence level. In particular, models with long reasoning capabilities, such as OpenAI's o1, DeepSeek-R1, QwQ-32B, and Kimi K1.5, which simulate the human deep thinking process by solving compound...
INTRODUCTION In recent years, Large Language Models (LLMs) have made impressive progress in the field of Artificial Intelligence, and their powerful language comprehension and generation capabilities have led to a wide range of applications in several domains. However, LLMs still face many challenges when dealing with complex tasks that require the invocation of external tools. For example, ...
The Python ecosystem has always had a shortage of package management and environment management tools, from the classic pip and virtualenv to pip-tools and conda to the modern Poetry and PDM. Each tool has its area of specialization, but they often make a developer's toolchain fragmented and complex. Now, ...
Top