DiffMem is a lightweight memory backend designed for AI intelligences and conversational systems. It innovatively uses Git as the core of the memory store, saving AI memories as human-readable Markdown files. Git's commit history is used to track the evolution of memories over time, while the system employs in-memory BM25 indexing for fast and interpretable information retrieval. This project is currently a Proof of Concept (PoC) to explore how a version control system can be utilized to build an efficient and scalable memory foundation for AI applications. DiffMem treats memory as a versioned knowledge base: the "current state" of knowledge is stored in editable files, while historical changes are stored in Git commit graphs. This design allows intelligences to query on a compact, up-to-date level of knowledge, while at the same time delving into the evolution of the memory when needed.
Function List
- Git-driven memory storage: Leverage Git's version control capabilities to manage and track the evolution of AI memories, with each memory update corresponding to a Git commit.
- Human-readable format: Memories are stored as simple Markdown files for developers to read, edit and manage directly.
- Current State Focus: By default, only the "current state" of knowledge documents are indexed and searched, reducing the scope of queries and improving retrieval efficiency and token economy in the context of large language models (LLMs).
- Differential Evolution Tracking: By
git diff
and other commands, the smart body can efficiently query for changes in specific information over time without having to load a complete history. - Fast Text Search: An in-memory BM25 index is built-in to provide fast millisecond response for keyword searches.
- modular component: The system consists of several core modules, including the one responsible for analyzing conversations and submitting updates
写入智能体
, responsible for integrating the query context of the上下文管理器
, as well as those responsible for performing searches搜索智能体
The - Lightweight and easy to integrate: The project has few dependencies, does not require deployment of a separate server, and can be integrated directly into existing projects as a Python module.
Using Help
DiffMem is designed so that it can be imported as a simple Python module without the need for complex server deployment. Below is a detailed installation and usage procedure.
Environment Preparation and Installation
- Cloning Codebase
First, it needs to be cloned from GitHubDiffMem
the source code repository to your local computer. Open a terminal or command line tool and enter the following command:git clone https://github.com/Growth-Kinetics/DiffMem.git
After execution, the codebase will be downloaded to a file in the current directory named
DiffMem
in the folder of the - Go to the project directory
utilizationcd
command into the project folder:cd DiffMem
- Installing dependencies
DiffMem relies on a number of Python libraries to run, and these dependencies are documented in therequirements.txt
file. You can use thepip
to install them:pip install -r requirements.txt ``` 这个命令会自动下载并安装`gitpython`、`rank-bm25`和`sentence-transformers`等必要的库。
- Setting the API key
DiffMem requires the use of a Large Language Model (LLM) to work together, for example to analyze the content of conversations. The project uses OpenRouter to manage LLM calls. You need to set your API key in an environment variable.
For Linux or macOS systems, use theexport
Command:export OPENROUTER_API_KEY='你的密钥'
For Windows systems, use the
set
Command:set OPENROUTER_API_KEY='你的密钥'
please include
你的密钥
Replace it with your own valid API key.
Core Function Operation
The main functionality of DiffMem is accomplished through theDiffMemory
class is exposed to the user. You can initialize this class and then call its methods to read, write, and query the memory.
- Initialize the memory bank
First, you need to import theDiffMemory
class and initialize it with a local path. This path will serve as the Git repository where the memory will be stored.from src.diffmem import DiffMemory # 初始化记忆库,指定仓库路径、用户名和API密钥 # 如果路径不存在,系统会自动创建一个新的Git仓库 memory = DiffMemory( repo_path="/path/to/your/memory_repo", user_name="alex", api_key="你的OpenRouter密钥" )
In the code above, the
/path/to/your/memory_repo
Replace it with the path to the folder where you wish to store your memories. - Processing and submitting memories
You can pass a piece of dialog or conversation content to theprocess_and_commit_session
method.DiffMem's write intelligences will automatically analyze the text, extract or update entity information, and then save those changes as a single Git commit.# 假设你有一段新的对话内容 conversation_text = "今天和妈妈一起喝了咖啡,她提到下周要去旅行。" session_id = "session-12345" # 为这次会话指定一个唯一的ID # 处理并提交这次会话的记忆 memory.process_and_commit_session(conversation_text, session_id) print("记忆已成功处理并提交。")
Upon execution, the relevant knowledge is updated to the Markdown file and a new Git commit record is generated, and the commit message will contain the session ID.
- Get Context
When it's time to interact with the AI, you can use theget_context
method to get relevant background information for the current conversation. This method supports different "depth" parameters to control the level of detail returned.depth="basic"
:: Get the core information block.depth="wide"
: Conduct semantic searches to return more broadly relevant information.depth="deep"
: Returns the complete contents of the file associated with the query.depth="temporal"
: Returns temporal information containing Git's history.
# 假设当前的对话是关于“妈妈的旅行计划” current_conversation = "妈妈的旅行计划定了吗?" # 获取深度上下文 context = memory.get_context(current_conversation, depth="deep") # 将获取到的上下文信息打印出来 print("获取到的相关上下文:") print(context)
This context can be fed into the LLM to generate more accurate and contextually knowledgeable responses.
- Direct execution of searches
You can also just use thesearch
method to retrieve information from the memory bank.query = "关于妈妈的信息" search_results = memory.search(query) print(f"关于 '{query}' 的搜索结果:") for result in search_results: print(f"- {result}")
Sample code to run
In the project'sexamples/
directory, a full usage demo file is availableusage.py
. You can run it directly to observe the complete workflow of DiffMem.
Execute the following command in the terminal:
python examples/usage.py
This script will demonstrate how to initialize the memory bank, submit new memories, and retrieve context based on new conversational content, showing the entire chain of DiffMem from message input to output.
application scenario
- Personal AI assistants
Long-term memory capabilities can be provided to personal AI assistants. The assistant can remember user preferences, past conversations, important dates and events. As memories evolve over time, the assistant can accurately recall "what we discussed last week" or "what is my daughter's age now" because it focuses only on the most recent state of the information while retaining a historical track. - AI systems that require continuous learning
In areas such as customer service and technical support, AI intelligences need to constantly learn new product knowledge and business processes.DiffMem can record the evolution of this knowledge. When an operating guide is updated, the system saves the new version and also logs the changes via Git history, ensuring that the AI always provides the most accurate information and can trace the historical version of any knowledge point. - Multi-Intelligence Collaboration
In a multi-intelligence system, different intelligences can share the same DiffMem memory. Through Git's branching and merge request mechanisms, intelligences can collaborate on updating shared knowledge and resolving possible "memory conflicts" to form a consistent, versioned team memory. - Interpretability and Debugging
For developers, AI sometimes behaves like a "black box", and DiffMem greatly enhances the interpretability of AI memories by storing them as human-readable text and Git commit history. Developers can review code as if it weregit log
cap (a poem)git diff
To see what the AI has "learned" and "how the knowledge has changed" is very helpful in debugging the AI's behavior and decision-making process.
QA
- How is DiffMem different from traditional vector databases?
Vector databases are primarily used for similarity searching of high-dimensional data, where information (e.g., text) is converted into vectors and stored, and then similar content is found by calculating the distance between the vectors. DiffMem, on the other hand, adopts a completely different idea, which does not rely on vector embeddings but manages memories as versioned text documents. Its core advantage is that it handles information that evolves over time, clearly tracking changes in a fact (e.g., a person's age changing from 9 to 10), whereas vector databases may retain outdated and noisy information when dealing with these kinds of "factual updates". - Why choose Git as your backend technology?
Git was chosen because it provides a mature and powerful solution for versioned document management. Git's strengths fit well with the needs of AI memory: it naturally supports tracking changes (diff
), recording history (log
), back to any point in time (checkout
) and branch management (branch
). In addition, Git is distributed and data is stored in simple files, which makes the repository highly portable and persistent, and does not rely on any proprietary formats. - Is DiffMem suitable for production environments?
Currently, DiffMem is a Proof of Concept (PoC) project, and its authors make it clear that it has not yet been hardened for production environments. It has some limitations, such as the need to manually perform remote synchronization of Git (push
/pull
), the error handling mechanism is relatively basic, and there is no locking mechanism designed for multi-user concurrent access. Therefore, further development and testing is required before direct use in large-scale commercial applications. - What are the main software dependencies needed to run DiffMem?
DiffMem is a lightweight project with major dependencies includingGitPython
library (for manipulating Git repositories in Python),rank-bm25
library (for implementing efficient text retrieval algorithms) and thesentence-transformers
(used to support semantically related search functions).