Omni-Bot-SDK-OSS is an open source WeChat automation framework based on visual recognition technology that supports WeChat version 4.0 RPA (Robot Process Automation) operations. It achieves zero runtime intrusion through custom YOLO models and OCR technology, suitable for developers to build automation tasks. Users can dynamically access plug-ins to adapt platforms such as OpenAI or Dify, parse multiple message types such as text, images, files, etc., and support message sending and extended functionality such as applet and circle of friends operations. The project is hosted on GitHub, developed in Python, and is suitable for deployment on standalone devices to avoid interfering with user operations.
Function List
- Based on YOLO model and OCR technology, it realizes window recognition and message content parsing.
- Support dynamic access to plug-ins, compatible with OpenAI, Dify and other third-party platforms.
- Parses WeChat messages, including text, image, file, and other types.
- Support message sending function with text, image, file, etc.
- Extendable to content publishing for applets and circles of friends.
- Real-time message processing through database listening.
- Provides a visual management client that requires no coding to operate.
Using Help
Installation process
To use Omni-Bot-SDK-OSS, follow the steps below to complete the installation locally or on a standalone device. The environment preparation and deployment process is relatively simple and suitable for developers familiar with Python.
- clone warehouse
Open a terminal and run the following command to clone the project locally:git clone https://github.com/weixin-omni/omni-bot-sdk-oss cd omni-bot-sdk-oss
- Creating a Virtual Environment
To avoid dependency conflicts, it is recommended to create a Python virtual environment:python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
- Installation of dependencies
Install the required dependencies for the project in the virtual environment:pip install -e .
- configuration file
The project requires a configuration fileconfig.yaml
It is used to set the parameters of microsoft window, database connection and so on. Users need to create and fill in the configuration file according to the official documentation (README or Wiki in the repository), which contains the YOLO model path, OCR settings and plugin parameters. - Operational framework
Use the following code to start the framework:from omni_bot_sdk.bot import Bot def main(): bot = Bot(config_path="config.yaml") bot.start() if __name__ == "__main__": main()
Once running, the framework listens for messages through the database and performs automated tasks based on the configuration.
Main Functions
1. Message parsing and processing
Omni-Bot-SDK-OSS uses YOLO model and OCR technology to recognize the content of messages in WeChat windows. After launching the framework, it will:
- Listen for new messages in the database (user-configurable database, such as MySQL or SQLite).
- Parses the message type (text, image, file, etc.) and stores the result in the message queue.
- Distribute messages to the plugin chain through the plugin manager to execute customized logic.
Operational Steps:
- Configure database connection parameters (in the
config.yaml
(set the database address and credentials in the) - Ensure that the microsoft client is running on the target device and the window remains visible.
- After launching the framework, the system automatically scans the WeChat window, recognizes new messages and parses the content.
2. Messaging
The framework supports sending text, image and file messages to simulate human operations. Operation steps:
- Define the send target (contact or group chat name) in the plugin.
- Call the framework's send interface, for example:
bot.send_message(contact="目标联系人", message_type="text", content="你好")
- Make sure the WeChat window is active and the frame will automatically locate the input box and send it.
take note of: Due to the use of visual identification, this may lead to incorrect send targets when there is a contact or group chat with the same name. It is recommended that unique identifiers (e.g., note names) be used to improve accuracy.
3. Plug-in extensions
Users can extend the functionality by writing plugins to support OpenAI or Dify and other platforms. Plugin development steps:
- exist
plugins
directory to create Python files that define the plugin logic. - The plugin needs to inherit the framework's
Plugin
class and implements theprocess_message
Methods. - Sample plugin code:
from omni_bot_sdk.plugin import Plugin class MyPlugin(Plugin): def process_message(self, message): # 自定义逻辑 return {"action": "send", "content": "收到消息"}
- Register the plug-in to the
config.yaml
, the frame will be loaded automatically.
4. Visualization client
For users who are not familiar with coding, the project provides a visual management client. Operation steps:
- Download the client (from the GitHub Release page).
- After installation, open the client and import
config.yaml
Documentation. - Configure message listening, sending rules and plugins through the interface without writing code.
- The client supports viewing message queues and execution logs for easy debugging.
caveat
- Deployment environment: RPA operation takes up the mouse and keyboard, and it is recommended to run it on a separate device to avoid interfering with daily use.
- Accuracy limitationsVisual recognition may be incorrect due to overlapping windows or resolution issues, make sure the WeChat window is unique and clear.
- Plug-in Development: Check the official documentation for detailed plugin API and sample code.
application scenario
- Automated Customer Service
Enterprises can listen to customer messages through the framework and automatically reply to frequently asked questions or forward messages to human customer service. For example, e-commerce platforms can automatically reply to order status inquiries. - Group Chat Management
In WeChat group chats, the framework can automatically send announcements, event notifications, or trigger specific replies based on keywords, suitable for community operations or marketing scenarios. - Data collection
Developers can use the message parsing feature to collect group chat or contact messages, analyze user behavior or extract key information for market research. - Content distribution
Media or self-media practitioners can use the framework to automatically publish article links, pictures or applets to WeChat groups or circles of friends to improve the efficiency of content dissemination.
QA
- Does the framework support all WeChat versions?
Currently only WeChat version 4.0 is supported. Other versions may fail to recognize due to interface changes, so it is recommended to test compatibility. - How can I improve the accuracy of my message delivery?
Use unique note names or group chat IDs to avoid same name conflicts. Ensure that the WeChat window stays in the foreground and is clearly visible. - What prior knowledge is required for plugin development?
Familiar with Python programming and basic YOLO/OCR principles. Refer to the plugin examples in the official documentation to get started. - Is the visualization client free?
Yes, the client is included in the open source project and is free to download and use, but you need to configure the environment yourself.