Overseas access: www.kdjingpai.com
Ctrl + D Favorites

Magentic-UI is an open source intelligent agent tool developed by Microsoft Research, designed to help users accomplish complex web tasks through collaboration. It is based on the AutoGen framework and combines a multi-agent system to provide a transparent and controllable user experience.Magentic-UI not only automates web browsing and code execution, but also manages files, making it suitable for tasks that require in-depth web navigation or data manipulation. Users can edit task schedules and monitor agent operations in real-time to ensure results meet expectations. The tool is available on GitHub under the MIT license, and developers are welcome to contribute code or make suggestions.

 

Function List

  • web automation: Support for complex web tasks such as filling forms, customizing orders, filtering flights, and more.
  • Multi-agent collaboration: Includes agents such as WebSurfer (web page manipulation), Coder (code execution), and FileSurfer (file management).
  • Mission Plan Editor: Users can create, modify, and delete task steps and participate in task planning.
  • Real-time operational feedback: Displays each step of an agent's action, such as clicking a button or entering a query.
  • Code Execution Support: Execute Python and Shell commands securely through the Docker container.
  • Document-processing capacity: Find, convert documents to Markdown format, and answer document-related questions.
  • Multi-model support: Compatible with Claude 3.7 Sonnet, Qwen 2.5 VL and other multilingual models.
  • Planned Learning Function: Save historical task plans to optimize the efficiency of future task execution.

Using Help

Installation process

Magentic-UI needs to be installed via a GitHub repository, and Docker is recommended to ensure full functionality. Here are the detailed installation steps:

  1. environmental preparation::
    • Ensure that Git and Docker are installed on your system, and if you are using Windows, enable WSL2 (Windows Subsystem for Linux).
    • Verify that Docker is running properly, command:
      docker --version
      
    • If Docker is not installed, refer to the official documentation (https://docs.docker.com/get-docker/).
  2. clone warehouse::
    • Open a terminal and run the following command to clone the Magentic-UI repository:
      git clone https://github.com/microsoft/magentic-ui.git
      cd magentic-ui
      
  3. Installation of dependencies::
    • The warehouse contains requirements.txt file, run the following command to install the Python dependency:
      pip install -r requirements.txt
      
    • If you are not using Docker, you can run a limited version (no code execution support) of the command:
      python main.py --no-docker
      
  4. Configuring Docker::
    • Pull the required Docker image and run it:
      docker-compose up -d
      
    • Ensure that the containers for WebSurfer, Coder and FileSurfer are started properly.
  5. Starting Magentic-UI::
    • Run it in the project root directory:
      python main.py
      
    • Once launched, the browser will open the Magentic-UI interface, which is located by default at http://localhost:8000The

Usage

The interface of Magentic-UI is divided into two parts: the session navigation panel on the left and the browser operation window on the right. The following is the flow of the main functions:

  • Creating a new task::
    1. Click "New Session" in the left panel and enter a task description such as "Order Pizza" or "Find Flights".
    2. Images can be uploaded to assist with task descriptions, such as screenshots of web pages.
    3. Magentic-UI generates a preliminary task plan, listing steps such as "Open website" and "Fill out form".
  • Editing the task plan::
    1. View the generated steps and click the "Edit" button to modify, add or delete steps.
    2. After confirming the plan, click the "Execute" button to start the agent operation.
    3. The user can pause, take over the operation or adjust the program at any time.
  • web automation::
    • The WebSurfer agent is responsible for web page interaction, supporting clicking buttons, entering text, uploading files, and so on.
    • Real-time display of operation details, such as "Click the 'Submit' button" or "Enter the search term 'flights'".
    • Users can check the contents of the web page through the interface to confirm that the agent is operating correctly.
  • code execution::
    • The Coder agent writes and executes Python or Shell code, suitable for data processing tasks.
    • Example: User enters "Extract data from a web page and generate charts", Coder generates code and runs it in a Docker container.
    • The results of the run are displayed in the interface for the user to view or download.
  • Document management::
    • The FileSurfer agent can find files in a project directory, convert them to Markdown format, or answer questions about the contents of a file.
    • Example: Enter "Find the contents of README.md" and FileSurfer will return a file summary.
  • Multi-model support::
    • OpenAI models are supported by default and can be found in the config.json Configure other models (e.g. Claude 3.7 Sonnet) in the
    • Configuration example:
      {
      "model": "claude-3.7-sonnet",
      "api_key": "your-api-key"
      }
      
  • Real-time monitoring and intervention::
    • The left panel displays the task status: 🔴 (user input required), ✅ (task complete), ↺ (task in progress).
    • The user can pause the agent, modify the steps or manually operate the browser at any time.

caveat

  • Docker mode provides full functionality, non-Docker mode does not support code execution and is suitable for simple web tasks.
  • Check out the TROUBLESHOOTING.md file, or submit an Issue on GitHub.
  • Ensure network stability to avoid interruption of agent operation.

application scenario

  1. Web Form Automation
    Users need to fill out complex online forms, such as visa applications or e-commerce orders. magentic-UI saves time by automatically navigating web pages and entering information.
  2. In-depth web navigation
    Find content that is not indexed by search engines, such as links to individual websites or specific flight information.The WebSurfer agent drills down into the website hierarchy for precise targeting.
  3. Data Processing and Visualization
    Users need to extract data from web pages and generate charts. the Coder agent crawls the data and writes code to generate visualizations.
  4. Content analysis of documents
    Developers need to find the contents of project files quickly. the FileSurfer agent locates the files and answers questions about them.

QA

  1. Does Magentic-UI need Docker?
    Docker is the recommended way, supporting code execution and file management. Non-Docker modes can be run, but have limited functionality.
  2. How do I add a new agent?
    exist agents catalog to add the new MCP-Agent code, update the config.json, restart the service.
  3. What language models are supported?
    Support OpenAI, Claude 3.7 Sonnet, Qwen 2.5 VL, etc. API key needs to be set in configuration file.
  4. How do you handle mission failure?
    probe TROUBLESHOOTING.mdIf it fails, submit an Issue to GitHub. If it still fails, submit an Issue on GitHub.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish