GenAI Processors is an open source Python library developed by Google DeepMind that focuses on efficient parallel processing of multimodal content. It is based on Python's asyncio framework and provides a modular, reusable processor interface to simplify the development of complex AI applications. Users can use this library to process text, audio, video and other data streams, and seamlessly integrate with the Gemini API. It supports real-time stream processing and turn-based interactions, making it suitable for building AI applications that require fast response times. The code is hosted on GitHub and the community can contribute processor modules to extend the functionality. The project is licensed under the Apache 2.0 license, making it suitable for developers to rapidly build AI applications that are available in production environments.
Function List
- asynchronous parallel processing: Based on Python asyncio, it supports efficient handling of I/O and compute-intensive tasks.
- Modular Processor Design: Provides Processor and PartProcessor units that can combine or parallelize complex data streams.
- Gemini API Integration: Built-in GenaiModel and LiveProcessor to support turn-based and real-time streaming interactions.
- multimodal flow processing: Supports splitting, merging and processing of text, audio, video and other data streams.
- Real-time interactive support: Processes real-time audio and video streams through LiveProcessor, ideal for building real-time AI agents.
- Community Contribution Extension: Enhancements to support users adding custom processors to the contrib/ directory.
- tool integration: Built-in tools such as Google Search enhance the AI agent's contextualization capabilities.
Using Help
Installation process
GenAI Processors requires Python 3.10 or higher. Here are the detailed installation steps:
- Setting up the environment::
- Make sure Python 3.10+ is installed on your system.
- Install Git in order to clone the code repository.
sudo apt update && sudo apt install python3.10 git
- clone warehouse::
- Clone the GenAI Processors repository from GitHub.
git clone https://github.com/google-gemini/genai-processors cd genai-processors
- Installation of dependencies::
- Use pip to install the required dependencies, including pyaudio, google-genai and termcolor.
pip install --upgrade pyaudio genai-processors google-genai termcolor
- Configuring API Keys::
- Get the API key for Google AI Studio.
- Setting environment variables
GOOGLE_API_KEY
cap (a poem)GOOGLE_PROJECT_ID
The
export GOOGLE_API_KEY="你的API密钥" export GOOGLE_PROJECT_ID="你的项目ID"
Usage
At the heart of GenAI Processors is the Processor module, which is used to process input and output streams. Below is a detailed flow of the main functions:
1. Creating a simple text processor
- functionality: Processes text input and outputs results.
- workflow::
- Import the module and create an input stream.
- utilization
stream_content
Converts text to a ProcessorPart stream. - Apply the processor and iterate through the output.
from genai_processors import content_api, streams input_parts = ["Hello", content_api.ProcessorPart("World")] input_stream = streams.stream_content(input_parts) async for part in simple_text_processor(input_stream): print(part.text)
- effect: Processes and prints input text part by part, suitable for simple text tasks.
2. Construction of real-time audio and video proxies
- functionality: Processes real-time audio and video streams through LiveProcessor.
- workflow::
- Initialize an audio input device (such as PyAudio).
- Configure the video input (e.g. camera or screen stream).
- Use LiveProcessor to call the Gemini Live API.
- Combined input, processing and output modules.
from genai_processors.core import audio_io, live_model, video import pyaudio pya = pyaudio.PyAudio() input_processor = video.VideoIn() + audio_io.PyAudioIn(pya, use_pcm_mimetype=True) live_processor = live_model.LiveProcessor(api_key="你的API密钥", model_name="gemini-2.5-flash-preview-native-audio-dialog") play_output = audio_io.PyAudioOut(pya) live_agent = input_processor + live_processor + play_output async for part in live_agent(text.terminal_input()): print(part)
- effect: Realizes microphone and camera input, and outputs audio after processing via the Gemini API, suitable for real-time conversational agents.
3. Research theme generation
- functionality: Generate research topics based on user input.
- workflow::
- utilization
topic_generator.py
Example, configuring GenaiModel. - Set model parameters such as the number of topics and output format.
- Enter a research query to get a list of topics in JSON format.
from genai_processors.examples.research.processors import topic_generator processor = topic_generator.TopicGenerator(api_key="你的API密钥") async for part in processor(["研究AI在医疗领域的应用"]): print(part.text)
- utilization
- effect: Generate a specified number of research topics and their relationship to the input, suitable for academic research scenarios.
4. Customized processors
- functionality: Create custom processors to handle specific tasks.
- workflow::
- consultation
create_your_own_processor.ipynb
Notebook. - Define the Processor class, inheriting from
processor.Processor
The - realization
call
method handles the input stream. - Add custom processors to the pipeline.
- consultation
- effect: Users can extend the functionality as needed, such as handling specific file formats or integrating with other APIs.
running example
- Real-time CLI Example::
- (of a computer) run
realtime_simple_cli.py
Create an audio dialog agent.
python3 examples/realtime_simple_cli.py
- Input voice, the system converts the voice to text, processes it and outputs a voice response.
- (of a computer) run
- Travel Program CLI::
- (of a computer) run
trip_request_cli.py
Generate a travel plan.
python3 examples/trip_request_cli.py
- Enter your destination and dates for a detailed plan.
- (of a computer) run
caveat
- Ensure that the API key is valid to avoid request failures.
- You can set the
--debug=True
View Log. - Real-time processing requires stable network and hardware support.
application scenario
- Real-Time Dialog Agent
- descriptive: Develop voice- or video-driven AI assistants that process real-time user input, suitable for customer service or virtual assistants.
- Academic research support
- descriptive: Generate research topics or analyze literature for students and researchers to quickly organize their thoughts.
- Multimodal Content Processing
- descriptive: Process audio and video streams to generate subtitles or real-time narration, suitable for live broadcasting or video analysis.
- Automated workflows
- descriptive: Build automated processing pipelines to handle batch data, suitable for enterprise data processing.
QA
- What pre-requisites are required?
- Requires Python 3.10+, installation of pyaudio and google-genai libraries, setup of Google API key.
- How do I debug the processing flow?
- When running the script add the
--debug=True
, view the log output and check the input and output streams.
- When running the script add the
- What data types are supported?
- Supports text, audio, video and customized data streams that can be processed by ProcessorPart.
- How do I contribute code?
- consultation
CONTRIBUTING.md
, submit a custom processor in the contrib/ directory.
- consultation