One Balance implements a granular model-level flow limiting management system, which is a core advantage that differentiates it from conventional API management tools. When a specific model (e.g. Google Gemini Pro) is detected to have reached its quota limit, the system will automatically mark the model as 'cool' and switch to other available models or keys to continue the service.
The system utilizes two-tier level quota monitoring:
- Minute-by-minute quotas: monitor the frequency of API calls over a short period of time
- Day quotas: tracking total usage over a 24-hour cycle
Based on the state storage mechanism of D1 database, One Balance can accurately record the usage of each key. When the quota limit is triggered, the system will automatically calculate a reasonable cooling-off time (e.g. 24 hours after the day-level quota is exhausted), during which there is no need for manual intervention at all.
This answer comes from the articleOne Balance: a load balancing tool for intelligently managing AI API keys via Cloudflare AI GatewayThe