Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Project developed by Jaesung Huh , hosted on GitHub , aims to provide a simple and efficient subtitle generation solution . Tools through the audio processing technology .....
arXiv Summarizer is an open source Python scripting tool, hosted on GitHub, designed to help users quickly access and generate summaries of academic papers from the arXiv platform. It utilizes the free Gemini API for efficient text summarization and is suitable for researchers, students and academic...
Sim Studio is an open source AI agent workflow building platform focused on helping users quickly design, test, and deploy large-scale language model (LLM) workflows through a lightweight, intuitive visual interface. Users can create complex multi-agent applications with drag-and-drop without deep programming. It supports this ...
Hula is an AI-powered creative tool designed to transform user selfies into viral videos, multi-style images and personalized sticker packs with a simple one-click operation. Developer Prequel Inc. built the app to support iOS and Android platforms for avid social...
AIstudioProxyAPI is an open source project that uses Node.js and Playwright technology to convert the Gemini model dialog functionality of the Google AI Studio web version into a standard API connection by emulating the OpenAI API ...
Step1X-Edit is an open source image editing framework developed by the Stepfun AI team and hosted on GitHub It combines a multimodal large language model (Qwen-VL) and a diffusion transformer (DiT) to allow users to edit an image with simple natural language commands, such as changing the background, removing an object, or transforming the wind ....
Klavis AI is an open source platform focused on simplifying the use and integration of the Model Context Protocol (MCP), an open standard that allows AI applications to dynamically connect with external tools and data sources.Klavis AI offers Slack, Discord clients, hosted MCP servers, and simplicity...
MiMo is an open source large language modeling project developed by Xiaomi, focusing on mathematical reasoning and code generation. The core product is the MiMo-7B family of models, which consists of a base model (Base), a supervised fine-tuning model (SFT), a reinforcement learning model trained from the base model (RL-Zero), and a SFT model trained from...
Muyan-TTS is an open source text-to-speech (TTS) model designed for podcasting scenarios. It is pre-trained with over 100,000 hours of podcast audio data and supports zero-sample speech synthesis to generate high-quality natural speech. The model is built on Llama-3.2-3B, and combined with the SoVITS decoder, it provides high...
CAD-MCP is an open source project that allows users to control CAD software drawing operations through natural language commands. It combines natural language processing and CAD automation technology , so that users do not need to manually operate the CAD interface , just enter simple text commands to create and modify the drawing . The project supports a variety of ...
manga-image-translator (Cotrans Translator open source version), used to translate the text in the comic or picture . Provide command line interaction and online demo , with batch conversion mode , web server mode and other diverse options for use . Multiple language target translation and recognition parameters can be set , .....
GraphGen is an open-source framework developed by OpenScienceLab, an AI lab in Shanghai, hosted on GitHub, focused on optimizing supervised fine-tuning of Large Language Models (LLMs) by guiding synthetic data generation through knowledge graphs. It constructs fine-grained knowledge graphs from source text, utilizing the expected calibration error...
ACI.dev is an open source infrastructure platform designed to provide AI intelligences with rapid integration to over 600 tools. It ensures that intelligences have secure access to tools such as Google Calendar, Slack, and Brave Search through multi-tenant authentication and fine-grained permissions management. developers can...
llm.pdf is an open source project that allows users to run Large Language Models (LLMs) directly in PDF files. This project, developed by EvanZhouDev and hosted on GitHub, demonstrates an innovative approach: compiling llama.cpp via Emscripten as ...
Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural and smooth speech, and also supports synchronized subtitle generation, making it suitable for audiobooks, video dubbing or learning aids. Users can choose...
Local Deep Research is an open source AI research assistant designed to help users conduct deep research and generate detailed reports for complex problems. It supports local running, allowing users to accomplish research tasks without relying on cloud services. The tool combines Local Large Language Modeling (LLM)...
DeepWiki is a free tool from Cognition AI focused on generating structured, Wikipedia-like documentation for GitHub repositories. It helps developers quickly understand complex code by analyzing code, README files, and configuration files to automatically create detailed documentation and interactive diagrams...
Trackers is an open source Python tool library focused on multi-object tracking in video. It integrates several leading tracking algorithms, such as SORT and DeepSORT, allowing users to combine different object detection models (e.g., YOLO, RT-DETR) for flexible video analysis. Users can easily...
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...