llm.pdf is an open source project that allows users to run Large Language Models (LLMs) directly in PDF files. This project, developed by EvanZhouDev and hosted on GitHub, demonstrates an innovative approach: compiling llama.cpp via Emscripten as ...
Aana SDK is an open source framework developed by Mobius Labs, named after the Malayalam word ആന (elephant). It helps developers quickly deploy and manage multimodal AI models, supporting processing of text, images, audio and video, and other data.Aana SDK is based on the Ray Distributed Computing Framework ...
BrowserAI is an open source tool that lets users run native AI models directly in the browser. It was developed by the Cloud-Code-AI team and supports language models like Llama, DeepSeek, and Kokoro. Users can complete text generation through the browser without a server or complex setup...
LitServe is an open source AI model service engine from Lightning AI, built on FastAPI and focused on rapidly deploying inference services for general-purpose AI models. It supports a wide range of scenarios from large language models (LLMs), visual models, audio models to classical machine learning models, providing batch...
Nexa AI is a platform focused on multimodal AI solutions that run locally. It offers a wide range of AI models, including Natural Language Processing (NLP), Computer Vision, Speech Recognition and Generation (ASR and TTS), all of which can be run locally on devices without relying on cloud-based services. This ...
vLLM is a high-throughput and memory-efficient reasoning and service engine designed for Large Language Modeling (LLM). Originally developed by the Sky Computing Lab at UC Berkeley, it has become a community project driven by academia and industry. vLLM aims to provide fast, easy...
Transformers.js is a JavaScript library provided by Hugging Face designed to run state-of-the-art machine learning models directly in the browser without server support. The library is compatible with Hugging Face's Python version of transformer...
Harbor is a revolutionary containerized LLM toolset focused on simplifying the deployment and management of local AI development environments. It enables developers to launch and manage all AI service components including LLM backend, API interfaces, and front-end interfaces with a single click through a clean command line interface (CLI) and companion application....
Xorbits Inference (Xinference for short) is a powerful and versatile library focused on providing distributed deployment and serving of language models, speech recognition models, and multimodal models. With Xorbits Inference, users can easily deploy and serve their own models or built-in advanced models,...
AI Dev Gallery is an AI development tools application from Microsoft (currently in public preview) designed for Windows developers. It provides a comprehensive platform to help developers easily integrate AI features into their Windows applications. The most notable feature of the tool is that it provides...
LightLLM is a Python-based Large Language Model (LLM) inference and service framework known for its lightweight design, ease of extension, and efficient performance. The framework leverages a variety of well-known open source implementations, including FasterTransformer, TGI, vLLM, and FlashAtten...
Transformers.js is a JavaScript library developed by Hugging Face to enable users to run state-of-the-art machine learning models directly in the browser without server support. The library is compatible with Hugging Face's Python trans...
GLM-Edge is a series of large language models and multimodal understanding models designed for end-side devices from Tsinghua University (Smart Spectrum Light Language). These models include GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B and GLM-Edge-V-5...
Exo is an open source project that aims to run its own AI cluster using everyday devices (e.g. iPhone, iPad, Android, Mac, Linux, etc.). Through dynamic model partitioning and automated device discovery, Exo is able to unify multiple devices into a single powerful GPU, supporting multiple models such as LLaMA, Mistral...
LocalAI is an open source local AI alternative that aims to provide API interfaces compatible with OpenAI, Claude, and others. It supports running on consumer-grade hardware, does not require a GPU, and is capable of performing a wide range of tasks such as text, audio, video, image generation, and speech cloning.LocalAI was developed by Ettore Di G...
llamafile is a tool from the Mozilla Builders project designed to simplify the deployment and operation of the Large Language Model (LLM). By combining llama.cpp with Cosmopolitan Libc, llamafile takes the complexity of LLM deployment through...
Petals is an open source project developed by the BigScience Workshop to run Large Language Models (LLMs) through a distributed computing approach. Users can run and fine-tune LLMs at home using consumer-grade GPUs or Google Colab , e.g. Llama 3 .....
The Aphrodite Engine is the official backend engine for PygmalionAI, designed to provide an inference endpoint for PygmalionAI sites and to support the rapid deployment of Hugging Face-compatible models. The engine utilizes vLLM's Paged Attention technology to enable efficient K/...
llama.cpp is a library implemented in pure C/C++ designed to simplify the inference process for Large Language Models (LLM). It supports a wide range of hardware platforms, including Apple Silicon, NVIDIA GPUs, and AMD GPUs, and provides a variety of quantization options to increase inference speed and reduce memory usage. The goal of the project is to ...