{"id":21423,"date":"2024-12-17T15:26:21","date_gmt":"2024-12-17T07:26:21","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=21423"},"modified":"2025-02-17T15:28:52","modified_gmt":"2025-02-17T07:28:52","slug":"lightllm","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/pt\/lightllm\/","title":{"rendered":"LightLLM\uff1a\u9ad8\u6548\u7684\u8f7b\u91cf\u7ea7\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u548c\u670d\u52a1\u6846\u67b6"},"content":{"rendered":"<p><a href=\"https:\/\/www.kdjingpai.com\/ja\/litellm\/\">LightLLM<\/a> \u662f\u4e00\u4e2a\u57fa\u4e8e Python \u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u7406\u548c\u670d\u52a1\u6846\u67b6\uff0c\u4ee5\u5176\u8f7b\u91cf\u7ea7\u8bbe\u8ba1\u3001\u6613\u4e8e\u6269\u5c55\u548c\u9ad8\u6548\u6027\u80fd\u800c\u8457\u79f0\u3002\u8be5\u6846\u67b6\u5229\u7528\u4e86\u591a\u79cd\u77e5\u540d\u7684\u5f00\u6e90\u5b9e\u73b0\uff0c\u5305\u62ec FasterTransformer\u3001TGI\u3001vLLM \u548c FlashAttention \u7b49\u3002LightLLM \u901a\u8fc7\u5f02\u6b65\u534f\u4f5c\u3001\u52a8\u6001\u6279\u5904\u7406\u548c\u5f20\u91cf\u5e76\u884c\u7b49\u6280\u672f\uff0c\u663e\u8457\u63d0\u9ad8\u4e86 GPU \u5229\u7528\u7387\u548c\u63a8\u7406\u901f\u5ea6\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u6a21\u578b\u548c\u5e94\u7528\u573a\u666f\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-21424\" title=\"LightLLM\uff1a\u9ad8\u6548\u7684\u8f7b\u91cf\u7ea7\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u548c\u670d\u52a1\u6846\u67b6-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/58bee253dc22cef.png\" alt=\"LightLLM\uff1a\u9ad8\u6548\u7684\u8f7b\u91cf\u7ea7\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u548c\u670d\u52a1\u6846\u67b6-1\" width=\"935\" height=\"428\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/58bee253dc22cef.png 935w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/58bee253dc22cef-768x352.png 768w\" sizes=\"auto, (max-width: 935px) 100vw, 935px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li>\u5f02\u6b65\u534f\u4f5c\uff1a\u652f\u6301\u5f02\u6b65\u7684\u5206\u8bcd\u3001\u6a21\u578b\u63a8\u7406\u548c\u53bb\u5206\u8bcd\u64cd\u4f5c\uff0c\u63d0\u9ad8 GPU \u5229\u7528\u7387\u3002<\/li>\n<li>\u65e0\u586b\u5145\u6ce8\u610f\u529b\uff1a\u652f\u6301\u591a\u79cd\u6a21\u578b\u7684\u65e0\u586b\u5145\u6ce8\u610f\u529b\u64cd\u4f5c\uff0c\u5904\u7406\u957f\u5ea6\u5dee\u5f02\u8f83\u5927\u7684\u8bf7\u6c42\u3002<\/li>\n<li>\u52a8\u6001\u6279\u5904\u7406\uff1a\u652f\u6301\u8bf7\u6c42\u7684\u52a8\u6001\u6279\u5904\u7406\u8c03\u5ea6\u3002<\/li>\n<li>FlashAttention\uff1a\u901a\u8fc7 FlashAttention \u63d0\u9ad8\u901f\u5ea6\u5e76\u51cf\u5c11 GPU \u5185\u5b58\u5360\u7528\u3002<\/li>\n<li>\u5f20\u91cf\u5e76\u884c\uff1a\u5728\u591a\u4e2a GPU \u4e0a\u5229\u7528\u5f20\u91cf\u5e76\u884c\u52a0\u901f\u63a8\u7406\u3002<\/li>\n<li><a href=\"https:\/\/www.kdjingpai.com\/ja\/tokenization\/\">Token<\/a> Attention\uff1a\u5b9e\u73b0\u4e86\u57fa\u4e8e token \u7684 KV \u7f13\u5b58\u5185\u5b58\u7ba1\u7406\u673a\u5236\uff0c\u96f6\u5185\u5b58\u6d6a\u8d39\u3002<\/li>\n<li>\u9ad8\u6027\u80fd\u8def\u7531\u5668\uff1a\u4e0e Token Attention \u534f\u4f5c\uff0c\u4f18\u5316\u7cfb\u7edf\u541e\u5410\u91cf\u3002<\/li>\n<li>Int8KV \u7f13\u5b58\uff1a\u589e\u52a0 token \u5bb9\u91cf\uff0c\u51e0\u4e4e\u7ffb\u500d\u3002<\/li>\n<li>\u652f\u6301\u591a\u79cd\u6a21\u578b\uff1a\u5305\u62ec BLOOM\u3001LLaMA\u3001StarCoder\u3001ChatGLM2 \u7b49\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<h3>\u5b89\u88c5\u6d41\u7a0b<\/h3>\n<ol>\n<li>\u4f7f\u7528 Docker \u5b89\u88c5 LightLLM\uff1a<\/li>\n<\/ol>\n<pre><code>   docker pull modeltc\/lightllm\r\ndocker run -it --rm modeltc\/lightllm\r\n<\/code><\/pre>\n<ol start=\"2\">\n<li>\u5b89\u88c5\u4f9d\u8d56\uff1a<\/li>\n<\/ol>\n<pre><code>   pip install -r requirements.txt\r\n<\/code><\/pre>\n<h3>\u4f7f\u7528\u65b9\u6cd5<\/h3>\n<ol>\n<li>\u542f\u52a8 LightLLM \u670d\u52a1\uff1a<\/li>\n<\/ol>\n<pre><code>   python -m lightllm.server\r\n<\/code><\/pre>\n<ol start=\"2\">\n<li>\u67e5\u8be2\u6a21\u578b\uff08\u63a7\u5236\u53f0\u793a\u4f8b\uff09\uff1a<\/li>\n<\/ol>\n<pre><code>   python -m lightllm.client --model llama --text \"\u4f60\u597d\uff0c\u4e16\u754c\uff01\"\r\n<\/code><\/pre>\n<ol start=\"3\">\n<li>\u67e5\u8be2\u6a21\u578b\uff08Python \u793a\u4f8b\uff09\uff1a<\/li>\n<\/ol>\n<pre><code>   from lightllm import Client\r\nclient = Client(model=\"llama\")\r\nresponse = client.query(\"\u4f60\u597d\uff0c\u4e16\u754c\uff01\")\r\nprint(response)\r\n<\/code><\/pre>\n<h3>\u4e3b\u8981\u529f\u80fd\u64cd\u4f5c\u6d41\u7a0b<\/h3>\n<ol>\n<li><strong>\u5f02\u6b65\u534f\u4f5c<\/strong>\uff1aLightLLM \u901a\u8fc7\u5f02\u6b65\u6267\u884c\u5206\u8bcd\u3001\u6a21\u578b\u63a8\u7406\u548c\u53bb\u5206\u8bcd\u64cd\u4f5c\uff0c\u663e\u8457\u63d0\u9ad8\u4e86 GPU \u7684\u5229\u7528\u7387\u3002\u7528\u6237\u53ea\u9700\u542f\u52a8\u670d\u52a1\uff0c\u7cfb\u7edf\u4f1a\u81ea\u52a8\u5904\u7406\u8fd9\u4e9b\u64cd\u4f5c\u3002<\/li>\n<li><strong>\u65e0\u586b\u5145\u6ce8\u610f\u529b<\/strong>\uff1a\u5728\u5904\u7406\u957f\u5ea6\u5dee\u5f02\u8f83\u5927\u7684\u8bf7\u6c42\u65f6\uff0cLightLLM \u652f\u6301\u65e0\u586b\u5145\u6ce8\u610f\u529b\u64cd\u4f5c\uff0c\u786e\u4fdd\u9ad8\u6548\u5904\u7406\u3002\u7528\u6237\u65e0\u9700\u989d\u5916\u914d\u7f6e\uff0c\u7cfb\u7edf\u4f1a\u81ea\u52a8\u4f18\u5316\u3002<\/li>\n<li><strong>\u52a8\u6001\u6279\u5904\u7406<\/strong>\uff1aLightLLM \u652f\u6301\u52a8\u6001\u6279\u5904\u7406\u8c03\u5ea6\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u914d\u7f6e\u6587\u4ef6\u8bbe\u7f6e\u6279\u5904\u7406\u53c2\u6570\uff0c\u7cfb\u7edf\u4f1a\u6839\u636e\u8bf7\u6c42\u52a8\u6001\u8c03\u6574\u6279\u5904\u7406\u7b56\u7565\u3002<\/li>\n<li><strong>FlashAttention<\/strong>\uff1a\u901a\u8fc7\u96c6\u6210 FlashAttention \u6280\u672f\uff0cLightLLM \u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\u5e76\u51cf\u5c11\u4e86 GPU \u5185\u5b58\u5360\u7528\u3002\u7528\u6237\u53ef\u4ee5\u5728\u914d\u7f6e\u6587\u4ef6\u4e2d\u542f\u7528\u6b64\u529f\u80fd\u3002<\/li>\n<li><strong>\u5f20\u91cf\u5e76\u884c<\/strong>\uff1aLightLLM \u652f\u6301\u5728\u591a\u4e2a GPU \u4e0a\u8fdb\u884c\u5f20\u91cf\u5e76\u884c\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u914d\u7f6e\u6587\u4ef6\u8bbe\u7f6e GPU \u6570\u91cf\u548c\u5e76\u884c\u53c2\u6570\uff0c\u7cfb\u7edf\u4f1a\u81ea\u52a8\u5206\u914d\u4efb\u52a1\u3002<\/li>\n<li><strong>Token Attention<\/strong>\uff1aLightLLM \u5b9e\u73b0\u4e86\u57fa\u4e8e token \u7684 KV \u7f13\u5b58\u5185\u5b58\u7ba1\u7406\u673a\u5236\uff0c\u786e\u4fdd\u96f6\u5185\u5b58\u6d6a\u8d39\u3002\u7528\u6237\u65e0\u9700\u989d\u5916\u914d\u7f6e\uff0c\u7cfb\u7edf\u4f1a\u81ea\u52a8\u7ba1\u7406\u5185\u5b58\u3002<\/li>\n<li><strong>\u9ad8\u6027\u80fd\u8def\u7531\u5668<\/strong>\uff1aLightLLM \u7684\u9ad8\u6027\u80fd\u8def\u7531\u5668\u4e0e Token Attention \u534f\u4f5c\uff0c\u4f18\u5316\u7cfb\u7edf\u541e\u5410\u91cf\u3002\u7528\u6237\u53ef\u4ee5\u5728\u914d\u7f6e\u6587\u4ef6\u4e2d\u8bbe\u7f6e\u8def\u7531\u53c2\u6570\uff0c\u7cfb\u7edf\u4f1a\u81ea\u52a8\u4f18\u5316\u8def\u7531\u7b56\u7565\u3002<\/li>\n<li><strong>Int8KV \u7f13\u5b58<\/strong>\uff1aLightLLM \u652f\u6301 Int8KV \u7f13\u5b58\uff0c\u589e\u52a0 token \u5bb9\u91cf\uff0c\u51e0\u4e4e\u7ffb\u500d\u3002\u7528\u6237\u53ef\u4ee5\u5728\u914d\u7f6e\u6587\u4ef6\u4e2d\u542f\u7528\u6b64\u529f\u80fd\uff0c\u7cfb\u7edf\u4f1a\u81ea\u52a8\u8c03\u6574\u7f13\u5b58\u7b56\u7565\u3002<\/li>\n<\/ol>\n<h3>\u652f\u6301\u7684\u6a21\u578b<\/h3>\n<p>LightLLM \u652f\u6301\u591a\u79cd\u6a21\u578b\uff0c\u5305\u62ec\u4f46\u4e0d\u9650\u4e8e\uff1a<\/p>\n<ul>\n<li>BLOOM<\/li>\n<li>LLaMA<\/li>\n<li>StarCoder<\/li>\n<li>ChatGLM2<\/li>\n<li>InternLM<\/li>\n<li>Qwen-VL<\/li>\n<li>Llava<\/li>\n<li>Stablelm<\/li>\n<li>MiniCPM<\/li>\n<li>Phi-3<\/li>\n<li>CohereForAI<\/li>\n<li>DeepSeek-V2<\/li>\n<\/ul>\n<p>\u7528\u6237\u53ef\u4ee5\u6839\u636e\u9700\u6c42\u9009\u62e9\u5408\u9002\u7684\u6a21\u578b\uff0c\u5e76\u5728\u914d\u7f6e\u6587\u4ef6\u4e2d\u8fdb\u884c\u76f8\u5e94\u8bbe\u7f6e\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>LightLLM \u662f\u4e00\u4e2a\u57fa\u4e8e Python \u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u7406\u548c\u670d\u52a1\u6846\u67b6\uff0c\u4ee5\u5176\u8f7b\u91cf\u7ea7\u8bbe\u8ba1\u3001\u6613\u4e8e\u6269\u5c55\u548c\u9ad8\u6548\u6027\u80fd\u800c\u8457\u79f0\u3002\u8be5\u6846\u67b6\u5229\u7528\u4e86\u591a\u79cd\u77e5\u540d\u7684\u5f00\u6e90\u5b9e\u73b0\uff0c\u5305\u62ec FasterTransformer\u3001TGI\u3001vLLM \u548c FlashAtten&#8230;<\/p>\n","protected":false},"author":1,"featured_media":61442,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[232],"class_list":["post-21423","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tool","tag-bendebushukaiyuanba"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/posts\/21423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/comments?post=21423"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/posts\/21423\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/media\/61442"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/media?parent=21423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/categories?post=21423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/tags?post=21423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}