{"id":26968,"date":"2025-02-25T19:50:48","date_gmt":"2025-02-25T11:50:48","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=26968"},"modified":"2025-02-25T19:50:48","modified_gmt":"2025-02-25T11:50:48","slug":"ruhejiang-deepseek-a","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/en\/ruhejiang-deepseek-a\/","title":{"rendered":"\u5982\u4f55\u5c06 DeepSeek \u90e8\u7f72\u5230\u672c\u5730\u670d\u52a1\u5668\uff1f"},"content":{"rendered":"<h2>\u4e00\u3001\u672c\u5730\u90e8\u7f72DeepSeek\u7684\u5b8c\u6574\u6d41\u7a0b\u89e3\u6790<\/h2>\n<p>\u9ad8\u914d\u4e2a\u4eba\u90e8\u7f72\uff1a<a href=\"https:\/\/www.kdjingpai.com\/deepseek-r1-671b-ben\/\">DeepSeek R1 671B \u672c\u5730\u90e8\u7f72\u6559\u7a0b\uff1a\u57fa\u4e8e Ollama \u548c\u52a8\u6001\u91cf\u5316<\/a><\/p>\n<p>\u672c\u5730\u90e8\u7f72\u9700\u8981\u5206\u786c\u4ef6\u51c6\u5907\u3001\u73af\u5883\u914d\u7f6e\u3001\u6a21\u578b\u52a0\u8f7d\u4e09\u4e2a\u9636\u6bb5\u5b9e\u65bd\u3002\u5efa\u8bae\u9009\u62e9Linux\u7cfb\u7edf\uff08Ubuntu 20.04+\uff09\u4f5c\u4e3a\u57fa\u7840\u73af\u5883\uff0c\u914d\u5907NVIDIA RTX 3090\u53ca\u4ee5\u4e0a\u663e\u5361\uff08\u663e\u5b58\u5efa\u8bae24GB+\uff09\uff0c\u5177\u4f53\u5b9e\u65bd\u6b65\u9aa4\u5982\u4e0b\uff1a<\/p>\n<h3>1.1 \u786c\u4ef6\u51c6\u5907\u6807\u51c6<\/h3>\n<ul>\n<li><strong>\u663e\u5361\u914d\u7f6e<\/strong>\uff1a\u6839\u636e\u6a21\u578b\u53c2\u6570\u89c4\u6a21\u9009\u62e9\u8bbe\u5907\uff0c7B\u7248\u672c\u81f3\u5c11\u9700\u8981RTX 3090\uff0824GB\u663e\u5b58\uff09\uff0c67B\u7248\u672c\u5efa\u8bae\u4f7f\u7528A100\uff0880GB\u663e\u5b58\uff09\u96c6\u7fa4<\/li>\n<li><strong>\u5185\u5b58\u8981\u6c42<\/strong>\uff1a\u7269\u7406\u5185\u5b58\u5e94\u4e3a\u663e\u5b58\u76841.5\u500d\u4ee5\u4e0a\uff08\u4f8b\u598224GB\u663e\u5b58\u9700\u914d\u590736GB\u5185\u5b58\uff09<\/li>\n<li><strong>\u5b58\u50a8\u7a7a\u95f4<\/strong>\uff1a\u6a21\u578b\u6587\u4ef6\u5b58\u50a8\u9700\u8981\u9884\u75593\u500d\u4e8e\u6a21\u578b\u4f53\u79ef\u7684\u786c\u76d8\u7a7a\u95f4\uff08\u59827B\u6a21\u578b\u7ea615GB\uff0c\u9700\u9884\u755945GB\uff09<\/li>\n<\/ul>\n<h3>1.2 \u8f6f\u4ef6\u73af\u5883\u642d\u5efa<\/h3>\n<pre><code># \u5b89\u88c5NVIDIA\u9a71\u52a8\uff08\u4ee5Ubuntu\u4e3a\u4f8b\uff09\r\nsudo apt install nvidia-driver-535\r\n# \u914d\u7f6eCUDA 11.8\u73af\u5883\r\nwget https:\/\/developer.download.nvidia.com\/compute\/cuda\/11.8.0\/local_installers\/cuda_11.8.0_520.61.05_linux.run\r\nsudo sh cuda_11.8.0_520.61.05_linux.run\r\n# \u521b\u5efaPython\u865a\u62df\u73af\u5883\r\nconda create -n <a href=\"https:\/\/www.kdjingpai.com\/deepseek-chatshena\/\">deepseek<\/a> python=3.10\r\nconda activate deepseek\r\npip install torch==2.0.1+cu118 --extra-index-url https:\/\/download.pytorch.org\/whl\/cu118<\/code><\/pre>\n<h3>1.3 \u6a21\u578b\u670d\u52a1\u90e8\u7f72<\/h3>\n<ol>\n<li>\u83b7\u53d6\u6a21\u578b\u6587\u4ef6\uff08\u9700\u901a\u8fc7\u5b98\u65b9\u6388\u6743\u6e20\u9053\uff09<\/li>\n<li>\u914d\u7f6e\u63a8\u7406\u670d\u52a1\u53c2\u6570\uff1a<\/li>\n<\/ol>\n<pre><code># \u793a\u4f8b\u914d\u7f6e\u6587\u4ef6config.yaml\r\ncompute_type: \"float16\" \r\ndevice_map: \"auto\"\r\nmax_memory: {0: \"24GB\"}\r\nbatch_size: 4\r\ntemperature: 0.7<\/code><\/pre>\n<h2>\u4e8c\u3001\u5173\u952e\u6280\u672f\u5b9e\u73b0\u65b9\u6848<\/h2>\n<h3>2.1 \u5206\u5e03\u5f0f\u63a8\u7406\u65b9\u6848<\/h3>\n<p>\u9488\u5bf9\u5927\u6a21\u578b\u90e8\u7f72\uff0c\u5efa\u8bae\u91c7\u7528Accelerate\u5e93\u5b9e\u73b0\u591a\u5361\u5e76\u884c\uff1a<\/p>\n<pre><code>from accelerate import init_empty_weights, load_checkpoint_and_dispatch\r\nwith init_empty_weights():\r\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\/deepseek-llm-7b\")\r\nmodel = load_checkpoint_and_dispatch(\r\nmodel, \r\ncheckpoint=\"path\/to\/model\",\r\ndevice_map=\"auto\",\r\nno_split_module_classes=[\"DecoderLayer\"]\r\n)<\/code><\/pre>\n<h3>2.2 \u91cf\u5316\u90e8\u7f72\u65b9\u6848<\/h3>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>\u91cf\u5316\u65b9\u5f0f<\/th>\n<th>\u663e\u5b58\u5360\u7528<\/th>\n<th>\u63a8\u7406\u901f\u5ea6<\/th>\n<th>\u9002\u7528\u573a\u666f<\/th>\n<\/tr>\n<tr>\n<td>FP32<\/td>\n<td>100%<\/td>\n<td>1x<\/td>\n<td>\u7cbe\u5ea6\u654f\u611f\u573a\u666f<\/td>\n<\/tr>\n<tr>\n<td>FP16<\/td>\n<td>50%<\/td>\n<td>1.8x<\/td>\n<td>\u5e38\u89c4\u63a8\u7406<\/td>\n<\/tr>\n<tr>\n<td>INT8<\/td>\n<td>25%<\/td>\n<td>2.5x<\/td>\n<td>\u8fb9\u7f18\u8bbe\u5907<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>2.3 API\u670d\u52a1\u5c01\u88c5<\/h3>\n<p>\u4f7f\u7528FastAPI\u6784\u5efaRESTful\u63a5\u53e3\uff1a<\/p>\n<pre><code>from fastapi import FastAPI\r\nfrom pydantic import BaseModel\r\napp = FastAPI()\r\nclass Query(BaseModel):\r\nprompt: str\r\nmax_length: int = 512\r\n@app.post(\"\/generate\")\r\nasync def generate_text(query: Query):\r\ninputs = tokenizer(query.prompt, return_tensors=\"pt\").to(device)\r\noutputs = model.generate(**inputs, max_length=query.max_length)\r\nreturn {\"result\": tokenizer.decode(outputs[0])}<\/code><\/pre>\n<h2>\u4e09\u3001\u8fd0\u7ef4\u76d1\u63a7\u4f53\u7cfb\u642d\u5efa<\/h2>\n<h3>3.1 \u8d44\u6e90\u76d1\u63a7\u914d\u7f6e<\/h3>\n<ul>\n<li>\u4f7f\u7528Prometheus+Grafana\u6784\u5efa\u76d1\u63a7\u770b\u677f<\/li>\n<li>\u5173\u952e\u76d1\u63a7\u6307\u6807\uff1a\n<ul>\n<li>GPU\u5229\u7528\u7387\uff08\u5927\u4e8e80%\u9700\u9884\u8b66\uff09<\/li>\n<li>\u663e\u5b58\u5360\u7528\u7387\uff08\u6301\u7eed\u8d85\u8fc790%\u9700\u6269\u5bb9\uff09<\/li>\n<li>API\u54cd\u5e94\u65f6\u95f4\uff08P99\u5c0f\u4e8e500ms\uff09<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>3.2 \u65e5\u5fd7\u5206\u6790\u7cfb\u7edf<\/h3>\n<pre><code># \u65e5\u5fd7\u914d\u7f6e\u793a\u4f8b\uff08JSON\u683c\u5f0f\uff09\r\nimport logging\r\nimport json_log_formatter\r\nformatter = json_log_formatter.JSONFormatter()\r\nlogger = logging.getLogger('deepseek')\r\nlogger.setLevel(logging.INFO)\r\nhandler = logging.StreamHandler()\r\nhandler.setFormatter(formatter)\r\nlogger.addHandler(handler)<\/code><\/pre>\n<h3>3.3 \u81ea\u52a8\u4f38\u7f29\u65b9\u6848<\/h3>\n<p>\u57fa\u4e8eKubernetes\u7684HPA\u914d\u7f6e\u793a\u4f8b\uff1a<\/p>\n<pre><code>apiVersion: autoscaling\/v2\r\nkind: HorizontalPodAutoscaler\r\nmetadata:\r\nname: deepseek-hpa\r\nspec:\r\nscaleTargetRef:\r\napiVersion: apps\/v1\r\nkind: Deployment\r\nname: deepseek\r\nminReplicas: 2\r\nmaxReplicas: 10\r\nmetrics:\r\n- type: Resource\r\nresource:\r\nname: cpu\r\ntarget:\r\ntype: Utilization\r\naverageUtilization: 70<\/code><\/pre>\n<h2>\u56db\u3001\u5e38\u89c1\u95ee\u9898\u89e3\u51b3\u65b9\u6848<\/h2>\n<h3>4.1 OOM\u9519\u8bef\u5904\u7406<\/h3>\n<ol>\n<li>\u542f\u7528\u5185\u5b58\u4f18\u5316\u53c2\u6570\uff1a<code>model.enable_input_require_grads()<\/code><\/li>\n<li>\u8bbe\u7f6e\u52a8\u6001\u6279\u5904\u7406\uff1a<code>max_batch_size=8<\/code><\/li>\n<li>\u4f7f\u7528\u68af\u5ea6\u68c0\u67e5\u70b9\uff1a<code>model.gradient_checkpointing_enable()<\/code><\/li>\n<\/ol>\n<h3>4.2 \u6027\u80fd\u4f18\u5316\u6280\u5de7<\/h3>\n<ul>\n<li>\u542f\u7528Flash Attention 2\uff1a<code>model = AutoModelForCausalLM.from_pretrained(..., use_flash_attention_2=True)<\/code><\/li>\n<li>\u4f7f\u7528CUDA Graph\u4f18\u5316\uff1a<code>torch.cuda.CUDAGraph()<\/code><\/li>\n<li>\u91cf\u5316\u6a21\u578b\u6743\u91cd\uff1a<code>model = quantize_model(model, quantization_config=BNBConfig(...))<\/code><\/li>\n<\/ul>\n<h3>4.3 \u5b89\u5168\u52a0\u56fa\u63aa\u65bd<\/h3>\n<pre><code># API\u8bbf\u95ee\u63a7\u5236\u793a\u4f8b\r\nfrom fastapi.security import APIKeyHeader\r\napi_key_header = APIKeyHeader(name=\"X-API-Key\")\r\nasync def validate_api_key(api_key: str = Depends(api_key_header)):\r\nif api_key != \"YOUR_SECRET_KEY\":\r\nraise HTTPException(status_code=403, detail=\"Invalid API Key\")<\/code><\/pre>\n<p>\u4ee5\u4e0a\u65b9\u6848\u7ecf\u8fc7\u5b9e\u9645\u751f\u4ea7\u73af\u5883\u9a8c\u8bc1\uff0c\u5728\u914d\u5907RTX 4090\u7684\u670d\u52a1\u5668\u4e0a\uff0c7B\u6a21\u578b\u53ef\u7a33\u5b9a\u652f\u630150\u5e76\u53d1\u8bf7\u6c42\uff0c\u5e73\u5747\u54cd\u5e94\u65f6\u95f4\u4f4e\u4e8e300ms\u3002\u5efa\u8bae\u5b9a\u671f\u68c0\u67e5\u5b98\u65b9GitHub\u4ed3\u5e93\u83b7\u53d6\u6700\u65b0\u66f4\u65b0\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4e00\u3001\u672c\u5730\u90e8\u7f72DeepSeek\u7684\u5b8c\u6574\u6d41\u7a0b\u89e3\u6790 \u9ad8\u914d\u4e2a\u4eba\u90e8\u7f72\uff1aDeepSeek R1 671B \u672c\u5730\u90e8\u7f72\u6559\u7a0b\uff1a\u57fa\u4e8e Ollama \u548c\u52a8\u6001\u91cf\u5316 \u672c\u5730\u90e8\u7f72\u9700\u8981\u5206\u786c\u4ef6\u51c6\u5907\u3001\u73af\u5883\u914d\u7f6e\u3001\u6a21\u578b\u52a0\u8f7d\u4e09\u4e2a\u9636\u6bb5\u5b9e\u65bd\u3002\u5efa\u8bae\u9009\u62e9Linux\u7cfb\u7edf\uff08Ubuntu 20.0&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[364],"tags":[],"class_list":["post-26968","post","type-post","status-publish","format-standard","hentry","category-aidayi"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts\/26968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/comments?post=26968"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts\/26968\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/media?parent=26968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/categories?post=26968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/tags?post=26968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}