{"id":16664,"date":"2024-12-29T19:34:17","date_gmt":"2024-12-29T11:34:17","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=16664"},"modified":"2024-12-29T19:34:31","modified_gmt":"2024-12-29T11:34:31","slug":"betterwhisperx","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/de\/betterwhisperx\/","title":{"rendered":"BetterWhisperX\uff1a\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u4e0e\u8bf4\u8bdd\u4eba\u5206\u79bb\uff0c\u63d0\u4f9b\u9ad8\u7cbe\u5ea6\u5355\u8bcd\u7ea7\u65f6\u95f4\u6233"},"content":{"rendered":"<p>BetterWhisperX \u662f\u4e00\u4e2a\u57fa\u4e8e WhisperX \u9879\u76ee\u7684\u4f18\u5316\u7248\u672c\uff0c\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u9ad8\u6548\u3001\u51c6\u786e\u7684\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b(ASR)\u670d\u52a1\u3002\u4f5c\u4e3a WhisperX \u7684\u6539\u8fdb\u5206\u652f\uff0c\u8be5\u9879\u76ee\u7531 Federico Torrielli \u7ef4\u62a4\uff0c\u81f4\u529b\u4e8e\u4fdd\u6301\u9879\u76ee\u7684\u6301\u7eed\u66f4\u65b0\u548c\u6027\u80fd\u63d0\u5347\u3002BetterWhisperX \u96c6\u6210\u4e86\u591a\u9879\u5148\u8fdb\u6280\u672f\uff0c\u5305\u62ec\u97f3\u7d20\u7ea7\u522b\u7684\u5f3a\u5236\u5bf9\u9f50\u3001\u57fa\u4e8e\u8bed\u97f3\u6d3b\u52a8\u7684\u6279\u5904\u7406\uff0c\u4ee5\u53ca\u8bf4\u8bdd\u4eba\u5206\u79bb\u529f\u80fd\u3002\u8be5\u5de5\u5177\u4e0d\u4ec5\u652f\u6301\u9ad8\u901f\u8f6c\u5f55(\u4f7f\u7528 large-v2 \u6a21\u578b\u65f6\u53ef\u8fbe\u5230 70 \u500d\u5b9e\u65f6\u901f\u5ea6)\uff0c\u8fd8\u80fd\u63d0\u4f9b\u7cbe\u786e\u7684\u8bcd\u7ea7\u65f6\u95f4\u6233\u548c\u591a\u8bf4\u8bdd\u4eba\u8bc6\u522b\u529f\u80fd\u3002\u7cfb\u7edf\u91c7\u7528 faster-whisper \u4f5c\u4e3a\u540e\u7aef\uff0c\u5373\u4f7f\u5904\u7406\u5927\u578b\u6a21\u578b\u4e5f\u53ea\u9700\u8f83\u5c11\u7684 GPU \u5185\u5b58\uff0c\u5177\u6709\u6781\u9ad8\u7684\u6027\u80fd\u6548\u7387\u6bd4\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16665\" title=\"BetterWhisperX\uff1a\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u4e0e\u8bf4\u8bdd\u4eba\u5206\u79bb\uff0c\u63d0\u4f9b\u9ad8\u7cbe\u5ea6\u5355\u8bcd\u7ea7\u65f6\u95f4\u6233-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/12\/896d73a13f80562.png\" alt=\"BetterWhisperX\uff1a\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u4e0e\u8bf4\u8bdd\u4eba\u5206\u79bb\uff0c\u63d0\u4f9b\u9ad8\u7cbe\u5ea6\u5355\u8bcd\u7ea7\u65f6\u95f4\u6233-1\" width=\"482\" height=\"283\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/12\/896d73a13f80562.png 482w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/12\/896d73a13f80562-300x176.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/12\/896d73a13f80562-18x12.png 18w\" sizes=\"auto, (max-width: 482px) 100vw, 482px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li><strong>\u5feb\u901f\u8bed\u97f3\u8f6c\u6587\u5b57<\/strong>\uff1a\u4f7f\u7528\u5927\u578b\u6a21\u578b large-v2\uff0c\u652f\u6301 70 \u500d\u5b9e\u65f6\u8f6c\u5f55\u3002<\/li>\n<li><strong>\u5355\u8bcd\u7ea7\u65f6\u95f4\u6233<\/strong>\uff1a\u901a\u8fc7 wav2vec2 \u5bf9\u9f50\u6280\u672f\uff0c\u63d0\u4f9b\u7cbe\u786e\u7684\u5355\u8bcd\u7ea7\u65f6\u95f4\u6233\u3002<\/li>\n<li><strong>\u591a\u8bf4\u8bdd\u4eba\u8bc6\u522b<\/strong>\uff1a\u5229\u7528 pyannote-audio \u8fdb\u884c\u8bf4\u8bdd\u4eba\u5206\u79bb\u548c\u6807\u7b7e\u3002<\/li>\n<li><strong>\u8bed\u97f3\u6d3b\u52a8\u68c0\u6d4b<\/strong>\uff1a\u51cf\u5c11\u8bef\u8bc6\u522b\u548c\u6279\u5904\u7406\uff0c\u65e0\u663e\u8457\u9519\u8bef\u7387\u589e\u52a0\u3002<\/li>\n<li><strong>\u6279\u5904\u7406\u63a8\u7406<\/strong>\uff1a\u652f\u6301\u6279\u91cf\u5904\u7406\uff0c\u63d0\u9ad8\u5904\u7406\u6548\u7387\u3002<\/li>\n<li><strong>\u517c\u5bb9\u6027<\/strong>\uff1a\u652f\u6301 PyTorch 2.0 \u548c Python 3.10\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u73af\u5883\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<h3>\u8be6\u7ec6\u64cd\u4f5c\u6d41\u7a0b<\/h3>\n<ol>\n<li><strong>\u51c6\u5907\u97f3\u9891\u6587\u4ef6<\/strong>\uff1a\u786e\u4fdd\u97f3\u9891\u6587\u4ef6\u683c\u5f0f\u4e3a WAV \u6216 MP3\uff0c\u4e14\u97f3\u8d28\u6e05\u6670\u3002<\/li>\n<li><strong>\u52a0\u8f7d\u6a21\u578b<\/strong>\uff1a\u6839\u636e\u9700\u6c42\u9009\u62e9\u5408\u9002\u7684\u6a21\u578b\uff08\u5982 large-v2\uff09\uff0c\u5e76\u52a0\u8f7d\u5230\u5185\u5b58\u4e2d\u3002<\/li>\n<li><strong>\u6267\u884c\u8f6c\u5f55<\/strong>\uff1a\u8c03\u7528 transcribe \u51fd\u6570\u8fdb\u884c\u8bed\u97f3\u8f6c\u6587\u5b57\u5904\u7406\uff0c\u83b7\u53d6\u521d\u6b65\u8f6c\u5f55\u7ed3\u679c\u3002<\/li>\n<li><strong>\u5bf9\u9f50\u65f6\u95f4\u6233<\/strong>\uff1a\u4f7f\u7528 align \u51fd\u6570\u5bf9\u8f6c\u5f55\u7ed3\u679c\u8fdb\u884c\u5355\u8bcd\u7ea7\u65f6\u95f4\u6233\u5bf9\u9f50\uff0c\u786e\u4fdd\u65f6\u95f4\u6233\u7cbe\u786e\u3002<\/li>\n<li><strong>\u8bf4\u8bdd\u4eba\u5206\u79bb<\/strong>\uff1a\u8c03\u7528 diarize \u51fd\u6570\u8fdb\u884c\u591a\u8bf4\u8bdd\u4eba\u8bc6\u522b\uff0c\u83b7\u53d6\u6bcf\u4e2a\u8bf4\u8bdd\u4eba\u7684\u6807\u7b7e\u548c\u5bf9\u5e94\u7684\u8bed\u97f3\u7247\u6bb5\u3002<\/li>\n<li><strong>\u7ed3\u679c\u8f93\u51fa<\/strong>\uff1a\u5c06\u6700\u7ec8\u7ed3\u679c\u4fdd\u5b58\u4e3a\u6587\u672c\u6587\u4ef6\u6216 JSON \u683c\u5f0f\uff0c\u4fbf\u4e8e\u540e\u7eed\u5904\u7406\u548c\u5206\u6790\u3002<\/li>\n<\/ol>\n<h3>1. \u73af\u5883\u51c6\u5907<\/h3>\n<ol>\n<li>\u7cfb\u7edf\u8981\u6c42\uff1a\n<ul>\n<li>Python 3.10 \u73af\u5883\uff08\u5efa\u8bae\u4f7f\u7528 mamba \u6216 conda \u521b\u5efa\u865a\u62df\u73af\u5883\uff09<\/li>\n<li>CUDA \u548c cuDNN \u652f\u6301\uff08GPU \u52a0\u901f\u5fc5\u9700\uff09<\/li>\n<li>FFmpeg \u5de5\u5177\u5305<\/li>\n<\/ul>\n<\/li>\n<li>\u5b89\u88c5\u6b65\u9aa4\uff1a<\/li>\n<\/ol>\n<pre><code># \u521b\u5efa Python \u73af\u5883\r\nmamba create -n whisperx python=3.10\r\nmamba activate whisperx\r\n# \u5b89\u88c5 CUDA \u548c cuDNN\r\nmamba install cuda cudnn\r\n# \u5b89\u88c5 BetterWhisperX\r\npip install git+https:\/\/github.com\/federicotorrielli\/BetterWhisperX.git\r\n<\/code><\/pre>\n<h3>2. \u57fa\u672c\u4f7f\u7528\u65b9\u6cd5<\/h3>\n<ol>\n<li>\u547d\u4ee4\u884c\u4f7f\u7528\uff1a<\/li>\n<\/ol>\n<pre><code># \u57fa\u7840\u8f6c\u5f55\uff08\u82f1\u8bed\uff09\r\nwhisperx audio.wav\r\n# \u4f7f\u7528\u5927\u6a21\u578b\u548c\u66f4\u9ad8\u7cbe\u5ea6\r\nwhisperx audio.wav --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --batch_size 4\r\n# \u542f\u7528\u8bf4\u8bdd\u4eba\u5206\u79bb\r\nwhisperx audio.wav --model large-v2 --diarize --highlight_words True\r\n# CPU \u6a21\u5f0f\uff08\u9002\u7528\u4e8e Mac OS X\uff09\r\nwhisperx audio.wav --compute_type int8\r\n<\/code><\/pre>\n<ol start=\"2\">\n<li>Python \u4ee3\u7801\u8c03\u7528\uff1a<\/li>\n<\/ol>\n<pre><code>import whisperx\r\nimport gc\r\ndevice = \"cuda\"\r\naudio_file = \"audio.mp3\"\r\nbatch_size = 16  # GPU \u5185\u5b58\u4e0d\u8db3\u65f6\u53ef\u964d\u4f4e\r\ncompute_type = \"float16\"  # \u5185\u5b58\u4e0d\u8db3\u53ef\u6539\u7528 \"int8\"\r\n# 1. \u52a0\u8f7d\u6a21\u578b\u5e76\u8f6c\u5f55\r\nmodel = whisperx.load_model(\"large-v2\", device, compute_type=compute_type)\r\naudio = whisperx.load_audio(audio_file)\r\nresult = model.transcribe(audio, batch_size=batch_size)\r\n# 2. \u97f3\u7d20\u5bf9\u9f50\r\nmodel_a, metadata = whisperx.load_align_model(language_code=result[\"language\"], device=device)\r\nresult = whisperx.align(result[\"segments\"], model_a, metadata, audio, device)\r\n# 3. \u8bf4\u8bdd\u4eba\u5206\u79bb\uff08\u9700\u8981 Hugging Face token\uff09\r\ndiarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)\r\ndiarize_segments = diarize_model(audio)\r\nresult = whisperx.assign_word_speakers(diarize_segments, result)\r\n<\/code><\/pre>\n<h3>3. \u6027\u80fd\u4f18\u5316\u5efa\u8bae<\/h3>\n<ol>\n<li>GPU \u5185\u5b58\u4f18\u5316\uff1a\n<ul>\n<li>\u964d\u4f4e\u6279\u5904\u7406\u5927\u5c0f\uff08batch_size\uff09<\/li>\n<li>\u4f7f\u7528\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982 base \u66ff\u4ee3 large\uff09<\/li>\n<li>\u9009\u62e9\u8f7b\u91cf\u7ea7\u8ba1\u7b97\u7c7b\u578b\uff08int8\uff09<\/li>\n<\/ul>\n<\/li>\n<li>\u591a\u8bed\u8a00\u652f\u6301\uff1a\n<ul>\n<li>\u9ed8\u8ba4\u652f\u6301\u8bed\u8a00\uff1a\u82f1\u8bed\u3001\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u65e5\u8bed\u3001\u4e2d\u6587\u3001\u8377\u5170\u8bed\u3001\u4e4c\u514b\u5170\u8bed\u3001\u8461\u8404\u7259\u8bed<\/li>\n<li>\u4f7f\u7528\u65f6\u6307\u5b9a\u8bed\u8a00\uff1a<code>--language de<\/code>\uff08\u793a\u4f8b\u4e3a\u5fb7\u8bed\uff09<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3>4. \u6ce8\u610f\u4e8b\u9879<\/h3>\n<ul>\n<li>\u5bf9\u4e8e\u7279\u6b8a\u5b57\u7b26\uff08\u5982\u6570\u5b57\u3001\u8d27\u5e01\u7b26\u53f7\uff09\u7684\u65f6\u95f4\u6233\u53ef\u80fd\u4e0d\u591f\u51c6\u786e<\/li>\n<li>\u591a\u4eba\u540c\u65f6\u8bf4\u8bdd\u7684\u573a\u666f\u8bc6\u522b\u6548\u679c\u53ef\u80fd\u4e0d\u4f73<\/li>\n<li>\u8bf4\u8bdd\u4eba\u5206\u79bb\u529f\u80fd\u4ecd\u5728\u4f18\u5316\u4e2d<\/li>\n<li>\u4f7f\u7528\u8bf4\u8bdd\u4eba\u5206\u79bb\u529f\u80fd\u9700\u8981 Hugging Face \u8bbf\u95ee\u4ee4\u724c<\/li>\n<li>\u786e\u4fdd GPU \u9a71\u52a8\u548c CUDA \u7248\u672c\u517c\u5bb9<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>BetterWhisperX \u662f\u4e00\u4e2a\u57fa\u4e8e WhisperX \u9879\u76ee\u7684\u4f18\u5316\u7248\u672c\uff0c\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u9ad8\u6548\u3001\u51c6\u786e\u7684\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b(ASR)\u670d\u52a1\u3002\u4f5c\u4e3a WhisperX \u7684\u6539\u8fdb\u5206\u652f\uff0c\u8be5\u9879\u76ee\u7531 Federico Torrielli \u7ef4\u62a4\uff0c\u81f4\u529b\u4e8e\u4fdd\u6301\u9879\u76ee\u7684\u6301\u7eed\u66f4\u65b0&#8230;<\/p>\n","protected":false},"author":1,"featured_media":61524,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[230,216],"class_list":["post-16664","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tool","tag-aikaiyuanxiangmu","tag-aiyuyinzhuanwenben"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/16664","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/comments?post=16664"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/16664\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media\/61524"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media?parent=16664"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/categories?post=16664"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/tags?post=16664"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}