{"id":7839,"date":"2024-10-29T21:16:19","date_gmt":"2024-10-29T13:16:19","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=7839"},"modified":"2024-11-12T09:00:35","modified_gmt":"2024-11-12T01:00:35","slug":"amphion-maskgct","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/de\/amphion-maskgct\/","title":{"rendered":"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09"},"content":{"rendered":"<p>MaskGCT\uff08Masked Generative Codec Transformer\uff09\u662f\u7531\u8da3\u4e38\u79d1\u6280\u548c\u9999\u6e2f\u4e2d\u6587\u5927\u5b66\u8054\u5408\u63a8\u51fa\u7684\u4e00\u4e2a\u5b8c\u5168\u975e\u81ea\u56de\u5f52\u7684\u6587\u672c\u5230\u8bed\u97f3\uff08TTS\uff09\u6a21\u578b\u3002\u8be5\u6a21\u578b\u65e0\u9700\u663e\u5f0f\u7684\u6587\u672c\u4e0e\u8bed\u97f3\u5bf9\u9f50\u4fe1\u606f\uff0c\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u751f\u6210\u65b9\u5f0f\uff0c\u9996\u5148\u901a\u8fc7\u6587\u672c\u9884\u6d4b\u8bed\u4e49\u7f16\u7801\uff0c\u518d\u901a\u8fc7\u8bed\u4e49\u7f16\u7801\u751f\u6210\u58f0\u5b66\u7f16\u7801\u3002MaskGCT\u5728\u96f6\u6837\u672cTTS\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u3001\u76f8\u4f3c\u5ea6\u9ad8\u4e14\u6613\u4e8e\u7406\u89e3\u7684\u8bed\u97f3\u8f93\u51fa\u3002<\/p>\n<blockquote><p>\u516c\u6d4b\u4ea7\u54c1\uff1a\u8da3\u4e38\u5343\u97f3\uff0c\u8bed\u97f3\u514b\u9686\u4e0e\u89c6\u9891\u591a\u8bed\u8a00\u7ffb\u8bd1\u5de5\u5177<\/p>\n<p>\u8bba\u6587\uff1ahttps:\/\/arxiv.org\/abs\/2409.00750<\/p><\/blockquote>\n<div id=\"attachment_7843\" style=\"width: 1930px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7843\" class=\"wp-image-7843 size-full\" title=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/7b2f92ef0e1b0f3.png\" alt=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-1\" width=\"1920\" height=\"870\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/7b2f92ef0e1b0f3.png 1920w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/7b2f92ef0e1b0f3-300x136.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/7b2f92ef0e1b0f3-1024x464.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/7b2f92ef0e1b0f3-768x348.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/7b2f92ef0e1b0f3-1536x696.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><p id=\"caption-attachment-7843\" class=\"wp-caption-text\">\u5728\u7ebf\u6f14\u793a\uff1ahttps:\/\/huggingface.co\/spaces\/amphion\/maskgct<\/p><\/div>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li><strong>\u6587\u672c\u5230\u8bed\u97f3\u8f6c\u6362\uff08TTS\uff09<\/strong>\uff1a\u5c06\u8f93\u5165\u7684\u6587\u672c\u8f6c\u6362\u4e3a\u8bed\u97f3\u8f93\u51fa\u3002<\/li>\n<li><strong>\u8bed\u4e49\u7f16\u7801<\/strong>\uff1a\u5c06\u8bed\u97f3\u8f6c\u6362\u4e3a\u8bed\u4e49\u7f16\u7801\uff0c\u4fbf\u4e8e\u540e\u7eed\u5904\u7406\u3002<\/li>\n<li><strong>\u58f0\u5b66\u7f16\u7801<\/strong>\uff1a\u5c06\u8bed\u4e49\u7f16\u7801\u8f6c\u6362\u4e3a\u58f0\u5b66\u7f16\u7801\uff0c\u5e76\u91cd\u5efa\u97f3\u9891\u6ce2\u5f62\u3002<\/li>\n<li><strong>\u96f6\u6837\u672c\u5b66\u4e60<\/strong>\uff1a\u65e0\u9700\u663e\u5f0f\u5bf9\u9f50\u4fe1\u606f\u5373\u53ef\u8fdb\u884c\u9ad8\u8d28\u91cf\u7684\u8bed\u97f3\u5408\u6210\u3002<\/li>\n<li><strong>\u9884\u8bad\u7ec3\u6a21\u578b<\/strong>\uff1a\u63d0\u4f9b\u591a\u79cd\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u652f\u6301\u5feb\u901f\u90e8\u7f72\u548c\u4f7f\u7528\u3002<\/li>\n<\/ul>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<h3>\u5b89\u88c5\u6d41\u7a0b<\/h3>\n<ol>\n<li><strong>\u514b\u9686\u9879\u76ee<\/strong>\uff1a\n<pre><code>git clone https:\/\/github.com\/open-mmlab\/Amphion.git\r\n<\/code><\/pre>\n<\/li>\n<li><strong>\u521b\u5efa\u73af\u5883\u5e76\u5b89\u88c5\u4f9d\u8d56<\/strong>\uff1a\n<pre><code>bash .\/models\/tts\/maskgct\/env.sh\r\n<\/code><\/pre>\n<\/li>\n<\/ol>\n<h3>\u4f7f\u7528\u6d41\u7a0b<\/h3>\n<ol>\n<li><strong>\u4e0b\u8f7d\u9884\u8bad\u7ec3\u6a21\u578b<\/strong>\uff1a \u53ef\u4ee5\u4eceHuggingFace\u4e0b\u8f7d\u6240\u9700\u7684\u9884\u8bad\u7ec3\u6a21\u578b\uff1a\n<pre><code>from huggingface_hub import hf_hub_download\r\n# \u4e0b\u8f7d\u8bed\u4e49\u7f16\u7801\u6a21\u578b\r\nsemantic_code_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"semantic_codec\/model.safetensors\")\r\n# \u4e0b\u8f7d\u58f0\u5b66\u7f16\u7801\u6a21\u578b\r\ncodec_encoder_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"acoustic_codec\/model.safetensors\")\r\ncodec_decoder_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"acoustic_codec\/model_1.safetensors\")\r\n# \u4e0b\u8f7dTTS\u6a21\u578b\r\nt2s_model_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"t2s_model\/model.safetensors\")\r\n<\/code><\/pre>\n<\/li>\n<li><strong>\u751f\u6210\u8bed\u97f3<\/strong>\uff1a \u4f7f\u7528\u4ee5\u4e0b\u4ee3\u7801\u4ece\u6587\u672c\u751f\u6210\u8bed\u97f3\uff1a\n<pre><code># \u5bfc\u5165\u5fc5\u8981\u7684\u5e93\r\nfrom amphion.models.tts.maskgct import MaskGCT\r\n# \u521d\u59cb\u5316\u6a21\u578b\r\nmodel = MaskGCT()\r\n# \u8f93\u5165\u6587\u672c\r\ntext = \"\u4f60\u597d\uff0c\u6b22\u8fce\u4f7f\u7528MaskGCT\u6a21\u578b\u3002\"\r\n# \u751f\u6210\u8bed\u97f3\r\naudio = model.text_to_speech(text)\r\n# \u4fdd\u5b58\u751f\u6210\u7684\u8bed\u97f3\r\nwith open(\"output.wav\", \"wb\") as f:\r\nf.write(audio)\r\n<\/code><\/pre>\n<\/li>\n<li><strong>\u6a21\u578b\u8bad\u7ec3<\/strong>\uff1a \u5982\u679c\u9700\u8981\u8bad\u7ec3\u81ea\u5df1\u7684\u6a21\u578b\uff0c\u53ef\u4ee5\u53c2\u8003\u9879\u76ee\u4e2d\u7684\u8bad\u7ec3\u811a\u672c\u548c\u914d\u7f6e\u6587\u4ef6\uff0c\u8fdb\u884c\u6570\u636e\u51c6\u5907\u548c\u6a21\u578b\u8bad\u7ec3\u3002<\/li>\n<\/ol>\n<h3>\u6ce8\u610f\u4e8b\u9879<\/h3>\n<ul>\n<li><strong>\u73af\u5883\u914d\u7f6e<\/strong>\uff1a\u786e\u4fdd\u5b89\u88c5\u4e86\u6240\u6709\u5fc5\u8981\u7684\u4f9d\u8d56\u5e93\uff0c\u5e76\u6b63\u786e\u914d\u7f6e\u73af\u5883\u53d8\u91cf\u3002<\/li>\n<li><strong>\u6570\u636e\u51c6\u5907<\/strong>\uff1a\u4f7f\u7528\u9ad8\u8d28\u91cf\u7684\u8bed\u97f3\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u83b7\u5f97\u66f4\u597d\u7684\u8bed\u97f3\u5408\u6210\u6548\u679c\u3002<\/li>\n<li><strong>\u6a21\u578b\u4f18\u5316<\/strong>\uff1a\u6839\u636e\u5177\u4f53\u5e94\u7528\u573a\u666f\uff0c\u8c03\u6574\u6a21\u578b\u53c2\u6570\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u4ee5\u8fbe\u5230\u6700\u4f73\u6027\u80fd\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u672c\u5730\u90e8\u7f72\u6559\u7a0b\uff08\u542b\u672c\u5730\u4e00\u952e\u5b89\u88c5\u5305\uff09<\/h2>\n<p>\u524d\u51e0\u5929\uff0c\u53c8\u4e00\u6b3e\u975e\u81ea\u56de\u5f52\u7684\u6587\u5b57\u8f6c\u8bed\u97f3\u7684AI\u6a21\u578b\uff1aMaskGCT\uff0c\u5f00\u653e\u4e86\u6e90\u7801\uff0c\u548c\u540c\u6837\u975e\u81ea\u56de\u5f52\u7684F5-TTS\u6a21\u578b\u4e00\u6837\uff0cMaskGCT\u6a21\u578b\u4e5f\u662f\u57fa\u4e8e10\u4e07\u5c0f\u65f6\u6570\u636e\u96c6Emilia\u8bad\u7ec3\u800c\u6765\u7684\uff0c\u7cbe\u901a\u4e2d\u82f1\u65e5\u97e9\u6cd5\u5fb76\u79cd\u8bed\u8a00\u7684\u8de8\u8bed\u79cd\u5408\u6210\u3002\u6570\u636e\u96c6Emilia\u662f\u5168\u7403\u6700\u5927\u4e14\u6700\u4e3a\u591a\u6837\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u79cd\u8bed\u97f3\u6570\u636e\u96c6\u4e4b\u4e00\u3002<\/p>\n<p>\u672c\u6b21\u5206\u4eab\u4e00\u4e0b\u5982\u4f55\u5728\u672c\u5730\u90e8\u7f72MaskGCT\u9879\u76ee\uff0c\u8ba9\u60a8\u7684\u663e\u5361\u518d\u6b21\u53d1\u70e7\u3002<\/p>\n<h3>\u5b89\u88c5\u57fa\u7840\u4f9d\u8d56<\/h3>\n<p>\u9996\u5148\u786e\u4fdd\u672c\u5730\u5df2\u7ecf\u5b89\u88c5\u597dPython3.11\u73af\u5883\uff0c\u5b89\u88c5\u5305\u53ef\u4ee5\u53bbPython\u7684\u5b98\u65b9\u4e0b\u8f7d:<\/p>\n<pre><code>python.org\r\n<\/code><\/pre>\n<p>\u968f\u540e\u514b\u9686\u5b98\u65b9\u9879\u76ee:<\/p>\n<pre><code>git clone https:\/\/github.com\/open-mmlab\/Amphion.git\r\n<\/code><\/pre>\n<p>\u5b98\u65b9\u63d0\u4f9b\u4e86\u57fa\u4e8elinux\u7684\u5b89\u88c5shell\u811a\u672c\uff1a<\/p>\n<pre><code>pip install setuptools ruamel.yaml tqdm   \r\npip install tensorboard tensorboardX torch==2.0.1  \r\npip install transformers===4.41.1  \r\npip install -U encodec  \r\npip install black==24.1.1  \r\npip install oss2  \r\nsudo apt-get install espeak-ng  \r\npip install phonemizer  \r\npip install g2p_en  \r\npip install accelerate==0.31.0  \r\npip install funasr zhconv zhon modelscope  \r\n# pip install git+https:\/\/github.com\/lhotse-speech\/lhotse  \r\npip install timm  \r\npip install jieba cn2an  \r\npip install unidecode  \r\npip install -U cos-python-sdk-v5  \r\npip install pypinyin  \r\npip install jiwer  \r\npip install omegaconf  \r\npip install pyworld  \r\npip install py3langid==0.2.2 LangSegment  \r\npip install onnxruntime  \r\npip install pyopenjtalk  \r\npip install pykakasi  \r\npip install -U openai-whisper\r\n<\/code><\/pre>\n<p>\u8fd9\u91cc\u7b14\u8005\u4e3a\u5927\u5bb6\u8f6c\u6362\u4e3a\u9002\u5408Windows\u7684requirements.txt\u4f9d\u8d56\u6587\u4ef6\uff1a<\/p>\n<pre><code>setuptools   \r\nruamel.yaml   \r\ntqdm   \r\ntransformers===4.41.1  \r\nencodec  \r\nblack==24.1.1  \r\noss2  \r\nphonemizer  \r\ng2p_en  \r\naccelerate==0.31.0  \r\nfunasr   \r\nzhconv   \r\nzhon   \r\nmodelscope  \r\ntimm  \r\njieba   \r\ncn2an  \r\nunidecode  \r\ncos-python-sdk-v5  \r\npypinyin  \r\njiwer  \r\nomegaconf  \r\npyworld  \r\npy3langid==0.2.2  \r\nLangSegment  \r\nonnxruntime  \r\npyopenjtalk  \r\npykakasi  \r\nopenai-whisper  \r\njson5\r\n<\/code><\/pre>\n<p>\u8fd0\u884c\u547d\u4ee4\uff1a<\/p>\n<pre><code>pip3 install -r requirements.txt\r\n<\/code><\/pre>\n<p>\u5b89\u88c5\u4f9d\u8d56\u5373\u53ef\u3002<\/p>\n<p>\u5b89\u88c5onnxruntime-gpu:<\/p>\n<pre><code>pip3 install onnxruntime-gpu\r\n<\/code><\/pre>\n<p>\u5b89\u88c5torch\u4e09\u4ef6\u5957:<\/p>\n<pre><code>pip3 install torch torchvision torchaudio --index-url https:\/\/download.pytorch.org\/whl\/cu118\r\n<\/code><\/pre>\n<h3>Windows\u914d\u7f6eespeak-ng<\/h3>\n<p>\u7531\u4e8eMaskGCT\u9879\u76ee\u540e\u7aef\u4f9d\u8d56espeak\u8f6f\u4ef6\uff0c\u6240\u4ee5\u9700\u8981\u5728\u672c\u5730\u8fdb\u884c\u914d\u7f6e\uff0ceSpeak \u662f\u4e00\u4e2a\u7d27\u51d1\u7684\u5f00\u6e90\u6587\u672c\u8f6c\u8bed\u97f3 (TTS) \u5408\u6210\u5668\uff0c\u652f\u6301\u591a\u79cd\u8bed\u8a00\u548c\u53e3\u97f3 \u3002\u5b83\u4f7f\u7528\u201c\u5171\u632f\u5cf0\u5408\u6210\u201d\u65b9\u6cd5\uff0c\u5141\u8bb8\u4ee5\u8f83\u5c0f\u7684\u4f53\u79ef\u63d0\u4f9b\u591a\u79cd\u8bed\u8a00 \u3002\u8bed\u97f3\u6e05\u6670\uff0c\u53ef\u4ee5\u9ad8\u901f\u4f7f\u7528\uff0c\u4f46\u4e0d\u5982\u57fa\u4e8e\u4eba\u7c7b\u8bed\u97f3\u5f55\u97f3\u7684\u8f83\u5927\u5408\u6210\u5668\u81ea\u7136\u6d41\u7545\uff0c\u800cMaskGCT\u5c31\u662f\u5728espeak\u7684\u5408\u6210\u57fa\u7840\u4e0a\u8fdb\u884c\u4e8c\u6b21\u63a8\u7406\u3002<\/p>\n<p>\u9996\u5148\u8fd0\u884c\u547d\u4ee4\u5b89\u88c5espeak:<\/p>\n<pre><code>winget install espeak\r\n<\/code><\/pre>\n<p>\u5982\u679c\u88c5\u4e0d\u4e0a\uff0c\u4e5f\u53ef\u4ee5\u4e0b\u8f7d\u5b89\u88c5\u5305\u624b\u52a8\u5b89\u88c5\uff1a<\/p>\n<pre><code>https:\/\/sourceforge.net\/projects\/espeak\/files\/espeak\/espeak-1.48\/setup_espeak-1.48.04.exe\/download\r\n<\/code><\/pre>\n<p>\u968f\u540e\u4e0b\u8f7despeak-ng\u5b89\u88c5\u5305\uff1a<\/p>\n<pre><code>https:\/\/github.com\/espeak-ng\/espeak-ng\/releases\r\n<\/code><\/pre>\n<p>\u4e0b\u8f7d\u540e\u53cc\u51fb\u5b89\u88c5\u3002<\/p>\n<p>\u63a5\u7740\u628a C:\\Program Files\\eSpeak NG\\libespeak-ng.dll \u62f7\u8d1d\u5230 C:\\Program Files (x86)\\eSpeak\\command_line \u76ee\u5f55\u3002<\/p>\n<p>\u7136\u540e\u628a libespeak-ng.dll \u91cd\u547d\u540d\u4e3a espeak-ng.dll<\/p>\n<p>\u6700\u540e\u628a C:\\Program Files (x86)\\eSpeak\\command_line \u76ee\u5f55\u914d\u7f6e\u5230\u73af\u5883\u53d8\u91cf\u5373\u53ef\u3002<\/p>\n<h3>MaskGCT\u672c\u5730\u63a8\u7406<\/h3>\n<p>\u90fd\u914d\u7f6e\u597d\u4e4b\u540e\uff0c\u7f16\u5199\u63a8\u7406\u811a\u672c local_test.py:<\/p>\n<pre><code>from models.tts.maskgct.maskgct_utils import *  \r\nfrom huggingface_hub import hf_hub_download  \r\nimport safetensors  \r\nimport soundfile as sf  \r\nimport os  \r\nimport argparse  \r\nos.environ['HF_HOME'] = os.path.join(os.path.dirname(__file__), 'hf_download')  \r\nprint(os.path.join(os.path.dirname(__file__), 'hf_download'))  \r\nparser = argparse.ArgumentParser(description=\"GPT-SoVITS api\")  \r\nparser.add_argument(\"-p\", \"--prompt_text\", type=str, default=\"\u8bf4\u5f97\u597d\u50cf\u60a8\u5e26\u6211\u4ee5\u6765\u6211\u8003\u597d\u8fc7\u51e0\u6b21\u4e00\u6837\")  \r\nparser.add_argument(\"-a\", \"--audio\", type=str, default=\".\/\u8bf4\u5f97\u597d\u50cf\u60a8\u5e26\u6211\u4ee5\u6765\u6211\u8003\u597d\u8fc7\u51e0\u6b21\u4e00\u6837.wav\")  \r\nparser.add_argument(\"-t\", \"--text\", type=str, default=\"\u4f60\u597d\")  \r\nparser.add_argument(\"-l\", \"--language\", type=str, default=\"zh\")  \r\nparser.add_argument(\"-lt\", \"--target_language\", type=str, default=\"zh\")  \r\nargs = parser.parse_args()  \r\nif __name__ == \"__main__\":  \r\n# download semantic codec ckpt  \r\nsemantic_code_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"semantic_codec\/model.safetensors\")  \r\n# download acoustic codec ckpt  \r\ncodec_encoder_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"acoustic_codec\/model.safetensors\")  \r\ncodec_decoder_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"acoustic_codec\/model_1.safetensors\")  \r\n# download t2s model ckpt  \r\nt2s_model_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"t2s_model\/model.safetensors\")  \r\n# download s2a model ckpt  \r\ns2a_1layer_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"s2a_model\/s2a_model_1layer\/model.safetensors\")  \r\ns2a_full_ckpt = hf_hub_download(\"amphion\/MaskGCT\", filename=\"s2a_model\/s2a_model_full\/model.safetensors\")  \r\n# build model  \r\ndevice = torch.device(\"cuda\")  \r\ncfg_path = \".\/models\/tts\/maskgct\/config\/maskgct.json\"  \r\ncfg = load_config(cfg_path)  \r\n# 1. build semantic model (w2v-bert-2.0)  \r\nsemantic_model, semantic_mean, semantic_std = build_semantic_model(device)  \r\n# 2. build semantic codec  \r\nsemantic_codec = build_semantic_codec(cfg.model.semantic_codec, device)  \r\n# 3. build acoustic codec  \r\ncodec_encoder, codec_decoder = build_acoustic_codec(cfg.model.acoustic_codec, device)  \r\n# 4. build t2s model  \r\nt2s_model = build_t2s_model(cfg.model.t2s_model, device)  \r\n# 5. build s2a model  \r\ns2a_model_1layer = build_s2a_model(cfg.model.s2a_model.s2a_1layer, device)  \r\ns2a_model_full =  build_s2a_model(cfg.model.s2a_model.s2a_full, device)  \r\n# load semantic codec  \r\nsafetensors.torch.load_model(semantic_codec, semantic_code_ckpt)  \r\n# load acoustic codec  \r\nsafetensors.torch.load_model(codec_encoder, codec_encoder_ckpt)  \r\nsafetensors.torch.load_model(codec_decoder, codec_decoder_ckpt)  \r\n# load t2s model  \r\nsafetensors.torch.load_model(t2s_model, t2s_model_ckpt)  \r\n# load s2a model  \r\nsafetensors.torch.load_model(s2a_model_1layer, s2a_1layer_ckpt)  \r\nsafetensors.torch.load_model(s2a_model_full, s2a_full_ckpt)  \r\n# inference  \r\nprompt_wav_path = args.audio  \r\nsave_path = \"output.wav\"  \r\nprompt_text = args.prompt_text  \r\ntarget_text = args.text  \r\n# Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.  \r\ntarget_len = None  \r\nmaskgct_inference_pipeline = MaskGCT_Inference_Pipeline(  \r\nsemantic_model,  \r\nsemantic_codec,  \r\ncodec_encoder,  \r\ncodec_decoder,  \r\nt2s_model,  \r\ns2a_model_1layer,  \r\ns2a_model_full,  \r\nsemantic_mean,  \r\nsemantic_std,  \r\ndevice,  \r\n)  \r\nrecovered_audio = maskgct_inference_pipeline.maskgct_inference(  \r\nprompt_wav_path, prompt_text, target_text,args.language,args.target_language, target_len=target_len  \r\n)  \r\nsf.write(save_path, recovered_audio, 24000)\r\n<\/code><\/pre>\n<p>\u9996\u6b21\u63a8\u7406\u4f1a\u5728hf_download\u76ee\u5f55\u4e0b\u8f7d10\u4e2aG\u7684\u6a21\u578b\u3002<\/p>\n<p>\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u4f1a\u5360\u752811G\u7684\u663e\u5b58\uff1a<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-7840\" title=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/31ea896cec697f7.jpg\" alt=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-1\" width=\"1477\" height=\"768\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/31ea896cec697f7.jpg 1477w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/31ea896cec697f7-300x156.jpg 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/31ea896cec697f7-1024x532.jpg 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/31ea896cec697f7-768x399.jpg 768w\" sizes=\"auto, (max-width: 1477px) 100vw, 1477px\" \/><\/p>\n<p>\u5982\u679c\u60a8\u7684\u663e\u5b58\u4f4e\u4e8e11G\uff0c\u90a3\u4e48\u52a1\u5fc5\u6253\u5f00Nvidia\u63a7\u5236\u9762\u677f\u7684\u7cfb\u7edf\u5185\u5b58\u56de\u9000\u7b56\u7565\uff0c\u901a\u8fc7\u7cfb\u7edf\u5185\u5b58\u6765\u8865\u8db3\u663e\u5b58\uff1a<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-7841\" title=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-2\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0.png\" alt=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-2\" width=\"2832\" height=\"1541\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0.png 2832w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0-300x163.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0-1024x557.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0-768x418.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0-1536x836.png 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f3810d1879a83a0-2048x1114.png 2048w\" sizes=\"auto, (max-width: 2832px) 100vw, 2832px\" \/><\/p>\n<p>\u5982\u679c\u613f\u610f\uff0c\u4e5f\u53ef\u4ee5\u57fa\u4e8egradio\u5199\u4e00\u4e2a\u7b80\u5355\u7684webui\u754c\u9762\uff0capp.py:<\/p>\n<pre><code>import os  \r\nimport gc  \r\nimport re  \r\nimport gradio as gr  \r\nimport numpy as np  \r\nimport subprocess  \r\nos.environ['HF_HOME'] = os.path.join(os.path.dirname(__file__), 'hf_download')  \r\n# \u8bbe\u7f6eHF_ENDPOINT\u73af\u5883\u53d8\u91cf  \r\nos.environ[\"HF_ENDPOINT\"] = \"https:\/\/hf-mirror.com\"  \r\nreference_wavs = [\"\u8bf7\u9009\u62e9\u53c2\u8003\u97f3\u9891\u6216\u8005\u81ea\u5df1\u4e0a\u4f20\"]  \r\nfor name in os.listdir(\".\/\u53c2\u8003\u97f3\u9891\/\"):  \r\nreference_wavs.append(name)  \r\ndef change_choices():  \r\nreference_wavs = [\"\u8bf7\u9009\u62e9\u53c2\u8003\u97f3\u9891\u6216\u8005\u81ea\u5df1\u4e0a\u4f20\"]  \r\nfor name in os.listdir(\".\/\u53c2\u8003\u97f3\u9891\/\"):  \r\nreference_wavs.append(name)  \r\nreturn {\"choices\":reference_wavs, \"__type__\": \"update\"}  \r\ndef change_wav(audio_path):  \r\ntext = audio_path.replace(\".wav\",\"\").replace(\".mp3\",\"\").replace(\".WAV\",\"\")  \r\n# text = replace_speaker(text)  \r\nreturn f\".\/\u53c2\u8003\u97f3\u9891\/{audio_path}\",text  \r\ndef do_cloth(gen_text_input,ref_audio_input,model_choice_text,model_choice_re,ref_text_input):  \r\ncmd = fr'.\\py311_cu118\\python.exe local_test.py -t \"{gen_text_input}\" -p \"{ref_text_input}\" -a \"{ref_audio_input}\" -l {model_choice_re} -lt {model_choice_text} '  \r\nprint(cmd)  \r\nres = subprocess.Popen(cmd)  \r\nres.wait()  \r\nreturn \"output.wav\"  \r\nwith gr.Blocks() as app_demo:  \r\ngr.Markdown(  \r\n\"\"\"  \r\n\u9879\u76ee\u5730\u5740:https:\/\/github.com\/open-mmlab\/Amphion\/tree\/main\/models\/tts\/maskgct  \r\n\u6574\u5408\u5305\u5236\u4f5c:\u5218\u60a6\u7684\u6280\u672f\u535a\u5ba2 https:\/\/space.bilibili.com\/3031494  \r\n\"\"\"  \r\n)  \r\ngen_text_input = gr.Textbox(label=\"\u751f\u6210\u6587\u672c\", lines=4)  \r\nmodel_choice_text = gr.Radio(  \r\nchoices=[\"zh\", \"en\"], label=\"\u751f\u6210\u6587\u672c\u8bed\u79cd\", value=\"zh\",interactive=True)  \r\nwavs_dropdown = gr.Dropdown(label=\"\u53c2\u8003\u97f3\u9891\u5217\u8868\",choices=reference_wavs,value=\"\u9009\u62e9\u53c2\u8003\u97f3\u9891\u6216\u8005\u81ea\u5df1\u4e0a\u4f20\",interactive=True)  \r\nrefresh_button = gr.Button(\"\u5237\u65b0\u53c2\u8003\u97f3\u9891\")  \r\nrefresh_button.click(fn=change_choices, inputs=[], outputs=[wavs_dropdown])  \r\nref_audio_input = gr.Audio(label=\"Reference Audio\", type=\"filepath\")  \r\nref_text_input = gr.Textbox(  \r\nlabel=\"Reference Text\",  \r\ninfo=\"Leave blank to automatically transcribe the reference audio. If you enter text it will override automatic transcription.\",  \r\nlines=2,  \r\n)  \r\nmodel_choice_re = gr.Radio(  \r\nchoices=[\"zh\", \"en\"], label=\"\u53c2\u8003\u97f3\u9891\u8bed\u79cd\", value=\"zh\",interactive=True  \r\n)  \r\nwavs_dropdown.change(change_wav,[wavs_dropdown],[ref_audio_input,ref_text_input])  \r\ngenerate_btn = gr.Button(\"Synthesize\", variant=\"primary\")  \r\naudio_output = gr.Audio(label=\"Synthesized Audio\")  \r\ngenerate_btn.click(do_cloth,[gen_text_input,ref_audio_input,model_choice_text,model_choice_re,ref_text_input],[audio_output])  \r\ndef main():  \r\nglobal app_demo  \r\nprint(f\"Starting app...\")  \r\napp_demo.launch(inbrowser=True)  \r\nif __name__ == \"__main__\":  \r\nmain()\r\n<\/code><\/pre>\n<p>\u5f53\u7136\uff0c\u522b\u5fd8\u4e86\u5b89\u88c5gradio\u4f9d\u8d56:<\/p>\n<pre><code>pip3 install -U gradio\r\n<\/code><\/pre>\n<p>\u8fd0\u884c\u6548\u679c\u662f\u8fd9\u6837\u7684\uff1a<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-7842\" title=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-3\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5.png\" alt=\"Amphion MaskGCT\uff1a\u96f6\u6837\u672c\u6587\u672c\u5230\u8bed\u97f3\u514b\u9686\u6a21\u578b\uff08\u672c\u5730\u4e00\u952e\u90e8\u7f72\u5305\uff09-3\" width=\"3464\" height=\"1910\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5.png 3464w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5-300x165.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5-1024x565.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5-768x423.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5-1536x847.png 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/10\/f1cbcd3dad13fb5-2048x1129.png 2048w\" sizes=\"auto, (max-width: 3464px) 100vw, 3464px\" \/><\/p>\n<h3>\u7ed3\u8bed<\/h3>\n<p>MaskGCT\u6a21\u578b\u7684\u4f18\u52bf\u5728\u4e8e\u8bed\u6c14\u97f5\u5f8b\u5c42\u9762\u5341\u5206\u7a81\u51fa\uff0c\u53ef\u4ee5\u5ab2\u7f8e\u771f\u5b9e\u8bed\u97f3\uff0c\u7f3a\u70b9\u4e5f\u5f88\u660e\u663e\uff0c\u8fd0\u884c\u6210\u672c\u504f\u9ad8\uff0c\u5de5\u7a0b\u5316\u5c42\u9762\u4f18\u5316\u4e0d\u8db3\u3002MaskGCT\u9879\u76ee\u4e3b\u9875\u4e2d\u5df2\u7ecf\u6709\u5176\u5546\u4e1a\u7248\u672c\u6a21\u578b\u7684\u5165\u53e3\uff0c\u636e\u6b64\u63a8\u65ad\uff0c\u5b98\u65b9\u5e94\u8be5\u4e0d\u4f1a\u5728\u5f00\u6e90\u7248\u672c\u4e2d\u592a\u8fc7\u53d1\u529b\uff0c\u6700\u540e\u5949\u4e0a\u4e00\u952e\u6574\u5408\u5305\uff0c\u4e0e\u4f17\u4e61\u4eb2\u540c\u98e8\u3002<\/p>\n<p>&nbsp;<\/p>\n<h2>MaskGCT \u4e00\u952e\u90e8\u7f72\u5305<\/h2>\n<p><div class=\"huoduan_hide_box\" style=\"border:1px dashed #F60; padding:10px; margin:10px 0; line-height:200%; color:#F00; background-color:#FFF4FF; overflow:hidden; clear:both;\"><img loading=\"lazy\" decoding=\"async\" class=\"wxpic\" align=\"right\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/05\/d8668ed8023fbe2.jpg\" style=\"width:150px;height:150px;margin-left:20px;display:inline;border:none\" width=\"150\" height=\"150\"  alt=\"AI\u751f\u4ea7\u529b\u5e94\u7528\" \/><span style=\"font-size:18px;\">\u6b64\u5904\u5185\u5bb9\u5df2\u7ecf\u88ab\u4f5c\u8005\u9690\u85cf\uff0c\u8bf7\u8f93\u5165\u9a8c\u8bc1\u7801\u67e5\u770b\u5185\u5bb9<\/span><form method=\"post\" style=\"margin:10px 0;\"><span class=\"yzts\" style=\"font-size:18px;float:left;\">\u9a8c\u8bc1\u7801\uff1a<\/span><input name=\"huoduan_verifycode\" id=\"verifycode\" type=\"text\" value=\"\" style=\"border:none;float:left;width:80px; height:32px; line-height:30px; padding:0 5px; border:1px solid #FF6600;-moz-border-radius: 0px;  -webkit-border-radius: 0px;  border-radius:0px;\" \/><input id=\"verifybtn\" style=\"border:none;float:left;width:80px; height:32px; line-height:32px; padding:0 5px; background-color:#F60; text-align:center; border:none; cursor:pointer; color:#FFF;-moz-border-radius: 0px; font-size:14px;  -webkit-border-radius: 0px;  border-radius:0px;\" name=\"\" type=\"submit\" value=\"\u63d0\u4ea4\u67e5\u770b\" \/><\/form><div style=\"clear:left;\"><\/div><span style=\"color:#00BF30\">\u8bf7\u5173\u6ce8\u672c\u7ad9\u5fae\u4fe1\u516c\u4f17\u53f7\uff0c\u56de\u590d\u201c<span style=\"color:blue\">\u9a8c\u8bc1\u7801<\/span>\u201d\uff0c\u83b7\u53d6\u9a8c\u8bc1\u7801\u3002\u5728\u5fae\u4fe1\u91cc\u641c\u7d22\u201c<span style=\"color:blue\">AI\u751f\u4ea7\u529b\u5e94\u7528<\/span>\u201d\u6216\u8005\u201c<span style=\"color:blue\">Artificial9527<\/span>\u201d\u6216\u8005\u5fae\u4fe1\u626b\u63cf\u53f3\u4fa7\u4e8c\u7ef4\u7801\u90fd\u53ef\u4ee5\u5173\u6ce8\u672c\u7ad9\u5fae\u4fe1\u516c\u4f17\u53f7\u3002<\/span><div class=\"cl\"><\/div><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MaskGCT\uff08Masked Generative Codec Transformer\uff09\u662f\u7531\u8da3\u4e38\u79d1\u6280\u548c\u9999\u6e2f\u4e2d\u6587\u5927\u5b66\u8054\u5408\u63a8\u51fa\u7684\u4e00\u4e2a\u5b8c\u5168\u975e\u81ea\u56de\u5f52\u7684\u6587\u672c\u5230\u8bed\u97f3\uff08TTS\uff09\u6a21\u578b\u3002\u8be5\u6a21\u578b\u65e0\u9700\u663e\u5f0f\u7684\u6587\u672c\u4e0e\u8bed\u97f3\u5bf9\u9f50\u4fe1\u606f\uff0c\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u751f\u6210\u65b9\u5f0f\uff0c\u9996\u5148\u901a\u8fc7\u6587\u672c\u9884&#8230;<\/p>\n","protected":false},"author":1,"featured_media":61165,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[230,237],"class_list":["post-7839","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tool","tag-aikaiyuanxiangmu","tag-aiyuyinkelong"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/7839","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/comments?post=7839"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/7839\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media\/61165"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media?parent=7839"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/categories?post=7839"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/tags?post=7839"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}