{"id":29173,"date":"2025-03-21T15:18:31","date_gmt":"2025-03-21T07:18:31","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=29173"},"modified":"2025-08-25T00:08:52","modified_gmt":"2025-08-24T16:08:52","slug":"markpdfdown","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/pt\/markpdfdown\/","title":{"rendered":"MarkPDFDown\uff1a\u57fa\u4e8e\u591a\u6a21\u6001\u6a21\u578b\u5c06PDF\u8f6c\u4e3aMarkdown\u6587\u4ef6"},"content":{"rendered":"<p>MarkPDFDown \u662f\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u3002\u5b83\u5229\u7528\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u628a PDF \u6587\u4ef6\u8f6c\u4e3a Markdown \u683c\u5f0f\u3002\u5f00\u53d1\u8005\u662f GitHub \u7528\u6237 jorben\u3002\u8fd9\u4e2a\u5de5\u5177\u7684\u76ee\u6807\u5f88\u7b80\u5355\uff1a\u8ba9 PDF \u6587\u6863\u53d8\u5f97\u66f4\u6613\u7f16\u8f91\u548c\u5206\u4eab\u3002\u5b83\u80fd\u8bc6\u522b\u6587\u6863\u4e2d\u7684\u6807\u9898\u3001\u5217\u8868\u3001\u8868\u683c\u7b49\u7ed3\u6784\uff0c\u751f\u6210\u683c\u5f0f\u6574\u9f50\u7684 Markdown \u6587\u4ef6\u3002\u9879\u76ee\u4f7f\u7528 Python \u7f16\u5199\uff0c\u9002\u5408\u9700\u8981\u5904\u7406 PDF \u6587\u4ef6\u5e76\u8f6c\u4e3a\u6587\u672c\u683c\u5f0f\u7684\u7528\u6237\u3002\u5f53\u524d\u7248\u672c\u9700\u8981\u4f9d\u8d56 OpenAI \u7684 API\uff0c\u7528\u6237\u5f97\u81ea\u5df1\u51c6\u5907 API \u5bc6\u94a5\u3002MarkPDFDown \u5728 GitHub \u4e0a\u5f00\u653e\u6e90\u4ee3\u7801\uff0c\u6b22\u8fce\u5927\u5bb6\u53c2\u4e0e\u6539\u8fdb\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-29174\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/bfcd24f8044975f.jpg\" alt=\"\" width=\"4300\" height=\"2546\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li>\u5c06 PDF \u6587\u4ef6\u8f6c\u6362\u4e3a Markdown \u683c\u5f0f\uff0c\u4fdd\u7559\u6587\u6863\u7ed3\u6784\u3002<\/li>\n<li>\u652f\u6301\u8bc6\u522b\u6807\u9898\u3001\u6bb5\u843d\u3001\u5217\u8868\u3001\u8868\u683c\u7b49\u5143\u7d20\u3002<\/li>\n<li>\u901a\u8fc7\u591a\u6a21\u6001\u5927\u6a21\u578b\u7406\u89e3 PDF \u5185\u5bb9\uff0c\u786e\u4fdd\u8f6c\u6362\u7ed3\u679c\u51c6\u786e\u3002<\/li>\n<li>\u63d0\u4f9b\u547d\u4ee4\u884c\u64cd\u4f5c\uff0c\u652f\u6301\u6279\u91cf\u5904\u7406 PDF \u6587\u4ef6\u3002<\/li>\n<li>\u5f00\u6e90\u514d\u8d39\uff0c\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49\u4fee\u6539\u4ee3\u7801\u3002<\/li>\n<\/ul>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<p>MarkPDFDown \u662f\u4e00\u4e2a\u547d\u4ee4\u884c\u5de5\u5177\uff0c\u9700\u8981\u5728\u7535\u8111\u4e0a\u5b89\u88c5\u5e76\u914d\u7f6e\u73af\u5883\u624d\u80fd\u4f7f\u7528\u3002\u4ee5\u4e0b\u662f\u8be6\u7ec6\u7684\u5b89\u88c5\u548c\u64cd\u4f5c\u6b65\u9aa4\uff0c\u9002\u5408\u65b0\u624b\u4e5f\u80fd\u8f7b\u677e\u4e0a\u624b\u3002<\/p>\n<h3>\u5b89\u88c5\u6d41\u7a0b<\/h3>\n<ol>\n<li><strong>\u51c6\u5907\u73af\u5883<\/strong><br \/>\n\u4f60\u9700\u8981\u4e00\u53f0\u88c5\u6709 Python 3.9 \u7684\u7535\u8111\u3002\u5982\u679c\u6ca1\u6709\uff0c\u8bf7\u5148\u4e0b\u8f7d\u5b89\u88c5 Python\u3002<br \/>\n\u6253\u5f00\u7ec8\u7aef\uff0c\u8f93\u5165\u4ee5\u4e0b\u547d\u4ee4\u521b\u5efa\u865a\u62df\u73af\u5883\uff1a<\/li>\n<\/ol>\n<pre><code>conda create -n markpdfdown python=3.9\r\n<\/code><\/pre>\n<p>\u7136\u540e\u6fc0\u6d3b\u73af\u5883\uff1a<\/p>\n<pre><code>conda activate markpdfdown\r\n<\/code><\/pre>\n<ol start=\"2\">\n<li><strong>\u4e0b\u8f7d\u4ee3\u7801<\/strong><br \/>\n\u5728\u7ec8\u7aef\u4e2d\u8f93\u5165\u547d\u4ee4\uff0c\u514b\u9686 MarkPDFDown \u7684 GitHub \u4ed3\u5e93\uff1a<\/li>\n<\/ol>\n<pre><code>git clone https:\/\/github.com\/jorben\/markpdfdown.git\r\n<\/code><\/pre>\n<p>\u8fdb\u5165\u9879\u76ee\u6587\u4ef6\u5939\uff1a<\/p>\n<pre><code>cd markpdfdown\r\n<\/code><\/pre>\n<ol start=\"3\">\n<li><strong>\u5b89\u88c5\u4f9d\u8d56<\/strong><br \/>\n\u9879\u76ee\u9700\u8981\u4e00\u4e9b Python \u5e93\u652f\u6301\u3002\u8fd0\u884c\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88c5\uff1a<\/li>\n<\/ol>\n<pre><code>pip install -r requirements.txt\r\n<\/code><\/pre>\n<ol start=\"4\">\n<li><strong>\u914d\u7f6e API \u5bc6\u94a5<\/strong><br \/>\nMarkPDFDown \u4f7f\u7528 OpenAI \u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u9700\u8981 API \u5bc6\u94a5\u3002\u5148\u53bb OpenAI \u5b98\u7f51\u6ce8\u518c\u8d26\u53f7\uff0c\u83b7\u53d6\u5bc6\u94a5\u3002<br \/>\n\u5728\u7ec8\u7aef\u4e2d\u8bbe\u7f6e\u5bc6\u94a5\uff1a<\/li>\n<\/ol>\n<pre><code>export OPENAI_API_KEY=&lt;\u4f60\u7684API\u5bc6\u94a5&gt;\r\n<\/code><\/pre>\n<p>\u5982\u679c\u60f3\u6362\u6a21\u578b\u6216 API \u5730\u5740\uff0c\u53ef\u4ee5\u518d\u8bbe\u7f6e\uff1a<\/p>\n<pre><code>export OPENAI_DEFAULT_MODEL=&lt;\u4f60\u7684\u6a21\u578b\u540d&gt;\r\nexport OPENAI_API_BASE=&lt;\u4f60\u7684API\u5730\u5740&gt;\r\n<\/code><\/pre>\n<ol start=\"5\">\n<li><strong>\u9a8c\u8bc1\u5b89\u88c5<\/strong><br \/>\n\u8f93\u5165\u00a0<code>python main.py --help<\/code>\uff0c\u5982\u679c\u663e\u793a\u5e2e\u52a9\u4fe1\u606f\uff0c\u8bf4\u660e\u5b89\u88c5\u6210\u529f\u3002<\/li>\n<\/ol>\n<h3>\u5982\u4f55\u4f7f\u7528<\/h3>\n<p>\u5b89\u88c5\u597d\u540e\uff0cMarkPDFDown \u7684\u64cd\u4f5c\u5f88\u7b80\u5355\uff0c\u4e3b\u8981\u901a\u8fc7\u547d\u4ee4\u884c\u5b8c\u6210\u3002\u4ee5\u4e0b\u662f\u5177\u4f53\u6b65\u9aa4\u3002<\/p>\n<h4>\u8f6c\u6362\u6574\u4e2a PDF \u6587\u4ef6<\/h4>\n<p>\u5047\u8bbe\u4f60\u6709\u4e00\u4e2a PDF \u6587\u4ef6\uff0c\u6bd4\u5982\u00a0<code>tests\/input.pdf<\/code>\uff0c\u60f3\u8f6c\u4e3a Markdown \u6587\u4ef6\u00a0<code>output.md<\/code>\u3002\u5728\u7ec8\u7aef\u4e2d\u8f93\u5165\uff1a<\/p>\n<pre><code>python main.py &lt; tests\/input.pdf &gt; output.md\r\n<\/code><\/pre>\n<p>\u8fd0\u884c\u540e\uff0c<code>output.md<\/code>\u00a0\u5c31\u4f1a\u51fa\u73b0\u5728\u5f53\u524d\u6587\u4ef6\u5939\u4e2d\uff0c\u91cc\u9762\u662f\u8f6c\u6362\u597d\u7684 Markdown \u5185\u5bb9\u3002<\/p>\n<h4>\u8f6c\u6362 PDF \u7684\u6307\u5b9a\u9875\u9762<\/h4>\n<p>\u5982\u679c\u53ea\u60f3\u8f6c\u6362\u67d0\u51e0\u9875\uff0c\u6bd4\u5982\u7b2c 2 \u5230\u7b2c 5 \u9875\uff0c\u8f93\u5165\uff1a<\/p>\n<pre><code>python main.py 2 5 &lt; tests\/input.pdf &gt; output.md\r\n<\/code><\/pre>\n<p>\u7b2c\u4e00\u4e2a\u6570\u5b57\u662f\u8d77\u59cb\u9875\uff0c\u7b2c\u4e8c\u4e2a\u662f\u7ed3\u675f\u9875\u3002\u9875\u7801\u4ece 1 \u5f00\u59cb\u8ba1\u6570\u3002<\/p>\n<h4>\u4f7f\u7528 Docker \u8fd0\u884c<\/h4>\n<p>\u4e0d\u60f3\u88c5 Python \u73af\u5883\uff1f\u53ef\u4ee5\u7528 Docker\u3002\u786e\u4fdd\u7535\u8111\u4e0a\u6709 Docker\u3002\u7136\u540e\u8fd0\u884c\uff1a<\/p>\n<pre><code>docker run -i -e OPENAI_API_KEY=&lt;\u4f60\u7684API\u5bc6\u94a5&gt; jorben\/markpdfdown &lt; tests\/input.pdf &gt; output.md\r\n<\/code><\/pre>\n<p>\u8fd9\u6837\u76f4\u63a5\u901a\u8fc7 Docker \u5bb9\u5668\u8f6c\u6362\u6587\u4ef6\u3002<\/p>\n<h3>\u529f\u80fd\u64cd\u4f5c\u8be6\u89e3<\/h3>\n<ul>\n<li><strong>\u6838\u5fc3\u529f\u80fd\uff1aPDF \u8f6c Markdown<\/strong><br \/>\n\u628a PDF \u6587\u4ef6\u62d6\u5230\u547d\u4ee4\u884c\u7a97\u53e3\uff0c\u6216\u8005\u76f4\u63a5\u8f93\u5165\u6587\u4ef6\u8def\u5f84\uff0c\u5de5\u5177\u4f1a\u81ea\u52a8\u5206\u6790\u5185\u5bb9\u3002\u6807\u9898\u4f1a\u53d8\u6210\u00a0<code>#<\/code>\u3001<code>##<\/code>\u00a0\u7b49\uff0c\u5217\u8868\u7528\u00a0<code>-<\/code>\u00a0\u8868\u793a\uff0c\u8868\u683c\u7528 Markdown \u8868\u683c\u683c\u5f0f\u8f93\u51fa\u3002<br \/>\n\u6bd4\u5982\uff0c\u4e00\u4e2a PDF \u6709\u6807\u9898\u201c\u7b80\u4ecb\u201d\u548c\u6b63\u6587\u201c\u8fd9\u662f\u5185\u5bb9\u201d\uff0c\u8f6c\u6362\u540e\u53ef\u80fd\u662f\uff1a<\/li>\n<\/ul>\n<pre><code># \u7b80\u4ecb\r\n\u8fd9\u662f\u5185\u5bb9\r\n<\/code><\/pre>\n<ul>\n<li><strong>\u6279\u91cf\u5904\u7406<\/strong><br \/>\n\u5982\u679c\u6709\u5f88\u591a PDF \u6587\u4ef6\uff0c\u53ef\u4ee5\u5199\u4e2a\u811a\u672c\u5faa\u73af\u8c03\u7528\u547d\u4ee4\u3002\u6bd4\u5982\u5728 Linux \u4e0a\uff1a<\/li>\n<\/ul>\n<pre><code>for file in *.pdf; do python main.py &lt; \"$file\" &gt; \"${file%.pdf}.md\"; done\r\n<\/code><\/pre>\n<ul>\n<li><strong>\u8c03\u8bd5\u4e0e\u6539\u8fdb<\/strong><br \/>\n\u8f6c\u6362\u7ed3\u679c\u4e0d\u7406\u60f3\uff1f\u53ef\u4ee5\u5728 GitHub \u4e0a\u63d0\u95ee\u9898\uff0c\u6216\u8005\u81ea\u5df1\u6539\u4ee3\u7801\u3002\u9879\u76ee\u7528 Python \u7f16\u5199\uff0c\u903b\u8f91\u90fd\u5728\u00a0<code>main.py<\/code>\u00a0\u91cc\u3002<\/li>\n<\/ul>\n<h3>\u6ce8\u610f\u4e8b\u9879<\/h3>\n<ul>\n<li>\u6587\u4ef6\u8def\u5f84\u4e0d\u80fd\u6709\u4e2d\u6587\uff0c\u5426\u5219\u53ef\u80fd\u62a5\u9519\u3002<\/li>\n<li>API \u5bc6\u94a5\u8981\u4fdd\u5bc6\uff0c\u522b\u6cc4\u9732\u7ed9\u522b\u4eba\u3002<\/li>\n<li>\u5927\u6587\u4ef6\u5904\u7406\u53ef\u80fd\u9700\u8981\u66f4\u591a\u65f6\u95f4\uff0c\u786e\u4fdd\u7f51\u7edc\u7a33\u5b9a\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u5e94\u7528\u573a\u666f<\/h2>\n<ol>\n<li><strong>\u5b66\u672f\u7814\u7a76<\/strong><br \/>\n\u5b66\u751f\u6216\u7814\u7a76\u8005\u5e38\u9700\u8981\u628a\u8bba\u6587 PDF \u8f6c\u4e3a Markdown\uff0c\u65b9\u4fbf\u6458\u5f55\u7b14\u8bb0\u6216\u5206\u4eab\u3002MarkPDFDown \u80fd\u4fdd\u7559\u8bba\u6587\u7684\u7ed3\u6784\uff0c\u6bd4\u5982\u6807\u9898\u548c\u8868\u683c\uff0c\u76f4\u63a5\u7528 Markdown \u7f16\u8f91\u3002<\/li>\n<li><strong>\u6587\u6863\u6574\u7406<\/strong><br \/>\n\u516c\u53f8\u6709\u5927\u91cf PDF \u8bf4\u660e\u4e66\u6216\u62a5\u544a\uff0c\u60f3\u8f6c\u4e3a Markdown \u5b58\u6863\u3002\u53ef\u4ee5\u7528\u8fd9\u4e2a\u5de5\u5177\u6279\u91cf\u8f6c\u6362\uff0c\u518d\u4e0a\u4f20\u5230 GitHub \u6216 Notion\u3002<\/li>\n<li><strong>\u6280\u672f\u5199\u4f5c<\/strong><br \/>\n\u5199\u6280\u672f\u535a\u5ba2\u65f6\uff0c\u9700\u8981\u5f15\u7528 PDF \u8d44\u6599\u3002\u76f4\u63a5\u8f6c\u6362\u540e\uff0c\u7c98\u8d34\u5230 Markdown \u7f16\u8f91\u5668\uff0c\u7701\u53bb\u624b\u52a8\u6574\u7406\u7684\u9ebb\u70e6\u3002<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h2>QA<\/h2>\n<ol>\n<li><strong>\u9700\u8981\u8054\u7f51\u5417\uff1f<\/strong><br \/>\n\u662f\u7684\u3002\u5de5\u5177\u4f9d\u8d56 OpenAI \u7684 API\uff0c\u5fc5\u987b\u8054\u7f51\u624d\u80fd\u7528\u3002<\/li>\n<li><strong>\u652f\u6301\u4e2d\u6587 PDF \u5417\uff1f<\/strong><br \/>\n\u652f\u6301\u3002\u53ea\u8981 PDF \u662f\u6587\u672c\u683c\u5f0f\uff08\u4e0d\u662f\u626b\u63cf\u56fe\u7247\uff09\uff0c\u4e2d\u6587\u5185\u5bb9\u4e5f\u80fd\u6b63\u5e38\u8f6c\u6362\u3002<\/li>\n<li><strong>\u8f6c\u6362\u51fa\u9519\u600e\u4e48\u529e\uff1f<\/strong><br \/>\n\u68c0\u67e5 API \u5bc6\u94a5\u662f\u5426\u6b63\u786e\uff0c\u6216\u8005 PDF \u6587\u4ef6\u662f\u5426\u635f\u574f\u3002\u5982\u679c\u8fd8\u4e0d\u884c\uff0c\u53bb GitHub \u63d0 issue\u3002<\/li>\n<li><strong>\u53ef\u4ee5\u79bb\u7ebf\u7528\u5417\uff1f<\/strong><br \/>\n\u73b0\u5728\u4e0d\u884c\u3002\u672a\u6765\u53ef\u80fd\u652f\u6301\u672c\u5730\u6a21\u578b\uff0c\u4f46\u76ee\u524d\u5f97\u9760 OpenAI \u7684\u670d\u52a1\u3002<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>MarkPDFDown \u662f\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u3002\u5b83\u5229\u7528\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u628a PDF \u6587\u4ef6\u8f6c\u4e3a Markdown \u683c\u5f0f\u3002\u5f00\u53d1\u8005\u662f GitHub \u7528\u6237 jorben\u3002\u8fd9\u4e2a\u5de5\u5177\u7684\u76ee\u6807\u5f88\u7b80\u5355\uff1a\u8ba9 PDF \u6587\u6863\u53d8\u5f97\u66f4\u6613\u7f16\u8f91\u548c\u5206\u4eab\u3002\u5b83\u80fd\u8bc6\u522b\u6587\u6863\u4e2d\u7684\u6807\u9898\u3001\u5217\u8868\u3001&#8230;<\/p>\n","protected":false},"author":1,"featured_media":32782,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,499],"tags":[230,252],"class_list":["post-29173","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tool","category-document-extraction","tag-aikaiyuanxiangmu","tag-markdown"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/posts\/29173","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/comments?post=29173"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/posts\/29173\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/media\/32782"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/media?parent=29173"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/categories?post=29173"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/pt\/wp-json\/wp\/v2\/tags?post=29173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}