{"id":29307,"date":"2025-03-24T18:09:40","date_gmt":"2025-03-24T10:09:40","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=29307"},"modified":"2025-08-25T00:07:21","modified_gmt":"2025-08-24T16:07:21","slug":"pdf-craft","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/de\/pdf-craft\/","title":{"rendered":"PDF Craft\uff1aPDF\u626b\u63cf\u6587\u4ef6\u8f6cMarkdown\u7684\u5f00\u6e90\u5de5\u5177"},"content":{"rendered":"<p>PDF Craft \u662f\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\uff0c\u4e13\u4e3a\u626b\u63cf\u4e66\u7c4d\u7684PDF\u8bbe\u8ba1\uff0c\u80fd\u5c06\u5176\u8f6c\u6362\u4e3aMarkdown\u683c\u5f0f\u3002\u5b83\u7531 oomol-lab \u5f00\u53d1\uff0c\u6258\u7ba1\u5728 GitHub \u4e0a\uff0c\u9002\u5408\u559c\u6b22\u6574\u7406\u7535\u5b50\u4e66\u7684\u7528\u6237\u3002\u5de5\u5177\u901a\u8fc7\u672c\u5730AI\u6a21\u578b\u8fd0\u884c\uff0c\u65e0\u9700\u8054\u7f51\uff0c\u65e2\u4fdd\u62a4\u9690\u79c1\u53c8\u65b9\u4fbf\u64cd\u4f5c\u3002\u5b83\u80fd\u63d0\u53d6\u626b\u63cf\u6587\u6863\u4e2d\u7684\u6b63\u6587\uff0c\u53bb\u6389\u9875\u7709\u9875\u811a\u7b49\u6742\u9879\uff0c\u751f\u6210\u5e72\u51c0\u7684Markdown\u6587\u4ef6\uff0c\u7279\u522b\u9002\u5408\u6574\u7406\u65e7\u4e66\u6216\u7814\u7a76\u8d44\u6599\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-29308\" title=\"PDF Craft\uff1aPDF\u626b\u63cf\u6587\u4ef6\u8f6cMarkdown\u7684\u5f00\u6e90\u5de5\u5177-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/150cc7d1f9395b5.jpg\" alt=\"PDF Craft\uff1aPDF\u626b\u63cf\u6587\u4ef6\u8f6cMarkdown\u7684\u5f00\u6e90\u5de5\u5177-1\" width=\"1795\" height=\"1236\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/150cc7d1f9395b5.jpg 1795w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/150cc7d1f9395b5-220x150.jpg 220w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/150cc7d1f9395b5-768x529.jpg 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/150cc7d1f9395b5-1536x1058.jpg 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/03\/150cc7d1f9395b5-18x12.jpg 18w\" sizes=\"auto, (max-width: 1795px) 100vw, 1795px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li>\u5c06\u626b\u63cf\u4e66\u7c4dPDF\u8f6c\u6362\u4e3aMarkdown\u683c\u5f0f\uff0c\u652f\u6301\u672c\u5730\u5904\u7406\u3002<\/li>\n<li>\u63d0\u53d6\u6b63\u6587\u5185\u5bb9\uff0c\u81ea\u52a8\u8fc7\u6ee4\u9875\u7709\u3001\u9875\u811a\u548c\u9875\u7801\u3002<\/li>\n<li>\u5904\u7406\u8de8\u9875\u6587\u672c\uff0c\u4fdd\u6301\u53e5\u5b50\u8fde\u8d2f\u3002<\/li>\n<li>\u652f\u6301\u63d2\u56fe\u548c\u8868\u683c\u622a\u56fe\uff0c\u5d4c\u5165Markdown\u6587\u4ef6\u3002<\/li>\n<li>\u4f7f\u7528AI\u5206\u6790\u9875\u9762\u5e03\u5c40\uff0c\u6309\u9605\u8bfb\u987a\u5e8f\u6574\u7406\u6587\u672c\u3002<\/li>\n<li>\u53ef\u6269\u5c55\u4e3aEPUB\u683c\u5f0f\uff0c\u751f\u6210\u7535\u5b50\u4e66\u6587\u4ef6\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<p>PDF Craft \u4e13\u6ce8\u4e8e\u626b\u63cf\u4e66\u7c4dPDF\u8f6cMarkdown\u3002\u4e0b\u9762\u662f\u8be6\u7ec6\u7684\u5b89\u88c5\u548c\u4f7f\u7528\u6b65\u9aa4\uff0c\u5e2e\u4f60\u5feb\u901f\u4e0a\u624b\u3002<\/p>\n<h3>\u5b89\u88c5\u6d41\u7a0b<\/h3>\n<ol>\n<li><strong>\u51c6\u5907\u73af\u5883<\/strong><br \/>\n\u4f60\u9700\u8981\u4e00\u53f0\u7535\u8111\uff0c\u5b89\u88c5Python 3.8\u6216\u4ee5\u4e0a\u7248\u672c\u3002\u786e\u4fdd\u786c\u76d8\u6709\u8db3\u591f\u7a7a\u95f4\u5b58\u653eAI\u6a21\u578b\u3002<\/li>\n<li><strong>\u4e0b\u8f7d\u4ee3\u7801<\/strong><br \/>\n\u6253\u5f00\u7ec8\u7aef\uff0c\u8f93\u5165\u547d\u4ee4\u514b\u9686\u9879\u76ee\uff1a<\/li>\n<\/ol>\n<pre><code>git clone https:\/\/github.com\/oomol-lab\/pdf-craft.git\r\n<\/code><\/pre>\n<p>\u7136\u540e\u8fdb\u5165\u76ee\u5f55\uff1a<\/p>\n<pre><code>cd pdf-craft\r\n<\/code><\/pre>\n<ol start=\"3\">\n<li><strong>\u5b89\u88c5\u4f9d\u8d56<\/strong><br \/>\n\u8f93\u5165\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88c5\u6240\u9700\u5e93\uff1a<\/li>\n<\/ol>\n<pre><code>pip install -r requirements.txt\r\n<\/code><\/pre>\n<p>\u5982\u679c\u6709GPU\uff0c\u53ef\u4ee5\u52a0\u88c5CUDA\u652f\u6301\uff1a<\/p>\n<pre><code>pip install torch --extra-index-url https:\/\/download.pytorch.org\/whl\/cu117\r\n<\/code><\/pre>\n<ol start=\"4\">\n<li><strong>\u83b7\u53d6\u6a21\u578b<\/strong><br \/>\n\u9996\u6b21\u8fd0\u884c\u65f6\uff0c\u5de5\u5177\u4f1a\u81ea\u52a8\u4e0b\u8f7dAI\u6a21\u578b\uff08\u5982DocLayout-YOLO\uff09\u3002\u4fdd\u6301\u7f51\u7edc\u7545\u901a\uff0c\u6a21\u578b\u4f1a\u5b58\u5230\u00a0<code>&lt;model_dir_path&gt;<\/code>\uff08\u53ef\u5728\u4ee3\u7801\u4e2d\u8bbe\u7f6e\uff09\u3002<\/li>\n<\/ol>\n<h3>\u64cd\u4f5c\u6d41\u7a0b<\/h3>\n<h4>\u8f6c\u6362\u4e3aMarkdown<\/h4>\n<ol>\n<li><strong>\u51c6\u5907PDF<\/strong><br \/>\n\u628a\u626b\u63cf\u7684\u4e66\u7c4dPDF\u653e\u5728\u4e00\u4e2a\u6587\u4ef6\u5939\uff0c\u6bd4\u5982\u00a0<code>\/path\/to\/pdf\/book.pdf<\/code>\u3002<\/li>\n<li><strong>\u8fd0\u884c\u8f6c\u6362<\/strong><br \/>\n\u5728\u7ec8\u7aef\u8f93\u5165\u4ee5\u4e0b\u4ee3\u7801\uff1a<\/li>\n<\/ol>\n<pre><code>from pdf_craft import PDFPageExtractor, MarkDownWriter\r\nextractor = PDFPageExtractor(device=\"cpu\", model_dir_path=\"\/path\/to\/model\/dir\/path\")\r\nwith MarkDownWriter(markdown_path=\"\/path\/to\/output.md\", image_dir=\"images\", encoding=\"utf-8\") as md:\r\nfor block in extractor.extract(pdf=\"\/path\/to\/pdf\/book.pdf\"):\r\nmd.write(block)\r\n<\/code><\/pre>\n<ul>\n<li><code>device=\"cpu\"<\/code>\uff1a\u7528CPU\u8fd0\u884c\u3002\u652f\u6301GPU\u7684\u6539\u4e3a\u00a0<code>device=\"cuda:0\"<\/code>\u3002<\/li>\n<li><code>markdown_path<\/code>\uff1a\u8f93\u51faMarkdown\u6587\u4ef6\u8def\u5f84\u3002<\/li>\n<li><code>image_dir<\/code>\uff1a\u63d2\u56fe\u4fdd\u5b58\u76ee\u5f55\u3002<\/li>\n<\/ul>\n<ol start=\"3\">\n<li><strong>\u67e5\u770b\u7ed3\u679c<\/strong><br \/>\n\u5b8c\u6210\u540e\uff0c\u6253\u5f00\u00a0<code>\/path\/to\/output.md<\/code>\u00a0\u68c0\u67e5\u5185\u5bb9\u3002\u63d2\u56fe\u4f1a\u81ea\u52a8\u4fdd\u5b58\u5230\u00a0<code>images<\/code>\u00a0\u6587\u4ef6\u5939\u3002<\/li>\n<\/ol>\n<h3>\u7279\u8272\u529f\u80fd\u64cd\u4f5c<\/h3>\n<ul>\n<li><strong>\u6b63\u6587\u63d0\u53d6<\/strong><br \/>\n\u5de5\u5177\u4f1a\u8bc6\u522b\u626b\u63cf\u9875\u9762\uff0c\u5254\u9664\u9875\u7709\u9875\u811a\uff0c\u53ea\u4fdd\u7559\u6b63\u6587\u3002\u4f60\u65e0\u9700\u624b\u52a8\u6e05\u7406\u6742\u9879\u3002<\/li>\n<li><strong>\u8de8\u9875\u5904\u7406<\/strong><br \/>\n\u5982\u679c\u53e5\u5b50\u88ab\u5206\u9875\u622a\u65ad\uff0cPDF Craft \u4f1a\u81ea\u52a8\u8fde\u63a5\uff0c\u786e\u4fdd\u6587\u672c\u6d41\u7545\u3002<\/li>\n<li><strong>\u63d2\u56fe\u5d4c\u5165<\/strong><br \/>\n\u626b\u63cf\u4e66\u7c4d\u4e2d\u7684\u56fe\u7247\u6216\u8868\u683c\u4f1a\u88ab\u622a\u56fe\uff0c\u5d4c\u5165Markdown\u3002\u4f60\u53ef\u4ee5\u5728\u00a0<code>images<\/code>\u00a0\u6587\u4ef6\u5939\u627e\u5230\u5b83\u4eec\u3002<\/li>\n<\/ul>\n<h3>\u5c0f\u8d34\u58eb<\/h3>\n<ul>\n<li>PDF\u626b\u63cf\u8d28\u91cf\u8981\u6e05\u6670\uff0c\u5426\u5219\u8bc6\u522b\u53ef\u80fd\u51fa\u9519\u3002<\/li>\n<li>\u9996\u6b21\u8fd0\u884c\u4f1a\u4e0b\u8f7d\u6a21\u578b\uff0c\u4e4b\u540e\u79bb\u7ebf\u5373\u53ef\u4f7f\u7528\u3002<\/li>\n<li>\u5982\u679c\u901f\u5ea6\u6162\uff0c\u8bd5\u8bd5GPU\u52a0\u901f\u6216\u51cf\u5c11\u9875\u9762\u6570\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u5e94\u7528\u573a\u666f<\/h2>\n<ol>\n<li><strong>\u6574\u7406\u65e7\u4e66<\/strong><br \/>\n\u4f60\u6709\u626b\u63cf\u7684\u65e7\u4e66PDF\uff0c\u60f3\u8f6c\u6210Markdown\u7f16\u8f91\u3002PDF Craft \u80fd\u53bb\u6389\u6742\u4e71\u5185\u5bb9\uff0c\u751f\u6210\u5e72\u51c0\u6587\u4ef6\u3002<\/li>\n<li><strong>\u7814\u7a76\u8d44\u6599\u8f6c\u6362<\/strong><br \/>\n\u5b66\u8005\u9700\u8981\u628a\u626b\u63cf\u8bba\u6587\u8f6c\u6210Markdown\u8bb0\u7b14\u8bb0\u3002\u5de5\u5177\u80fd\u4fdd\u7559\u6b63\u6587\u548c\u63d2\u56fe\uff0c\u65b9\u4fbf\u5f15\u7528\u3002<\/li>\n<li><strong>\u7535\u5b50\u4e66\u5236\u4f5c<\/strong><br \/>\n\u4f60\u60f3\u628a\u626b\u63cfPDF\u53d8\u6210\u53ef\u7f16\u8f91\u7684Markdown\u6587\u6863\u3002PDF Craft \u63d0\u4f9b\u7b80\u5355\u65b9\u6848\u3002<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h2>QA<\/h2>\n<ol>\n<li><strong>\u53ea\u652f\u6301\u626b\u63cfPDF\u5417\uff1f<\/strong><br \/>\n\u4e3b\u8981\u662f\u4f18\u5316\u626b\u63cf\u4e66\u7c4dPDF\u3002\u666e\u901a\u6587\u5b57PDF\u4e5f\u80fd\u7528\uff0c\u4f46\u6548\u679c\u53ef\u80fd\u4e0d\u5982\u626b\u63cf\u6587\u4ef6\u3002<\/li>\n<li><strong>\u8f6c\u6362\u540e\u56fe\u7247\u600e\u4e48\u5904\u7406\uff1f<\/strong><br \/>\n\u56fe\u7247\u4f1a\u622a\u56fe\u4fdd\u5b58\u5230\u6307\u5b9a\u6587\u4ef6\u5939\uff0cMarkdown\u91cc\u81ea\u52a8\u5d4c\u5165\u94fe\u63a5\u3002<\/li>\n<li><strong>\u4e3a\u4ec0\u4e48\u9996\u6b21\u8fd0\u884c\u6162\uff1f<\/strong><br \/>\n\u56e0\u4e3a\u8981\u4e0b\u8f7dAI\u6a21\u578b\u3002\u4e4b\u540e\u4f1a\u53d8\u5feb\u3002<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>PDF Craft \u662f\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\uff0c\u4e13\u4e3a\u626b\u63cf\u4e66\u7c4d\u7684PDF\u8bbe\u8ba1\uff0c\u80fd\u5c06\u5176\u8f6c\u6362\u4e3aMarkdown\u683c\u5f0f\u3002\u5b83\u7531 oomol-lab \u5f00\u53d1\uff0c\u6258\u7ba1\u5728 GitHub \u4e0a\uff0c\u9002\u5408\u559c\u6b22\u6574\u7406\u7535\u5b50\u4e66\u7684\u7528\u6237\u3002\u5de5\u5177\u901a\u8fc7\u672c\u5730AI\u6a21\u578b\u8fd0\u884c\uff0c\u65e0\u9700\u8054\u7f51\uff0c\u65e2\u4fdd\u62a4\u9690\u79c1\u53c8\u65b9\u4fbf\u64cd\u4f5c\u3002\u5b83&#8230;<\/p>\n","protected":false},"author":1,"featured_media":32782,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,499],"tags":[230,248,252],"class_list":["post-29307","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tool","category-document-extraction","tag-aikaiyuanxiangmu","tag-ocr","tag-markdown"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/29307","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/comments?post=29307"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/29307\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media\/32782"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media?parent=29307"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/categories?post=29307"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/tags?post=29307"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}