{"id":26976,"date":"2025-02-25T20:27:50","date_gmt":"2025-02-25T12:27:50","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=26976"},"modified":"2025-08-25T00:15:31","modified_gmt":"2025-08-24T16:15:31","slug":"par_scrape","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/en\/par_scrape\/","title":{"rendered":"par_scrape\uff1a\u667a\u80fd\u63d0\u53d6\u7f51\u9875\u6570\u636e\u7684\u722c\u866b\u5de5\u5177"},"content":{"rendered":"<p>par_scrape \u662f\u4e00\u4e2a\u57fa\u4e8e Python \u7684\u5f00\u6e90\u7f51\u9875\u722c\u866b\u5de5\u5177\uff0c\u7531\u5f00\u53d1\u8005 Paul Robello \u5728 GitHub \u4e0a\u63a8\u51fa\uff0c\u65e8\u5728\u5e2e\u52a9\u7528\u6237\u4ece\u7f51\u9875\u4e2d\u667a\u80fd\u63d0\u53d6\u6570\u636e\u3002\u5b83\u6574\u5408\u4e86 Selenium \u548c Playwright \u4e24\u79cd\u5f3a\u5927\u7684\u6d4f\u89c8\u5668\u81ea\u52a8\u5316\u6280\u672f\uff0c\u5e76\u7ed3\u5408 AI \u5904\u7406\u80fd\u529b\uff0c\u652f\u6301\u4ece\u7b80\u5355\u9759\u6001\u9875\u9762\u5230\u590d\u6742\u52a8\u6001\u7f51\u7ad9\u7684\u6570\u636e\u6293\u53d6\u3002\u65e0\u8bba\u662f\u63d0\u53d6\u4ef7\u683c\u3001\u6807\u9898\u8fd8\u662f\u5176\u4ed6\u7ed3\u6784\u5316\u4fe1\u606f\uff0cpar_scrape \u90fd\u80fd\u901a\u8fc7\u6307\u5b9a\u5b57\u6bb5\u5feb\u901f\u5b8c\u6210\u4efb\u52a1\uff0c\u5e76\u5c06\u7ed3\u679c\u8f93\u51fa\u4e3a Markdown\u3001JSON \u6216 CSV \u7b49\u683c\u5f0f\u3002\u9879\u76ee\u9002\u7528\u4e8e\u5f00\u53d1\u8005\u3001\u6570\u636e\u5206\u6790\u5e08\u6216\u5e0c\u671b\u81ea\u52a8\u5316\u6536\u96c6\u7f51\u7edc\u4fe1\u606f\u7684\u7528\u6237\uff0c\u5b89\u88c5\u7b80\u5355\u4e14\u529f\u80fd\u7075\u6d3b\uff0c\u6df1\u53d7\u5f00\u6e90\u793e\u533a\u6b22\u8fce\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-26977\" title=\"par_scrape\uff1a\u667a\u80fd\u63d0\u53d6\u7f51\u9875\u6570\u636e\u7684\u722c\u866b\u5de5\u5177-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/ef2341354063567.jpg\" alt=\"par_scrape\uff1a\u667a\u80fd\u63d0\u53d6\u7f51\u9875\u6570\u636e\u7684\u722c\u866b\u5de5\u5177-1\" width=\"990\" height=\"1225\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/ef2341354063567.jpg 990w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/ef2341354063567-768x950.jpg 768w\" sizes=\"auto, (max-width: 990px) 100vw, 990px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li><strong>\u667a\u80fd\u6570\u636e\u63d0\u53d6<\/strong>\uff1a\u901a\u8fc7 AI \u6a21\u578b\uff08\u5982 OpenAI \u6216 Anthropic\uff09\u5206\u6790\u7f51\u9875\u5185\u5bb9\uff0c\u7cbe\u51c6\u63d0\u53d6\u7528\u6237\u6307\u5b9a\u7684\u5b57\u6bb5\u3002<\/li>\n<li><strong>\u53cc\u91cd\u722c\u866b\u652f\u6301<\/strong>\uff1a\u652f\u6301 Selenium \u548c Playwright \u4e24\u79cd\u6280\u672f\uff0c\u9002\u5e94\u4e0d\u540c\u7f51\u7ad9\u67b6\u6784\u9700\u6c42\u3002<\/li>\n<li><strong>\u591a\u79cd\u8f93\u51fa\u683c\u5f0f<\/strong>\uff1a\u6293\u53d6\u7ed3\u679c\u53ef\u8f93\u51fa\u4e3a Markdown\u3001JSON\u3001CSV \u6216 Excel\uff0c\u65b9\u4fbf\u540e\u7eed\u5904\u7406\u3002<\/li>\n<li><strong>\u81ea\u5b9a\u4e49\u5b57\u6bb5\u6293\u53d6<\/strong>\uff1a\u7528\u6237\u53ef\u6307\u5b9a\u63d0\u53d6\u7684\u5b57\u6bb5\uff0c\u5982\u6807\u9898\u3001\u63cf\u8ff0\u3001\u4ef7\u683c\u7b49\uff0c\u6ee1\u8db3\u4e2a\u6027\u5316\u9700\u6c42\u3002<\/li>\n<li><strong>\u5e76\u884c\u6293\u53d6<\/strong>\uff1a\u652f\u6301\u591a\u7ebf\u7a0b\u6293\u53d6\uff0c\u63d0\u5347\u5927\u89c4\u6a21\u6570\u636e\u91c7\u96c6\u6548\u7387\u3002<\/li>\n<li><strong>\u7b49\u5f85\u673a\u5236<\/strong>\uff1a\u63d0\u4f9b\u591a\u79cd\u9875\u9762\u52a0\u8f7d\u7b49\u5f85\u65b9\u5f0f\uff08\u5982\u6682\u505c\u3001\u9009\u62e9\u5668\u7b49\u5f85\uff09\uff0c\u786e\u4fdd\u52a8\u6001\u5185\u5bb9\u6293\u53d6\u6210\u529f\u3002<\/li>\n<li><strong>AI \u6a21\u578b\u9009\u62e9<\/strong>\uff1a\u652f\u6301\u591a\u79cd AI \u63d0\u4f9b\u8005\uff08\u5982 OpenAI\u3001Anthropic\u3001XAI\uff09\uff0c\u7075\u6d3b\u9002\u914d\u4e0d\u540c\u4efb\u52a1\u3002<\/li>\n<li><strong>\u7f13\u5b58\u4f18\u5316<\/strong>\uff1a\u5185\u7f6e\u63d0\u793a\u7f13\u5b58\u529f\u80fd\uff0c\u51cf\u5c11\u91cd\u590d\u8bf7\u6c42\u6210\u672c\uff0c\u63d0\u9ad8\u6548\u7387\u3002<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<h3>\u5b89\u88c5\u6d41\u7a0b<\/h3>\n<p>\u8981\u4f7f\u7528 par_scrape\uff0c\u9700\u8981\u5148\u5b8c\u6210\u4ee5\u4e0b\u5b89\u88c5\u6b65\u9aa4\uff0c\u786e\u4fdd\u73af\u5883\u51c6\u5907\u5145\u5206\u3002\u4ee5\u4e0b\u662f\u8be6\u7ec6\u7684\u5b89\u88c5\u6307\u5357\uff1a<\/p>\n<h4>1. \u73af\u5883\u51c6\u5907<\/h4>\n<ul>\n<li><strong>Python \u7248\u672c<\/strong>\uff1a\u786e\u4fdd\u7cfb\u7edf\u5df2\u5b89\u88c5 Python 3.11 \u6216\u4ee5\u4e0a\u7248\u672c\uff0c\u53ef\u901a\u8fc7\u547d\u4ee4\u00a0<code>python --version<\/code>\u00a0\u68c0\u67e5\u3002<\/li>\n<li><strong>Git \u5de5\u5177<\/strong>\uff1a\u7528\u4e8e\u4ece GitHub \u514b\u9686\u4ee3\u7801\uff0c\u82e5\u672a\u5b89\u88c5\uff0c\u53ef\u901a\u8fc7\u00a0<code>sudo apt install git<\/code>\uff08Linux\uff09\u6216\u5b98\u7f51\u4e0b\u8f7d\u5b89\u88c5\u3002<\/li>\n<li><strong>UV \u5de5\u5177<\/strong>\uff1a\u63a8\u8350\u4f7f\u7528 UV \u7ba1\u7406\u4f9d\u8d56\uff0c\u5b89\u88c5\u547d\u4ee4\u4e3a\uff1a\n<ul>\n<li>Linux\/Mac\uff1a<code>curl -LsSf https:\/\/astral.sh\/uv\/install.sh | sh<\/code><\/li>\n<li>Windows\uff1a<code>powershell -ExecutionPolicy ByPass -c \"irm https:\/\/astral.sh\/uv\/install.ps1 | iex\"<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>2. \u514b\u9686\u9879\u76ee<\/h4>\n<p>\u5728\u7ec8\u7aef\u8f93\u5165\u4ee5\u4e0b\u547d\u4ee4\uff0c\u5c06 par_scrape \u9879\u76ee\u514b\u9686\u5230\u672c\u5730\uff1a<\/p>\n<pre><code>git clone https:\/\/github.com\/paulrobello\/par_scrape.git  \r\ncd par_scrape\r\n<\/code><\/pre>\n<h4>3. \u5b89\u88c5\u4f9d\u8d56<\/h4>\n<p>\u4f7f\u7528 UV \u5b89\u88c5\u9879\u76ee\u4f9d\u8d56\uff1a<\/p>\n<pre><code>uv <a href=\"https:\/\/www.kdjingpai.com\/ja\/sync\/\">sync<\/a>\r\n<\/code><\/pre>\n<p>\u6216\u8005\u76f4\u63a5\u4ece PyPI \u5b89\u88c5\uff1a<\/p>\n<pre><code>uv tool install par_scrape  \r\n# \u6216\u4f7f\u7528 pipx  \r\npipx install par_scrape\r\n<\/code><\/pre>\n<h4>4. \u5b89\u88c5 Playwright\uff08\u53ef\u9009\uff09<\/h4>\n<p>\u82e5\u9009\u62e9 Playwright \u4f5c\u4e3a\u722c\u866b\u5de5\u5177\uff0c\u9700\u989d\u5916\u5b89\u88c5\u5e76\u914d\u7f6e\u6d4f\u89c8\u5668\uff1a<\/p>\n<pre><code>uv tool install playwright  \r\nplaywright install chromium\r\n<\/code><\/pre>\n<h4>5. \u914d\u7f6e API \u5bc6\u94a5<\/h4>\n<p>par_scrape \u652f\u6301\u591a\u79cd AI \u63d0\u4f9b\u8005\uff0c\u9700\u5728\u73af\u5883\u53d8\u91cf\u4e2d\u914d\u7f6e\u5bf9\u5e94\u5bc6\u94a5\u3002\u7f16\u8f91\u00a0<code>~\/.par_scrape.env<\/code>\u00a0\u6587\u4ef6\uff0c\u6dfb\u52a0\u4ee5\u4e0b\u5185\u5bb9\uff08\u6839\u636e\u9700\u6c42\u9009\u62e9\uff09\uff1a<\/p>\n<pre><code>OPENAI_API_KEY=your_openai_key  \r\nANTHROPIC_API_KEY=your_anthropic_key  \r\nXAI_API_KEY=your_xai_key\r\n<\/code><\/pre>\n<p>\u6216\u8005\u5728\u8fd0\u884c\u547d\u4ee4\u524d\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\uff1a<\/p>\n<pre><code>export OPENAI_API_KEY=your_openai_key\r\n<\/code><\/pre>\n<h3>\u4f7f\u7528\u65b9\u6cd5<\/h3>\n<p>\u5b89\u88c5\u5b8c\u6210\u540e\u5373\u53ef\u901a\u8fc7\u547d\u4ee4\u884c\u8fd0\u884c par_scrape\uff0c\u4ee5\u4e0b\u662f\u8be6\u7ec6\u64cd\u4f5c\u6d41\u7a0b\uff1a<\/p>\n<h4>\u57fa\u672c\u4f7f\u7528\u793a\u4f8b<\/h4>\n<p>\u5047\u8bbe\u8981\u4ece OpenAI \u5b9a\u4ef7\u9875\u9762\u63d0\u53d6\u6807\u9898\u3001\u63cf\u8ff0\u548c\u4ef7\u683c\uff1a<\/p>\n<pre><code>par_scrape --url \"https:\/\/openai.com\/api\/pricing\/\" -f \"Title\" -f \"Description\" -f \"Price\" --model gpt-4o-mini --display-output md\r\n<\/code><\/pre>\n<ul>\n<li><code>--url<\/code>\uff1a\u76ee\u6807\u7f51\u9875\u5730\u5740\u3002<\/li>\n<li><code>-f<\/code>\uff1a\u6307\u5b9a\u63d0\u53d6\u5b57\u6bb5\uff0c\u53ef\u591a\u6b21\u4f7f\u7528\u3002<\/li>\n<li><code>--model<\/code>\uff1a\u9009\u62e9 AI \u6a21\u578b\uff08\u5982 gpt-4o-mini\uff09\u3002<\/li>\n<li><code>--display-output<\/code>\uff1a\u8f93\u51fa\u683c\u5f0f\uff08md\u3001json\u3001csv \u7b49\uff09\u3002<\/li>\n<\/ul>\n<h4>\u7279\u8272\u529f\u80fd\u64cd\u4f5c<\/h4>\n<ol>\n<li><strong>\u5207\u6362\u722c\u866b\u5de5\u5177<\/strong><br \/>\n\u9ed8\u8ba4\u4f7f\u7528 Playwright\uff0c\u82e5\u9700\u4f7f\u7528 Selenium\uff0c\u53ef\u6dfb\u52a0\u53c2\u6570\uff1a<\/p>\n<pre><code>par_scrape --url \"https:\/\/example.com\" -f \"Title\" --scraper selenium\r\n<\/code><\/pre>\n<\/li>\n<li><strong>\u5e76\u884c\u6293\u53d6<\/strong><br \/>\n\u8bbe\u7f6e\u6700\u5927\u5e76\u884c\u8bf7\u6c42\u6570\uff0c\u63d0\u5347\u6548\u7387\uff1a<\/p>\n<pre><code>par_scrape --url \"https:\/\/example.com\" -f \"Data\" --scrape-max-parallel 5\r\n<\/code><\/pre>\n<\/li>\n<li><strong>\u52a8\u6001\u9875\u9762\u7b49\u5f85<\/strong><br \/>\n\u5bf9\u4e8e\u52a8\u6001\u52a0\u8f7d\u5185\u5bb9\uff0c\u53ef\u8bbe\u7f6e\u7b49\u5f85\u7c7b\u578b\u548c\u9009\u62e9\u5668\uff1a<\/p>\n<pre><code>par_scrape --url \"https:\/\/example.com\" -f \"Content\" --wait-type selector --wait-selector \".dynamic-content\"\r\n<\/code><\/pre>\n<p>\u652f\u6301\u7684\u7b49\u5f85\u7c7b\u578b\u5305\u62ec\u00a0<code>none<\/code>\u3001<code>pause<\/code>\u3001<code>sleep<\/code>\u3001<code>idle<\/code>\u3001<code>selector<\/code>\u00a0\u548c\u00a0<code>text<\/code>\u3002<\/li>\n<li><strong>\u81ea\u5b9a\u4e49\u8f93\u51fa\u8def\u5f84<\/strong><br \/>\n\u5c06\u7ed3\u679c\u4fdd\u5b58\u5230\u6307\u5b9a\u6587\u4ef6\u5939\uff1a<\/p>\n<pre><code>par_scrape --url \"https:\/\/example.com\" -f \"Title\" --output-folder .\/my_data\r\n<\/code><\/pre>\n<\/li>\n<\/ol>\n<h4>\u64cd\u4f5c\u6d41\u7a0b\u8be6\u89e3<\/h4>\n<p>\u4ee5\u6293\u53d6\u5b9a\u4ef7\u9875\u9762\u4e3a\u4f8b\uff1a<\/p>\n<ol>\n<li><strong>\u786e\u5b9a\u76ee\u6807<\/strong>\uff1a\u8bbf\u95ee https:\/\/openai.com\/api\/pricing\/\uff0c\u786e\u8ba4\u9700\u8981\u63d0\u53d6 \u201cModel\u201d\u3001\u201cPricing Input\u201d \u548c \u201cPricing Output\u201d\u3002<\/li>\n<li><strong>\u8fd0\u884c\u547d\u4ee4<\/strong>\uff1a\n<pre><code>par_scrape --url \"https:\/\/openai.com\/api\/pricing\/\" -f \"Model\" -f \"Pricing Input\" -f \"Pricing Output\" --model gpt-4o-mini --display-output json\r\n<\/code><\/pre>\n<\/li>\n<li><strong>\u67e5\u770b\u7ed3\u679c<\/strong>\uff1a\u547d\u4ee4\u6267\u884c\u540e\uff0c\u7ec8\u7aef\u663e\u793a JSON \u683c\u5f0f\u6570\u636e\uff0c\u6216\u4fdd\u5b58\u81f3\u9ed8\u8ba4\u8f93\u51fa\u6587\u4ef6\u3002<\/li>\n<li><strong>\u8c03\u6574\u53c2\u6570<\/strong>\uff1a\u82e5\u6570\u636e\u4e0d\u5b8c\u6574\uff0c\u53ef\u5c1d\u8bd5\u589e\u52a0\u00a0<code>--retries 5<\/code>\uff08\u91cd\u8bd5\u6b21\u6570\uff09\u6216\u8c03\u6574\u00a0<code>--sleep-time 5<\/code>\uff08\u7b49\u5f85\u65f6\u95f4\uff09\u3002<\/li>\n<\/ol>\n<h4>\u6ce8\u610f\u4e8b\u9879<\/h4>\n<ul>\n<li><strong>API \u5bc6\u94a5<\/strong>\uff1a\u786e\u4fdd\u5bc6\u94a5\u6709\u6548\uff0c\u5426\u5219 AI \u63d0\u53d6\u529f\u80fd\u4e0d\u53ef\u7528\u3002<\/li>\n<li><strong>\u7f51\u7ad9\u9650\u5236<\/strong>\uff1a\u67d0\u4e9b\u7f51\u7ad9\u53ef\u80fd\u6709\u53cd\u722c\u673a\u5236\uff0c\u5efa\u8bae\u4f7f\u7528\u00a0<code>--headless<\/code>\uff08\u65e0\u5934\u6a21\u5f0f\uff09\u6216\u8c03\u6574\u6293\u53d6\u9891\u7387\u3002<\/li>\n<li><strong>\u7f13\u5b58\u4f7f\u7528<\/strong>\uff1a\u82e5\u591a\u6b21\u6293\u53d6\u540c\u4e00\u9875\u9762\uff0c\u53ef\u542f\u7528\u00a0<code>--prompt-cache<\/code>\u00a0\u51cf\u5c11\u6210\u672c\u3002<\/li>\n<\/ul>\n<p>\u901a\u8fc7\u4ee5\u4e0a\u6b65\u9aa4\uff0c\u7528\u6237\u53ef\u5feb\u901f\u4e0a\u624b par_scrape\uff0c\u8f7b\u677e\u5b8c\u6210\u7f51\u9875\u6570\u636e\u63d0\u53d6\u4efb\u52a1\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>par_scrape \u662f\u4e00\u4e2a\u57fa\u4e8e Python \u7684\u5f00\u6e90\u7f51\u9875\u722c\u866b\u5de5\u5177\uff0c\u7531\u5f00\u53d1\u8005 Paul Robello \u5728 GitHub \u4e0a\u63a8\u51fa\uff0c\u65e8\u5728\u5e2e\u52a9\u7528\u6237\u4ece\u7f51\u9875\u4e2d\u667a\u80fd\u63d0\u53d6\u6570\u636e\u3002\u5b83\u6574\u5408\u4e86 Selenium \u548c Playwright \u4e24\u79cd\u5f3a\u5927\u7684\u6d4f\u89c8\u5668\u81ea\u52a8\u5316&#8230;<\/p>\n","protected":false},"author":1,"featured_media":32782,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,499],"tags":[230,252],"class_list":["post-26976","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tool","category-document-extraction","tag-aikaiyuanxiangmu","tag-markdown"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts\/26976","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/comments?post=26976"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts\/26976\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/media\/32782"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/media?parent=26976"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/categories?post=26976"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/tags?post=26976"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}