{"id":14094,"date":"2024-11-26T19:28:32","date_gmt":"2024-11-26T11:28:32","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=14094"},"modified":"2025-07-14T02:44:04","modified_gmt":"2025-07-13T18:44:04","slug":"shiyong-vespa-shixian","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/de\/shiyong-vespa-shixian\/","title":{"rendered":"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG &#8211; \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528"},"content":{"rendered":"<h2><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14103\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3.jpg\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" width=\"1140\" height=\"948\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3.jpg 1140w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-300x249.jpg 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1024x852.jpg 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-768x639.jpg 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-14x12.jpg 14w\" sizes=\"auto, (max-width: 1140px) 100vw, 1140px\" \/>\u4ecb\u7ecd<\/h2>\n<p>Thomas \u4e8e 2024 \u5e74 4 \u6708\u52a0\u5165 Vespa \u62c5\u4efb\u9ad8\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u3002\u5728\u4ed6\u4e4b\u524d\u4f5c\u4e3a AI \u987e\u95ee\u7684\u6700\u540e\u4e00\u4e2a\u4efb\u52a1\u4e2d\uff0c\u4ed6\u5b9e\u9645\u4e0a\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u4e8e Vespa \u7684\u5927\u89c4\u6a21 PDF \u96c6\u5408\u7684 <a href=\"https:\/\/www.kdjingpai.com\/de\/rag\/\">RAG<\/a> \u5e94\u7528\u3002<\/p>\n<p>PDF \u5728\u4f01\u4e1a\u4e16\u754c\u4e2d\u65e0\u5904\u4e0d\u5728\uff0c\u4ece\u4e2d\u641c\u7d22\u548c\u68c0\u7d22\u4fe1\u606f\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u5e38\u89c1\u7684\u7528\u4f8b\u3002\u6311\u6218\u5728\u4e8e\u8bb8\u591a PDF \u901a\u5e38\u5c5e\u4e8e\u4ee5\u4e0b\u4e00\u79cd\u6216\u591a\u79cd\u7c7b\u522b\uff1a<\/p>\n<ul>\n<li>\u5b83\u4eec\u662f\u626b\u63cf\u6587\u6863\uff0c\u610f\u5473\u7740\u6587\u672c\u65e0\u6cd5\u8f7b\u677e\u63d0\u53d6\uff0c\u56e0\u6b64\u5fc5\u987b\u4f7f\u7528 OCR\uff0c\u8fd9\u589e\u52a0\u4e86\u590d\u6742\u6027\u3002<\/li>\n<li>\u5b83\u4eec\u5305\u542b\u5927\u91cf\u7684\u56fe\u8868\u3001\u8868\u683c\u548c\u793a\u610f\u56fe\uff0c\u5373\u4f7f\u53ef\u4ee5\u63d0\u53d6\u6587\u672c\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e5f\u4e0d\u6613\u88ab\u68c0\u7d22\u3002<\/li>\n<li>\u5b83\u4eec\u5305\u542b\u8bb8\u591a\u56fe\u50cf\uff0c\u6709\u65f6\u5176\u4e2d\u5305\u542b\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002<\/li>\n<\/ul>\n<p>\u8bf7\u6ce8\u610f\uff0c\u672f\u8bed\u00a0<em>ColPali<\/em>\u00a0\u6709\u4e24\u4e2a\u542b\u4e49\uff1a<\/p>\n<ol>\n<li>\u4e00\u4e2a\u7279\u5b9a\u7684\u00a0<a href=\"https:\/\/huggingface.co\/vidore\/colpali-v1.2\">\u6a21\u578b<\/a>\u00a0\uff0c\u4ee5\u53ca\u4e00\u4e2a\u76f8\u5173\u7684\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2407.01449\">\u8bba\u6587<\/a>\u00a0\uff0c\u5b83\u5728 VLM (PaliGemma) \u4e4b\u4e0a\u8bad\u7ec3\u4e00\u4e2a LoRa-adapter\uff0c\u4ee5\u751f\u6210\u7528\u4e8e\u201c\u540e\u671f\u4ea4\u4e92\u201d\u7684\u6587\u672c\u548c\u56fe\u50cf\u8054\u5408\u5d4c\u5165\uff08\u56fe\u50cf\u4e2d\u6bcf\u4e2a patch \u4e00\u4e2a\u5d4c\u5165\uff09\uff0c\u57fa\u4e8e\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2112.01488\">ColBERT<\/a>\u00a0\u65b9\u6cd5\u6269\u5c55\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002<\/li>\n<li>\u5b83\u8fd8\u4ee3\u8868\u4e86\u4e00\u79cd\u89c6\u89c9\u6587\u6863\u68c0\u7d22\u7684\u00a0<em>\u65b9\u5411<\/em>\u00a0\uff0c\u7ed3\u5408 VLM \u7684\u80fd\u529b\u4e0e\u9ad8\u6548\u7684\u540e\u671f\u4ea4\u4e92\u673a\u5236\u3002\u8fd9\u79cd\u65b9\u5411\u4e0d\u9650\u4e8e\u539f\u8bba\u6587\u4e2d\u7684\u7279\u5b9a\u6a21\u578b\uff0c\u8fd8\u53ef\u4ee5\u5e94\u7528\u4e8e\u5176\u4ed6 VLM\uff0c\u4f8b\u5982\u6211\u4eec\u5173\u4e8e\u4f7f\u7528 ColQwen2 \u548c Vespa \u7684\u00a0<a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/examples\/pdf-retrieval-with-ColQwen2-vlm_Vespa-cloud.html\">notebook<\/a>\u00a0\u3002<\/li>\n<\/ol>\n<p>\u5728\u8fd9\u7bc7\u535a\u5ba2\u6587\u7ae0\u4e2d\uff0c\u6211\u4eec\u5c06\u6df1\u5165\u63a2\u8ba8\u5982\u4f55\u4f7f\u7528 ColPali \u5d4c\u5165\u5728 Vespa \u4e0a\u6784\u5efa\u4e00\u4e2a\u5c55\u793a\u89c6\u89c9 RAG \u7684\u5b9e\u65f6\u6f14\u793a\u5e94\u7528\u3002\u6211\u4eec\u5c06\u63cf\u8ff0\u5e94\u7528\u7684\u67b6\u6784\u3001\u7528\u6237\u4f53\u9a8c\u4ee5\u53ca\u6784\u5efa\u5e94\u7528\u6240\u4f7f\u7528\u7684\u6280\u672f\u6808\u3002<\/p>\n<p>\u4ee5\u4e0b\u662f\u6f14\u793a\u5e94\u7528\u7684\u4e00\u4e9b\u622a\u56fe\uff1a<\/p>\n<p>\u7b2c\u4e00\u4e2a\u793a\u4f8b\u5e76\u4e0d\u662f\u4e00\u4e2a\u5e38\u89c1\u7684\u67e5\u8be2\uff0c\u4f46\u5b83\u5c55\u793a\u4e86\u89c6\u89c9\u68c0\u7d22\u5728\u67d0\u4e9b\u7c7b\u578b\u67e5\u8be2\u4e2d\u7684\u5f3a\u5927\u529f\u80fd\u3002\u8fd9\u5f88\u597d\u5730\u4f53\u73b0\u4e86\u201c\u6240\u89c1\u5373\u6240\u641c (WYSIWYS)\u201d\u8303\u5f0f\u3002<\/p>\n<p>\u76f8\u4f3c\u6027\u6620\u5c04\u9ad8\u4eae\u4e86\u6700\u76f8\u4f3c\u7684\u90e8\u5206\uff0c\u4f7f\u7528\u6237\u53ef\u4ee5\u8f7b\u677e\u770b\u51fa\u9875\u9762\u7684\u54ea\u4e9b\u90e8\u5206\u4e0e\u67e5\u8be2\u6700\u4e3a\u76f8\u5173\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14105\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1.jpg\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" width=\"1140\" height=\"948\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1.jpg 1140w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1-300x249.jpg 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1-1024x852.jpg 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1-768x639.jpg 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/31388a1eedb64d3-1-14x12.jpg 14w\" sizes=\"auto, (max-width: 1140px) 100vw, 1140px\" \/>\u7b2c\u4e8c\u4e2a\u793a\u4f8b\u662f\u4e00\u4e2a\u66f4\u5e38\u89c1\u7684\u7528\u6237\u67e5\u8be2\uff0c\u5c55\u793a\u4e86 ColPali \u5728\u8bed\u4e49\u76f8\u4f3c\u6027\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\u3002\u00a0 <img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14100\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-2\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/79089fbaa5e3d8b.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-2\" width=\"1140\" height=\"948\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/79089fbaa5e3d8b.png 1140w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/79089fbaa5e3d8b-300x249.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/79089fbaa5e3d8b-1024x852.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/79089fbaa5e3d8b-768x639.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/79089fbaa5e3d8b-14x12.png 14w\" sizes=\"auto, (max-width: 1140px) 100vw, 1140px\" \/><\/p>\n<p>\u4eb2\u8eab\u7ecf\u5386\u8fc7\u8ba9 PDF \u53ef\u68c0\u7d22\u7684\u56f0\u96be\u540e\uff0cThomas \u5bf9\u89c6\u89c9\u8bed\u8a00\u6a21\u578b (VLM) \u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\u5c24\u4e3a\u611f\u5174\u8da3\u3002<\/p>\n<p>\u5728\u9605\u8bfb\u4e86\u4e4b\u524d\u5173\u4e8e ColPali \u7684\u00a0<a href=\"https:\/\/blog.vespa.ai\/the-rise-of-vision-driven-document-retrieval-for-rag\/\">Vespa \u535a\u5ba2\u6587\u7ae0<\/a>\u00a0\u4ee5\u53ca\u4e0e\u00a0<a href=\"https:\/\/x.com\/jobergum\">Jo Bergum<\/a>\u00a0\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u6df1\u5165\u8ba8\u8bba\u540e\uff0c\u4ed6\u53d7\u5230\u4e86\u542f\u53d1\uff0c\u63d0\u51fa\u4e86\u4e00\u4e2a\u4f7f\u7528 Vespa \u6784\u5efa\u89c6\u89c9 RAG \u5e94\u7528\u7684\u9879\u76ee\u3002<\/p>\n<p>\u5728 Vespa\uff0c\u5458\u5de5\u6709\u673a\u4f1a\u5728\u6bcf\u4e2a\u8fed\u4ee3\u5468\u671f\u4e2d\u63d0\u51fa\u4ed6\u4eec\u60f3\u8981\u5f00\u5c55\u7684\u5de5\u4f5c\u8ba1\u5212\u3002\u53ea\u8981\u5efa\u8bae\u7684\u5de5\u4f5c\u4e0e\u516c\u53f8\u7684\u76ee\u6807\u4e00\u81f4\uff0c\u5e76\u4e14\u6ca1\u6709\u5176\u4ed6\u7d27\u6025\u4f18\u5148\u4e8b\u9879\uff0c\u6211\u4eec\u5c31\u53ef\u4ee5\u5f00\u59cb\u5b9e\u65bd\u3002\u5bf9\u4e8e\u6765\u81ea\u54a8\u8be2\u884c\u4e1a\u7684 Thomas \u6765\u8bf4\uff0c\u8fd9\u79cd\u81ea\u4e3b\u6027\u65e0\u7591\u662f\u4e00\u80a1\u6e05\u65b0\u7684\u7a7a\u6c14\u3002<\/p>\n<h3>TL;DR<\/h3>\n<p>\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u00a0<a href=\"https:\/\/huggingface.co\/spaces\/vespa-engine\/colpali-vespa-visual-retrieval\">\u5b9e\u65f6\u6f14\u793a\u5e94\u7528\u7a0b\u5e8f<\/a>\uff0c\u5c55\u793a\u5982\u4f55\u4f7f\u7528 Vespa \u4e2d\u7684 ColPali \u5d4c\u5165\u548c Python \u4ec5\u501f\u52a9 FastHTML \u5b9e\u73b0\u57fa\u4e8e PDF \u7684 Visual RAG\u3002<\/p>\n<p>\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u590d\u73b0\u4ee3\u7801\uff1a<\/p>\n<ol>\n<li>\u4e00\u4e2a\u53ef\u4ee5\u8fd0\u884c\u7684\u00a0<a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/examples\/visual_pdf_rag_with_vespa_colpali_cloud.html\">notebook<\/a>\uff0c\u7528\u4e8e\u8bbe\u7f6e\u60a8\u81ea\u5df1\u7684 Vespa \u5e94\u7528\u7a0b\u5e8f\u4ee5\u5b9e\u73b0 Visual RAG\u3002<\/li>\n<li><a href=\"https:\/\/github.com\/vespa-engine\/sample-apps\/tree\/master\/visual-retrieval-colpali\">FastHTML \u5e94\u7528<\/a>\u00a0\u7684\u4ee3\u7801\uff0c\u60a8\u53ef\u4ee5\u7528\u6765\u8bbe\u7f6e\u4e00\u4e2a\u4e0e Vespa \u5e94\u7528\u7a0b\u5e8f\u4ea4\u4e92\u7684 Web \u5e94\u7528\u3002<\/li>\n<\/ol>\n<h2>\u9879\u76ee\u76ee\u6807<\/h2>\n<p>\u8be5\u9879\u76ee\u6709\u4e24\u4e2a\u4e3b\u8981\u76ee\u6807\uff1a<\/p>\n<h3>1. \u6784\u5efa\u4e00\u4e2a\u5b9e\u65f6\u6f14\u793a<\/h3>\n<p>\u867d\u7136\u5f00\u53d1\u8005\u53ef\u80fd\u4f1a\u5bf9\u4ee5\u7ec8\u7aef JSON \u8f93\u51fa\u4f5c\u4e3a UI \u7684\u6f14\u793a\u611f\u5230\u6ee1\u610f\uff0c\u4f46\u4e8b\u5b9e\u4e0a\uff0c\u5927\u591a\u6570\u4eba\u66f4\u503e\u5411\u4e8e\u4e00\u4e2a\u7f51\u9875\u754c\u9762\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14104\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/b63b7a0368fb6ee.jpg\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" width=\"500\" height=\"500\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/b63b7a0368fb6ee.jpg 500w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/b63b7a0368fb6ee-300x300.jpg 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/b63b7a0368fb6ee-12x12.jpg 12w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/>\u8fd9\u5c06\u4f7f\u6211\u4eec\u80fd\u591f\u5c55\u793a\u5728 Vespa \u4e2d\u57fa\u4e8e ColPali \u5d4c\u5165\u7684 PDF Visual RAG\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5728\u6cd5\u5f8b\u3001\u91d1\u878d\u3001\u5efa\u7b51\u3001\u5b66\u672f\u548c\u533b\u7597\u7b49\u4f17\u591a\u9886\u57df\u548c\u7528\u4f8b\u4e2d\u90fd\u5177\u6709\u76f8\u5173\u6027\u3002<\/p>\n<p>\u6211\u4eec\u6709\u4fe1\u5fc3\u8fd9\u5c06\u5728\u672a\u6765\u975e\u5e38\u91cd\u8981\uff0c\u4f46\u76ee\u524d\u5c1a\u672a\u89c1\u5230\u6709\u4efb\u4f55\u5b9e\u9645\u5e94\u7528\u80fd\u591f\u5c55\u793a\u8fd9\u4e00\u70b9\u3002<\/p>\n<p>\u540c\u65f6\uff0c\u8fd9\u4e5f\u4e3a\u6211\u4eec\u5728\u6548\u7387\u3001\u53ef\u6269\u5c55\u6027\u548c\u7528\u6237\u4f53\u9a8c\u65b9\u9762\u63d0\u4f9b\u4e86\u8bb8\u591a\u5b9d\u8d35\u89c1\u89e3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4e5f\u975e\u5e38\u597d\u5947\uff08\u6216\u8005\u8bf4\u6709\u4e9b\u7d27\u5f20\uff09\uff0c\u60f3\u77e5\u9053\u5b83\u662f\u5426\u8db3\u591f\u5feb\u4ee5\u63d0\u4f9b\u826f\u597d\u7684\u7528\u6237\u4f53\u9a8c\u3002<\/p>\n<p>\u6211\u4eec\u8fd8\u5e0c\u671b\u7a81\u51fa\u4e00\u4e9b Vespa \u7684\u6709\u7528\u529f\u80fd\uff0c\u4f8b\u5982\uff1a<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.vespa.ai\/en\/phased-ranking.html\">\u5206\u9636\u6bb5\u6392\u5e8f<\/a><\/li>\n<li>\u5173\u952e\u8bcd\u8054\u60f3\u5efa\u8bae<\/li>\n<li>\u591a\u5411\u91cf MaxSim \u8ba1\u7b97<\/li>\n<\/ul>\n<h3>2. \u521b\u5efa\u4e00\u4e2a\u5f00\u6e90\u6a21\u677f<\/h3>\n<p>\u6211\u4eec\u5e0c\u671b\u63d0\u4f9b\u4e00\u4e2a\u6a21\u677f\uff0c\u4f9b\u4ed6\u4eba\u6784\u5efa\u81ea\u5df1\u7684 Visual RAG \u5e94\u7528\u7a0b\u5e8f\u3002<\/p>\n<p>\u8fd9\u4e2a\u6a21\u677f\u5e94\u8be5\u5bf9\u4ed6\u4eba\u6765\u8bf4\u8db3\u591f<strong>\u7b80\u5355<\/strong>\uff0c\u65e0\u9700\u638c\u63e1\u5927\u91cf\u7279\u5b9a\u7684\u7f16\u7a0b\u8bed\u8a00\u6216\u6846\u67b6\u3002<\/p>\n<h2>\u521b\u5efa\u6570\u636e\u96c6<\/h2>\n<p>\u5728\u6211\u4eec\u7684\u6f14\u793a\u4e2d\uff0c\u6211\u4eec\u5e0c\u671b\u4f7f\u7528\u4e00\u4e2a PDF \u6587\u6863\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5927\u91cf\u4ee5\u56fe\u50cf\u3001\u8868\u683c\u548c\u56fe\u8868\u5f62\u5f0f\u5448\u73b0\u7684\u91cd\u8981\u4fe1\u606f\u3002\u6211\u4eec\u8fd8\u9700\u8981\u4e00\u4e2a\u89c4\u6a21\u8db3\u591f\u5927\u7684\u6570\u636e\u96c6\uff0c\u4ee5\u8bc1\u660e\u76f4\u63a5\u5c06\u6240\u6709\u56fe\u50cf\u4e0a\u4f20\u5230 VLM\uff08\u8df3\u8fc7\u68c0\u7d22\u6b65\u9aa4\uff09\u662f\u4e0d\u53ef\u884c\u7684\u3002<\/p>\n<p>\u4f7f\u7528\u00a0<code>gemini-1.5-flash-8b<\/code>\uff0c\u5f53\u524d\u7684\u6700\u5927\u8f93\u5165\u56fe\u50cf\u6570\u4e3a 3600\u3002<\/p>\n<p>\u7531\u4e8e\u6ca1\u6709\u7b26\u5408\u6211\u4eec\u9700\u6c42\u7684\u516c\u5171\u6570\u636e\u96c6\uff0c\u6211\u4eec\u51b3\u5b9a\u521b\u5efa\u81ea\u5df1\u7684\u6570\u636e\u96c6\u3002<\/p>\n<p>\u4f5c\u4e3a\u81ea\u8c6a\u7684\u632a\u5a01\u4eba\uff0c\u6211\u4eec\u5f88\u9ad8\u5174\u53d1\u73b0\u632a\u5a01\u653f\u5e9c\u5168\u7403\u517b\u8001\u57fa\u91d1\uff08GPFG\uff0c\u4e5f\u79f0\u4e3a\u77f3\u6cb9\u57fa\u91d1\uff09\u81ea 2000 \u5e74\u4ee5\u6765\u5df2\u5728\u5176\u7f51\u7ad9\u4e0a\u53d1\u5e03\u5e74\u5ea6\u62a5\u544a\u548c\u6cbb\u7406\u6587\u4ef6\u3002\u7f51\u7ad9\u4e0a\u672a\u63d0\u53ca\u7248\u6743\uff0c\u5e76\u4e14\u5176\u6700\u8fd1\u7684<a href=\"https:\/\/www.nbim.no\/en\/the-fund\/news-list\/2024\/the-worlds-most-transparent-fund\/\">\u58f0\u660e<\/a>\u8868\u660e\u5176\u662f\u4e16\u754c\u4e0a\u6700\u900f\u660e\u7684\u57fa\u91d1\uff0c\u56e0\u6b64\u6211\u4eec\u786e\u4fe1\u53ef\u4ee5\u4f7f\u7528\u8fd9\u4e9b\u6570\u636e\u8fdb\u884c\u6f14\u793a\u3002<\/p>\n<p>\u6570\u636e\u96c6\u5305\u62ec\u4ece 2000 \u5e74\u5230 2024 \u5e74\u7684 116 \u4efd\u4e0d\u540c PDF \u62a5\u544a\uff0c\u5171\u8ba1 6992 \u9875\u3002<\/p>\n<p>\u6570\u636e\u96c6\u5305\u62ec\u56fe\u50cf\u3001\u6587\u672c\u3001URL\u3001\u9875\u7801\u3001\u751f\u6210\u7684\u95ee\u9898\u3001\u67e5\u8be2\u4ee5\u53ca ColPali \u5d4c\u5165\uff0c\u73b0\u5df2\u53d1\u5e03\u5728\u00a0<a href=\"https:\/\/huggingface.co\/datasets\/vespa-engine\/gpfg-QA\/\">\u8fd9\u91cc<\/a>\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14095\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-4\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-4\" width=\"1592\" height=\"886\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1.png 1592w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1-300x167.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1-1024x570.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1-768x427.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1-1536x855.png 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/6715b79b8487da1-18x10.png 18w\" sizes=\"auto, (max-width: 1592px) 100vw, 1592px\" \/><\/p>\n<h3>\u751f\u6210\u5408\u6210\u67e5\u8be2\u548c\u95ee\u9898<\/h3>\n<p>\u6211\u4eec\u8fd8\u4e3a\u6bcf\u9875\u751f\u6210\u4e86\u5408\u6210\u67e5\u8be2\u548c\u95ee\u9898\u3002\u8fd9\u4e9b\u53ef\u4ee5\u7528\u4e8e\u4ee5\u4e0b\u4e24\u4e2a\u76ee\u7684\uff1a<\/p>\n<ol>\n<li>\u5728\u7528\u6237\u8f93\u5165\u65f6\uff0c\u4e3a\u641c\u7d22\u6846\u63d0\u4f9b\u5173\u952e\u8bcd\u8054\u60f3\u5efa\u8bae\u3002<\/li>\n<li>\u7528\u4e8e\u8bc4\u4f30\u76ee\u7684\u3002<\/li>\n<\/ol>\n<p>\u6211\u4eec\u751f\u6210\u95ee\u9898\u548c\u67e5\u8be2\u6240\u4f7f\u7528\u7684\u63d0\u793a\u6765\u81ea\u00a0<a href=\"https:\/\/danielvanstrien.xyz\/posts\/post-with-code\/colpali\/2024-09-23-generate_colpali_dataset.html#an-update-retrieval-focused-prompt\">Daniel van Strien \u7684\u8fd9\u7bc7\u7cbe\u5f69\u535a\u6587<\/a>\u3002<\/p>\n<pre><code>\u60a8\u662f\u4e00\u540d\u6295\u8d44\u8005\u3001\u80a1\u7968\u5206\u6790\u5e08\u548c\u91d1\u878d\u4e13\u5bb6\u3002\u63a5\u4e0b\u6765\u60a8\u5c06\u770b\u5230\u632a\u5a01\u653f\u5e9c\u5168\u7403\u517b\u8001\u57fa\u91d1\uff08GPFG\uff09\u53d1\u5e03\u7684\u62a5\u544a\u9875\u9762\u56fe\u50cf\u3002\u8be5\u62a5\u544a\u53ef\u80fd\u662f\u5e74\u5ea6\u6216\u5b63\u5ea6\u62a5\u544a\uff0c\u6216\u5173\u4e8e\u8d23\u4efb\u6295\u8d44\u3001\u98ce\u9669\u7b49\u4e3b\u9898\u7684\u653f\u7b56\u62a5\u544a\u3002\r\n\u60a8\u7684\u4efb\u52a1\u662f\u751f\u6210\u68c0\u7d22\u67e5\u8be2\u548c\u95ee\u9898\uff0c\u8fd9\u4e9b\u67e5\u8be2\u548c\u95ee\u9898\u53ef\u4ee5\u7528\u4e8e\u5728\u5927\u578b\u6587\u6863\u5e93\u4e2d\u68c0\u7d22\u6b64\u6587\u6863\uff08\u6216\u57fa\u4e8e\u8be5\u6587\u6863\u63d0\u51fa\u95ee\u9898\uff09\u3002\r\n\u8bf7\u751f\u6210\u4e09\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u67e5\u8be2\u548c\u95ee\u9898\u3002\r\n\u68c0\u7d22\u67e5\u8be2\u662f\u57fa\u4e8e\u5173\u952e\u8bcd\u7684\u67e5\u8be2\uff0c\u7531 2-5 \u4e2a\u5355\u8bcd\u7ec4\u6210\uff0c\u7528\u4e8e\u5728\u641c\u7d22\u5f15\u64ce\u4e2d\u627e\u5230\u8be5\u6587\u6863\u3002\r\n\u95ee\u9898\u662f\u81ea\u7136\u8bed\u8a00\u95ee\u9898\uff0c\u6587\u6863\u4e2d\u5305\u542b\u8be5\u95ee\u9898\u7684\u7b54\u6848\u3002\r\n\u67e5\u8be2\u7c7b\u578b\u5982\u4e0b\uff1a\r\n1. \u5e7f\u6cdb\u4e3b\u9898\u67e5\u8be2\uff1a\u8986\u76d6\u6587\u6863\u7684\u4e3b\u8981\u4e3b\u9898\u3002\r\n2. \u5177\u4f53\u7ec6\u8282\u67e5\u8be2\uff1a\u6db5\u76d6\u6587\u6863\u7684\u67d0\u4e2a\u5177\u4f53\u7ec6\u8282\u6216\u65b9\u9762\u3002\r\n3. \u53ef\u89c6\u5143\u7d20\u67e5\u8be2\uff1a\u6db5\u76d6\u6587\u6863\u4e2d\u7684\u67d0\u4e2a\u53ef\u89c6\u5143\u7d20\uff0c\u4f8b\u5982\u56fe\u8868\u3001\u56fe\u5f62\u6216\u56fe\u50cf\u3002\r\n\u91cd\u8981\u6307\u5357\uff1a\r\n- \u786e\u4fdd\u67e5\u8be2\u4e0e\u68c0\u7d22\u4efb\u52a1\u76f8\u5173\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u63cf\u8ff0\u9875\u9762\u5185\u5bb9\u3002\r\n- \u4f7f\u7528\u57fa\u4e8e\u4e8b\u5b9e\u7684\u81ea\u7136\u8bed\u8a00\u98ce\u683c\u6765\u4e66\u5199\u95ee\u9898\u3002\r\n- \u8bbe\u8ba1\u67e5\u8be2\u65f6\uff0c\u4ee5\u6709\u4eba\u5728\u5927\u578b\u6587\u6863\u5e93\u4e2d\u641c\u7d22\u6b64\u6587\u6863\u4e3a\u524d\u63d0\u3002\r\n- \u67e5\u8be2\u5e94\u591a\u6837\u5316\uff0c\u4ee3\u8868\u4e0d\u540c\u7684\u641c\u7d22\u7b56\u7565\u3002\r\n\u5c06\u60a8\u7684\u56de\u7b54\u683c\u5f0f\u5316\u4e3a\u5982\u4e0b\u7ed3\u6784\u7684 JSON \u5bf9\u8c61\uff1a\r\n{\r\n\"broad_topical_question\": \"2019 \u5e74\u7684\u8d23\u4efb\u6295\u8d44\u653f\u7b56\u662f\u4ec0\u4e48\uff1f\",\r\n\"broad_topical_query\": \"2019 \u8d23\u4efb\u6295\u8d44\u653f\u7b56\",\r\n\"specific_detail_question\": \"\u53ef\u518d\u751f\u80fd\u6e90\u7684\u6295\u8d44\u6bd4\u4f8b\u662f\u591a\u5c11\uff1f\",\r\n\"specific_detail_query\": \"\u53ef\u518d\u751f\u80fd\u6e90\u6295\u8d44\u6bd4\u4f8b\",\r\n\"visual_element_question\": \"\u603b\u6301\u6709\u4ef7\u503c\u7684\u65f6\u95f4\u8d8b\u52bf\u5982\u4f55\uff1f\",\r\n\"visual_element_query\": \"\u603b\u6301\u6709\u4ef7\u503c\u8d8b\u52bf\"\r\n}\r\n\u5982\u679c\u6ca1\u6709\u76f8\u5173\u7684\u53ef\u89c6\u5143\u7d20\uff0c\u8bf7\u5728\u53ef\u89c6\u5143\u7d20\u95ee\u9898\u548c\u67e5\u8be2\u4e2d\u63d0\u4f9b\u7a7a\u5b57\u7b26\u4e32\u3002\r\n\u4ee5\u4e0b\u662f\u9700\u8981\u5206\u6790\u7684\u6587\u6863\u56fe\u50cf\uff1a\r\n\u8bf7\u57fa\u4e8e\u6b64\u56fe\u50cf\u751f\u6210\u67e5\u8be2\uff0c\u5e76\u4ee5\u6307\u5b9a\u7684 JSON \u683c\u5f0f\u63d0\u4f9b\u54cd\u5e94\u3002\r\n\u53ea\u8fd4\u56de JSON\uff0c\u4e0d\u8fd4\u56de\u4efb\u4f55\u989d\u5916\u8bf4\u660e\u6587\u672c\u3002\r\n<\/code><\/pre>\n<p>\u6211\u4eec\u4f7f\u7528\u00a0<code>gemini-1.5-flash-8b<\/code>\u00a0\u751f\u6210\u95ee\u9898\u548c\u67e5\u8be2\u3002<\/p>\n<p><strong>\u6ce8\u610f<\/strong><\/p>\n<p>\u5728\u7b2c\u4e00\u6b21\u8fd0\u884c\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\u751f\u6210\u4e86\u4e00\u4e9b\u975e\u5e38\u957f\u7684\u95ee\u9898\uff0c\u56e0\u6b64\u6211\u4eec\u5728\u00a0<a href=\"https:\/\/ai.google.dev\/api\/generate-content#generationconfig\">generationconfig<\/a>\u00a0\u4e2d\u6dfb\u52a0\u4e86\u00a0<code>maxOutputTokens=500<\/code>\uff0c\u8fd9\u975e\u5e38\u6709\u5e2e\u52a9\u3002<\/p>\n<p>\u6211\u4eec\u8fd8\u6ce8\u610f\u5230\u751f\u6210\u7684\u95ee\u9898\u548c\u67e5\u8be2\u4e2d\u6709\u4e00\u4e9b\u5947\u602a\u7684\u5185\u5bb9\uff0c\u4f8b\u5982\u201cstring\u201d\u591a\u6b21\u51fa\u73b0\u5728\u95ee\u9898\u4e2d\u3002\u6211\u4eec\u786e\u5b9e\u5e0c\u671b\u5bf9\u751f\u6210\u7684\u95ee\u9898\u548c\u67e5\u8be2\u8fdb\u884c\u66f4\u6df1\u5165\u7684\u9a8c\u8bc1\u3002<\/p>\n<h2>\u5168\u7a0b\u4f7f\u7528 Python<\/h2>\n<p>\u6211\u4eec\u7684\u76ee\u6807\u7528\u6237\u662f\u4e0d\u65ad\u58ee\u5927\u7684\u6570\u636e\u79d1\u5b66\u548c AI \u793e\u533a\u3002\u8fd9\u4e00\u7fa4\u4f53\u5f88\u53ef\u80fd\u662f Python \u5728 GitHub\u00a0<a href=\"https:\/\/github.blog\/news-insights\/octoverse\/octoverse-2024\/\">Octoverse \u72b6\u6001\u62a5\u544a<\/a>\u4e2d\u88ab\u5217\u4e3a\u6700\u53d7\u6b22\u8fce\uff08\u4e14\u589e\u957f\u6700\u5feb\uff09\u7684\u7f16\u7a0b\u8bed\u8a00\u7684\u4e3b\u8981\u539f\u56e0\u4e4b\u4e00\u3002<\/p>\n<p>\u6211\u4eec\u9700\u8981\u5728\u540e\u7aef\u4f7f\u7528 Python \u8fdb\u884c\u67e5\u8be2\u5d4c\u5165\u63a8\u7406\uff08\u4f7f\u7528\u00a0<a href=\"https:\/\/github.com\/illuin-tech\/colpali\">colpali-engine<\/a>-\u5e93\uff09\uff0c\u76f4\u5230 Vespa \u539f\u751f\u652f\u6301\u00a0<code>ColpaliEmbedder<\/code>\u00a0\uff08\u6b63\u5728\u5f00\u53d1\u4e2d\uff0c\u8be6\u89c1\u00a0<a href=\"https:\/\/github.com\/vespa-engine\/vespa\/issues\/32389\">github issue<\/a>\uff09\u3002\u5982\u679c\u524d\u7aef\u91c7\u7528\u5176\u4ed6\u8bed\u8a00\uff08\u53ca\u5176\u6846\u67b6\uff09\uff0c\u4f1a\u589e\u52a0\u9879\u76ee\u590d\u6742\u6027\uff0c\u4ece\u800c\u4f7f\u4ed6\u4eba\u66f4\u96be\u590d\u73b0\u8be5\u5e94\u7528\u3002<\/p>\n<p>\u56e0\u6b64\uff0c\u6211\u4eec\u51b3\u5b9a\u7528 Python \u6784\u5efa\u6574\u4e2a\u5e94\u7528\u3002<\/p>\n<h3>\u524d\u7aef\u6846\u67b6\u7684\u9009\u62e9<\/h3>\n<h4>Streamlit \u548c Gradio<\/h4>\n<p>\u6211\u4eec\u627f\u8ba4\uff0c\u4f7f\u7528 Gradio \u548c Streamlit \u6784\u5efa\u7b80\u5355\u7684 PoC\uff08\u6982\u5ff5\u9a8c\u8bc1\uff09\u975e\u5e38\u5bb9\u6613\uff0c\u6211\u4eec\u8fc7\u53bb\u4e5f\u51fa\u4e8e\u8fd9\u4e2a\u76ee\u7684\u4f7f\u7528\u8fc7\u5b83\u4eec\u3002\u4f46\u6709\u4e24\u4e2a\u4e3b\u8981\u539f\u56e0\u8ba9\u6211\u4eec\u51b3\u5b9a\u4e0d\u9009\u5b83\u4eec\uff1a<\/p>\n<ol>\n<li>\u6211\u4eec\u9700\u8981\u4e00\u4e2a\u53ef\u4ee5\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u4f7f\u7528\u7684\u4e13\u4e1a\u5916\u89c2 UI\u3002<\/li>\n<li>\u6211\u4eec\u9700\u8981\u826f\u597d\u7684\u6027\u80fd\u3002\u7b49\u5f85\u51e0\u79d2\u949f\u6216 UI \u95f4\u6b47\u6027\u51bb\u7ed3\uff0c\u5bf9\u6211\u4eec\u60f3\u8981\u5c55\u793a\u7684\u5e94\u7528\u662f\u4e0d\u591f\u7684\u3002<\/li>\n<\/ol>\n<p>\u867d\u7136\u6211\u4eec\u559c\u6b22\u953b\u70bc\u8eab\u4f53\uff0c\u4f46\u6211\u4eec\u4e0d\u559c\u6b22 Streamlit \u5c4f\u5e55\u53f3\u4e0a\u89d2\u7684\u201c\u8fd0\u884c\u4e2d\u201d\u6d88\u606f\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14098\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-4\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/f663f54deb55bb1.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-4\" width=\"220\" height=\"40\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/f663f54deb55bb1.png 220w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/f663f54deb55bb1-18x3.png 18w\" sizes=\"auto, (max-width: 220px) 100vw, 220px\" \/><\/p>\n<h4>FastHTML \u7684\u6551\u63f4<\/h4>\n<p>\u6211\u4eec\u662f\u00a0<a href=\"https:\/\/www.answer.ai\/\">answer.ai<\/a>\u00a0\u7684\u5fe0\u5b9e\u7c89\u4e1d\u3002\u56e0\u6b64\uff0c\u5f53\u4ed6\u4eec\u5728\u4eca\u5e74\u65e9\u4e9b\u65f6\u5019\u53d1\u5e03\u00a0<a href=\"https:\/\/fastht.ml\/\">FastHTML<\/a><a href=\"https:\/\/blog.vespa.ai\/visual-rag-in-practice\/#fn:7\">3<\/a>\u65f6\uff0c\u6211\u4eec\u5f88\u9ad8\u5174\u5c1d\u8bd5\u4e00\u4e0b\u3002<\/p>\n<p>FastHTML \u662f\u4e00\u4e2a\u4f7f\u7528\u7eaf Python \u6784\u5efa\u73b0\u4ee3 Web \u5e94\u7528\u7684\u6846\u67b6\u3002\u6839\u636e\u5176\u00a0<a href=\"https:\/\/about.fastht.ml\/vision\">\u613f\u666f<\/a>\uff1a<\/p>\n<p>FastHTML \u662f\u4e00\u4e2a\u901a\u7528\u7684\u5168\u6808 Web \u7f16\u7a0b\u7cfb\u7edf\uff0c\u4e0e Django\u3001NextJS \u548c Ruby on Rails \u5c5e\u4e8e\u540c\u4e00\u7c7b\u578b\u3002\u5176\u613f\u666f\u662f\u6210\u4e3a\u521b\u5efa\u5feb\u901f\u539f\u578b\u7684\u6700\u7b80\u5355\u65b9\u5f0f\uff0c\u540c\u65f6\u4e5f\u662f\u521b\u5efa\u53ef\u6269\u5c55\u3001\u5f3a\u5927\u3001\u4e30\u5bcc\u5e94\u7528\u7684\u6700\u7b80\u5355\u65b9\u5f0f\u3002<\/p>\n<p>FastHTML \u5728\u5e95\u5c42\u4f7f\u7528\u4e86\u00a0<a href=\"https:\/\/www.starlette.io\/\">starlette<\/a>\u00a0\u548c\u00a0<a href=\"https:\/\/www.uvicorn.org\/\">uvicorn<\/a>\u3002<\/p>\n<p>\u5b83\u81ea\u5e26\u00a0<a href=\"https:\/\/picocss.com\/\">Pico CSS<\/a>\u00a0\u7528\u4e8e\u6837\u5f0f\u8bbe\u7f6e\u3002\u7531\u4e8e\u56e2\u961f\u4e2d\u7ecf\u9a8c\u4e30\u5bcc\u7684 Web \u5f00\u53d1\u8005 Leandro \u5e0c\u671b\u5c1d\u8bd5 Tailwind CSS\uff0c\u52a0\u4e0a\u6211\u4eec\u6700\u8fd1\u53d1\u73b0\u7684\u00a0<a href=\"https:\/\/shad4fasthtml.com\/\">shad4fast<\/a>\uff0c\u6211\u4eec\u51b3\u5b9a\u7ed3\u5408 FastHTML \u548c\u00a0<a href=\"https:\/\/ui.shadcn.com\/\">shadcn\/ui<\/a>\u00a0\u4e2d\u7f8e\u89c2\u7684 UI \u7ec4\u4ef6\u3002<\/p>\n<h3>Pyvespa<\/h3>\n<p>\u6211\u4eec\u7684 Vespa Python \u5ba2\u6237\u7aef\u00a0<a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/\">pyvespa<\/a>\u00a0\u4ee5\u5f80\u4e3b\u8981\u7528\u4e8e Vespa \u5e94\u7528\u7684\u539f\u578b\u5f00\u53d1\u3002\u7136\u800c\uff0c\u6700\u8fd1\u6211\u4eec\u52aa\u529b\u901a\u8fc7 pyvespa \u63d0\u4f9b\u66f4\u591a\u7684 Vespa \u529f\u80fd\u652f\u6301\u3002\u76ee\u524d\u5df2\u652f\u6301\u90e8\u7f72\u5230\u751f\u4ea7\u73af\u5883\uff0c\u5e76\u6dfb\u52a0\u4e86\u901a\u8fc7 pyvespa \u9ad8\u7ea7\u914d\u7f6e Vespa\u00a0<code>services.xml<\/code>\u00a0\u6587\u4ef6\u7684\u529f\u80fd\u3002\u8be6\u89c1\u00a0<a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/advanced-configuration.html\">\u6b64<\/a>\u00a0\u7b14\u8bb0\u672c\u4e2d\u7684\u793a\u4f8b\u548c\u8be6\u7ec6\u4fe1\u606f\u3002<\/p>\n<p>\u56e0\u6b64\uff0c\u5927\u591a\u6570\u4e0d\u9700\u8981\u81ea\u5b9a\u4e49 Java \u7ec4\u4ef6\u7684 Vespa \u5e94\u7528\u90fd\u53ef\u4ee5\u901a\u8fc7 pyvespa \u6784\u5efa\u3002<\/p>\n<p><strong>\u8da3\u95fb\uff1a<\/strong><\/p>\n<p>pyvespa \u7684\u9ad8\u7ea7\u914d\u7f6e\u529f\u80fd\u5b9e\u9645\u4e0a\u53d7\u5230\u4e86 FastHTML \u5c06\u00a0<code>ft<\/code>-\u7ec4\u4ef6\u5c01\u88c5\u5e76\u8f6c\u6362\u4e3a HTML \u6807\u7b7e\u65b9\u5f0f\u7684\u542f\u53d1\u3002\u5728 pyvespa \u4e2d\uff0c\u6211\u4eec\u5bf9\u00a0<code>vt<\/code>-\u7ec4\u4ef6\u6267\u884c\u4e86\u7c7b\u4f3c\u64cd\u4f5c\uff0c\u5c06\u5176\u8f6c\u6362\u4e3a Vespa\u00a0<code>services.xml<\/code>\u00a0\u6807\u7b7e\u3002\u5bf9\u6b64\u611f\u5174\u8da3\u7684\u8bfb\u8005\u53ef\u4ee5\u67e5\u770b\u00a0<a href=\"https:\/\/github.com\/vespa-engine\/pyvespa\/pull\/915\">\u6b64 PR<\/a>\u00a0\u4e86\u89e3\u8be6\u60c5\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e3a\u6211\u4eec\u8282\u7701\u4e86\u5927\u91cf\u5de5\u4f5c\uff0c\u76f8\u8f83\u4e8e\u4e3a\u6240\u6709\u652f\u6301\u7684\u6807\u7b7e\u5b9e\u73b0\u81ea\u5b9a\u4e49\u7c7b\u3002<\/p>\n<p>\u53e6\u5916\uff0c\u4f7f\u7528 pyvespa \u6784\u5efa Vespa \u5e94\u7528\u7684\u8fc7\u7a0b\u4e5f\u8ba9\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8df5\u9a8c\u8bc1\u3002<\/p>\n<h3>\u786c\u4ef6<\/h3>\n<p>\u4f5c\u4e3a\u539f\u751f\u652f\u6301 Vespa \u7684 ColPali \u5d4c\u5165\u5668\uff0c\u76ee\u524d\u4ecd\u5904\u4e8e\u00a0<a href=\"https:\/\/github.com\/vespa-engine\/vespa\/issues\/32389\">WIP<\/a>\u00a0\u72b6\u6001\uff0c\u6211\u4eec\u77e5\u9053\u9700\u8981 GPU \u6765\u5b8c\u6210\u63a8\u7406\u3002\u4ece\u5728 Colab \u4e2d\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff1aT4 \u5b9e\u4f8b\u5c31\u8db3\u591f\u4e86\u3002<\/p>\n<p>\u4e3a\u4e86\u5728\u5c06\u6570\u636e\u96c6\u7684 PDF \u9875\u9762\u5d4c\u5165\u5230 Vespa \u4e4b\u524d\u751f\u6210\u5d4c\u5165\uff0c\u6211\u4eec\u8003\u8651\u4f7f\u7528\u65e0\u670d\u52a1\u5668\u7684 GPU \u63d0\u4f9b\u5546\uff08<a href=\"https:\/\/modal.com\/\">Modal<\/a>\u00a0\u662f\u6211\u4eec\u7684\u6700\u7231\u4e4b\u4e00\uff09\u3002\u4f46\u662f\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u201c\u4ec5\u6709\u201d6692\u9875\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u53f0 Macbook M2 Pro \u5de5\u4f5c\u4e86 5-6 \u5c0f\u65f6\u6765\u521b\u5efa\u8fd9\u4e9b\u5d4c\u5165\u3002<\/p>\n<h3>\u6258\u7ba1<\/h3>\n<p>\u8fd9\u91cc\u6709\u5f88\u591a\u9009\u62e9\u3002\u6211\u4eec\u53ef\u4ee5\u9009\u62e9\u4f20\u7edf\u7684\u4e91\u670d\u52a1\u63d0\u4f9b\u5546\uff0c\u4f8b\u5982 AWS\u3001GCP \u6216 Azure\uff0c\u4f46\u8fd9\u9700\u8981\u6211\u4eec\u82b1\u8d39\u66f4\u591a\u7cbe\u529b\u6765\u8bbe\u7f6e\u548c\u7ba1\u7406\u57fa\u7840\u8bbe\u65bd\uff0c\u5e76\u4e14\u4f1a\u8ba9\u5176\u4ed6\u4eba\u66f4\u96be\u590d\u5236\u8fd9\u4e2a\u5e94\u7528\u7a0b\u5e8f\u3002<\/p>\n<p>\u6211\u4eec\u4e86\u89e3\u5230\u00a0<a href=\"https:\/\/huggingface.co\/spaces\">Hugging Face Spaces<\/a>\u00a0\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u4ee5\u6839\u636e\u9700\u8981\u6dfb\u52a0 GPU \u7684\u6258\u7ba1\u670d\u52a1\u3002\u4ed6\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4e00\u952e\u5f0f\u7684\u201c\u514b\u9686\u6b64\u7a7a\u95f4\u201d\u6309\u94ae\uff0c\u53ef\u4ee5\u8ba9\u5176\u4ed6\u4eba\u975e\u5e38\u8f7b\u677e\u5730\u590d\u5236\u8be5\u5e94\u7528\u7a0b\u5e8f\u3002<\/p>\n<p>\u6211\u4eec\u53d1\u73b0 <a href=\"https:\/\/www.kdjingpai.com\/de\/answerai\/\">answer.ai<\/a> \u521b\u5efa\u4e86\u4e00\u4e2a\u00a0<a href=\"https:\/\/github.com\/AnswerDotAI\/fasthtml-hf\">\u53ef\u91cd\u7528\u7684\u5e93<\/a>\uff0c\u53ef\u4ee5\u7528\u4e8e\u5728 Hugging Face Spaces \u4e0a\u90e8\u7f72 FastHTML \u5e94\u7528\u7a0b\u5e8f\u3002\u4f46\u5728\u8fdb\u4e00\u6b65\u7814\u7a76\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u4ed6\u4eec\u7684\u65b9\u6cd5\u4f7f\u7528\u4e86 Docker SDK \u6765\u64cd\u4f5c Spaces\uff0c\u5b9e\u9645\u4e0a\u8fd8\u6709\u66f4\u7b80\u5355\u7684\u65b9\u6cd5\u3002<\/p>\n<p>\u901a\u8fc7\u5229\u7528\u00a0<a href=\"https:\/\/huggingface.co\/docs\/hub\/en\/spaces-sdks-python\">Custom Python Spaces<\/a>\u3002<\/p>\n<p>\u6839\u636e\u00a0<a href=\"https:\/\/huggingface.co\/docs\/hub\/en\/spaces-sdks-python\">huggingface-hub \u6587\u6863<\/a>\uff1a<\/p>\n<p>\u867d\u7136\u8fd9\u4e0d\u662f\u5b98\u65b9\u5de5\u4f5c\u6d41\uff0c\u4f46\u60a8\u53ef\u4ee5\u901a\u8fc7\u9009\u62e9 Gradio \u4f5c\u4e3a SDK \u5e76\u5728\u7aef\u53e3 7860 \u4e0a\u63d0\u4f9b\u524d\u7aef\u754c\u9762\uff0c\u5728 Spaces \u4e2d\u8fd0\u884c\u60a8\u81ea\u5df1\u7684 Python + \u754c\u9762\u6808\u3002<\/p>\n<p><strong>\u8da3\u95fb 2\uff1a<\/strong>\u00a0\u6587\u6863\u4e2d\u6709\u4e00\u4e2a\u9519\u5b57\uff0c\u6307\u51fa\u63d0\u4f9b\u670d\u52a1\u7684\u7aef\u53e3\u662f\u00a0<code>7680<\/code>\u3002\u5e78\u8fd0\u7684\u662f\uff0c\u6211\u4eec\u6ca1\u6709\u82b1\u592a\u591a\u65f6\u95f4\u5c31\u53d1\u73b0\u6b63\u786e\u7684\u7aef\u53e3\u5e94\u8be5\u662f\u00a0<code>7860<\/code>\uff0c\u5e76\u63d0\u4ea4\u4e86\u4e00\u4e2a\u00a0<a href=\"https:\/\/github.com\/huggingface\/hub-docs\/pull\/1436\">PR<\/a>\uff0c\u7531 Hugging Face \u7684 CTO Julien Chaumond \u5408\u5e76\uff0c\u4fee\u590d\u4e86\u8fd9\u4e2a\u9519\u8bef\u3002\u6e05\u5355\u4efb\u52a1\u5b8c\u6210\uff01<\/p>\n<h2>\u89c6\u89c9\u8bed\u8a00\u6a21\u578b<\/h2>\n<p>\u5bf9\u4e8e Visual RAG \u7684\u201c\u751f\u6210\u201d\u90e8\u5206\uff0c\u6211\u4eec\u9700\u8981\u4e00\u4e2a\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u6839\u636e\u4ece Vespa \u83b7\u53d6\u7684 top-k \u6392\u540d\u6587\u6863\u751f\u6210\u54cd\u5e94\u3002<\/p>\n<p>Vespa \u539f\u751f\u652f\u6301\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/llms-in-vespa.html\">LLM<\/a>\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\uff0c\u65e0\u8bba\u662f\u5916\u90e8\u8fd8\u662f\u5185\u90e8\u96c6\u6210\uff0c\u4f46 VLM\uff08\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff09\u5c1a\u672a\u5728 Vespa \u4e2d\u83b7\u5f97\u539f\u751f\u652f\u6301\u3002<\/p>\n<p>\u8fc7\u53bb\u4e00\u5e74\u4e2d\uff0cOpenAI\u3001Anthropic \u548c Google \u90fd\u53d1\u5e03\u4e86\u4f18\u79c0\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\uff0c\u8fd9\u4e00\u9886\u57df\u53d1\u5c55\u8fc5\u901f\u3002\u51fa\u4e8e\u6027\u80fd\u8003\u8651\uff0c\u6211\u4eec\u5e0c\u671b\u9009\u62e9\u4e00\u4e2a\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u9274\u4e8e Google \u7684 <a href=\"https:\/\/www.kdjingpai.com\/de\/google-ai-studioai\/\">Gemini API<\/a> \u6700\u8fd1\u6539\u8fdb\u4e86\u5f00\u53d1\u8005\u4f53\u9a8c\uff0c\u6211\u4eec\u51b3\u5b9a\u5728\u8fd9\u4e2a\u6f14\u793a\u4e2d\u4f7f\u7528\u00a0<code>gemini-1.5-flash-8b<\/code>\u3002<\/p>\n<p>\u5f53\u7136\uff0c\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u9009\u62e9\u6a21\u578b\u4e4b\u524d\uff0c\u5efa\u8bae\u5bf9\u4e0d\u540c\u6a21\u578b\u8fdb\u884c\u91cf\u5316\u8bc4\u4f30\uff0c\u4f46\u8fd9\u8d85\u51fa\u4e86\u672c\u9879\u76ee\u7684\u8303\u56f4\u3002<\/p>\n<h2>\u67b6\u6784<\/h2>\n<p>\u6709\u4e86\u6280\u672f\u6808\u540e\uff0c\u6211\u4eec\u53ef\u4ee5\u5f00\u59cb\u6784\u5efa\u5e94\u7528\u7a0b\u5e8f\u4e86\u3002\u5e94\u7528\u7a0b\u5e8f\u7684\u9ad8\u5c42\u67b6\u6784\u5982\u4e0b\uff1a<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14102\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-5\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/945020f7f64d1bb.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-5\" width=\"983\" height=\"640\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/945020f7f64d1bb.png 983w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/945020f7f64d1bb-300x195.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/945020f7f64d1bb-768x500.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/945020f7f64d1bb-18x12.png 18w\" sizes=\"auto, (max-width: 983px) 100vw, 983px\" \/><\/p>\n<h2>Vespa \u5e94\u7528\u7a0b\u5e8f<\/h2>\n<p>Vespa \u5e94\u7528\u7a0b\u5e8f\u7684\u5173\u952e\u7ec4\u4ef6\u5305\u62ec\uff1a<\/p>\n<ul>\n<li>\u5305\u542b\u5b57\u6bb5\u548c\u7c7b\u578b\u7684\u6587\u6863\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/reference\/schema-reference.html\">schema definition<\/a>\u3002<\/li>\n<li><a href=\"https:\/\/docs.vespa.ai\/en\/ranking.html\">Rank profile<\/a>\u00a0\u5b9a\u4e49\u3002<\/li>\n<li>\u4e00\u4e2a\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/application-packages.html#services.xml\"><code>services.xml<\/code><\/a>\u00a0\u914d\u7f6e\u6587\u4ef6\u3002<\/li>\n<\/ul>\n<p>\u6240\u6709\u8fd9\u4e9b\u00a0<em>\u53ef\u4ee5<\/em>\u00a0\u4f7f\u7528 pyvespa \u5728 Python \u4e2d\u5b9a\u4e49\uff0c\u4f46\u6211\u4eec\u5efa\u8bae\u540c\u65f6\u68c0\u67e5\u751f\u6210\u7684\u914d\u7f6e\u6587\u4ef6\uff0c\u53ef\u4ee5\u901a\u8fc7\u8c03\u7528\u00a0<code>app.package.to_files()<\/code>\u00a0\u6765\u5b9e\u73b0\u3002\u8be6\u7ec6\u4fe1\u606f\u8bf7\u53c2\u89c1\u00a0<a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/reference-api.html#vespa.package.ApplicationPackage.to_files\">pyvespa \u6587\u6863<\/a>\u3002<\/p>\n<h3>\u6392\u540d\u914d\u7f6e<\/h3>\n<p>Vespa \u7684\u4e00\u4e2a\u6700\u88ab\u4f4e\u4f30\u7684\u529f\u80fd\u662f\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/phased-ranking.html\">\u5206\u9636\u6bb5\u6392\u540d<\/a>\u00a0\u529f\u80fd\u3002\u5b83\u5141\u8bb8\u60a8\u5b9a\u4e49\u591a\u4e2a\u6392\u540d\u914d\u7f6e\u6587\u4ef6\uff0c\u6bcf\u4e2a\u914d\u7f6e\u6587\u4ef6\u53ef\u4ee5\u5305\u542b\u4e0d\u540c\u7684\uff08\u6216\u7ee7\u627f\u7684\uff09\u6392\u540d\u9636\u6bb5\uff0c\u8fd9\u4e9b\u9636\u6bb5\u53ef\u4ee5\u5728\u5185\u5bb9\u8282\u70b9\uff08\u7b2c\u4e00\u9636\u6bb5\u548c\u7b2c\u4e8c\u9636\u6bb5\uff09\u6216\u5bb9\u5668\u8282\u70b9\u4e0a\u6267\u884c\uff08<a href=\"https:\/\/docs.vespa.ai\/en\/reference\/schema-reference.html#globalphase-rank\">\u5168\u5c40\u9636\u6bb5<\/a>\uff09\u3002<\/p>\n<p>\u8fd9\u4f7f\u6211\u4eec\u80fd\u591f\u5206\u522b\u5904\u7406\u8bb8\u591a\u4e0d\u540c\u7684\u7528\u4f8b\uff0c\u5e76\u4e3a\u6bcf\u79cd\u60c5\u51b5\u627e\u5230\u5ef6\u8fdf\u3001\u6210\u672c\u548c\u8d28\u91cf\u4e4b\u95f4\u7684\u7406\u60f3\u5e73\u8861\u3002<\/p>\n<p>\u8bf7\u9605\u8bfb\u6211\u4eec\u7684 CEO Jon Bratseth \u5173\u4e8e\u901a\u8fc7\u5c06\u8ba1\u7b97\u79fb\u81f3\u6570\u636e\u7aef\u5b9e\u73b0\u67b6\u6784\u53cd\u8f6c\u7684\u00a0<a href=\"https:\/\/thenewstack.io\/architecture-inversion-scale-by-moving-computation-not-data\/\">\u8fd9\u7bc7\u535a\u6587<\/a>\u3002<\/p>\n<p>\u5bf9\u4e8e\u8fd9\u4e2a\u5e94\u7528\u7a0b\u5e8f\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86 3 \u4e2a\u4e0d\u540c\u7684\u6392\u540d\u914d\u7f6e\uff1a<\/p>\n<p><strong>\u6ce8\u610f<\/strong>\u00a0<strong>\u68c0\u7d22<\/strong>\u9636\u6bb5\u662f\u5728\u67e5\u8be2\u65f6\u901a\u8fc7\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/query-language.html\">yql<\/a>\u00a0\u6307\u5b9a\u7684\uff0c\u800c<strong>\u6392\u540d\u7b56\u7565<\/strong>\u662f\u5728\u6392\u540d\u914d\u7f6e\u6587\u4ef6\u4e2d\u6307\u5b9a\u7684\uff08\u662f\u90e8\u7f72\u65f6\u63d0\u4f9b\u7684\u5e94\u7528\u7a0b\u5e8f\u5305\u7684\u4e00\u90e8\u5206\uff09\u3002<\/p>\n<h4>1. \u7eaf ColPali<\/h4>\n<p>\u5728\u6211\u4eec\u7684\u5e94\u7528\u7a0b\u5e8f\u4e2d\uff0c\u7528\u4e8e\u8fd9\u79cd\u6392\u540d\u6a21\u5f0f\u7684 yql \u662f\uff1a<\/p>\n<pre><code>select title, text from pdf_page where targetHits:{100}nearestNeighbor(embedding,rq{i}) OR targetHits:{100}nearestNeighbor(embedding,rq{i+1}) .. targetHits:{100}nearestNeighbor(embedding,rq{n}) OR userQuery();\r\n<\/code><\/pre>\n<p>\u6211\u4eec\u8fd8\u5c06\u00a0<code>hnsw.exploreAdditionalHits<\/code>\u00a0\u53c2\u6570\u8c03\u6574\u4e3a 300\uff0c\u4ee5\u786e\u4fdd\u5728\u68c0\u7d22\u9636\u6bb5\u4e0d\u4f1a\u9519\u8fc7\u4efb\u4f55\u76f8\u5173\u7684\u5339\u914d\u9879\u3002\u8bf7\u6ce8\u610f\uff0c\u8fd9\u4f1a\u5e26\u6765\u6027\u80fd\u6210\u672c\u3002<\/p>\n<p>\u5176\u4e2d\u00a0<code>rq{i}<\/code>\u00a0\u662f\u67e5\u8be2\u4e2d\u7684\u7b2c i \u4e2a <a href=\"https:\/\/www.kdjingpai.com\/de\/tokenization\/\">Token<\/a> \uff08\u5fc5\u987b\u4f5c\u4e3a\u53c2\u6570\u5728 HTTP \u8bf7\u6c42\u4e2d\u63d0\u4f9b\uff09\uff0c<code>n<\/code>\u00a0\u662f\u7528\u4e8e\u68c0\u7d22\u7684\u6700\u5927\u67e5\u8be2 Token \u6570\uff08\u6211\u4eec\u5728\u6b64\u5e94\u7528\u7a0b\u5e8f\u4e2d\u4f7f\u7528 64\uff09\u3002<\/p>\n<p>\u6b64\u6392\u540d\u914d\u7f6e\u4f7f\u7528\u4e86\u00a0<code>max_sim_binary<\/code>\u00a0\u6392\u540d\u8868\u8fbe\u5f0f\uff0c\u8be5\u8868\u8fbe\u5f0f\u5229\u7528\u4e86 Vespa \u4e2d\u4f18\u5316\u7684\u6c49\u660e\u8ddd\u79bb\u8ba1\u7b97\u529f\u80fd\uff08\u8be6\u7ec6\u4fe1\u606f\u89c1\u00a0<a href=\"https:\/\/blog.vespa.ai\/scaling-colpali-to-billions\/\">Scaling ColPali to billions<\/a>\u3002\u5728\u7b2c\u4e00\u9636\u6bb5\u6392\u540d\u4e2d\u4f7f\u7528\u6b64\u65b9\u6cd5\uff0c\u5e76\u5bf9\u524d 100 \u4e2a\u5339\u914d\u9879\u4f7f\u7528 ColPali \u5d4c\u5165\u7684\u5b8c\u6574\u6d6e\u70b9\u8868\u793a\u91cd\u65b0\u6392\u540d\u3002<\/p>\n<h4>2. \u7eaf\u57fa\u4e8e\u6587\u672c\u7684\u6392\u540d\uff08BM25\uff09<\/h4>\n<p>\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u4ec5\u57fa\u4e8e\u00a0<code>weakAnd<\/code>\u00a0\u68c0\u7d22\u6587\u6863\u3002<\/p>\n<pre><code>select title, text from pdf_page where userQuery();\r\n<\/code><\/pre>\n<p>\u5728\u6392\u540d\u9636\u6bb5\uff0c\u6211\u4eec\u4f7f\u7528 <a href=\"https:\/\/www.kdjingpai.com\/de\/bm25\/\">bm25<\/a> \u8fdb\u884c\u7b2c\u4e00\u9636\u6bb5\u6392\u540d\uff08\u65e0\u7b2c\u4e8c\u9636\u6bb5\uff09\u3002<\/p>\n<p>\u8bf7\u6ce8\u610f\uff0c\u4e3a\u4e86\u83b7\u5f97\u6700\u4f73\u6027\u80fd\uff0c\u6211\u4eec\u5f88\u53ef\u80fd\u5e0c\u671b\u5c06\u57fa\u4e8e\u6587\u672c\u548c\u57fa\u4e8e\u89c6\u89c9\u7684\u6392\u540d\u7279\u5f81\u7ed3\u5408\u4f7f\u7528\uff08\u4f8b\u5982\u4f7f\u7528\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/phased-ranking.html#cross-hit-normalization-including-reciprocal-rank-fusion\">\u4e92\u60e0\u6392\u540d\u878d\u5408<\/a>\uff09\uff0c\u4f46\u5728\u6b64\u6f14\u793a\u4e2d\uff0c\u6211\u4eec\u5e0c\u671b\u5c55\u793a\u5b83\u4eec\u4e4b\u95f4\u7684\u5dee\u5f02\uff0c\u800c\u4e0d\u662f\u627e\u5230\u6700\u4f18\u7ec4\u5408\u3002<\/p>\n<h4>3. \u6df7\u5408 BM25 + ColPali<\/h4>\n<p>\u5728\u68c0\u7d22\u9636\u6bb5\uff0c\u6211\u4eec\u4f7f\u7528\u4e0e\u7eaf ColPali \u6392\u540d\u914d\u7f6e\u76f8\u540c\u7684 yql\u3002<\/p>\n<p>\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u67e5\u8be2\uff0c\u5c24\u5176\u662f\u8f83\u77ed\u7684\u67e5\u8be2\uff0c\u7eaf ColPali \u5339\u914d\u4e86\u8bb8\u591a\u6ca1\u6709\u6587\u672c\u7684\u9875\u9762\uff08\u4ec5\u6709\u56fe\u50cf\uff09\uff0c\u800c\u6211\u4eec\u5bfb\u627e\u7684\u8bb8\u591a\u7b54\u6848\u5b9e\u9645\u4e0a\u51fa\u73b0\u5728\u6709\u6587\u672c\u7684\u9875\u9762\u4e2d\u3002<\/p>\n<p>\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6dfb\u52a0\u4e86\u4e00\u4e2a\u7ed3\u5408 BM25 \u5206\u6570\u548c ColPali \u5206\u6570\u7684\u7b2c\u4e8c\u9636\u6bb5\u6392\u540d\u8868\u8fbe\u5f0f\uff0c\u4f7f\u7528\u4e24\u4e2a\u5206\u6570\u7684\u7ebf\u6027\u7ec4\u5408\uff08<code>max_sim + 2 * (bm25(title) + bm25(text))<\/code>\uff09\u3002<\/p>\n<p>\u6b64\u65b9\u6cd5\u57fa\u4e8e\u7b80\u5355\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\uff0c\u4f46\u901a\u8fc7\u8fdb\u884c\u6392\u540d\u5b9e\u9a8c\u627e\u5230\u4e0d\u540c\u7279\u5f81\u7684\u6700\u4f18\u6743\u91cd\u4f1a\u66f4\u6709\u76ca\u3002<\/p>\n<h3>Vespa \u4e2d\u7684\u7247\u6bb5\u751f\u6210<\/h3>\n<p>\u5728\u641c\u7d22\u524d\u7aef\u4e2d\uff0c\u901a\u5e38\u4f1a\u5305\u542b\u4e00\u4e9b\u6765\u6e90\u6587\u672c\u7684\u6458\u5f55\uff0c\u5e76\u5c06\u67d0\u4e9b\u8bcd\u4ee5\u00a0<strong>\u7c97\u4f53<\/strong>\u00a0\uff08\u9ad8\u4eae\uff09\u663e\u793a\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14101\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-6\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-6\" width=\"2576\" height=\"1624\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8.png 2576w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8-300x189.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8-1024x646.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8-768x484.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8-1536x968.png 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8-2048x1291.png 2048w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/67be68a348da3b8-18x12.png 18w\" sizes=\"auto, (max-width: 2576px) 100vw, 2576px\" \/><\/p>\n<p>\u5728\u4e0a\u4e0b\u6587\u4e2d\u663e\u793a\u5339\u914d\u7684\u67e5\u8be2\u8bcd\u7684\u7247\u6bb5\uff0c\u5141\u8bb8\u7528\u6237\u5feb\u901f\u5224\u65ad\u7ed3\u679c\u662f\u5426\u53ef\u80fd\u6ee1\u8db3\u5176\u4fe1\u606f\u9700\u6c42\u3002<\/p>\n<p>\u5728 Vespa \u4e2d\uff0c\u8fd9\u79cd\u529f\u80fd\u88ab\u79f0\u4e3a\u201c\u52a8\u6001\u7247\u6bb5\u201d\uff0c\u5e76\u4e14\u6709\u591a\u79cd\u53c2\u6570\u53ef\u4ee5\u8c03\u6574\uff0c\u4f8b\u5982\u5305\u62ec\u591a\u5c11\u5468\u56f4\u4e0a\u4e0b\u6587\u4ee5\u53ca\u7528\u4e8e\u7a81\u51fa\u663e\u793a\u5339\u914d\u8bcd\u7684\u6807\u7b7e\u3002<\/p>\n<p>\u5728\u6b64\u6f14\u793a\u4e2d\uff0c\u6211\u4eec\u540c\u65f6\u5c55\u793a\u7247\u6bb5\u548c\u9875\u9762\u7684\u5b8c\u6574\u63d0\u53d6\u6587\u672c\u4ee5\u4f5c\u6bd4\u8f83\u3002<br \/>\n\u4e3a\u4e86\u51cf\u5c11\u7ed3\u679c\u4e2d\u7684\u89c6\u89c9\u566a\u58f0\uff0c\u6211\u4eec\u4ece\u7528\u6237\u67e5\u8be2\u4e2d\u79fb\u9664\u4e86\u505c\u7528\u8bcd\uff08and\u3001in\u3001the \u7b49\uff09\uff0c\u56e0\u6b64\u5b83\u4eec\u4e0d\u4f1a\u88ab\u9ad8\u4eae\u663e\u793a\u3002<\/p>\n<p><a href=\"https:\/\/docs.vespa.ai\/en\/document-summaries.html#dynamic-snippets\">\u4e86\u89e3\u6709\u5173 Vespa \u52a8\u6001\u7247\u6bb5\u7684\u66f4\u591a\u4fe1\u606f\u3002<\/a><\/p>\n<h3>Vespa \u4e2d\u7684\u67e5\u8be2\u5efa\u8bae<\/h3>\n<p>\u641c\u7d22\u4e2d\u4e00\u4e2a\u5e38\u89c1\u7684\u529f\u80fd\u662f\u201c\u641c\u7d22\u5efa\u8bae\u201d\uff0c\u5b83\u4f1a\u5728\u7528\u6237\u8f93\u5165\u65f6\u663e\u793a\u3002<br \/>\n\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u901a\u5e38\u88ab\u7528\u6765\u63d0\u4f9b\u9884\u8ba1\u7b97\u7684\u7ed3\u679c\uff0c\u4f46\u5728\u8fd9\u91cc\u6211\u4eec\u6ca1\u6709\u4efb\u4f55\u7528\u6237\u6d41\u91cf\u53ef\u4f9b\u5206\u6790\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14099\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-7\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-7\" width=\"2096\" height=\"844\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18.png 2096w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18-300x121.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18-1024x412.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18-768x309.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18-1536x619.png 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18-2048x825.png 2048w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/1313181a9d0cb18-18x7.png 18w\" sizes=\"auto, (max-width: 2096px) 100vw, 2096px\" \/><\/p>\n<p>\u5728\u672c\u4f8b\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u7b80\u5355\u7684\u5b50\u5b57\u7b26\u4e32\u641c\u7d22\uff0c\u5c06\u7528\u6237\u8f93\u5165\u7684\u524d\u7f00\u4e0e\u4ece PDF \u9875\u9762\u751f\u6210\u7684\u76f8\u5173\u95ee\u9898\u5339\u914d\uff0c\u4ee5\u63d0\u4f9b\u5efa\u8bae\u3002<\/p>\n<p>\u6211\u4eec\u7528\u4e8e\u83b7\u53d6\u8fd9\u4e9b\u5efa\u8bae\u7684 yql \u67e5\u8be2\u662f\uff1a<\/p>\n<pre><code>select questions from pdf_page where questions matches (\".*{query}.*\")\r\n<\/code><\/pre>\n<p>\u8fd9\u79cd\u65b9\u6cd5\u7684\u4e00\u4e2a\u4f18\u52bf\u662f\uff0c\u4efb\u4f55\u51fa\u73b0\u5728\u5efa\u8bae\u4e2d\u7684\u95ee\u9898\u90fd\u53ef\u4ee5\u786e\u8ba4\u5728\u73b0\u6709\u6570\u636e\u4e2d\u6709\u7b54\u6848\uff01<\/p>\n<p>\u6211\u4eec\u672c\u53ef\u4ee5\u786e\u4fdd\u751f\u6210\u5efa\u8bae\u95ee\u9898\u7684\u9875\u9762\u59cb\u7ec8\u51fa\u73b0\u5728\u524d\u4e09\u4e2a\u54cd\u5e94\u4e2d\uff08\u901a\u8fc7\u5728\u6392\u5e8f\u914d\u7f6e\u4e2d\u52a0\u5165\u7528\u6237\u67e5\u8be2\u4e0e\u6587\u6863\u751f\u6210\u95ee\u9898\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u6307\u6807\uff09\uff0c\u4f46\u4ece\u5c55\u793a ColPali \u6a21\u578b\u529f\u80fd\u7684\u89d2\u5ea6\u6765\u770b\uff0c\u8fd9\u79cd\u505a\u6cd5\u6709\u70b9\u50cf\u201c\u4f5c\u5f0a\u201d\u3002<\/p>\n<h2>\u7528\u6237\u4f53\u9a8c<\/h2>\n<p>\u6211\u4eec\u5f88\u5e78\u8fd0\u5730\u4ece\u9996\u5e2d\u79d1\u5b66\u5bb6\u00a0<a href=\"https:\/\/x.com\/jobergum\">Jo Bergum<\/a>\u00a0\u90a3\u91cc\u5f97\u5230\u4e86\u6781\u597d\u7684 UX \u53cd\u9988\u3002\u4ed6\u63a8\u52a8\u6211\u4eec\u8ba9 UX \u53d8\u5f97\u201c\u5feb\u901f\u6d41\u7545\u201d\u3002\u4eba\u4eec\u4e60\u60ef\u4e86 Google\uff0c\u56e0\u6b64\u6beb\u65e0\u7591\u95ee\uff0c\u901f\u5ea6\u5bf9\u4e8e\u641c\u7d22\uff08\u548c RAG\uff09\u4e2d\u7684\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u662f\u76ee\u524d AI \u793e\u533a\u4ecd\u7136\u6709\u4e9b\u4f4e\u4f30\u7684\u4e00\u70b9\uff0c\u8bb8\u591a\u4eba\u4f3c\u4e4e\u5bf9\u7b49\u5f85 5-10 \u79d2\u7684\u54cd\u5e94\u611f\u5230\u6ee1\u610f\u3002\u800c\u6211\u4eec\u5e0c\u671b\u5b9e\u73b0\u4ee5\u6beb\u79d2\u4e3a\u5355\u4f4d\u7684\u54cd\u5e94\u65f6\u95f4\u3002<\/p>\n<p>\u6839\u636e\u4ed6\u7684\u53cd\u9988\uff0c\u6211\u4eec\u9700\u8981\u8bbe\u7f6e\u5206\u9636\u6bb5\u7684\u8bf7\u6c42\u6d41\u7a0b\uff0c\u4ee5\u907f\u514d\u5728\u663e\u793a\u7ed3\u679c\u4e4b\u524d\u7b49\u5f85\u5b8c\u6574\u7684\u56fe\u50cf\u548c\u76f8\u4f3c\u6027\u6620\u5c04\u5f20\u91cf\u4ece Vespa \u8fd4\u56de\u3002<\/p>\n<p>\u89e3\u51b3\u65b9\u6848\u662f\u9996\u5148\u4ece\u7ed3\u679c\u4e2d\u4ec5\u63d0\u53d6\u6700\u91cd\u8981\u7684\u6570\u636e\u3002\u5bf9\u6211\u4eec\u6765\u8bf4\uff0c\u8fd9\u610f\u5473\u7740\u4ec5\u63d0\u53d6\u00a0<code>title<\/code>\u3001<code>url<\/code>\u3001<code>text<\/code>\u3001<code>page_no<\/code>\uff0c\u4ee5\u53ca\u7f29\u5c0f\uff08\u6a21\u7cca\uff09\u7248\u672c\u7684\u56fe\u50cf\uff0832&#215;32 \u50cf\u7d20\uff09\uff0c\u7528\u4e8e\u521d\u6b65\u641c\u7d22\u7ed3\u679c\u5c55\u793a\u3002\u8fd9\u4f7f\u6211\u4eec\u80fd\u591f\u7acb\u5373\u663e\u793a\u7ed3\u679c\uff0c\u5e76\u5728\u540e\u53f0\u7ee7\u7eed\u52a0\u8f7d\u5b8c\u6574\u56fe\u50cf\u548c\u76f8\u4f3c\u6027\u6620\u5c04\u3002<\/p>\n<p>\u5b8c\u6574\u7684 UX \u6d41\u7a0b\u5982\u4e0b\u6240\u793a\uff1a<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14106\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707.jpg\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-1\" width=\"3980\" height=\"2213\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707.jpg 3980w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707-300x167.jpg 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707-1024x569.jpg 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707-768x427.jpg 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707-1536x854.jpg 1536w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707-2048x1139.jpg 2048w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/d43e7a230634707-18x10.jpg 18w\" sizes=\"auto, (max-width: 3980px) 100vw, 3980px\" \/>\u4e3b\u8981\u7684\u5ef6\u8fdf\u6765\u6e90\u6709\uff1a<\/p>\n<ul>\n<li>\u751f\u6210 ColPali \u5d4c\u5165\u7684\u63a8\u7406\u65f6\u95f4\uff08\u5728 GPU \u4e0a\u5b8c\u6210\uff0c\u53d6\u51b3\u4e8e\u67e5\u8be2\u4e2d\u7684 Token \u6570\u91cf\uff09\n<ul>\n<li>\u56e0\u6b64\u6211\u4eec\u51b3\u5b9a\u5bf9\u8be5\u51fd\u6570\u4f7f\u7528\u00a0<code>@lru_cache<\/code>\u00a0\u88c5\u9970\u5668\uff0c\u4ee5\u907f\u514d\u5bf9\u76f8\u540c\u67e5\u8be2\u591a\u6b21\u91cd\u65b0\u8ba1\u7b97\u5d4c\u5165\u3002<\/li>\n<\/ul>\n<\/li>\n<li>Hugging Face Spaces \u548c Vespa \u4e4b\u95f4\u7684\u7f51\u7edc\u5ef6\u8fdf\uff08\u5305\u62ec TCP \u63e1\u624b\uff09\n<ul>\n<li>\u5b8c\u6574\u56fe\u50cf\u7684\u4f20\u8f93\u65f6\u95f4\u4e5f\u5f88\u663e\u8457\uff08\u6bcf\u5f20\u7ea6 0.5MB\uff09\u3002<\/li>\n<li>\u76f8\u4f3c\u6027\u6620\u5c04\u5f20\u91cf\u7684\u5927\u5c0f\u66f4\u5927\uff08<code>n_query_tokens<\/code>\u00a0x\u00a0<code>n_images<\/code>\u00a0x 1030 patches x 128\uff09\u3002<\/li>\n<\/ul>\n<\/li>\n<li>\u521b\u5efa\u76f8\u4f3c\u6027\u6620\u5c04\u6df7\u5408\u56fe\u50cf\u662f\u4e00\u4e2a CPU \u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u4f46\u8fd9\u662f\u901a\u8fc7\u00a0<code>fastcore<\/code>\u00a0\u7684\u00a0<code>@threaded<\/code>\u00a0\u88c5\u9970\u5668\u4ee5\u591a\u7ebf\u7a0b\u540e\u53f0\u4efb\u52a1\u5b8c\u6210\u7684\uff0c\u6bcf\u5f20\u56fe\u50cf\u8f6e\u8be2\u5176\u5bf9\u5e94\u7684\u7aef\u70b9\u4ee5\u68c0\u67e5\u76f8\u4f3c\u6027\u6620\u5c04\u662f\u5426\u51c6\u5907\u5c31\u7eea\u3002<\/li>\n<\/ul>\n<h2>\u538b\u529b\u6d4b\u8bd5<\/h2>\n<p>\u6211\u4eec\u5bf9\u5e94\u7528\u7a0b\u5e8f\u5728\u6d41\u91cf\u6fc0\u589e\u65f6\u7684\u8868\u73b0\u611f\u5230\u62c5\u5fe7\uff0c\u56e0\u6b64\u8fdb\u884c\u4e86\u4e00\u6b21\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u65b9\u6cd5\u662f\u901a\u8fc7\u6d4f\u89c8\u5668\u5f00\u53d1\u5de5\u5177\u5c06\u8bf7\u6c42\u00a0<code>\/fetch_results<\/code>\u00a0\u7684 cURL \u547d\u4ee4\u590d\u5236\u4e0b\u6765\uff08\u672a\u542f\u7528\u7f13\u5b58\uff09\uff0c\u5e76\u5728 10 \u4e2a\u5e76\u884c\u7ec8\u7aef\u4e2d\u5faa\u73af\u8fd0\u884c\u3002 \uff08\u8fd9\u65f6\u6211\u4eec\u7981\u7528\u4e86\u00a0<code>@lru_cache<\/code>\u00a0\u88c5\u9970\u5668\u3002\uff09<\/p>\n<h3>\u7ed3\u679c<\/h3>\n<p>\u5c3d\u7ba1\u6d4b\u8bd5\u975e\u5e38\u57fa\u7840\uff0c\u4f46\u9996\u6b21\u6d4b\u8bd5\u663e\u793a\u641c\u7d22\u541e\u5410\u91cf\u7684\u74f6\u9888\u5728\u4e8e Huggingface \u7a7a\u95f4\u7684 GPU \u4e0a\u8ba1\u7b97 ColPali \u5d4c\u5165\uff0c\u800c Vespa \u540e\u7aef\u53ef\u4ee5\u8f7b\u677e\u5904\u7406\u6bcf\u79d2 20 \u591a\u4e2a\u67e5\u8be2\uff0c\u8d44\u6e90\u4f7f\u7528\u7387\u5f88\u4f4e\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u6f14\u793a\u6765\u8bf4\u5df2\u7ef0\u7ef0\u6709\u4f59\u3002\u5982\u679c\u9700\u8981\u6269\u5c55\uff0c\u6211\u4eec\u7684\u9996\u8981\u63aa\u65bd\u662f\u4e3a Huggingface \u7a7a\u95f4\u542f\u7528\u66f4\u5927\u7684 GPU \u5b9e\u4f8b\u3002<\/p>\n<p>\u5982\u4ee5\u4e0b\u56fe\u8868\u6240\u793a\uff0cVespa \u5e94\u7528\u7a0b\u5e8f\u5728\u8d1f\u8f7d\u4e0b\u8868\u73b0\u826f\u597d\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14096\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-10\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/7883a7305bcbd39.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-10\" width=\"416\" height=\"234\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/7883a7305bcbd39.png 416w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/7883a7305bcbd39-300x169.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/7883a7305bcbd39-18x10.png 18w\" sizes=\"auto, (max-width: 416px) 100vw, 416px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-14097\" title=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-11\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/05c7e24700502a0.png\" alt=\"\u4f7f\u7528 Vespa \u5b9e\u73b0 PDF \u7684\u89c6\u89c9 RAG - \u4e00\u4e2a\u57fa\u4e8e Python \u7684\u6f14\u793a\u5e94\u7528-11\" width=\"1060\" height=\"160\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/05c7e24700502a0.png 1060w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/05c7e24700502a0-300x45.png 300w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/05c7e24700502a0-1024x155.png 1024w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/05c7e24700502a0-768x116.png 768w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2024\/11\/05c7e24700502a0-18x3.png 18w\" sizes=\"auto, (max-width: 1060px) 100vw, 1060px\" \/><\/p>\n<h2>\u4f7f\u7528 FastHTML \u7684\u53cd\u601d<\/h2>\n<p>\u4f7f\u7528 FastHTML \u7684\u4e3b\u8981\u6536\u83b7\u662f\uff0c\u5b83\u6253\u7834\u4e86\u524d\u7aef\u548c\u540e\u7aef\u5f00\u53d1\u4e4b\u95f4\u7684\u969c\u788d\u3002\u4ee3\u7801\u7d27\u5bc6\u96c6\u6210\uff0c\u8ba9\u6211\u4eec\u6240\u6709\u4eba\u90fd\u80fd\u7406\u89e3\u5e76\u5bf9\u5e94\u7528\u7a0b\u5e8f\u7684\u6bcf\u4e2a\u90e8\u5206\u505a\u51fa\u8d21\u732e\u3002\u8fd9\u4e00\u70b9\u4e0d\u5bb9\u4f4e\u4f30\u3002<\/p>\n<p>\u6211\u4eec\u975e\u5e38\u4eab\u53d7\u80fd\u591f\u4f7f\u7528\u6d4f\u89c8\u5668\u7684\u5f00\u53d1\u5de5\u5177\u68c0\u67e5\u524d\u7aef\u4ee3\u7801\uff0c\u5e76\u5b9e\u9645\u770b\u5230\u5e76\u7406\u89e3\u5176\u4e2d\u7684\u5927\u90e8\u5206\u5185\u5bb9\u3002<\/p>\n<p>\u4e0e\u4f7f\u7528\u72ec\u7acb\u7684\u524d\u7aef\u6846\u67b6\u76f8\u6bd4\uff0c\u5f00\u53d1\u548c\u90e8\u7f72\u8fc7\u7a0b\u663e\u8457\u7b80\u5316\u3002<\/p>\n<p>\u5b83\u4f7f\u6211\u4eec\u80fd\u591f\u7528\u00a0<a href=\"https:\/\/docs.astral.sh\/uv\/\">uv<\/a>\u00a0\u7ba1\u7406\u00a0<strong>\u6240\u6709<\/strong>\u00a0\u4f9d\u8d56\u9879\uff0c\u8fd9\u6781\u5927\u5730\u6539\u53d8\u4e86\u6211\u4eec\u5728 Python \u4e2d\u5904\u7406\u4f9d\u8d56\u9879\u7684\u65b9\u5f0f\u3002<\/p>\n<p><strong>Thomas \u7684\u89c2\u70b9\uff1a<\/strong><\/p>\n<p>\u4f5c\u4e3a\u4e00\u540d\u6570\u636e\u79d1\u5b66\u548c AI \u80cc\u666f\u7684\u5f00\u53d1\u8005\uff0c\u66f4\u504f\u597d Python\uff0c\u4f46\u4e5f\u66fe\u4f7f\u7528\u8fc7\u591a\u4e2a JS \u6846\u67b6\uff0c\u6211\u7684\u4f53\u9a8c\u975e\u5e38\u79ef\u6781\u3002\u6211\u611f\u5230\u81ea\u5df1\u80fd\u591f\u66f4\u597d\u5730\u53c2\u4e0e\u524d\u7aef\u76f8\u5173\u4efb\u52a1\uff0c\u800c\u4e0d\u4f1a\u7ed9\u9879\u76ee\u589e\u52a0\u592a\u591a\u590d\u6742\u6027\u3002\u6211\u975e\u5e38\u559c\u6b22\u80fd\u591f\u7406\u89e3\u5e94\u7528\u7a0b\u5e8f\u7684\u6bcf\u4e2a\u90e8\u5206\u3002<\/p>\n<p><strong>Andreas \u7684\u89c2\u70b9\uff1a<\/strong><\/p>\n<p>\u6211\u5728 Vespa \u4e0a\u5de5\u4f5c\u4e86\u5f88\u957f\u65f6\u95f4\uff0c\u4f46\u5bf9 Python \u6216\u524d\u7aef\u5f00\u53d1\u7684\u6d89\u8db3\u4e0d\u591a\u3002\u4e00\u5f00\u59cb\u7684\u4e00\u4e24\u5929\u611f\u89c9\u6709\u4e9b\u4e0d\u77e5\u6240\u63aa\uff0c\u4f46\u80fd\u591f\u5728\u5168\u6808\u4e2d\u5de5\u4f5c\uff0c\u5e76\u51e0\u4e4e\u5b9e\u65f6\u770b\u5230\u81ea\u5df1\u66f4\u6539\u7684\u6548\u679c\uff0c\u5b9e\u5728\u662f\u592a\u4ee4\u4eba\u5174\u594b\u4e86\uff01\u6709\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5e2e\u52a9\uff0c\u8fdb\u5165\u4e00\u4e2a\u4e0d\u719f\u6089\u7684\u73af\u5883\u6bd4\u4ee5\u5f80\u4efb\u4f55\u65f6\u5019\u90fd\u66f4\u5bb9\u6613\u3002\u6211\u975e\u5e38\u559c\u6b22\u6211\u4eec\u80fd\u591f\u901a\u8fc7 Vespa \u5185\u7684\u5f20\u91cf\u8868\u8fbe\u5f0f\u8ba1\u7b97\u56fe\u50cf\u8865\u4e01\u7684\u76f8\u4f3c\u6027\uff08\u5411\u91cf\u5df2\u7ecf\u5b58\u50a8\u5728\u5185\u5b58\u4e2d\uff09\uff0c\u5e76\u5c06\u5176\u4e0e\u641c\u7d22\u7ed3\u679c\u4e00\u8d77\u8fd4\u56de\uff0c\u4ece\u800c\u4ee5\u66f4\u4f4e\u7684\u5ef6\u8fdf\u548c\u8d44\u6e90\u6d88\u8017\u521b\u5efa\u76f8\u4f3c\u6027\u6620\u5c04\u3002<\/p>\n<p><strong>Leandro \u7684\u89c2\u70b9\uff1a<\/strong><\/p>\n<p>\u4f5c\u4e3a\u4e00\u540d\u5177\u6709\u4f7f\u7528 React\u3001JavaScript\u3001TypeScript\u3001HTML \u548c CSS \u8fdb\u884c Web \u5f00\u53d1\u7684\u624e\u5b9e\u57fa\u7840\u7684\u5f00\u53d1\u8005\uff0c\u8f6c\u5411 FastHTML \u76f8\u5bf9\u7b80\u5355\u3002\u8be5\u6846\u67b6\u7684\u76f4\u63a5 HTML \u5143\u7d20\u6620\u5c04\u4e0e\u6211\u4e4b\u524d\u7684\u77e5\u8bc6\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8fd9\u964d\u4f4e\u4e86\u5b66\u4e60\u66f2\u7ebf\u3002\u4e3b\u8981\u7684\u6311\u6218\u662f\u9002\u5e94 FastHTML \u7684\u57fa\u4e8e Python \u7684\u8bed\u6cd5\uff0c\u56e0\u4e3a\u5b83\u4e0d\u540c\u4e8e\u6807\u51c6\u7684 HTML\/JS \u7ed3\u6784\u3002<\/p>\n<h2>\u89c6\u89c9\u6280\u672f\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684\u4e00\u5207\u5417\uff1f<\/h2>\n<p>\u6211\u4eec\u5df2\u7ecf\u770b\u5230\uff0c\u5229\u7528\u6765\u81ea\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08Vision Language Model\uff0cVLM\uff09\u7684 Token \u7ea7\u522b\u665a\u671f\u4ea4\u4e92\u5d4c\u5165\u5728\u67d0\u4e9b\u7c7b\u578b\u7684\u67e5\u8be2\u4e2d\u975e\u5e38\u5f3a\u5927\uff0c\u4f46\u6211\u4eec\u5e76\u4e0d\u8ba4\u4e3a\u5b83\u662f\u4e07\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u800c\u66f4\u50cf\u662f\u5de5\u5177\u7bb1\u4e2d\u4e00\u4e2a\u975e\u5e38\u6709\u4ef7\u503c\u7684\u5de5\u5177\u3002<\/p>\n<p>\u9664\u4e86 ColPali\uff0c\u6211\u4eec\u5728\u8fc7\u53bb\u4e00\u5e74\u4e2d\u8fd8\u770b\u5230\u4e86\u89c6\u89c9\u68c0\u7d22\u9886\u57df\u7684\u5176\u4ed6\u521b\u65b0\u3002\u4e24\u4e2a\u7279\u522b\u6709\u8da3\u7684\u65b9\u6cd5\u662f\uff1a<\/p>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/pdf\/2406.11251\">\u6587\u6863\u622a\u56fe\u5d4c\u5165\uff08Document Screenshot Embeddings\uff0cDSE\uff09<\/a><a href=\"https:\/\/blog.vespa.ai\/visual-rag-in-practice\/#fn:4\">5<\/a>\u00a0&#8211; \u4e00\u79cd\u53cc\u7f16\u7801\u5668\u6a21\u578b\uff0c\u7528\u4e8e\u4e3a\u6587\u6863\u7684\u622a\u56fe\u751f\u6210\u5bc6\u96c6\u5d4c\u5165\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u5d4c\u5165\u8fdb\u884c\u68c0\u7d22\u3002<\/li>\n<li>IBM \u7684\u00a0<a href=\"https:\/\/research.ibm.com\/blog\/docling-generative-AI?t\">Docling<\/a>\u00a0&#8211; \u4e00\u4e2a\u5e93\uff0c\u7528\u4e8e\u5c06\u591a\u79cd\u7c7b\u578b\u7684\u6587\u6863\uff08\u5982 PDF\u3001PPT\u3001DOCX \u7b49\uff09\u89e3\u6790\u4e3a Markdown\uff0c\u907f\u514d\u4e86 OCR\uff0c\u800c\u662f\u4f7f\u7528\u8ba1\u7b97\u673a\u89c6\u89c9\u6a21\u578b\u3002<\/li>\n<\/ul>\n<p>Vespa \u652f\u6301\u5c06\u8fd9\u4e9b\u65b9\u6cd5\u7ed3\u5408\u8d77\u6765\uff0c\u5e76\u4f7f\u5f00\u53d1\u8005\u80fd\u591f\u9488\u5bf9\u7279\u5b9a\u7684\u7528\u4f8b\uff0c\u5728\u5ef6\u8fdf\u3001\u6210\u672c\u548c\u8d28\u91cf\u4e4b\u95f4\u627e\u5230\u6700\u5177\u5438\u5f15\u529b\u7684\u5e73\u8861\u3002<\/p>\n<p>\u6211\u4eec\u53ef\u4ee5\u8bbe\u60f3\u4e00\u4e2a\u5e94\u7528\uff0c\u5b83\u7ed3\u5408\u4e86 Docling \u6216\u7c7b\u4f3c\u5de5\u5177\u7684\u9ad8\u8d28\u91cf\u6587\u672c\u63d0\u53d6\uff0c\u4f7f\u7528\u6587\u6863\u622a\u56fe\u5d4c\u5165\u8fdb\u884c\u5bc6\u96c6\u68c0\u7d22\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u7279\u5f81\u548c\u7c7b\u4f3c ColPali \u6a21\u578b\u7684\u00a0<code>MaxSim<\/code>\u00a0\u5206\u6570\u8fdb\u884c\u6392\u5e8f\u3002\u5982\u679c\u4f60\u771f\u7684\u60f3\u63d0\u5347\u6027\u80fd\uff0c\u4f60\u751a\u81f3\u53ef\u4ee5\u5c06\u6240\u6709\u8fd9\u4e9b\u7279\u5f81\u4e0e\u8bf8\u5982\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/xgboost.html\">XGBoost<\/a>\u00a0\u6216\u00a0<a href=\"https:\/\/docs.vespa.ai\/en\/lightgbm.html\">LightGBM<\/a>\u00a0\u7684 GBDT \u6a21\u578b\u7ed3\u5408\u8d77\u6765\u3002<\/p>\n<p>\u56e0\u6b64\uff0c\u5c3d\u7ba1 ColPali \u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u53ef\u7528\u4e8e\u4f7f\u6587\u672c\u4e2d\u96be\u4ee5\u63d0\u53d6\u7684\u4fe1\u606f\u53d8\u5f97\u53ef\u68c0\u7d22\uff0c\u4f46\u5b83\u5e76\u975e\u4e07\u80fd\uff0c\u5e94\u8be5\u7ed3\u5408\u5176\u4ed6\u65b9\u6cd5\u4ee5\u5b9e\u73b0\u6700\u4f73\u6027\u80fd\u3002<\/p>\n<h2>\u7f3a\u5931\u7684\u90e8\u5206<\/h2>\n<p>\u6a21\u578b\u662f\u6682\u65f6\u7684\uff0c\u800c\u8bc4\u4f30\u662f\u6c38\u4e45\u7684\u3002<\/p>\n<p><em><a href=\"https:\/\/x.com\/charles_irl\/status\/1854911668309880935\">@charles_irl \u5728 X \u4e0a<\/a><\/em><\/p>\n<p>\u6dfb\u52a0\u81ea\u52a8\u8bc4\u4f30\u8d85\u51fa\u4e86\u6b64\u6f14\u793a\u7684\u8303\u56f4\uff0c\u4f46\u6211\u4eec\u5f3a\u70c8\u5efa\u8bae\u4f60\u4e3a\u81ea\u5df1\u7684\u7528\u4f8b\u521b\u5efa\u4e00\u4e2a\u8bc4\u4f30\u6570\u636e\u96c6\u3002\u4f60\u53ef\u4ee5\u4f7f\u7528 LLM-as-a-judge \u8fdb\u884c\u5f15\u5bfc\uff08\u8bf7\u53c2\u9605\u8fd9\u7bc7\u00a0<a href=\"https:\/\/blog.vespa.ai\/improving-retrieval-with-llm-as-a-judge\/\">\u535a\u5ba2\u6587\u7ae0<\/a>\uff0c\u4e86\u89e3\u6211\u4eec\u5982\u4f55\u4e3a\u00a0<a href=\"https:\/\/search.vespa.ai\/\">search.vespa.ai<\/a>\u00a0\u5b9e\u73b0\u8fd9\u4e00\u70b9\uff09\u3002<\/p>\n<p>Vespa \u63d0\u4f9b\u4e86\u8bb8\u591a\u53ef\u8c03\u53c2\u6570\uff0c\u901a\u8fc7\u5bf9\u4e0d\u540c\u5b9e\u9a8c\u8fdb\u884c\u5b9a\u91cf\u53cd\u9988\uff0c\u4f60\u53ef\u4ee5\u4e3a\u81ea\u5df1\u7684\u5177\u4f53\u7528\u4f8b\u627e\u5230\u6700\u5438\u5f15\u4eba\u7684\u6743\u8861\u3002<\/p>\n<h2>\u7ed3\u8bba<\/h2>\n<p>\u6211\u4eec\u5df2\u6784\u5efa\u4e86\u4e00\u4e2a\u5b9e\u65f6\u6f14\u793a\u5e94\u7528\uff0c\u5c55\u793a\u5982\u4f55\u5728 Vespa \u4e2d\u4f7f\u7528 ColPali \u5d4c\u5165\u5bf9 PDF \u6267\u884c\u89c6\u89c9 RAG \u68c0\u7d22\u3002<\/p>\n<p>\u5982\u679c\u4f60\u5df2\u7ecf\u9605\u8bfb\u5230\u8fd9\u91cc\uff0c\u53ef\u80fd\u5bf9\u4ee3\u7801\u611f\u5174\u8da3\u3002\u4f60\u53ef\u4ee5\u5728\u00a0<a href=\"https:\/\/github.com\/vespa-engine\/sample-apps\/tree\/master\/visual-retrieval-colpali\">\u6b64\u5904<\/a>\u00a0\u627e\u5230\u8be5\u5e94\u7528\u7684\u4ee3\u7801\u3002<\/p>\n<p>\u73b0\u5728\uff0c\u53bb\u6784\u5efa\u4f60\u81ea\u5df1\u7684\u89c6\u89c9 RAG \u5e94\u7528\u5427\uff01<\/p>\n<p>\u5bf9\u4e8e\u60f3\u4e86\u89e3\u66f4\u591a\u5173\u4e8e\u89c6\u89c9\u68c0\u7d22\u3001ColPali \u6216 Vespa \u7684\u4eba\uff0c\u53ef\u4ee5\u968f\u65f6\u52a0\u5165\u00a0<a href=\"https:\/\/vespatalk.slack.com\/\">Vespa \u7684 Slack \u793e\u533a<\/a>\u00a0\u63d0\u95ee\u3001\u5bfb\u6c42\u793e\u533a\u7684\u5e2e\u52a9\u6216\u4e86\u89e3 Vespa \u7684\u6700\u65b0\u53d1\u5c55\u3002<\/p>\n<h2>\u5e38\u89c1\u95ee\u9898<\/h2>\n<p><strong>\u4f7f\u7528 ColPali \u662f\u5426\u9700\u8981\u5728\u63a8\u7406\u65f6\u4f7f\u7528 GPU\uff1f<\/strong><\/p>\n<p>\u76ee\u524d\uff0c\u4e3a\u4e86\u5728\u5408\u7406\u7684\u65f6\u95f4\u5185\u5bf9\u67e5\u8be2\u8fdb\u884c\u63a8\u7406\uff0c\u6211\u4eec\u9700\u8981\u4f7f\u7528 GPU\u3002<\/p>\n<p>\u672a\u6765\uff0c\u6211\u4eec\u9884\u8ba1\u7c7b\u4f3c ColPali \u6a21\u578b\u7684\u8d28\u91cf\u548c\u6548\u7387\uff08\u5982\u66f4\u5c0f\u7684\u5d4c\u5165\uff09\u5c06\u6709\u6240\u63d0\u5347\uff0c\u5e76\u4f1a\u6709\u66f4\u591a\u7c7b\u4f3c\u7684\u6a21\u578b\u51fa\u73b0\uff0c\u5c31\u50cf\u6211\u4eec\u770b\u5230\u7684 ColBERT \u7cfb\u5217\u6a21\u578b\u4e00\u6837\uff0c\u4f8b\u5982 answer.ai \u7684\u00a0<a href=\"https:\/\/huggingface.co\/answerdotai\/answerai-colbert-small-v1\">answerai-colbert-small-v1<\/a>\uff0c\u5176\u6027\u80fd\u5df2\u8d85\u8fc7\u539f\u59cb ColBERT \u6a21\u578b\uff0c\u5c3d\u7ba1\u4f53\u79ef\u4e0d\u5230\u539f\u6a21\u578b\u7684\u4e09\u5206\u4e4b\u4e00\u3002<\/p>\n<p>\u8bf7\u53c2\u9605\u00a0<a href=\"https:\/\/blog.vespa.ai\/introducing-answerai-colbert-small\/\">Vespa \u535a\u5ba2<\/a>\u00a0\uff0c\u4e86\u89e3\u5982\u4f55\u5728 Vespa \u4e2d\u4f7f\u7528\u00a0<code>answerai-colbert-small-v1<\/code>\u3002<\/p>\n<p><strong>\u80fd\u5426\u5728 Vespa \u4e2d\u5c06 ColPali \u4e0e\u67e5\u8be2\u8fc7\u6ee4\u5668\u7ed3\u5408\u4f7f\u7528\uff1f<\/strong><\/p>\n<p>\u53ef\u4ee5\u3002\u5728\u8fd9\u4e2a\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u4e3a\u9875\u9762\u6dfb\u52a0\u4e86\u00a0<code>published_year<\/code>\u00a0\u5b57\u6bb5\uff0c\u4f46\u5c1a\u672a\u5728\u524d\u7aef\u5b9e\u73b0\u5176\u4f5c\u4e3a\u8fc7\u6ee4\u9009\u9879\u7684\u529f\u80fd\u3002<\/p>\n<p><strong>Vespa \u4f55\u65f6\u4f1a\u539f\u751f\u652f\u6301 ColPali \u5d4c\u5165\uff1f<\/strong><\/p>\n<p>\u8bf7\u53c2\u9605\u00a0<a href=\"https:\/\/github.com\/vespa-engine\/vespa\/issues\/32389\">\u6b64 GitHub \u95ee\u9898<\/a>\u3002<\/p>\n<p><strong>\u8fd9\u80fd\u6269\u5c55\u5230\u6570\u5341\u4ebf\u6587\u6863\u5417\uff1f<\/strong><\/p>\n<p>\u53ef\u4ee5\u3002Vespa \u652f\u6301\u6c34\u5e73\u6269\u5c55\uff0c\u5e76\u5141\u8bb8\u4f60\u6839\u636e\u7279\u5b9a\u7528\u4f8b\u8c03\u6574\u5ef6\u8fdf\u3001\u6210\u672c\u548c\u8d28\u91cf\u4e4b\u95f4\u7684\u6743\u8861\u3002<\/p>\n<p><strong>\u8fd9\u4e2a\u6f14\u793a\u53ef\u4ee5\u6539\u7f16\u4e3a\u652f\u6301 ColQwen2 \u5417\uff1f<\/strong><\/p>\n<p>\u53ef\u4ee5\uff0c\u4f46\u5728\u8ba1\u7b97\u76f8\u4f3c\u6027\u56fe\u65f6\u5b58\u5728\u4e00\u4e9b\u5dee\u5f02\u3002<\/p>\n<p>\u8bf7\u53c2\u9605\u00a0<a href=\"https:\/\/github.com\/tonywu71\/colpali-cookbooks\/blob\/main\/examples\/gen_colqwen2_similarity_maps.ipynb\">\u6b64 notebook<\/a>\u00a0\u4f5c\u4e3a\u8d77\u70b9\u3002<\/p>\n<p><strong>\u6211\u53ef\u4ee5\u7528\u81ea\u5df1\u7684\u6570\u636e\u8fd0\u884c\u6b64\u6f14\u793a\u5417\uff1f<\/strong><\/p>\n<p>\u5f53\u7136\u53ef\u4ee5\uff01\u901a\u8fc7\u8c03\u6574\u63d0\u4f9b\u7684\u00a0<a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/examples\/visual_pdf_rag_with_vespa_colpali_cloud.html\">notebook<\/a>\u00a0\u6307\u5411\u4f60\u7684\u6570\u636e\uff0c\u4f60\u53ef\u4ee5\u4e3a\u89c6\u89c9 RAG \u8bbe\u7f6e\u81ea\u5df1\u7684 Vespa \u5e94\u7528\u3002\u4f60\u8fd8\u53ef\u4ee5\u5c06\u63d0\u4f9b\u7684 Web \u5e94\u7528\u4f5c\u4e3a\u81ea\u5df1\u524d\u7aef\u7684\u8d77\u70b9\u3002<\/p>\n<h2>\u53c2\u8003\u6587\u732e<\/h2>\n<ol>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2407.01449v2\">ColPali: Efficient Document Retrieval with Vision Language Models<\/a><\/li>\n<li><a href=\"https:\/\/pyvespa.readthedocs.io\/en\/latest\/examples\/pdf-retrieval-with-ColQwen2-vlm_Vespa-cloud.html\">pyvespa notebook on using ColQwen2 with Vespa<\/a><\/li>\n<li><a href=\"https:\/\/blog.vespa.ai\/improving-retrieval-with-llm-as-a-judge\/\">Improving retrieval with LLM as a judge<\/a><\/li>\n<li><a href=\"https:\/\/blog.vespa.ai\/scaling-colpali-to-billions\/\">Scaling ColPali to billions<\/a><\/li>\n<li><a href=\"https:\/\/arxiv.org\/pdf\/2406.11251\">Document Screenshot Embeddings<\/a><\/li>\n<li><a href=\"https:\/\/research.ibm.com\/blog\/docling-generative-AI?t\">Docling<\/a><\/li>\n<li><a href=\"https:\/\/fastht.ml\/\">FastHTML<\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>\u4ecb\u7ecd Thomas \u4e8e 2024 \u5e74 4 \u6708\u52a0\u5165 Vespa \u62c5\u4efb\u9ad8\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u3002\u5728\u4ed6\u4e4b\u524d\u4f5c\u4e3a AI \u987e\u95ee\u7684\u6700\u540e\u4e00\u4e2a\u4efb\u52a1\u4e2d\uff0c\u4ed6\u5b9e\u9645\u4e0a\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u4e8e Vespa \u7684\u5927\u89c4\u6a21 PDF \u96c6\u5408\u7684 RAG \u5e94\u7528\u3002 PDF \u5728\u4f01\u4e1a\u4e16\u754c\u4e2d\u65e0\u5904\u4e0d\u5728\uff0c\u4ece\u4e2d\u641c\u7d22&#8230;<\/p>\n","protected":false},"author":1,"featured_media":31260,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[425,20,459],"tags":[243],"class_list":["post-14094","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-professional","category-tool","category-rag-project","tag-aizhishikuyukefu"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/14094","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/comments?post=14094"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/posts\/14094\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media\/31260"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/media?parent=14094"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/categories?post=14094"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/de\/wp-json\/wp\/v2\/tags?post=14094"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}