{"id":21381,"date":"2025-02-17T10:19:12","date_gmt":"2025-02-17T02:19:12","guid":{"rendered":"https:\/\/www.aisharenet.com\/?p=21381"},"modified":"2025-05-22T15:57:54","modified_gmt":"2025-05-22T07:57:54","slug":"confident-ai","status":"publish","type":"post","link":"https:\/\/www.kdjingpai.com\/en\/confident-ai\/","title":{"rendered":"Confident AI"},"content":{"rendered":"<p>DeepEval\u662f\u4e00\u4e2a\u7b80\u5355\u6613\u7528\u7684\u5f00\u6e90LLM\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6d4b\u8bd5\u5927\u8bed\u8a00\u6a21\u578b\u7cfb\u7edf\u3002\u5b83\u7c7b\u4f3c\u4e8ePytest\uff0c\u4f46\u4e13\u6ce8\u4e8eLLM\u8f93\u51fa\u7684\u5355\u5143\u6d4b\u8bd5\u3002DeepEval\u7ed3\u5408\u6700\u65b0\u7684\u7814\u7a76\u6210\u679c\uff0c\u901a\u8fc7G-Eval\u3001\u5e7b\u89c9\u68c0\u6d4b\u3001\u7b54\u6848\u76f8\u5173\u6027\u3001RAGAS\u7b49\u6307\u6807\uff0c\u5bf9LLM\u8f93\u51fa\u8fdb\u884c\u8bc4\u4f30\u3002\u65e0\u8bba\u4f60\u7684\u5e94\u7528\u662f\u901a\u8fc7RAG\u5b9e\u73b0\u8fd8\u662f\u5fae\u8c03\uff0cDeepEval\u90fd\u80fd\u5e2e\u52a9\u4f60\u786e\u5b9a\u6700\u4f73\u8d85\u53c2\u6570\uff0c\u4ece\u800c\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u53ef\u4ee5\u751f\u6210\u5408\u6210\u6570\u636e\u96c6\u3001\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55CI\/CD\u73af\u5883\u4e2d\uff0c\u5e76\u63d0\u4f9b40\u591a\u79cd\u5b89\u5168\u6f0f\u6d1e\u7684\u7ea2\u961f\u6d4b\u8bd5\u529f\u80fd\u3002\u8be5\u6846\u67b6\u8fd8\u4e0eConfident AI\u5b8c\u5168\u96c6\u6210\uff0c\u652f\u6301\u6574\u4e2a\u5e73\u53f0\u7684\u8bc4\u4f30\u751f\u547d\u5468\u671f\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-21382\" title=\"Confident AI\uff1a\u81ea\u52a8\u5316\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u6846\u67b6\uff0c\u5bf9\u6bd4\u4e0d\u540c\u5927\u6a21\u578b\u63d0\u793a\u8bcd\u8f93\u51fa\u8d28\u91cf-1\" src=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/48499b0c1452fde.jpg\" alt=\"Confident AI\uff1a\u81ea\u52a8\u5316\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u6846\u67b6\uff0c\u5bf9\u6bd4\u4e0d\u540c\u5927\u6a21\u578b\u63d0\u793a\u8bcd\u8f93\u51fa\u8d28\u91cf-1\" width=\"1080\" height=\"643\" srcset=\"https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/48499b0c1452fde.jpg 1080w, https:\/\/www.kdjingpai.com\/wp-content\/uploads\/2025\/02\/48499b0c1452fde-768x457.jpg 768w\" sizes=\"auto, (max-width: 1080px) 100vw, 1080px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>\u529f\u80fd\u5217\u8868<\/h2>\n<ul>\n<li>\u591a\u79cdLLM\u8bc4\u4f30\u6307\u6807\uff0c\u5982G-Eval\u3001\u5e7b\u89c9\u68c0\u6d4b\u3001\u7b54\u6848\u76f8\u5173\u6027\u3001RAGAS\u7b49<\/li>\n<li>\u652f\u6301\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u81ea\u52a8\u96c6\u6210\u5230DeepEval\u751f\u6001\u7cfb\u7edf\u4e2d<\/li>\n<li>\u751f\u6210\u5408\u6210\u6570\u636e\u96c6\u7528\u4e8e\u8bc4\u4f30<\/li>\n<li>\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55CI\/CD\u73af\u5883\u4e2d<\/li>\n<li>\u7ea2\u961f\u6d4b\u8bd5\u529f\u80fd\uff0c\u68c0\u6d4b40\u591a\u79cd\u5b89\u5168\u6f0f\u6d1e<\/li>\n<li>\u57fa\u51c6\u6d4b\u8bd5\uff0c\u652f\u6301MMLU\u3001HellaSwag\u3001DROP\u7b49\u591a\u4e2a\u57fa\u51c6<\/li>\n<li>\u4e0eConfident AI\u5b8c\u5168\u96c6\u6210\uff0c\u652f\u6301\u4ece\u6570\u636e\u96c6\u521b\u5efa\u5230\u8bc4\u4f30\u7ed3\u679c\u8c03\u8bd5\u7684\u6574\u4e2a\u8bc4\u4f30\u751f\u547d\u5468\u671f<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>\u4f7f\u7528\u5e2e\u52a9<\/h2>\n<h3>\u5b89\u88c5<\/h3>\n<p>\u4f60\u53ef\u4ee5\u901a\u8fc7pip\u5b89\u88c5DeepEval\uff1a<\/p>\n<pre><code>pip install -U deepeval\r\n<\/code><\/pre>\n<p>\u63a8\u8350\u521b\u5efa\u4e00\u4e2a\u8d26\u6237\u4ee5\u751f\u6210\u53ef\u5171\u4eab\u7684\u4e91\u7aef\u6d4b\u8bd5\u62a5\u544a\uff1a<\/p>\n<pre><code>deepeval login\r\n<\/code><\/pre>\n<h3>\u64b0\u5199\u6d4b\u8bd5\u7528\u4f8b<\/h3>\n<p>\u521b\u5efa\u4e00\u4e2a\u6d4b\u8bd5\u6587\u4ef6\uff1a<\/p>\n<pre><code>touch test_chatbot.py\r\n<\/code><\/pre>\n<p>\u5728<code>test_chatbot.py<\/code>\u4e2d\u7f16\u5199\u7b2c\u4e00\u4e2a\u6d4b\u8bd5\u7528\u4f8b\uff1a<\/p>\n<pre><code>import pytest\r\nfrom deepeval import assert_test\r\nfrom deepeval.metrics import AnswerRelevancyMetric\r\nfrom deepeval.test_case import LLMTestCase\r\ndef test_case():\r\ncorrectness_metric = GEval(\r\nname=\"Correctness\",\r\ncriteria=\"Determine if the 'actual output' is correct based on the 'expected output'.\",\r\nevaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],\r\nthreshold=0.5\r\n)\r\ntest_case = LLMTestCase(\r\ninput=\"What if these shoes don't fit?\",\r\nactual_output=\"We offer a 30-day full refund at no extra costs.\",\r\nretrieval_context=[\"All customers are eligible for a 30 day full refund at no extra costs.\"]\r\n)\r\nassert_test(test_case, [correctness_metric])\r\n<\/code><\/pre>\n<p>\u5c06\u4f60\u7684<code>OPENAI_API_KEY<\/code>\u8bbe\u7f6e\u4e3a\u73af\u5883\u53d8\u91cf\uff1a<\/p>\n<pre><code>export OPENAI_API_KEY=\"...\"\r\n<\/code><\/pre>\n<p>\u5728CLI\u4e2d\u8fd0\u884c\u6d4b\u8bd5\u6587\u4ef6\uff1a<\/p>\n<pre><code>deepeval test run test_chatbot.py\r\n<\/code><\/pre>\n<h3>\u4f7f\u7528\u72ec\u7acb\u6307\u6807<\/h3>\n<p>DeepEval\u6781\u5177\u6a21\u5757\u5316\uff0c\u4f7f\u5f97\u4efb\u4f55\u4eba\u90fd\u53ef\u4ee5\u8f7b\u677e\u4f7f\u7528\u5176\u5404\u9879\u6307\u6807\uff1a<\/p>\n<pre><code>from deepeval.metrics import AnswerRelevancyMetric\r\nfrom deepeval.test_case import LLMTestCase\r\nanswer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7)\r\ntest_case = LLMTestCase(\r\ninput=\"What if these shoes don't fit?\",\r\nactual_output=\"We offer a 30-day full refund at no extra costs.\",\r\nretrieval_context=[\"All customers are eligible for a 30 day full refund at no extra costs.\"]\r\n)\r\nanswer_relevancy_metric.measure(test_case)\r\nprint(answer_relevancy_metric.score)\r\nprint(answer_relevancy_metric.reason)\r\n<\/code><\/pre>\n<h3>\u6279\u91cf\u8bc4\u4f30\u6570\u636e\u96c6<\/h3>\n<p>\u5728DeepEval\u4e2d\uff0c\u6570\u636e\u96c6\u53ea\u662f\u6d4b\u8bd5\u7528\u4f8b\u7684\u96c6\u5408\u3002\u4ee5\u4e0b\u662f\u5982\u4f55\u6279\u91cf\u8bc4\u4f30\u8fd9\u4e9b\u6570\u636e\u96c6\uff1a<\/p>\n<pre><code>import pytest\r\nfrom deepeval import assert_test\r\nfrom deepeval.metrics import HallucinationMetric, AnswerRelevancyMetric\r\nfrom deepeval.test_case import LLMTestCase\r\nfrom deepeval.dataset import EvaluationDataset\r\nfirst_test_case = LLMTestCase(input=\"...\", actual_output=\"...\", context=[\"...\"])\r\nsecond_test_case = LLMTestCase(input=\"...\", actual_output=\"...\", context=[\"...\"])\r\ndataset = EvaluationDataset(test_cases=[first_test_case, second_test_case])\r\n@pytest.mark.parametrize(\"test_case\", dataset)\r\ndef test_customer_chatbot(test_case: LLMTestCase):\r\nhallucination_metric = HallucinationMetric(threshold=0.3)\r\nanswer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)\r\nassert_test(test_case, [hallucination_metric, answer_relevancy_metric])\r\n<\/code><\/pre>\n<p>\u5728CLI\u4e2d\u8fd0\u884c\u6d4b\u8bd5\u6587\u4ef6\uff1a<\/p>\n<pre><code>deepeval test run test_&lt;filename&gt;.py -n 4\r\n<\/code><\/pre>\n<h3>\u4f7f\u7528Confident AI\u8fdb\u884cLLM\u8bc4\u4f30<\/h3>\n<p>\u767b\u5f55DeepEval\u5e73\u53f0\uff1a<\/p>\n<pre><code>deepeval login\r\n<\/code><\/pre>\n<p>\u8fd0\u884c\u6d4b\u8bd5\u6587\u4ef6\uff1a<\/p>\n<pre><code>deepeval test run test_chatbot.py\r\n<\/code><\/pre>\n<p>\u6d4b\u8bd5\u5b8c\u6210\u540e\uff0c\u4f60\u5c06\u5728CLI\u4e2d\u770b\u5230\u4e00\u4e2a\u94fe\u63a5\uff0c\u5c06\u5176\u7c98\u8d34\u5230\u6d4f\u89c8\u5668\u4e2d\u67e5\u770b\u7ed3\u679c\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>DeepEval\u662f\u4e00\u4e2a\u7b80\u5355\u6613\u7528\u7684\u5f00\u6e90LLM\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6d4b\u8bd5\u5927\u8bed\u8a00\u6a21\u578b\u7cfb\u7edf\u3002\u5b83\u7c7b\u4f3c\u4e8ePytest\uff0c\u4f46\u4e13\u6ce8\u4e8eLLM\u8f93\u51fa\u7684\u5355\u5143\u6d4b\u8bd5\u3002DeepEval\u7ed3\u5408\u6700\u65b0\u7684\u7814\u7a76\u6210\u679c\uff0c\u901a\u8fc7G-Eval\u3001\u5e7b\u89c9\u68c0\u6d4b\u3001\u7b54\u6848\u76f8\u5173\u6027\u3001RAGAS\u7b49\u6307\u6807\uff0c\u5bf9LLM\u8f93\u51fa&#8230;<\/p>\n","protected":false},"author":1,"featured_media":61853,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[404,501],"tags":[230,227],"class_list":["post-21381","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-chat","category-prompt-aids","tag-aikaiyuanxiangmu","tag-promptstishizhilinga"],"_links":{"self":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts\/21381","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/comments?post=21381"}],"version-history":[{"count":0,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/posts\/21381\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/media\/61853"}],"wp:attachment":[{"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/media?parent=21381"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/categories?post=21381"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdjingpai.com\/en\/wp-json\/wp\/v2\/tags?post=21381"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}