Zerank-1 is a state-of-the-art reranker model developed by ZeroEntropy. It plays a key role as a "second filter" in information retrieval or semantic search systems. First, an initial retrieval system (e.g., a vector search) quickly identifies a set of potentially relevant content from a large library of documents. Zerank-1 then analyzes and re-ranks this initial set of content to produce a list of results that are more precise and more in line with the user's query intent. The core advantage of this model is that it processes both the user query and the individual documents, thus capturing the deeper and more subtle semantic associations between the two. Based on publicly available benchmark data, Zerank-1 outperforms similar models in several areas, significantly improving the accuracy of the search system.
Function List
- Calculating the relevance score: The model takes as input a "query-document" pair and outputs a score that represents the relevance of the document to the query.
- Reordering of results: After an initial search (e.g., vector or keyword search), the model is used to perform a secondary ranking of candidate documents, placing the most relevant ones at the top.
- Improve search accuracy: Improve the quality of the information ultimately provided to the user or to the large-scale language model (LLM) by filtering out potentially distracting items or less relevant content in the initial search through fine-grained sorting.
- Cross-domain applicability: Excellent performance in a wide range of specialties, including finance, law, medicine, code, and conversation.
- Support for RAG systems: The accuracy and relevance of content generated by large language models can be significantly improved by providing more precise contextual information in the Retrieval Augmented Generation (RAG) process.
Using Help
What is Reranking?
Before describing how Zerank-1 can be used, it is important to understand where it fits into the overall search process. A modern intelligent search system is usually divided into two stages:
- Recall/Retrieval (Retrieval):: This is the first stage. The system uses a fast and efficient method (e.g., keyword-based BM25 algorithm or vector-based similarity search) to quickly find hundreds or thousands of documents from a large database that may be relevant to the user's query. The goal of this phase is to "kill a thousand before you miss one" and to ensure that as many relevant documents as possible are included, so speed is a primary concern.
- Reranking:: This is the second phase. As the first phase pursues speed and breadth, the accuracy of the results may not be high enough. A reordering model (e.g., Zerank-1) steps in at this point. It evaluates the documents recalled in the first stage one by one in a fine-grained way, calculates the precise relevance score of each document to the query, and then re-ranks them according to this score. This approach is more computationally intensive and slower, but more accurate.
Zerank-1 is a Cross-Encoder model specialized for the second stage.
How to use Zerank-1 in Python
via Hugging Face sentence-transformers
library, it is very easy to use Zerank-1 in your projects.
Step 1: Install the necessary libraries
If you don't already have it installed in your environment sentence-transformers
cap (a poem) torch
This can be accomplished by pip
Perform the installation.
pip install -U sentence-transformers torch
Step 2: Load the model
utilization CrossEncoder
Classes loaded from Hugging Face Hub zeroentropy/zerank-1
Models. The first time you run the program, it automatically downloads the model files to the local cache.
from sentence_transformers import CrossEncoder
# 加载模型,trust_remote_code=True 是必须的
model = CrossEncoder("zeroentropy/zerank-1", trust_remote_code=True)
Step 3: Prepare queries and documents
You need to prepare a query and a set of documents to be sorted. The format of the data is a list of tuples, each of which is a (查询, 文档)
The form of the
For example, suppose you have a query and several documents retrieved from the first stage:
query = "全球变暖的主要原因是什么?"
documents = [
"太阳活动周期的变化是地球气候波动的一个自然因素。",
"根据IPCC的报告,人类活动,特别是温室气体排放,是自20世纪中期以来观测到的全球变暖的主要驱动力。",
"火山爆发会向大气中释放气溶胶,短期内可能导致地球表面温度下降。",
"燃烧化石燃料(如煤、石油和天然气)为交通和电力生产提供能源,会释放大量的二氧化碳。"
]
Step 4: Make predictions and get scores
Combine the query and document into the input format required by the model, and then call the model.predict()
Methods.
# 按照 (查询, 文档) 的格式创建输入对
query_document_pairs = [(query, doc) for doc in documents]
# 使用模型预测相关性得分
scores = model.predict(query_document_pairs)
print("各文档的相关性得分:", scores)
The output will be a NumPy array where each value corresponds to the relevance score of an input pair. A higher score means that the document is more semantically relevant to the query.
Step 5: Sort according to scores
Now you can map the scores to the original document and sort the scores from highest to lowest to get the final list of exact results.
import numpy as np
# 将分数和文档打包
scored_documents = list(zip(scores, documents))
# 按分数降序排序
sorted_documents = sorted(scored_documents, key=lambda x: x[0], reverse=True)
# 打印排序后的结果
print("重排序后的结果:")
for score, doc in sorted_documents:
print(f"得分: {score:.4f} - 文档: {doc}")
After executing the above code, you will see that the most relevant documents related to the "main causes of global warming" are at the top of the list, which is exactly how the reordering model works.
application scenario
- Enterprise-class semantic search
For search engines for internal knowledge bases or websites, users often enter queries that are not simple keywords. zerank-1 can be deployed after primary searches (e.g., Elasticsearch or vector searches) to reorder the first 100 results returned to ensure that the top 10 results are the ones that the user wants to see most, dramatically improving the search experience and efficiency. - Retrieval Augmentation Generation (RAG)
In RAG applications, the quality of a Large Language Model (LLM) answering a question depends directly on the quality of the contextual information provided to it. Using Zerank-1 to reorder the retrieved document fragments before feeding them to the LLM filters out irrelevant or noisy information and retains only the most relevant context to the question, thus allowing the LLM to generate more accurate and factual answers. - intelligent question and answer system (Q&A)
In automated customer service, technical support and other Q&A scenarios, the system needs to find the paragraph that best answers the user's question from a large number of FAQ documents, product manuals, etc. Zerank-1 can accurately calculate the degree of match between each candidate paragraph and the user's question, so as to locate the most accurate answer and improve the problem solving rate. - Document or code de-duplication
When working with large document collections or codebases, one may encounter a large number of semantically repetitive but not identical textual content. By treating one document as a "query" and other documents as "items to be compared", Zerank-1 can be used to compute the relevance scores between them, thus effectively identifying and clustering documents or code snippets with similar content.
QA
- What is the difference between Zerank-1 and a regular Embedding Model?
Ordinary embedding models (also called Bi-Encoder) generate a vector (i.e., an embedding) for the query and the document independently, and then determine the relevance by calculating the cosine similarity of these two vectors. Zerank-1, on the other hand, as a Cross-Encoder, processes both the query and the document, which allows it to capture more complex interactions and semantic relationships between the two, and thus typically has higher ranking accuracy, but with a corresponding increase in computational cost. - Is Zerank-1 free?
zeroentropy/zerank-1
The model itself is available on Hugging Face with a license restriction for its use. It is under a non-commercial license (non-commercial license). If you need to use it for commercial purposes, you need to contact ZeroEntropy directly for a license. However, they also offer a slightly smaller, fully open source (Apache 2.0 license) versionzeroentropy/zerank-1-small
, available for free commercial use. - What kind of hardware do I need to use Zerank-1?
As a deep learning model, using the GPU speeds up its computation considerably, especially when it has to deal with sorting a large number of documents. However, it can also run on the CPU when the amount of data is not too large, it will just be slower. The exact needs depend on your application scenario and performance requirements. - What is the range of scores output by the model?
The model output is a floating point number representing the correlation. The score itself has no fixed upper or lower limit; it is a relative value. When sorting a batch of documents, you only need to compare how high or low their respective scores are; the higher the score, the more relevant the document.