Skip to main content

dspy.ColBERTv2

Constructor

The constructor initializes the ColBERTv2 class instance and sets up the request parameters for interacting with the ColBERTv2 server.

class ColBERTv2:
def __init__(
self,
url: str = "http://0.0.0.0",
port: Optional[Union[str, int]] = None,
post_requests: bool = False,
):

Parameters:

  • url (str): URL for ColBERTv2 server.
  • port (Union[str, int], Optional): Port endpoint for ColBERTv2 server. Defaults to None.
  • post_requests (bool, Optional): Flag for using HTTP POST requests. Defaults to False.

Methods

__call__(self, query: str, k: int = 10, simplify: bool = False) -> Union[list[str], list[dotdict]]

Enables making queries to the ColBERTv2 server for retrieval. Internally, the method handles the specifics of preparing the request prompt and corresponding payload to obtain the response. The function handles the retrieval of the top-k passages based on the provided query.

Parameters:

  • query (str): Query string used for retrieval.
  • k (int, optional): Number of passages to retrieve. Defaults to 10.
  • simplify (bool, optional): Flag for simplifying output to a list of strings. Defaults to False.

Returns:

  • Union[list[str], list[dotdict]]: Depending on simplify flag, either a list of strings representing the passage content (True) or a list of dotdict instances containing passage details (False).

Quickstart

import dspy

colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

retrieval_response = colbertv2_wiki17_abstracts('When was the first FIFA World Cup held?', k=5)

for result in retrieval_response:
print("Text:", result['text'], "\n")

dspy.ColBERTv2RetrieverLocal

This is taken from the official documentation of Colbertv2 following the paper.

You can install Colbertv2 by the following instructions from here

Constructor

The constructor initializes the ColBERTv2 as a local retriever object. You can initialize a server instance from your ColBERTv2 local instance using the code snippet from here

class ColBERTv2RetrieverLocal:
def __init__(
self,
passages:List[str],
colbert_config=None,
load_only:bool=False):

Parameters

  • passages (List[str]): List of passages to be indexed
  • colbert_config (ColBERTConfig, Optional): colbert config for building and searching. Defaults to None.
  • load_only (Boolean): whether to load the index or build and then load. Defaults to False.

The colbert_config object is required for ColBERTv2, and it can be imported from from colbert.infra.config import ColBERTConfig. You can find the descriptions of config attributes from here

Methods

forward(self, query:str, k:int, **kwargs) -> Union[list[str], list[dotdict]]

It retrieves relevant passages from the index based on the query. If you already have a local index, then you can pass the load_only flag as True and change the index attribute of ColBERTConfig to the local path. Also, make sure to change the checkpoint attribute of ColBERTConfig to the embedding model that you used to build the index.

Parameters:

  • query (str): Query string used for retrieval.
  • k (int, optional): Number of passages to retrieve. Defaults to 7

It returns a Prediction object for each query

Prediction(
pid=[33, 6, 47, 74, 48],
passages=['No pain, no gain.', 'The best things in life are free.', 'Out of sight, out of mind.', 'To be or not to be, that is the question.', 'Patience is a virtue.']
)

dspy.ColBERTv2RerankerLocal

You can also use ColBERTv2 as a reranker in DSPy.

Constructor

class ColBERTv2RerankerLocal:

def __init__(
self,
colbert_config=None,
checkpoint:str='bert-base-uncased'):

Parameters

  • colbert_config (ColBERTConfig, Optional): colbert config for building and searching. Defaults to None.
  • checkpoint (str): Embedding model for embeddings the documents and query

Methods

forward(self,query:str,passages:List[str])

Based on a query and list of passages, it reranks the passages and returns the scores along with the passages ordered in descending order based on the similarity scores.

Parameters:

  • query (str): Query string used for reranking.
  • passages (List[str]): List of passages to be reranked

It returns the similarity scores array and you can link it to the passages by

for idx in np.argsort(scores_arr)[::-1]:
print(f"Passage = {passages[idx]} --> Score = {scores_arr[idx]}")