retrieve.MilvusRM

Constructor

Initialize an instance of the MilvusRM class, with the option to use OpenAI's text-embedding-3-small embeddings or any customized embedding function.

MilvusRM(
    collection_name: str,
    uri: Optional[str] = "http://localhost:19530",
    token: Optional[str] = None,
    db_name: Optional[str] = "default",
    embedding_function: Optional[Callable] = None,
    k: int = 3,
)

Parameters:

collection_name (str): The name of the Milvus collection to query against.
uri (str, optional): The Milvus connection uri. Defaults to "http://localhost:19530".
token (str, optional): The Milvus connection token. Defaults to None.
db_name (str, optional): The Milvus database name. Defaults to "default".
embedding_function (callable, optional): The function to convert a list of text to embeddings. The embedding function should take a list of text strings as input and output a list of embeddings. Defaults to None. By default, it will get OpenAI client by the environment variable OPENAI_API_KEY and use OpenAI's embedding model "text-embedding-3-small" with the default dimension.
k (int, optional): The number of top passages to retrieve. Defaults to 3.

Methods

`forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None) -> dspy.Prediction`

Search the Milvus collection for the top k passages matching the given query or queries, using embeddings generated via the default OpenAI embedding or the specified embedding_function.

Parameters:

query_or_queries (Union[str, List[str]]): The query or list of queries to search for.
k (Optional[int], optional): The number of results to retrieve. If not specified, defaults to the value set during initialization.

Returns:

dspy.Prediction: Contains the retrieved passages, each represented as a dotdict with schema [{"id": str, "score": float, "long_text": str, "metadatas": dict }]

Quickstart

To support passage retrieval, it assumes that a Milvus collection has been created and populated with the following field:

text: The text of the passage

MilvusRM uses OpenAI's text-embedding-3-small embedding by default or any customized embedding function. While different options are available, the examples below demonstrate how to utilize the default OpenAI embeddings and a customized embedding function using the BGE model.

Default OpenAI Embeddings

from dspy.retrieve.milvus_rm import MilvusRM
import os

os.envrion["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"

retriever_model = MilvusRM(
    collection_name="<YOUR_COLLECTION_NAME>",
    uri="<YOUR_MILVUS_URI>",
    token="<YOUR_MILVUS_TOKEN>"  # ignore this if no token is required for Milvus connection
    )

results = retriever_model("Explore the significance of quantum computing", k=5)

for result in results:
    print("Document:", result.long_text, "\n")

Customized Embedding Function

from dspy.retrieve.milvus_rm import MilvusRM
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('BAAI/bge-base-en-v1.5')

def bge_embedding_function(texts: List[str]):
    embeddings = model.encode(texts, normalize_embeddings=True)
    return embeddings

retriever_model = MilvusRM(
    collection_name="<YOUR_COLLECTION_NAME>",
    uri="<YOUR_MILVUS_URI>",
    token="<YOUR_MILVUS_TOKEN>",  # ignore this if no token is required for Milvus connection
    embedding_function=bge_embedding_function
    )

results = retriever_model("Explore the significance of quantum computing", k=5)

for result in results:
    print("Document:", result.long_text, "\n")

retrieve.MilvusRM

Constructor​

Methods​

forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None) -> dspy.Prediction​

Quickstart​

Default OpenAI Embeddings​

Customized Embedding Function​