Setting Up LancedbRM

LancedbRM can be instantiated with any custom vectorizer and configured to return any payload field.

  • table_name (str): The name of the table to query against.
  • persist_directory (str): directory where database is stored.
  • k (int, optional): The number of top passages to retrieve. Defaults to 3.

Under the Hood

forward(self, query_or_queries: Union[str, list[str]], k: Optional[int] = None) -> dspy.Prediction


  • query_or_queries (Union[str, List[str]]): The query or queries to search for.
  • k (Optional[int]): The number of top passages to retrieve. Defaults to self.k.


  • dspy.Prediction: Contains the retrieved passages, each represented as a dotdict with a long_text attribute.

Example Usage

import os
import pandas as pd

import dspy
from dspy.retrieve.lancedb import LancedbRM

from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry

from datasets import load_dataset

# load_dataset
ds = load_dataset("fancyzhx/dbpedia_14")
df = pd.DataFrame(ds['train'])


uri = 'tmp/db'
table_name = 'passages'
db = lancedb.connect(uri)
model = get_registry().get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5", device="cpu")

class Passages(LanceModel):
    text: str = model.SourceField()
    vector: Vector(model.ndims()) = model.VectorField()

table = db.create_table(table_name, schema=Passages)

lancedb_retriever = LancedbRM(

retrieve = dspy.Retrieve()

retrieve("Integrated Circuits")