Skip to main content

retrieve.MyScaleRM

Constructor

Initializes an instance of the MyScaleRM class, which is designed to use MyScaleDB (a ClickHouse fork optimized for vector similarity and full-text search) to retrieve documents based on query embeddings. This class supports embedding generation using either local models or OpenAI's API and manages database interactions efficiently.

Syntax

MyScaleRM(
client: clickhouse_connect.driver.client.Client,
table: str,
database: str = 'default',
metadata_columns: List[str] = ['text'],
vector_column: str = 'vector',
k: int = 3,
openai_api_key: Optional[str] = None,
openai_model: Optional[str] = None,
local_embed_model: Optional[str] = None
)

Parameters for MyScaleRM Constructor

  • client (clickhouse_connect.driver.client.Client): A client connection to the MyScaleDB database, used to execute queries and manage interactions with the database.
  • table (str): Specifies the table within MyScaleDB from which data will be retrieved. This table should be equipped with a vector column for conducting similarity searches.
  • database (str, optional): The name of the database where the table is located, defaulting to "default".
  • metadata_columns (List[str], optional): Columns to include as metadata in the output, defaulting to ["text"].
  • vector_column (str, optional): The column that contains vector data, used for similarity searches, defaulting to "vector".
  • k (int, optional): The number of closest matches to return for each query, defaulting to 3.
  • openai_api_key (str, optional): API key for accessing OpenAI services, necessary if using OpenAI for embedding generation.
  • openai_model (str, optional): The specific OpenAI model to use for embeddings, required if an OpenAI API key is provided.
  • local_embed_model (str, optional): Specifies a local model for embedding generation, chosen if local computation is preferred.

Methods

forward

Executes a retrieval operation based on a user's query and returns the top k relevant results using the embeddings generated by the specified method.

Syntax

def forward(self, user_query: str, k: Optional[int] = None) -> dspy.Prediction

Parameters

  • user_query (str): The query or list of queries for which to retrieve matching passages.
  • k (Optional[int], optional): The number of top matches to retrieve. If not provided, it defaults to the k value set during class initialization.

Returns

  • dspy.Prediction: Contains the retrieved passages, formatted as a list of dotdict objects. Each entry includes:
    • long_text (str): The text content of the retrieved passage.

Description

The forward method leverages the MyScaleDB's vector search capabilities to find the top k passages that best match the provided query. This method is integral for utilizing the MyScaleRM class to access and retrieve data efficiently based on semantic similarity, facilitated by the chosen embedding generation technique (either via a local model or the OpenAI API).

Quickstart

This section provides practical examples of how to instantiate and use the MyScaleRM class to retrieve data from MyScaleDB efficiently using text embeddings.

from dspy.retrieve.myscaledb_rm import MyScaleRM

MyScale_model = MyScaleRM(client=client,
table="table_name",
openai_api_key="sk-***",
openai_model="embeddings_model",
vector_column="vector_column_name",
metadata_columns=["add_your_columns_here"],
k=6)

MyScale_model("Please suggest me some funny movies")

passages = results.passages

# Loop through each passage and print the 'long_text'
for passage in passages:
print(passage['long_text'], "\n")