BootstrapFewShot
When compiling a DSPy program, we generally invoke a teleprompter, which is an optimizer that takes the program, a training set, and a metric—and returns a new optimized program. Different teleprompters apply different strategies for optimization. This family of teleprompters is focused on optimizing the few shot examples. Let's take an example of a Sample pipeline and see how we can use teleprompter to optimizes it.
Setting up a Sample Pipeline
We'll be making a basic answer generation pipeline over GSM8K dataset that we saw in the Minimal Example, we won't be changing anything in it! So let's start by configuring the LM which will be OpenAI LM client with gpt-3.5-turbo
as the LLM in use.
import dspy
turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=250)
dspy.configure(lm=turbo)
Now that we have the LM client setup it's time to import the train-dev split in GSM8k
class that DSPy provides us:
import random
from dspy.datasets import DataLoader
dl = DataLoader()
gsm8k = dl.from_huggingface(
"gsm8k",
"main",
input_keys = ("question",),
)
trainset, devset = gsm8k['train'], random.sample(gsm8k['test'], 50)
We'll now define a basic QA inline signature i.e. question->answer
and pass it to ChainOfThought
module, that applies necessary addition for CoT style prompting to the Signature.
class CoT(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.prog(question=question)
Now we need to evaluate this pipeline too!! So we'll use the Evaluate
class that DSPy provides us, as for the metric we'll use the gsm8k_metric
that we imported above.
from dspy.evaluate import Evaluate
evaluate = Evaluate(devset=devset[:], metric=gsm8k_metric, num_threads=NUM_THREADS, display_progress=True, display_table=False)
To evaluate the CoT
pipeline we'll need to create an object of it and pass it as an arg to the evaluator
call.
cot_baseline = CoT()
evaluate(cot_baseline, devset=devset[:])
Now we have the baseline pipeline ready to use, so let's try using the BootstrapFewShot
teleprompter and optimizing our pipeline to make it even better!
Using BootstrapFewShot
Let's start by importing and initializing our teleprompter, for the metric we'll be using the same gsm8k_metric
imported and used above:
from dspy.teleprompt import BootstrapFewShot
teleprompter = BootstrapFewShot(
metric=gsm8k_metric,
max_bootstrapped_demos=8,
max_labeled_demos=8,
)
metric
is a an obvious parameter but what are max_bootstrapped_demos
and max_labeled_demos
parameters? Let's see the difference via a table:
Feature | max_labeled_demos | max_bootstrapped_demos |
---|---|---|
Purpose | Refers to the maximum number of labeled demonstrations (examples) that will be used for training the student module directly. Labeled demonstrations are typically pre-existing, manually labeled examples that the module can learn from. | Refers to the maximum number of demonstrations that will be bootstrapped. Bootstrapping in this context likely means generating new training examples based on the predictions of a teacher module or some other process. These bootstrapped demonstrations are then used alongside or instead of the manually labeled examples. |
Training Usage | Directly used in training; typically more reliable due to manual labeling. | Augment training data; potentially less accurate as they are generated examples. |
Data Source | Pre-existing dataset of manually labeled examples. | Generated during the training process, often using outputs from a teacher module. |
Influence on Training | Higher quality and reliability, assuming labels are accurate. | Provides more data but may introduce noise or inaccuracies. |
This teleprompter augments any necessary field even if you data doesn't have it, for example we don't have rationale for labelling however you'll see the rationale for each few shot example in the prompt that this teleprompter curated, how? By generating them all via a teacher
module which is an optional parameter, since we didn't pass it the teleprompted creates a teacher from the module we are training or the student
module.
In the next section, we'll seen this process step by step but for now let's start optimizing our CoT
module by calling the compile
method in the teleprompter:
cot_compiled = teleprompter.compile(CoT(), trainset=trainset, valset=devset)
Once the training is done you'll have a more optimized module that you can save or load again for use anytime:
cot_compiled.save('turbo_gsm8k.json')
# Loading:
# cot = CoT()
# cot.load('turbo_gsm8k.json')
How BootstrapFewShot
works?
LabeledFewShot
is the most vanilla teleprompter that takes a training set as input and it assigns subsets of the trainset in each student's predictor's demos attribute. You can think of it as basically adding few shot examples to the prompt.
BootStrapFewShot
starts by doing this only, it starts by:
-
Initializing a student program which which is the one we are optimizing and a teacher program which unless specified otherwise is a clone of the student.
-
Then to the teacher it add the demos by using
LabeledFewShot
teleprompter. -
Mappings are created between the names of predictors and their corresponding instances in both student and teacher models.
-
The maximum number of bootstrap demonstrations (max_bootstraps) is determined. This limits the amount of initial training data generated.
-
The process iterates over each example in the training set. For each example, the method checks if the maximum number of bootstraps has been reached. If so, the process stops.
-
For each training example, the teacher model attempts to generate a prediction.
-
If the teacher model successfully generates a prediction, the trace of this prediction process is captured. This trace includes details about which predictors were called, the inputs they received, and the outputs they produced.
-
If the prediction is successful, a demonstration (demo) is created for each step in the trace. This demo includes the inputs to the predictor and the outputs it generated.
This is how it works, aside from that. BootstrapFewShotWithOptuna
, BootstrapFewShotWithRandomSearch
etc. which work on the same principle with slight changes in the example discovery process.