HFClientVLLM
HFClient vLLM
Prerequisites - Launching vLLM Server locally
Refer to the vLLM Server API for setting up the vLLM server locally.
#Example vLLM Server Launch
python -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-hf --port 8080
This command will start the server and make it accessible at http://localhost:8080
.
Setting up the vLLM Client
The constructor initializes the HFModel
base class to support the handling of prompting models, configuring the client for communicating with the hosted vLLM server to generate requests. This requires the following parameters:
model
(str): ID of model connected to the vLLM server.port
(int): Port for communicating to the vLLM server.url
(str): Base URL of hosted vLLM server. This will often be"http://localhost"
.**kwargs
: Additional keyword arguments to configure the vLLM client.
Example of the vLLM constructor:
class HFClientVLLM(HFModel):
def __init__(self, model, port, url="http://localhost", **kwargs):
Under the Hood
_generate(self, prompt, **kwargs) -> dict
Parameters:
prompt
(str): Prompt to send to model hosted on vLLM server.**kwargs
: Additional keyword arguments for completion request.
Returns:
dict
: dictionary withprompt
and list of responsechoices
.
Internally, the method handles the specifics of preparing the request prompt and corresponding payload to obtain the response.
After generation, the method parses the JSON response received from the server and retrieves the output through json_response["choices"]
and stored as the completions
list.
Lastly, the method constructs the response dictionary with two keys: the original request prompt
and choices
, a list of dictionaries representing generated completions with the key text
holding the response's generated text.
Using the vLLM Client
vllm_llama2 = dspy.HFClientVLLM(model="meta-llama/Llama-2-7b-hf", port=8080, url="http://localhost")
Sending Requests via vLLM Client
- Recommended Configure default LM using
dspy.configure
.
This allows you to define programs in DSPy and simply call modules on your input fields, having DSPy internally call the prompt on the configured LM.
dspy.configure(lm=vllm_llama2)
#Example DSPy CoT QA program
qa = dspy.ChainOfThought('question -> answer')
response = qa(question="What is the capital of Paris?") #Prompted to vllm_llama2
print(response.answer)
- Generate responses using the client directly.
response = vllm_llama2._generate(prompt='What is the capital of Paris?')
print(response)