langcheck.metrics.en.pairwise_text_quality#
- langcheck.metrics.en.pairwise_text_quality.pairwise_comparison(generated_outputs_a: List[str] | str, generated_outputs_b: List[str] | str, prompts: List[str] | str, sources_a: List[str] | str | None = None, sources_b: List[str] | str | None = None, reference_outputs: List[str] | str | None = None, enforce_consistency: bool = True, model_type: str = 'openai', openai_client: OpenAI | None = None, openai_args: Dict[str, str] | None = None, *, use_async: bool = False) MetricValue[float | None][source]#
Calculates the pairwise comparison metric. This metric takes on float values of either 0.0 (Response A is better), 0.5 (Tie), or 1.0 (Response B is better). The score may also be None if it could not be computed.
We currently support two model types:
1. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See this page for examples on setting up the OpenAI API key.
2. The ‘azure_openai’ type. Essentially the same as the ‘openai’ type, except that it uses the AzureOpenAI client. Note that you must specify your model deployment to use in
openai_args, e.g.openai_args={'model': 'YOUR_DEPLOYMENT_NAME'}- Ref:
Our prompt is similar to the prompt used in https://arxiv.org/abs/2306.05685
- Parameters:
generated_outputs_a – Model A’s generated output(s) to evaluate
generated_outputs_b – Model B’s generated output(s) to evaluate
prompts – The prompts used to generate the output(s)
sources_a – The source text(s) for Model A’s generated output(s), default None
sources_b – The source text(s) for Model B’s generated output(s), default None
reference_outputs – The reference output(s), default None
enforce_consistency – When this is True, we will only return a score if the score is the same when Model A and Model B are swapped. This is useful for ensuring that the evaluator’s position bias is not impacting the scores. Default True.
model_type – The type of model to use (‘openai’, or ‘azure_openai’), default ‘openai’
openai_client – OpenAI or AzureOpenAI client, default None. If this is None, we will attempt to create a default client.
openai_args – Dict of additional args to pass in to the
client.chat.completions.createfunction, default Noneuse_async – Whether to use the asynchronous API of OpenAI, default False
- Returns:
An MetricValue object