langcheck.metrics.en.source_based_text_quality#
- langcheck.metrics.en.source_based_text_quality.factual_consistency(generated_outputs: List[str] | str, sources: List[str] | str, prompts: List[str] | str | None = None, model_type: str = 'local', openai_args: Dict[str, str] | None = None) MetricValue[float][source]#
Calculates the factual consistency between the generated outputs and the sources. The factual consistency score for one generated output is computed as the average of the per-sentence consistencies of the generated output with the source text. This metric takes on float values between [0, 1], where 0 means that the output is not at all consistent with the source text, and 1 means that the output is fully consistent with the source text. (NOTE: when uing the OpenAI model, the factuality score for each sentence is either 0.0, 0.5, or 1.0.)
We currently support two model types:
1. The ‘local’ type, where the ‘unieval-fact’ model is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this.
2. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See https://langcheck.readthedocs.io/en/latest/metrics.html#computing-metrics-with-openai-models # NOQA E501 for examples on setting up the OpenAI API key.
- Parameters:
generated_outputs – The model generated output(s) to evaluate
sources – The source text(s), one string per generated output
prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.
model_type – The type of model to use (‘local’ or ‘openai’), default ‘local’
openai_args – Dict of additional args to pass in to the openai.ChatCompletion.create function, default None
- Returns:
An MetricValue object