langcheck.metrics.de.source_based_text_quality#
- langcheck.metrics.de.source_based_text_quality.context_relevance(sources: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None] [source]#
Calculates the relevance of the sources to the prompts. This metric takes on float values between [0, 1], where 0 means that the source text is not at all relevant to the prompt, and 1 means that the source text is fully relevant to the prompt.
We currently only support the evaluation based on an EvalClient.
- Parameters:
sources – The source text(s), one string per prompt
prompts – The prompt(s)
eval_model – The EvalClient instance used for the evaluation
- langcheck.metrics.de.source_based_text_quality.factual_consistency(generated_outputs: list[str] | str, sources: list[str] | str, prompts: list[str] | str | None = None, eval_model: str | EvalClient = 'local') MetricValue[float | None] [source]#
Calculates the factual consistency between the generated outputs and the sources. This metric takes on float values between [0, 1], where 0 means that the output is not at all consistent with the source text, and 1 means that the output is fully consistent with the source text. (NOTE: when using an EvalClient, the factuality scores are either 0.0, 0.5, or 1.0. The score may also be None if it could not be computed.)
We currently support two evaluation model types:
1. The ‘local’ type, where the ‘unieval-fact’ model is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this. This function wraps
factual_consistency()
using the translation modelHelsinki-NLP/opus-mt-de-en
to translate the German texts to English before computing the factual consistency scores. This is because the UniEval-fact model is trained on English text.2. The EvalClient type, where you can use an EvalClient typically implemented with an LLM. The implementation details are explained in each of the concrete EvalClient classes.
- Parameters:
generated_outputs – The model generated output(s) to evaluate
sources – The source text(s), one string per generated output
prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.
eval_model – The type of model to use (‘local’ or the EvalClient instance used for the evaluation). default ‘local’
- Returns:
An MetricValue object