langcheck.metrics

langcheck.metrics#

langcheck.metrics contains all of LangCheck’s evaluation metrics.

Since LangCheck has multi-lingual support, language-specific metrics are organized into sub-packages such as langcheck.metrics.en or langcheck.metrics.ja.

Tip

As a shortcut, all English and language-agnostic metrics are also directly accessible from langcheck.metrics. For example, you can directly run langcheck.metrics.sentiment() instead of langcheck.metrics.en.reference_free_text_quality.sentiment().

Additionally, langcheck.metrics.MetricValue is a shortcut for langcheck.metrics.metric_value.MetricValue.

There are several different types of metrics:

Type of Metric	Examples	Languages
Reference-Based Text Quality Metrics	`toxicity(generated_outputs)` `sentiment(generated_outputs)` `ai_disclaimer_similarity(generated_outputs)`	EN, JA
Reference-Free Text Quality Metrics	`semantic_similarity(generated_outputs, reference_outputs)` `rouge2(generated_outputs, reference_outputs)`	EN, JA
Source-Based Text Quality Metrics	`factual_consistency(generated_outputs, sources)`	EN, JA
Text Structure Metrics	`is_float(generated_outputs, min=0, max=None)` `is_json_object(generated_outputs)`	All Languages

class langcheck.metrics.MetricValue(metric_name: str, metric_values: List[NumericType], prompts: List[str] | None, generated_outputs: List[str], reference_outputs: List[str] | None, sources: List[str] | None, language: str | None)[source]#

Bases: Generic[NumericType]

A rich object that is the output of any langcheck.metrics function.

all() → bool[source]#: Equivalent to all(metric_value.metric_values). This is mostly useful for binary metric functions.

any() → bool[source]#: Equivalent to any(metric_value.metric_values). This is mostly useful for binary metric functions.

generated_outputs: List[str]#

histogram(jupyter_mode: str = 'inline')[source]#

Shows an interactive histogram of all data points in MetricValue. Intended to be used in a Jupyter notebook.

This is a convenience function that calls langcheck.plot.histogram().

language: str | None#

metric_name: str#

metric_values: List[NumericType]#

prompts: List[str] | None#

reference_outputs: List[str] | None#

scatter(jupyter_mode: str = 'inline')[source]#

Shows an interactive scatter plot of all data points in MetricValue. Intended to be used in a Jupyter notebook.

This is a convenience function that calls langcheck.plot.scatter().

sources: List[str] | None#

to_df() → DataFrame[source]#: Returns a DataFrame of metric values for each data point.

langcheck.metrics.ai_disclaimer_similarity(generated_outputs: List[str] | str, prompts: List[str] | str | None = None, ai_disclaimer_phrase: str = "I don't have personal opinions, emotions, or consciousness.", embedding_model_type: str = 'local', openai_args: Dict[str, str] | None = None) → MetricValue[float][source]#

Calculates the degree to which the LLM’s output contains a disclaimer that it is an AI. This is calculated by computing the semantic similarity between the generated outputs and a reference AI disclaimer phrase; by default, this phrase is “I don’t have personal opinions, emotions, or consciousness.”, but you can also pass in a custom phrase. Please refer to semantic_similarity() for details on the typical output ranges and the supported embedding model types.

Parameters:

generated_outputs – A list of model generated outputs to evaluate
prompts – An optional list of prompts used to generate the outputs. Prompts are not evaluated and only used as metadata.
ai_disclaimer_phrase – Reference AI disclaimer phrase, default “I don’t have personal opinions, emotions, or consciousness.”
embedding_model_type – The type of embedding model to use (‘local’ or ‘openai’), default ‘local’
openai_args – Dict of additional args to pass in to the openai.Embedding.create function, default None

Returns:

langcheck.metrics

Contents

langcheck.metrics#