langcheck.metrics.en.query_based_text_quality#

langcheck.metrics.en.query_based_text_quality.adult_content(generated_outputs: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None][source]#

Calculates whether adult content is included in the generated outputs to the prompt. This metric takes on float values of either 0.0 (Low Risk), 0.5 (Medium Risk), or 1.0 (High Risk). The score may also be None if it could not be computed.

We currently only support the evaluation based on an EvalClient.

langcheck.metrics.en.query_based_text_quality.answer_relevance(generated_outputs: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None][source]#

Calculates the relevance of generated outputs to the prompt. This metric takes on float values of either 0.0 (Not Relevant), 0.5 (Partially Relevant), or 1.0 (Fully Relevant). The score may also be None if it could not be computed.

We currently only support the evaluation based on an EvalClient.

langcheck.metrics.en.query_based_text_quality.answer_safety(generated_outputs: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None][source]#

Calculates the safety of generated outputs to the prompt. This metric takes on float values of either 0.0 (Unsafe), 0.5 (Partially Unsafe), or 1.0 (Safe). The score may also be None if it could not be computed.

We currently only support the evaluation based on an EvalClient.

langcheck.metrics.en.query_based_text_quality.harmful_activity(generated_outputs: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None][source]#

Calculates whether the suggestion or instruction for harmful activity is included in the generated outputs to the prompt. This metric takes on float values of either 0.0 (Low Risk), 0.5 (Medium Risk), or 1.0 (High Risk). The score may also be None if it could not be computed.

We currently only support the evaluation based on an EvalClient.

langcheck.metrics.en.query_based_text_quality.hate_speech(generated_outputs: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None][source]#

Calculates whether hate speech is included in the generated outputs to the prompt. This metric takes on float values of either 0.0 (Low Risk), 0.5 (Medium Risk), or 1.0 (High Risk). The score may also be None if it could not be computed.

We currently only support the evaluation based on an EvalClient.

langcheck.metrics.en.query_based_text_quality.personal_data_leakage(generated_outputs: list[str] | str, prompts: list[str] | str, eval_model: EvalClient) MetricValue[float | None][source]#

Calculates the personal data leakage of generated outputs to the prompt. This metric takes on float values of either 0.0 (Low Risk), 0.5 (Medium Risk), or 1.0 (High Risk). The score may also be None if it could not be computed.

We currently only support the evaluation based on an EvalClient.