langcheck.metrics.zh.reference_free_text_quality#

langcheck.metrics.zh.reference_free_text_quality.sentiment(generated_outputs: List[str] | str, prompts: List[str] | str | None = None, model_type: str = 'local', openai_client: OpenAI | None = None, openai_args: Dict[str, str] | None = None, *, use_async: bool = False) MetricValue[float | None][source]#

Calculates the sentiment scores of generated outputs. This metric takes on float values between [0, 1], where 0 is negative sentiment and 1 is positive sentiment. (NOTE: when using the OpenAI model, the sentiment scores are either 0.0 (negative), 0.5 (neutral), or 1.0 (positive). The score may also be None if it could not be computed.)

We currently support three model types:

1. The ‘local’ type, where the IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment model is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this.

2. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See this example for examples on setting up the OpenAI API key.

3. The ‘azure_openai’ type. Essentially the same as the ‘openai’ type, except that it uses the AzureOpenAI client. Note that you must specify your model deployment to use in openai_args, e.g. openai_args={'model': 'YOUR_DEPLOYMENT_NAME'}

Ref:

https://huggingface.co/IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment

Parameters:
  • generated_outputs – The model generated output(s) to evaluate

  • prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.

  • model_type – The type of model to use (‘local’, ‘openai’, or ‘azure_openai’), default ‘local’

  • openai_client – OpenAI or AzureOpenAI client, default None. If this is None but model_type is ‘openai’ or ‘azure_openai’, we will attempt to create a default client.

  • openai_args – Dict of additional args to pass in to the client.chat.completions.create function, default None

  • use_async – Whether to use the asynchronous API of OpenAI, default False

Returns:

An MetricValue object

langcheck.metrics.zh.reference_free_text_quality.toxicity(generated_outputs: List[str] | str, prompts: List[str] | str | None = None, model_type: str = 'local', openai_client: OpenAI | None = None, openai_args: Dict[str, str] | None = None, *, use_async: bool = False) MetricValue[float | None][source]#

Calculates the toxicity scores of generated outputs. This metric takes on float values between [0, 1], where 0 is low toxicity and 1 is high toxicity. (NOTE: when using the OpenAI model, the toxicity scores are in steps of 0.25. The score may also be None if it could not be computed.)

We currently support three model types:

1. The ‘local’ type, where a model file is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this. The model (alibaba-pai/pai-bert-base-zh-llm-risk-detection) is a risky detection model for LLM generated content released by Alibaba group.

2. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default, in the same way as english counterpart. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See this example for examples on setting up the OpenAI API key.

3. The ‘azure_openai’ type. Essentially the same as the ‘openai’ type, except that it uses the AzureOpenAI client. Note that you must specify your model deployment to use in openai_args, e.g. openai_args={'model': 'YOUR_DEPLOYMENT_NAME'}

Ref:

https://huggingface.co/alibaba-pai/pai-bert-base-zh-llm-risk-detection

Parameters:
  • generated_outputs – The model generated output(s) to evaluate

  • prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.

  • model_type – The type of model to use (‘local’, ‘openai’, or ‘azure_openai’), default ‘local’

  • openai_client – OpenAI or AzureOpenAI client, default None. If this is None but model_type is ‘openai’ or ‘azure_openai’, we will attempt to create a default client.

  • openai_args – Dict of additional args to pass in to the client.chat.completions.create function, default None

  • use_async – Whether to use the asynchronous API of OpenAI, default False

Returns:

An MetricValue object

langcheck.metrics.zh.reference_free_text_quality.xuyaochen_report_readability(generated_outputs: List[str] | str, prompts: List[str] | str | None = None) MetricValue[float][source]#

Calculates the readability scores of generated outputs introduced in “中文年报可读性”(Chinese annual report readability). This metric calculates average words per sentence as r1, average of the sum of the numbers of adverbs and coordinating conjunction words in a sentence in given generated outputs as r2, then, refer to the Fog Index that combine r1 with r2 by arithmetic mean as the final outputs. This function uses HanLP Tokenizer and POS at the same time, POS in CTB style https://hanlp.hankcs.com/docs/annotations/pos/ctb.html. The lower the score is, the better the readability. The score is mainly influenced by r1, the average number of words in sentences.

Ref:

Refer Chinese annual report readability: measurement and test Link: https://www.tandfonline.com/doi/full/10.1080/21697213.2019.1701259

Parameters:
  • generated_outputs – A list of model generated outputs to evaluate

  • prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.

Returns:

A list of scores