langcheck.metrics.ja.reference_free_text_quality#

Calculates the fluency scores of generated outputs. This metric takes on float values between [0, 1], where 0 is low fluency and 1 is high fluency. (NOTE: when using the OpenAI model, the fluency scores are either 0.0 (poor), 0.5 (fair), or 1.0 (good). The score may also be None if it could not be computed.)

We currently support two model types: 1. The ‘local’ type, where a model file is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this. The model (liwii/fluency-score-classification-ja) is a fine-tuned model based on line-corporation/line-distilbert-base-japanese model. 2. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default, in the same way as english counterpart. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See this example for examples on setting up the OpenAI API key.

Ref:: https://huggingface.co/line-corporation/line-distilbert-base-japanese https://huggingface.co/liwii/fluency-score-classification-ja

Parameters:

generated_outputs – The model generated output(s) to evaluate
prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.
model_type – The type of model to use (‘local’ or ‘openai’), default ‘local’
openai_args – Dict of additional args to pass in to the openai.ChatCompletion.create function, default None

Returns:

An MetricValue object

Calculates the sentiment scores of generated outputs. This metric takes on float values between [0, 1], where 0 is negative sentiment and 1 is positive sentiment. (NOTE: when using the OpenAI model, the sentiment scores are either 0.0 (negative), 0.5 (neutral), or 1.0 (positive). The score may also be None if it could not be computed.)

We currently support two model types: 1. The ‘local’ type, where the Twitter-roBERTa-base-sentiment-multilingual model is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this. 2. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See this example for examples on setting up the OpenAI API key.

Ref:: https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual

Parameters:

generated_outputs – The model generated output(s) to evaluate
prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.
model_type – The type of model to use (‘local’ or ‘openai’), default ‘local’
openai_args – Dict of additional args to pass in to the openai.ChatCompletion.create function, default None

Returns:

An MetricValue object

langcheck.metrics.ja.reference_free_text_quality.tateishi_ono_yamada_reading_ease(generated_outputs: List[str] | str, prompts: List[str] | str | None = None) → MetricValue[float][source]#

Calculates the readability of generated Japanese outputs using the reading ease score introduced in “日本文の読みやすさの評価式 (A Computer Readability Formula of Japanese Texts for Machine Scoring)”. This metric takes on float values between (-∞, ∞), but in the paper it is reported that the average & the standard deviation of the scores obtained for 77 texts used for the experiment are 50 and 10 respectively. Higher scores mean the text is easier to read.

The score is based on the number of “run”s, which are sequences of characters with the same type (hiragana, katakana, kanji… etc). See the original paper for details.

Ref:: https://www.jstage.jst.go.jp/article/nihongokyoiku/158/0/158_49/_pdf/-char/ja (Japanese) https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=37773&item_no=1&page_id=13&block_id=8 (Japanese) https://aclanthology.org/C88-2135/ (English)

Parameters:

generated_outputs – The model generated output(s) to evaluate
prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.

Returns:

An MetricValue object

Calculates the toxicity scores of generated outputs. This metric takes on float values between [0, 1], where 0 is low toxicity and 1 is high toxicity. (NOTE: when using the OpenAI model, the toxicity scores are in steps of 0.25. The score may also be None if it could not be computed.)

We currently support two model types: 1. The ‘local’ type, where a model file is downloaded from HuggingFace and run locally. This is the default model type and there is no setup needed to run this. The model (Alnusjaponica/toxicity-score-multi-classification) is a fine-tuned model based on line-corporation/line-distilbert-base-japanese model. 2. The ‘openai’ type, where we use OpenAI’s ‘gpt-turbo-3.5’ model by default, in the same way as english counterpart. While the model you use is configurable, please make sure to use one that supports function calling (https://platform.openai.com/docs/guides/gpt/function-calling). See this example for examples on setting up the OpenAI API key.

Ref:: https://huggingface.co/line-corporation/line-distilbert-base-japanese https://huggingface.co/Alnusjaponica/toxicity-score-multi-classification

Parameters:

generated_outputs – The model generated output(s) to evaluate
prompts – The prompts used to generate the output(s). Prompts are optional metadata and not used to calculate the metric.
model_type – The type of model to use (‘local’ or ‘openai’), default ‘local’
openai_args – Dict of additional args to pass in to the openai.ChatCompletion.create function, default None

Returns:

An MetricValue object

langcheck.metrics.ja.reference_free_text_quality

Contents

langcheck.metrics.ja.reference_free_text_quality#