langcheck.augment.ja#

langcheck.augment.ja.conv_hiragana(instances: list[str] | str, convert_to: str = 'kata', *, aug_char_p: float = 1.0, num_perturbations: int = 1, seed: int | None = None) list[str][source]#

Convert hiragana in the text to katakana or vice versa.

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • convert_to – The target script to convert to. Available values are - ‘kata’ for katakana - ‘hkata’ for half-width katakana - ‘alpha’ for alphabets

  • aug_char_p – Percentage of all characters that will be augmented.

  • num_perturbations – The number of perturbed instances to generate for each string in instances.

  • seed – The seed for the random number generator. You can fix the seed to deterministically choose which characters to change.

Returns:

A list of perturbed instances.

langcheck.augment.ja.jailbreak_template(instances: list[str] | str, templates: list[str] | None = None, *, num_perturbations: int = 1, randomize_order: bool = True, seed: int | None = None, custom_templates: list[tuple[str, str]] | None = None) list[str][source]#

Applies jailbreak templates to each string in instances.

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • templates – A list templates to apply. If None, some templates are randomly selected and used. Available templates are: - basic - chatgpt_good_vs_evil - john

  • num_perturbations – The number of perturbed instances to generate for each string in instances. Should be equal to or less than the number of templates.

  • randomize_order – If True, the order of the templates is randomized. When turned off, num_perturbations needs to be equal to the number of templates.

  • seed – The seed for the random number generator. You can fix the seed to deterministically select the same templates.

  • custom_templates – A list of tuples of names and paths to custom Jinja2 templates. The template should contain an {{input_query}} placeholder, which will be replaced by the input query.

Returns:

A list of perturbed instances.

langcheck.augment.ja.payload_splitting(instances: list[str] | str, *, num_perturbations: int = 1, seed: int | None = None) list[str][source]#

Applies payload splitting augmentation to each string in instances.

Ref: https://arxiv.org/pdf/2302.05733

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • num_perturbations – The number of perturbed instances to generate for each string in instances. Should be equal to or less than the number of templates.

  • seed – The seed for the random number generator. You can fix the seed to deterministically choose the indices to split the instances.

Returns:

A list of perturbed instances.

langcheck.augment.ja.rephrase_with_system_role_context(instances: list[str] | str, system_role: str, *, num_perturbations: int = 1, eval_client: EvalClient) list[str | None][source]#

Rephrases each prompt in instances (usually a list of prompts) by adding the specified system role as context to each prompt. This adds context about what role the AI should assume when responding.

For example, if the prompt is “フランスの首都はどこですか?” and the role is “先生”, the augmented prompt might be “あなたは先生で、学生に地理を教えています。では 以下のクエリに応えてください: フランスの首都はどこですか?”.

Parameters:
  • instances – A single prompt or a list of prompts to be augmented.

  • system_role – The role of the system in the augmented prompt.

  • num_perturbations – The number of perturbed instances to generate for each string in instances

  • eval_client – The type of model to use.

Returns:

A list of rephrased instances.

langcheck.augment.ja.rephrase_with_user_role_context(instances: list[str] | str, user_role: str, *, num_perturbations: int = 1, eval_client: EvalClient) list[str | None][source]#

Rephrases each prompt in instances (usually a list of prompts) by adding the specified user role as context to each prompt. This adds context about the role of the user that is making the request.

For example, if the prompt is “フランスの首都はどこですか?” and the role is “学生”, the augmented prompt might be “私は学生です。宿題をしています。フランスの首都 はどこですか?”.

Parameters:
  • instances – A single prompt or a list of prompts to be augmented.

  • user_role – The role of the user in the prompt.

  • num_perturbations – The number of perturbed instances to generate for each string in instances

  • eval_client – The type of model to use.

Returns:

A list of rephrased instances.

langcheck.augment.ja.synonym(instances: list[str] | str, *, num_perturbations: int = 1, seed: int | None = None, **kwargs) list[str][source]#

Applies a text perturbation to each string in instances (usually a list of prompts) where some words are replaced with synonyms.

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • num_perturbations – The number of perturbed instances to generate for each string in instances

  • aug_p – Percentage of words with synonymous which will be augmented. Defaults to 0.8.

  • seed – The seed for the random number generator. You can fix the seed to deterministically choose which words to change.

Returns:

A list of perturbed instances.

Note

This function requires sudachidict_core and sudachipy to be installed in your environment. Please refer to the official instructions to install them.