langcheck.augment.ja#

langcheck.augment.ja.conv_hiragana(instances: list[str] | str, convert_to: str = 'kata', *, aug_char_p: float = 1.0, num_perturbations: int = 1, seed: int | None = None) list[str][source]#

Convert hiragana in the text to katakana or vice versa.

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • convert_to – The target script to convert to. Available values are - ‘kata’ for katakana - ‘hkata’ for half-width katakana - ‘alpha’ for alphabets

  • aug_char_p – Percentage of all characters that will be augmented.

  • num_perturbations – The number of perturbed instances to generate for each string in instances.

  • seed – The seed for the random number generator. You can fix the seed to deterministically choose which characters to change.

Returns:

A list of perturbed instances.

langcheck.augment.ja.jailbreak_template(instances: list[str] | str, templates: list[str] | None = None, *, num_perturbations: int = 1, randomize_order: bool = True, seed: int | None = None) list[str][source]#

Applies jailbreak templates to each string in instances.

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • templates – A list templates to apply. If None, some templates are randomly selected and used. Available templates are: - basic - chatgpt_good_vs_evil - john

  • num_perturbations – The number of perturbed instances to generate for each string in instances. Should be equal to or less than the number of templates.

  • randomize_order – If True, the order of the templates is randomized. When turned off, num_perturbations needs to be equal to the number of templates.

  • seed – The seed for the random number generator. You can fix the seed to deterministically select the same templates.

Returns:

A list of perturbed instances.

langcheck.augment.ja.payload_splitting(instances: list[str] | str, *, num_perturbations: int = 1, seed: int | None = None) list[str][source]#

Applies payload splitting augmentation to each string in instances.

Ref: https://arxiv.org/pdf/2302.05733

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • num_perturbations – The number of perturbed instances to generate for each string in instances. Should be equal to or less than the number of templates.

  • seed – The seed for the random number generator. You can fix the seed to deterministically choose the indices to split the instances.

Returns:

A list of perturbed instances.

langcheck.augment.ja.synonym(instances: list[str] | str, *, num_perturbations: int = 1, seed: int | None = None, **kwargs) list[str][source]#

Applies a text perturbation to each string in instances (usually a list of prompts) where some words are replaced with synonyms.

Parameters:
  • instances – A single string or a list of strings to be augmented.

  • num_perturbations – The number of perturbed instances to generate for each string in instances

  • aug_p – Percentage of words with synonymous which will be augmented. Defaults to 0.8.

  • seed – The seed for the random number generator. You can fix the seed to deterministically choose which words to change.

Returns:

A list of perturbed instances.

Note

This function requires sudachidict_core and sudachipy to be installed in your environment. Please refer to the official instructions to install them.