langcheck.augment.en#

langcheck.augment.en.keyboard_typo(texts: list[str] | str, **kwargs) list[str][source]#

Generate keyboard typo perturbed texts for augmentation.

Parameters:

texts – List of texts to be augmented.

Note

Any argument that can be passed to nlpaug.augmenter.char.keyboard.KeyboardAug is acceptable. Some of the more useful ones from nlpaug document are listed below:

  • aug_char_p (float): Percentage of character (per token) will be augmented.

  • aug_char_min (int): Minimum number of character will be augmented.

  • aug_char_max (int): Maximum number of character will be augmented.

  • aug_word_p (float): Percentage of word will be augmented.

  • aug_word_min (int): Minimum number of word will be augmented.

  • aug_word_max (int): Maximum number of word will be augmented.

Note that the default values for these arguments are different from the nlpaug defaults. To be more specific, the default values for aug_char_p to be 0.1, aug_char_max and aug_word_max to be None, and include_special_char and include_numeric to be False. See the documentation for more details.

Returns:

A list of perturbed texts.

langcheck.augment.en.ocr_typo(texts: list[str] | str, **kwargs) list[str][source]#

Generate OCR typo perturbed texts for augmentation.

Parameters:

texts – List of texts to be augmented.

Note

Any argument that can be passed to nlpaug.augmenter.char.ocr.OcrAug is acceptable. Some of the more useful ones from nlpaug document are listed below:

  • aug_char_p (float): Percentage of character (per token) will be augmented.

  • aug_char_min (int): Minimum number of character will be augmented.

  • aug_char_max (int): Maximum number of character will be augmented.

  • aug_word_p (float): Percentage of word will be augmented.

  • aug_word_min (int): Minimum number of word will be augmented.

  • aug_word_max (int): Maximum number of word will be augmented.

Note that the default values for these arguments are different from the nlpaug defaults. To be more specific, the default values for aug_char_p to be 0.1, aug_char_max and aug_word_max to be None. See the documentation for more details.

Returns:

A list of perturbed texts.

langcheck.augment.en.synonym(texts: list[str] | str, **kwargs) list[str][source]#

Generate texts where some of the input words are replaced with synonyms.

Parameters:
  • texts – List of texts to be augmented.

  • aug_p – Percentage of words which will be augmented. Default to 0.1.

  • aug_max – Maximum number of words which will be augmented. Default to None.

Note

Any argument that can be passed to nlpaug.augmenter.word.SynonymAug is acceptable. Some of the more useful ones from nlpaug document are listed below:

  • aug_p (float): Percentage of word will be augmented.

  • aug_min: Minimum number of word will be augmented.

  • aug_max: Maximum number of word will be augmented.

Note that the default values for these arguments are different from the nlpaug defaults. To be more specific, the default values for aug_p to be 0.1 and aug_max to be None. See the source code for mode details.

Returns:

A list of perturbed texts.