langcheck.augment.en#
- langcheck.augment.en.keyboard_typo(texts: list[str] | str, **kwargs) list[str][source]#
Generate keyboard typo perturbed texts for augmentation.
- Parameters:
texts – List of texts to be augmented.
Note
Any argument that can be passed to nlpaug.augmenter.char.keyboard.KeyboardAug is acceptable. Some of the more useful ones from nlpaug document are listed below:
aug_char_p(float): Percentage of character (per token) will be augmented.aug_char_min(int): Minimum number of character will be augmented.aug_char_max(int): Maximum number of character will be augmented.aug_word_p(float): Percentage of word will be augmented.aug_word_min(int): Minimum number of word will be augmented.aug_word_max(int): Maximum number of word will be augmented.
Note that the default values for these arguments are different from the
nlpaugdefaults. To be more specific, the default values foraug_char_pto be 0.1,aug_char_maxandaug_word_maxto be None, andinclude_special_charandinclude_numericto be False. See the documentation for more details.- Returns:
A list of perturbed texts.
- langcheck.augment.en.ocr_typo(texts: list[str] | str, **kwargs) list[str][source]#
Generate OCR typo perturbed texts for augmentation.
- Parameters:
texts – List of texts to be augmented.
Note
Any argument that can be passed to nlpaug.augmenter.char.ocr.OcrAug is acceptable. Some of the more useful ones from nlpaug document are listed below:
aug_char_p(float): Percentage of character (per token) will be augmented.aug_char_min(int): Minimum number of character will be augmented.aug_char_max(int): Maximum number of character will be augmented.aug_word_p(float): Percentage of word will be augmented.aug_word_min(int): Minimum number of word will be augmented.aug_word_max(int): Maximum number of word will be augmented.
Note that the default values for these arguments are different from the
nlpaugdefaults. To be more specific, the default values foraug_char_pto be 0.1,aug_char_maxandaug_word_maxto be None. See the documentation for more details.- Returns:
A list of perturbed texts.
- langcheck.augment.en.synonym(texts: list[str] | str, **kwargs) list[str][source]#
Generate texts where some of the input words are replaced with synonyms.
- Parameters:
texts – List of texts to be augmented.
aug_p – Percentage of words which will be augmented. Default to 0.1.
aug_max – Maximum number of words which will be augmented. Default to None.
Note
Any argument that can be passed to nlpaug.augmenter.word.SynonymAug is acceptable. Some of the more useful ones from nlpaug document are listed below:
aug_p(float): Percentage of word will be augmented.aug_min: Minimum number of word will be augmented.aug_max: Maximum number of word will be augmented.
Note that the default values for these arguments are different from the
nlpaugdefaults. To be more specific, the default values foraug_pto be 0.1 andaug_maxto be None. See the source code for mode details.- Returns:
A list of perturbed texts.