byte pair encoding
Meanings
noun
- A lossless data compression algorithm that iteratively replaces the most frequent pair of adjacent bytes in a sequence with a new byte not already present in the data.
- A subword tokenization method that iteratively merges the most frequent pairs of adjacent characters in a corpus to form longer and more meaningful tokens, typically until a predefined vocabulary size is reached.
Word forms
This entry uses open data from Wiktionary (CC BY-SA/GFDL). Word forms are used for search and are not indexed as separate pages.