byte pair encoding

English dictionary entry

Meanings

noun
  1. A lossless data compression algorithm that iteratively replaces the most frequent pair of adjacent bytes in a sequence with a new byte not already present in the data.
  2. A subword tokenization method that iteratively merges the most frequent pairs of adjacent characters in a corpus to form longer and more meaningful tokens, typically until a predefined vocabulary size is reached.

Word forms

byte pair encoding byte pair encodings byte-pair encoding
This entry uses open data from Wiktionary (CC BY-SA/GFDL). Word forms are used for search and are not indexed as separate pages.