How to turn words into numbers is important for machine learning models to work well. Different kinds of tokenizations lead to models that ‘think’ in different ways, and can cause subtle & surprising errors (especially with BPEs).
How to turn words into numbers is important for machine learning models to work well. Different kinds of tokenizations lead to models that ‘think’ in different ways, and can cause subtle & surprising errors (especially with BPEs).