Tuesday, February 18, 2025

[NLSea] Subword tokenization – handling multilingual data and mispellings

OnePiece Work - Seattle 720 3rd Ave suite 1100, Seattle, WA, us

AGENDA 6:30 - Arrive and mingle7:00 - Talk begins8:00 - Discussion TOPIC Modern natural language processing systems have to deal with widely varying data coming directly from consumers. This invariably means models will need to provide quality inference on text with misspellings, and even text in other languages. Subword tokenization is a modern NLP technique […]