Name: [NLSea] Subword tokenization – handling multilingual data and mispellings
Start: 2019-12-03T18:30:00-08:00
End: 2019-12-03T20:30:00-08:00
Location: OnePiece Work – Seattle

[NLSea] Subword tokenization – handling multilingual data and mispellings

December 3, 2019 @ 6:30 pm - 8:30 pm

AGENDA

6:30 – Arrive and mingle
7:00 – Talk begins
8:00 – Discussion

TOPIC

Modern natural language processing systems have to deal with widely varying data coming directly from consumers. This invariably means models will need to provide quality inference on text with misspellings, and even text in other languages.

Subword tokenization is a modern NLP technique for vectorizing text that is meant to address these problems while keeping models performant. It is used in modern systems like BERT, XLNet, and their derivatives, and is the emerging standard for preparing text for neural nets.

This presentation will explore recent strategies for implementing subword tokenization, and walk through a simple implementation of byte-pair encoding.

THANK YOU

Thanks again to OnePieceWork (http://www.onepiecework.com/) for hosting us!

ABOUT NLSEA

NLSea is a special interest group of PuPPy focused on application of natural language processing (NLP). The event is for NLP practitioners as well as those wanting to get into the field. We plan to cover modern applications of NLP, including project briefs as well as recent important research papers.

Details

Date:: December 3, 2019
Time:: 6:30 pm - 8:30 pm
Website:: http://www.meetup.com/PSPPython/events/266325236/

Organizer

: Puget Sound Programming Python (PuPPy)

Venue

: OnePiece Work – Seattle
: 720 3rd Ave suite 1100
Seattle, WA 98104 us + Google Map

[NLSea] Subword tokenization – handling multilingual data and mispellings

December 3, 2019 @ 6:30 pm - 8:30 pm

Details

Organizer

Venue

Community

Google Anti-Trust: Is This How the Cookie Crumbles?

Governor-Elect Set to Make the Right Moves Across the Board

Elevating Urban Design: Seattle’s Architectural Renaissance

Commentary

Google Anti-Trust: Is This How the Cookie Crumbles?

Unplugged

Phone Etiquette