Contextual Embedding/
Transformers

Chaya Liebeskind, Jerusalem College of Technology, Israel
liebchaya@gmail.com

Barbara Lewandowska-Tomaszczyk, University of Applied Sciences in Konin, Poland
barbara.lewandowska-tomaszczyk@konin.edu.pl

Contextual embeddings, as opposed to context-free embeddings, consider the positioning of a word within a specific sentence or document. Transformers, a type of deep learning model, emerged as powerful tools in the creation of contextual embeddings. By employing the attention mechanism, transformers enable the model to assess the significance of distinct terms within a given sequence, thereby capturing complex contextual information. According to Devlin et al., the ability to comprehend context allows transformers, including BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), to generate representations of words that are more comprehensive and subtle, taking into account their diverse meanings in various contexts (for example, the same sentence: ‘The king smiled proudly.’) A transformer would examine not only the co-occurrence of words, but also their order and relationships within the sentence. It could be inferred that ‘smiled proudly’ modifies ‘king,’ implying a positive, triumphant smile, putting ‘king’ in a different vector location than when used in a different context, such as ‘The king ruled with an iron fist.’ Transformers have made substantial strides in the field of natural language processing (NLP) by enabling models to comprehend intricate linguistic connections and interdependencies. As a result, they have proven to be exceptionally efficient in a vast array of applications, including question answering and text summarisation.

Keywords: attention mechanism, word representations, deep learning in nlp

Related Entries: Context, Computational Linguistics, Discourse (1), Discourse (2)

References:
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.‏
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.‏
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.‏