Lexical Embedding

Chaya Liebeskind, Jerusalem College of Technology, Israel
liebchaya@gmail.com

Lexical embedding is the process of representing words or lexical items in a continuous vector space to capture their semantic relationships and contextual associations. This enables semantically similar words to appear near one another in the space and is essential in natural language processing (NLP) tasks such as sentiment analysis, machine translation, and information retrieval. There are two main types of lexical embedding: context-free and contextual.

Context-free embeddings, also known as static word embeddings, assign a fixed vector to each word based on its general usage across large text corpora. A widely used technique for generating such embeddings is Word2Vec, which learns distributed word representations by analysing patterns of word co-occurrence. For example, in the sentence ‘The king smiled proudly,’ the word ‘king’ frequently appears with terms like ‘smiled’ and ‘proudly,’ leading the model to associate it with royalty and positive sentiment. The resulting vector places ‘king’ near other related words. However, since the vector is fixed, it cannot adapt to different contexts in which the same word appears.

Contextual embeddings, in contrast, generate word vectors that depend on the word’s position and role in a specific sentence. Transformers, deep learning models based on attention mechanisms, produce these dynamic embeddings by modeling how each word relates to others in its context. For instance, in ‘The king smiled proudly,’ the model interprets ‘smiled proudly’ as reflecting a triumphant tone. But in ‘The king ruled with an iron fist,’ the tone becomes harsh or authoritarian, and the embedding for ‘king’ changes accordingly. Models like BERT and GPT excel at capturing such subtleties, enabling more accurate performance in tasks such as question answering, summarisation, and nuanced language understanding.

Keywords: contextual embeddings, word representation, static embeddings

References:
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.‏
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.‏
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.‏