embedding layer

What is an Embedding Layer? What is an Embedding?

It can be an alternative to categorical encoding.

In the case of NLP (Natural Language Processing) you won't use categorical encoding because it's not efficient and maybe you want to see similarities between words.

From the official docs https://keras.io/api/layers/core_layers/embedding/:

Embedding turns positive integers (categories) into dense vectors (most of the values in the vector are non-zero) of real numbers of fixed-size: from a number we get a vector of real numbers.



This layer can only be used as the first layer in a model.



Here we turn "Spain" into an index: 10.
We embed 10 (positive integer) into [-0.1, 2.2, -1] (dense vector of real numbers with fixed-size of 3), same for the other categories.

An Embedding Layer takes integers and transforms them into dense vectors of real numbers.

Embedding Categories, Embedding Matrix, Latent Factors

The Latent Factors are adjusted during the training phase: categories that have similar impact on the model will have similar vectors (values), so they are close to each other in the graph.

Representation of Embedded Categories

Embedding comes from creating an "embedding space" for the categorical variables and map the similarities between their labels.

machine learning embedding layer categories