Saturday, March 19, 2016

Index of Embedding layer with zero padding in Keras

Leave a Comment

I am building an RNN model in Keras for sentences with word embeddings from gensim. I am initializing the embedding layer with GloVe vectors. Since this is a sequential model and sentences have variable lengths, vectors are zero-padded. e.g.

[0, 0, 0, 6, 2, 4] 

Let's say the GloVe vectors have dimensions [NUM_VOCAB, EMBEDDING_SIZE]. The zero index is masked (ignored) so to get the proper indexing of words, do we add an extra column to the GloVe matrix so the dimensions are: [NUM_VOCAB+1, EMBEDDING_SIZE]?

Seems like there is an unnecessary vector that the model will estimate unless there is a more elegant way.

glove = Word2Vec.load_word2vec_format(filename) embedding_matrix = np.vstack([np.zeros(EMBEDDING_SIZE), glove.syn0])  model = Sequential()  # -- this uses Glove as inits model.add(Embedding(NUM_VOCAB, EMBEDDING_SIZE, input_length=maxlen, mask_zero=True,                            weights=[embedding_matrix]))  # -- sequence layer model.add(LSTM(32, return_sequences=False, init='orthogonal')) model.add(Activation('tanh'))  ... 

Thanks

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment