0

I am trying to retrieve embeddings for words based on the pretrained ELMo model available on tensorflow hub. The code I am using is modified from here: https://www.geeksforgeeks.org/overview-of-word-embedding-using-embeddings-from-language-models-elmo/

The sentence that I am inputting is
bod =" is coming up in and every project is expected to do a video due on we look forward to discussing this with you at our meeting this this time they have laid out the selection criteria for the video award s go for the top spot this time "

and these are the keywords I want embeddings for:
words=["do", "a", "video"]

embeddings = elmo([bod],
signature="default",
as_dict=True)["elmo"]
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

this sentence is 236 characters in length. this is the picture showing that lenbod

but when I put this sentence into the ELMo model, the tensor that is returned is only contains a string of length 48 tensor dim

and this becomes a problem when i try to extract embeddings for keywords that are outside the 48 length limit because the indices of the keywords are shown to be outside this length: kywordlen

this is the code I used to get the indices for the words in 'bod'(as shown above)

num_list=[]
for item in words:
  print(item)
  index = bod.index(item)
  num_list.append(index)
num_list

But i keep running into this error: error

I tried looking for ELMo documentation to explain why this is happening but I have not found anything related to this problem of pruned input.

Any advice is much appreciated!

Thank You

4

1 回答 1

0

这并不是真正的 AllenNLP 问题,因为您使用的是基于 tensorflow 的 ELMo 实现。

也就是说,我认为问题在于 ELMo 嵌入了令牌,而不是字符。您将获得 48 个嵌入,因为该字符串有 48 个标记。

于 2021-05-27T04:47:21.873 回答