我正在使用深度学习/神经语言模型将文本字符串编码到向量空间中。模型架构由一系列卷积层组成,这些卷积层将信息输入 LSTM 编码器/解码器层。这不是我写的代码,但可以在这里看到:https ://github.com/soroushv/Tweet2Vec 。代码是用 Theano 编写的。
我已经训练了模型(学习了编码参数),现在在尝试编码新数据(特别是推文)时遇到了麻烦。问题:LSTM 编码层将每条推文编码为 1 的向量
通过检查 theano.scan 的输出可以理解这个问题,它不仅方便地返回最终的 LSTM 隐藏状态输出,而且还返回 256 个步骤中的每一个之后的每个输出。见下文:
步骤 0
[[ 0.25917339 0.25781667 0.25658038 ..., 0.25910226 0.25752231
0.25876063]
[ 0.2606163 0.25934234 0.25892666 ..., 0.25979728 0.25971574
0.26056644]
[ 0.25828436 0.25749022 0.25691608 ..., 0.25829604 0.25789574
0.25868404]
...,
]]
步骤1
[[ 0.73329866 0.73475593 0.73370075 ..., 0.73479998 0.7338261 0.733863 ]
[ 0.73417366 0.73541886 0.73478198 ..., 0.7358954 0.73425269
0.73425108]
[ 0.73278904 0.73414212 0.73380911 ..., 0.73420793 0.73367095
0.73390627]
...,
]]
step 2
[[ 0.88244921 0.88317329 0.88279039 ..., 0.88326144 0.8826766
0.88286686]
[ 0.88289285 0.8835054 0.88333857 ..., 0.88382012 0.88288385
0.88304704]
[ 0.88218391 0.88285285 0.88285559 ..., 0.88295335 0.88259971
0.88289762]
...,
]]
第 3 步
[[ 0.95328414 0.95359486 0.95343065 ..., 0.95363271 0.95338178
0.95346349]
[ 0.95347464 0.95373732 0.95366573 ..., 0.9538722 0.95347077
0.95354074]
[ 0.95317024 0.95345742 0.95345861 ..., 0.95350057 0.95334882
0.95347661]
...,
]]
...并通过第 15 步:第 15 步
[[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
...,
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]
[ 1. 1. 1. ..., 1. 1. 1.]]
我已经验证了最后一个卷积层的输出是合法的。看起来像:
[[[ 0.02103661 0.06566209 0.03910736 ..., 0.02350483 0.05625576
0.03399094]
[ 0. 0. 0. ..., 0. 0. 0. ]
[ 0. 0.04917431 0.00619121 ..., 0. 0.04114747 0. ]
...,
[ 0.02622109 0.01024228 0.04368387 ..., 0.04718351 0. 0. ]
[ 0. 0. 0. ..., 0. 0. 0. ]
[ 0.01217926 0.04057767 0.0250682 ..., 0. 0.03617524
...,...]]
并且形状正确。
所以我希望对正在发生的事情以及如何解决它有一些了解。这是来自 LSTM 类的代码。据我所知,上面的卷积输出作为输入提供给下面的 calc_output。所有这些代码(以及其余代码)都在链接的 github 中。谢谢
def calc_output(self, inputs, **kwargs):
input = inputs[0]
mask = inputs[self.mask_idx] if self.mask_idx is not None else None
h_beg = inputs[self.h_beg_idx] if self.h_beg_idx is not None else None
cell_beg = inputs[self.cell_beg_idx] if self.cell_beg_idx is not None else None
if input.ndim > 3:
input = T.flatten(input, 3)
input = input.dimshuffle(1, 0, 2)
seq_len, num_batch, _ = input.shape
W_st = T.concatenate([self.W_i, self.W_f, self.W_c, self.W_o], axis=1)
U_st = T.concatenate([self.U_i, self.U_f, self.U_c, self.U_o], axis=1)
b_st = T.concatenate([self.b_i, self.b_f, self.b_c, self.b_o], axis=0)
if self.precompute:
input = T.dot(input, W_st) + b_st
non_seqs = [U_st]
if self.peepholes:
non_seqs += [self.W_pci,
self.W_pcf,
self.W_pco]
if not self.precompute:
non_seqs += [W_st, b_st]
def gate_data(wt_mat, gate_id):
return wt_mat[:, gate_id * self.output_units:(gate_id + 1) * self.output_units]
def step(i_t, c_tm1, h_tm1, *args):
if not self.precompute:
i_t = T.dot(i_t, W_st) + b_st
gates = i_t + T.dot(h_tm1, U_st)
if self.grad_clip:
gates = theano.gradient.grad_clip(
gates, -self.grad_clip, self.grad_clip)
ingate = gate_data(gates, 0)
forgetgate = gate_data(gates, 1)
cell_input = gate_data(gates, 2)
outgate = gate_data(gates, 3)
if self.peepholes:
ingate += c_tm1*self.W_pci
forgetgate += c_tm1*self.W_pcf
ingate = self.i_gate_act(ingate)
forgetgate = self.f_gate_act(forgetgate)
cell_input = self.c_gate_act(cell_input)
cell = forgetgate*c_tm1 + ingate*cell_input
if self.peepholes:
outgate += cell*self.W_pco
outgate = self.o_gate_act(outgate)
hid = outgate*self.o_gate_act(cell)
return [cell, hid]
sequences = input
step_fun = step
ones = T.ones((num_batch, 1))
if not isinstance(self.cell_beg, BaseLayer):
cell_beg = T.dot(ones, self.cell_beg)
if not isinstance(self.h_beg, BaseLayer):
h_beg = T.dot(ones, self.h_beg)
cell_out, hid_out = theano.scan(
fn=step_fun,
sequences=sequences,
outputs_info=[cell_beg, h_beg],
go_backwards=self.back_rnn,
truncate_gradient=self.gradient_steps,
non_sequences=non_seqs,
strict=True)[0]
if self.no_return_seq:
hid_out = hid_out[-1]
else:
hid_out = hid_out.dimshuffle(1, 0, 2)
if self.back_rnn:
hid_out = hid_out[:, ::-1]
return hid_out