deep-learning - 文本数据的向量嵌入：LSTM 将所有输出值推至 1

Question

我正在使用深度学习/神经语言模型将文本字符串编码到向量空间中。模型架构由一系列卷积层组成，这些卷积层将信息输入 LSTM 编码器/解码器层。这不是我写的代码，但可以在这里看到：https ://github.com/soroushv/Tweet2Vec 。代码是用 Theano 编写的。

我已经训练了模型（学习了编码参数），现在在尝试编码新数据（特别是推文）时遇到了麻烦。问题：LSTM 编码层将每条推文编码为 1 的向量

通过检查 theano.scan 的输出可以理解这个问题，它不仅方便地返回最终的 LSTM 隐藏状态输出，而且还返回 256 个步骤中的每一个之后的每个输出。见下文：

步骤 0

[[ 0.25917339  0.25781667  0.25658038 ...,  0.25910226  0.25752231
   0.25876063]
 [ 0.2606163   0.25934234  0.25892666 ...,  0.25979728  0.25971574
   0.26056644]
 [ 0.25828436  0.25749022  0.25691608 ...,  0.25829604  0.25789574
   0.25868404]
 ...,

]]

步骤1

[[ 0.73329866  0.73475593  0.73370075 ...,  0.73479998  0.7338261   0.733863  ]
 [ 0.73417366  0.73541886  0.73478198 ...,  0.7358954   0.73425269
   0.73425108]
 [ 0.73278904  0.73414212  0.73380911 ...,  0.73420793  0.73367095
   0.73390627]
 ...,

]]

step 2
[[ 0.88244921  0.88317329  0.88279039 ...,  0.88326144  0.8826766
   0.88286686]
 [ 0.88289285  0.8835054   0.88333857 ...,  0.88382012  0.88288385
   0.88304704]
 [ 0.88218391  0.88285285  0.88285559 ...,  0.88295335  0.88259971
   0.88289762]
 ...,

]]

第 3 步

[[ 0.95328414  0.95359486  0.95343065 ...,  0.95363271  0.95338178
   0.95346349]
 [ 0.95347464  0.95373732  0.95366573 ...,  0.9538722   0.95347077
   0.95354074]
 [ 0.95317024  0.95345742  0.95345861 ...,  0.95350057  0.95334882
   0.95347661]
 ...,

]]

...并通过第 15 步：第 15 步

[[ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ...,
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]]

我已经验证了最后一个卷积层的输出是合法的。看起来像：

[[[ 0.02103661  0.06566209  0.03910736 ...,  0.02350483  0.05625576
    0.03399094]
  [ 0.          0.          0.         ...,  0.          0.          0.        ]
  [ 0.          0.04917431  0.00619121 ...,  0.          0.04114747  0.        ]
  ...,
  [ 0.02622109  0.01024228  0.04368387 ...,  0.04718351  0.          0.        ]
  [ 0.          0.          0.         ...,  0.          0.          0.        ]
  [ 0.01217926  0.04057767  0.0250682  ...,  0.          0.03617524
  ...,...]]

并且形状正确。

所以我希望对正在发生的事情以及如何解决它有一些了解。这是来自 LSTM 类的代码。据我所知，上面的卷积输出作为输入提供给下面的 calc_output。所有这些代码（以及其余代码）都在链接的 github 中。谢谢

def calc_output(self, inputs, **kwargs):

    input = inputs[0]
    mask = inputs[self.mask_idx] if self.mask_idx is not None  else None
    h_beg = inputs[self.h_beg_idx] if self.h_beg_idx is not None  else None
    cell_beg = inputs[self.cell_beg_idx] if self.cell_beg_idx is not None  else None
    if input.ndim > 3:
        input = T.flatten(input, 3)

    input = input.dimshuffle(1, 0, 2)
    seq_len, num_batch, _ = input.shape
    W_st = T.concatenate([self.W_i, self.W_f, self.W_c, self.W_o], axis=1)
    U_st = T.concatenate([self.U_i, self.U_f, self.U_c, self.U_o], axis=1)
    b_st = T.concatenate([self.b_i, self.b_f, self.b_c, self.b_o], axis=0)

    if self.precompute:
        input = T.dot(input, W_st) + b_st

    non_seqs = [U_st]

    if self.peepholes:
        non_seqs += [self.W_pci,
                     self.W_pcf,
                     self.W_pco]
    if not self.precompute:
        non_seqs += [W_st, b_st]

    def gate_data(wt_mat, gate_id):
        return wt_mat[:, gate_id * self.output_units:(gate_id + 1) * self.output_units]

    def step(i_t, c_tm1, h_tm1, *args):
        if not self.precompute:
            i_t = T.dot(i_t, W_st) + b_st
        gates = i_t + T.dot(h_tm1, U_st)
        if self.grad_clip:
            gates = theano.gradient.grad_clip(
                gates, -self.grad_clip, self.grad_clip)
        ingate = gate_data(gates, 0)
        forgetgate = gate_data(gates, 1)
        cell_input = gate_data(gates, 2)
        outgate = gate_data(gates, 3)

        if self.peepholes:
            ingate += c_tm1*self.W_pci
            forgetgate += c_tm1*self.W_pcf

        ingate = self.i_gate_act(ingate)
        forgetgate = self.f_gate_act(forgetgate)
        cell_input = self.c_gate_act(cell_input)

        cell = forgetgate*c_tm1 + ingate*cell_input

        if self.peepholes:
            outgate += cell*self.W_pco
        outgate = self.o_gate_act(outgate)

        hid = outgate*self.o_gate_act(cell)

        return [cell, hid]

    sequences = input
    step_fun = step

    ones = T.ones((num_batch, 1))
    if not isinstance(self.cell_beg, BaseLayer):
        cell_beg = T.dot(ones, self.cell_beg)

    if not isinstance(self.h_beg, BaseLayer):
        h_beg = T.dot(ones, self.h_beg)

    cell_out, hid_out = theano.scan(
        fn=step_fun,
        sequences=sequences,
        outputs_info=[cell_beg, h_beg],
        go_backwards=self.back_rnn,
        truncate_gradient=self.gradient_steps,
        non_sequences=non_seqs,
        strict=True)[0]
    if self.no_return_seq:
        hid_out = hid_out[-1]
    else:
        hid_out = hid_out.dimshuffle(1, 0, 2)
        if self.back_rnn:
            hid_out = hid_out[:, ::-1]

    return hid_out

deep-learning - 文本数据的向量嵌入：LSTM 将所有输出值推至 1

0 回答 0

Related

Reference