menu

RNN LSTM - Notes

“Deep Learning” Reading note

10. Sequence Modeling: Recurrent and Recursive nets

10.1 Unfolding Computational Graphs

  • What is a Computational Graph?

    A computational graph is a way to formalize the structure of a set of computations.

Considering a classical form of a dynamic system:

where $s$ is called the state of the system. And it is a recurrent.

For a finite number of $\tau$ steps, we can unfold the equation. For example, when $\tau = 3$, we have

Another example, considering a dynamic system driven by an external signal $x^{(t)}$,

To indicate the state is the hidden units of the network, we rewrite equation 10.4 using $h$ to represent the state,

We can represent the unfolded recurrence after $t$ steps with a function $g^{(t)}$:

The function $ g^{(t)} $ takes the whole past sequence $\left(x^{(t)}, x^{(t-1)}, x^{(t-2)}, …, x^{(2)}, x^{(1)} \right)$ as input and produces the current state. But the unfolded recurrent structure allows us to factorize the function into repeated application of a function $f$, the unfolding process introduces two major advantages:

    1. The learned model always has the same input size, which specified in terms of transition from one state to another.
    1. It is possible to use the same transition function $f$ with the same parameters at every time step.

10.2 Recurrent Neural Networks (循环神经网络)

Based on the basic idea of parameter-sharing, we can build a variety of neural networks. 3 typical designs are presented in the book:

    1. RNN produce an output at each time step, and have connections between hidden unit. (Page 368, fig. 10.3)
    1. RNN produce an output at each time step and have connections from the output at one time step to the hidden units at the next time step. (Page 370, fig. 10.4)
    1. RNN connections between hidden units, that read an entire sequence and then produce a single output (page. 371, fig. 10.5)

The RNN of figure 10.3 (p.368) and equation 10.8 (p.369) is universal in the sense that (从…意义上来说)anyfunction computable by a Turing machine can be computed by such a recurrent network of a finite size.

In the book, case 1 is chosen as a throughout example. Figure 10.3 (p.369) or say equation 10.8 (p. 370) defines forward propagation in this model.

Parameters: bias vecters $\pmb{b}$, $\pmb{c}$, weight matrices $\pmb{U}$, $\pmb{V}$ and $\pmb{W}$.

  • $\pmb W$ is the hidden-to-hidden recurrent weight matrix
  • $\pmb V$ is the hidden-to-output weight matrix
  • $\pmb U$ is the input-to-hidden weight matrix

  • This is an example of a recurrent network that maps an input sequence $\pmb{x}^{(t)}$ to an output sequence $\pmb{y}^{(t)}$ of the same length $t$ in a range of $(1, \tau)$.

The Loss Function:


KF