The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

GRU units

Dimitri Fichou

2023-04-21

Feed forward pass

\[ r_t = sigmoid(h_{t-1} * W_r + x_t * U_r) \]

\[ z_t = sigmoid(h_{t-1} * W_z + x_t * U_z) \]

\[ g_t = tanh(W_g * (h_{t-1} \cdot r_t) + x_t * U_g) \]

\[ h_t = y_t = h_{t-1} \cdot (1 - z_t) + (z_t \cdot g_t) \]

Back propagation pass

To perform the BPTT with a GRU unit, we have the eror comming from the top layer (\(\delta 1\)), the future hidden states (\(\delta 2\)). Also, we have stored during the feed forward the states at each step of the feeding. In the case of the future layer, this error is just set to zero if not calculated yet. For convention, \(\cdot\) correspond to point wise multiplication, while \(*\) correspond to matrix multiplication.

The rules on how to back prpagate come from this post.

\[\delta 3 = \delta 1 + \delta 2 \]

\[\delta 4 = (1 - z_t) \cdot \delta 3 \]

\[\delta 5 = \delta 3 \cdot h_{t-1} \]

\[\delta 6 = 1 - \delta 5 \]

\[\delta 7 = \delta 3 \cdot g_t \]

\[\delta 8 = \delta 3 \cdot z_t \]

\[\delta 9 = \delta 7 + \delta 8 \]

\[\delta 10 = \delta 8 \cdot tanh'(g_t) \]

\[\delta 11 = \delta 9 \cdot sigmoid'(z_t) \]

\[\delta 12 = \delta 10 * W_g^T \] \[\delta 13 = \delta 10 * U_g^T \] \[\delta 14 = \delta 11 * W_z^T \] \[\delta 15 = \delta 11 * U_z^T \]

\[\delta 16 = \delta 13 \cdot h_{t-1} \] \[\delta 17 = \delta 13 \cdot r_t \]

\[\delta 18 = \delta 17 \cdot sigmoid'(r_t) \]

\[\delta 19 = \delta 17 + \delta 4 \]

\[\delta 20 = \delta 18 * W_r^T \] \[\delta 21 = \delta 18 * U_r^T \]

\[\delta 22 = \delta 21 + \delta 15 \]

\[\delta 23 = \delta 19 + \delta 22 \]

\[\delta 24 = \delta 12 + \delta 14 +\delta 20 \]

The error \(\delta 23\) and \(\delta 24\) are used for the next layers. Once all those errors are available, it is possible to calculate the weight update.

\[\delta W_r = \delta W_f + h_{t-1}^T * \delta 10 \] \[\delta U_r = \delta U_f + x_{t}^T * \delta 10 \]

\[\delta W_z = \delta W_i + h_{t-1}^T * \delta 11 \] \[\delta U_z = \delta U_i + x_{t}^T * \delta 11 \]

\[\delta W_g = \delta W_g + (h_{t-1}^T \cdot r_t) * \delta 18 \] \[\delta U_g = \delta U_g + x_{t}^T * \delta 18 \]

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.