Update README.md
Browse files
README.md
CHANGED
@@ -235,8 +235,8 @@ $$\mathbf{s}_0 \sim \mathcal{N}(\mathbf{0}, \sigma^2 I_{n\cdot h})$$
|
|
235 |
|
236 |
$$\mathbf{s}_i = R(\mathbf{e}, \mathbf{s}_{i-1}) \; \textnormal{for} \; i \in \lbrace 1, \dots, r \rbrace$$
|
237 |
|
238 |
-
$$\mathbf{p} =
|
239 |
-
where \\(\sigma\\) is the standard deviation of the initial random state. Given an init random state \\(\mathbf{s}_0\\), the model repeatedly applies the core
|
240 |
block \\(R\\), which accepts the latent state \\(\mathbf{s}_{i-1}\\) and the embedded input \\(\mathbf{e}\\) and outputs a new latent state \\(\mathbf{s}_i\\).
|
241 |
After finishing all iterations, the coda block processes the last state and produces the probabilities of the next token.
|
242 |
|
|
|
235 |
|
236 |
$$\mathbf{s}_i = R(\mathbf{e}, \mathbf{s}_{i-1}) \; \textnormal{for} \; i \in \lbrace 1, \dots, r \rbrace$$
|
237 |
|
238 |
+
$$\mathbf{p} = C(\mathbf{s}_r)$$
|
239 |
+
where \\(\sigma\\) is the standard deviation of the initial random state. Given an init random state \\(\mathbf{s}_0\\), the model repeatedly applies the core recurrent
|
240 |
block \\(R\\), which accepts the latent state \\(\mathbf{s}_{i-1}\\) and the embedded input \\(\mathbf{e}\\) and outputs a new latent state \\(\mathbf{s}_i\\).
|
241 |
After finishing all iterations, the coda block processes the last state and produces the probabilities of the next token.
|
242 |
|