Update README.md
Browse files
README.md
CHANGED
|
@@ -35,7 +35,7 @@ Based on growth technology, the Tele-FLM-1T model training is divided into three
|
|
| 35 |
- SwiGLU for activation function
|
| 36 |
- Linear bias disabled
|
| 37 |
- Embedding and language model head untied
|
| 38 |
-
- Input and output
|
| 39 |
|
| 40 |
Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
|
| 41 |
To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
|
|
|
|
| 35 |
- SwiGLU for activation function
|
| 36 |
- Linear bias disabled
|
| 37 |
- Embedding and language model head untied
|
| 38 |
+
- Input and output multiplier
|
| 39 |
|
| 40 |
Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
|
| 41 |
To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
|