Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,13 +1,40 @@
|
|
| 1 |
# Translator
|
| 2 |
-
This is a research project to create a
|
| 3 |
-
At current state, I don't have enough resources to train a model for this task,
|
| 4 |
-
so I'm presenting only the decoder, that can generate some text based on the input.
|
| 5 |
|
| 6 |
-
|
|
|
|
|
|
|
| 7 |
- Clone repository
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- Run code
|
| 9 |
```python
|
| 10 |
from Translator import Writer
|
| 11 |
-
writer = Writer.from_pretrained(
|
| 12 |
-
print(writer(input_seq="One day I saw a "))
|
| 13 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Translator
|
| 2 |
+
This is a research project to create a model that can work with text
|
|
|
|
|
|
|
| 3 |
|
| 4 |
+
### How to launch in docker environment
|
| 5 |
+
|
| 6 |
+
### How to launch in your environment
|
| 7 |
- Clone repository
|
| 8 |
+
- Install dependencies by
|
| 9 |
+
```shell
|
| 10 |
+
pip install poetry && poetry install
|
| 11 |
+
```
|
| 12 |
- Run code
|
| 13 |
```python
|
| 14 |
from Translator import Writer
|
| 15 |
+
writer = Writer.from_pretrained() # .to("cuda")
|
| 16 |
+
print(writer(input_seq="One day I saw a ", temperature=2)) # I highly recommend high temperature
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
# Model architecture and training pipeline
|
| 20 |
+
Transformer decoder architecture with params:
|
| 21 |
+
- decoder blocks = 4
|
| 22 |
+
- vocab size = 8192
|
| 23 |
+
- embedding_size = 512
|
| 24 |
+
- number of heads = 8
|
| 25 |
+
- hidden size in FFN = 1024
|
| 26 |
+
- max_sequence_length = 128
|
| 27 |
+
|
| 28 |
+
Trained with params:
|
| 29 |
+
- loss = CrossEntropyLoss
|
| 30 |
+
- optimizer = Adam
|
| 31 |
+
- batch = 400
|
| 32 |
+
- accumulation steps = 3
|
| 33 |
+
- epochs = 10
|
| 34 |
+
- nums of sequences in dataset = 21kk
|
| 35 |
+
|
| 36 |
+
Total training time: 10 hours
|
| 37 |
+
|
| 38 |
+
# Sources
|
| 39 |
+
- Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
|
| 40 |
+
- [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)
|