Sashavav
/

Translator

Sashavav commited on May 21, 2025

Commit

7edfd55

verified ·

1 Parent(s): 33fdec5

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,13 +1,40 @@
 # Translator
-This is a research project to create a translator from an article Attention Is All You Need.
-At current state, I don't have enough resources to train a model for this task,
-so I'm presenting only the decoder, that can generate some text based on the input.
-# How to launch
 - Clone repository
 - Run code
 ```python
 from Translator import Writer
-writer = Writer.from_pretrained("Sashavav/Translator") #  .to("cuda")
-print(writer(input_seq="One day I saw a "))
-```

 # Translator
+This is a research project to create a model that can work with text
+### How to launch in docker environment
+### How to launch in your environment
 - Clone repository
+- Install dependencies by
+```shell
+pip install poetry && poetry install
+```
 - Run code
 ```python
 from Translator import Writer
+writer = Writer.from_pretrained() #  .to("cuda")
+print(writer(input_seq="One day I saw a ", temperature=2))  # I highly recommend high temperature
+```
+# Model architecture and training pipeline
+Transformer decoder architecture with params:
+- decoder blocks = 4
+- vocab size = 8192
+- embedding_size = 512
+- number of heads = 8
+- hidden size in FFN = 1024
+- max_sequence_length = 128
+Trained with params:
+- loss = CrossEntropyLoss
+- optimizer = Adam
+- batch = 400
+- accumulation steps = 3
+- epochs = 10
+- nums of sequences in dataset = 21kk
+Total training time: 10 hours
+# Sources
+- Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
+- [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)