Sashavav commited on
Commit
7edfd55
·
verified ·
1 Parent(s): 33fdec5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -7
README.md CHANGED
@@ -1,13 +1,40 @@
1
  # Translator
2
- This is a research project to create a translator from an article Attention Is All You Need.
3
- At current state, I don't have enough resources to train a model for this task,
4
- so I'm presenting only the decoder, that can generate some text based on the input.
5
 
6
- # How to launch
 
 
7
  - Clone repository
 
 
 
 
8
  - Run code
9
  ```python
10
  from Translator import Writer
11
- writer = Writer.from_pretrained("Sashavav/Translator") # .to("cuda")
12
- print(writer(input_seq="One day I saw a "))
13
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Translator
2
+ This is a research project to create a model that can work with text
 
 
3
 
4
+ ### How to launch in docker environment
5
+
6
+ ### How to launch in your environment
7
  - Clone repository
8
+ - Install dependencies by
9
+ ```shell
10
+ pip install poetry && poetry install
11
+ ```
12
  - Run code
13
  ```python
14
  from Translator import Writer
15
+ writer = Writer.from_pretrained() # .to("cuda")
16
+ print(writer(input_seq="One day I saw a ", temperature=2)) # I highly recommend high temperature
17
+ ```
18
+
19
+ # Model architecture and training pipeline
20
+ Transformer decoder architecture with params:
21
+ - decoder blocks = 4
22
+ - vocab size = 8192
23
+ - embedding_size = 512
24
+ - number of heads = 8
25
+ - hidden size in FFN = 1024
26
+ - max_sequence_length = 128
27
+
28
+ Trained with params:
29
+ - loss = CrossEntropyLoss
30
+ - optimizer = Adam
31
+ - batch = 400
32
+ - accumulation steps = 3
33
+ - epochs = 10
34
+ - nums of sequences in dataset = 21kk
35
+
36
+ Total training time: 10 hours
37
+
38
+ # Sources
39
+ - Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
40
+ - [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)