File size: 2,583 Bytes
8e929c9
 
 
a90412e
8e929c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8de69f8
8e929c9
8de69f8
 
 
d7c8a52
 
8e929c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d7c8a52
8e929c9
 
 
 
 
d7c8a52
8e929c9
d7c8a52
 
8e929c9
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: apache-2.0
---
## Markov vs Transformer: Text Generation Experiment

This project compares classic statistical models (Markov chains & n-grams) with modern transformers (GPT-2) on the same text corpus (Pushkin’s poetry).
It highlights the jump from local memory to long-context reasoning in text generation.

## Motivation

In 1906, Andrey Markov showed how letter dependencies could model text.
Fast forward a century — transformers now handle hundreds of tokens of context with attention.
This repo recreates that evolution, side by side.

## Models Compared

Markov Chains (n=1,3,5)

Generate text from local character windows.

Capture letter frequencies, small fragments of fluency.

Fail on long-term dependencies → loops & gibberish.

GPT-2 (medium)

Pretrained transformer with 345M parameters.

Extends prompts into fluent, poetic lines.

Still prone to degeneration without sampling controls.

## Results

Prompt: “I loved you” (Pushkin corpus)

Markov vs GPT-2 on Pushkin (Prompt: “I loved you”)
Model	Sample Output (excerpt)
Markov (n=3)	“I longer trouble you so tenderly, sorrow. I loved in my so sincerely extinguished in my shyness, no loved now by jealousy. I loved you may be loved you may be loved you so tenderly…”
Markov (n=5)	“I loved you silently, without hope, Tormented now by jealousy. I loved you: and perhaps this flame Has not entirely extinguished in my soul; But let it no longer trouble you; I do not entirely extinguished in my soul; But let it…”
GPT-2 medium	“I love you with all my heart, without reserve. I am in love with you now, and have never been. I am in love with you now, and will never be… I love you with all my heart, without reserve…”

## Key Observations

Markov chains: good for local coherence, but collapse quickly.

Transformers: sustain global structure, more creative continuations.

Both models show failure modes — repetition loops highlight why sampling strategies matter.

Demonstrates the leap from statistical modeling → neural networks → generative AI.

## How to Run

Markov chains

mc = NGramMarkov(n=5)
mc.train(corpus)
print(mc.generate("<Pushkin Poetry Corpus of Choice>", 200))


## GPT-2 (via 🤗 Transformers)

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2-medium")

print(generator("<Pushkin Poetry Corpus of Choice>", max_length=80, do_sample=True))

✨ Author

## Developed by [Naga Adithya Kaushik (GenAIDevTOProd)], assisted with AI(Debug, text corpus generation only)

For research, debug, and teaching purposes.