Llama-2-13B-Computer-Engineering / README.md

Update README.md

74380f0 verified about 1 month ago

4.02 kB

	---
	license: llama2
	base_model: meta-llama/Llama-2-13b-hf
	tags:
	- llama2
	- computer-engineering
	- computer-architecture
	- algorithms
	- systems
	- qora
	- lora
	- quantized
	- merged
	language:
	- en
	library_name: transformers
	datasets:
	- cais/mmlu
	- sahil2801/CodeAlpaca-20k
	- Open-Orca/OpenOrca
	model_type: llama
	---

	[![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-181717?style=for-the-badge&logo=github)](https://github.com/IrfanUruchi/Llama-2-13B-Computer-Engineering-)
	[![Model Weights](https://img.shields.io/badge/🤗-Model_Weights-FFD21F?style=for-the-badge)](https://huggingface.co/Irfanuruchi/Llama-2-13B-Computer-Engineering)
	[![Meta AI Llama 2 Licence](https://img.shields.io/badge/License-Apache_2.0-blue.svg?style=for-the-badge)](https://huggingface.co/meta-llama/Llama-2-13b-hf)

	---

	# Llama-2-13B-Computer-Engineering

	### Overview

	Llama-2-13B-Computer-Engineering is a fine‑tuned variant of LLaMA‑2‑13B, adapted for computer engineering, computer architecture, systems, and algorithms.
	The model was trained using QLoRA (4‑bit quantization), then merged into a single checkpoint.
	This allows 13B‑scale reasoning to run in ~6.6 GB of storage and ~16GB of GPU memory, making it usable on a single modern GPU.

	---

	### Training Setup
	- Base model: [LLaMA‑2‑13B](https://huggingface.co/meta-llama/Llama-2-13b-hf)
	- Fine‑tuning method: QLoRA (4‑bit NF4) + LoRA adapters (`r=16`, `α=32`)
	- Optimized Layers: Attention projection modules (`q_proj`, `k_proj`, `v_proj`, `o_proj`)
	- Final merge: LoRA weights merged into the base model → single merged checkpoint
	- Resulting size: ~6.6 GB (`safetensors` sharded files) vs. ~24 GB fp16

	---

	## Dataset
	The dataset was curated from multiple sources to emphasize reasoning, explanations, and code writing in computer engineering contexts.

	Included sources:
	- MMLU (Computer Security subset) → exam‑style questions on systems and security
	- CodeAlpaca‑20k (filtered) → algorithm, data structures, complexity, trees, sorting/searching, graphs
	- OpenOrca subset → reasoning tasks mentioning computer systems and architecture
	- Custom technical examples (hand‑crafted) on:
	- CPU pipelining & instruction‑level parallelism
	- Cache coherency and MESI protocol
	- Compiler optimizations (instruction scheduling, inlining, loop unrolling)
	- RISC vs. CISC architectures
	- Memory hierarchies (registers, caches, RAM, storage)
	- Branch prediction
	- Example algorithms (binary search, stacks, etc.)


	---

	### Usage


	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"Irfanuruchi/Llama-2-13B-Computer-Engineering",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("Irfanuruchi/Llama-2-13B-Computer-Engineering")

	prompt = """### Instruction:
	Explain CPU pipelining and its advantages.

	### Response:"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	```

	---

	## Example Responses

	Q: What is cache coherency in multicore systems?
	A: Cache coherence ensures that all processor cores observe a consistent view of memory. Protocols such as MESI handle invalidation and updates when one core modifies data, preventing stale values and race conditions.

	Q: Implement a stack in Python.
	A: Produces a Stack class with methods for push, pop, peek, is_empty, and size.

	---


	## Limitations

	- While optimized for computer engineering, performance outside this scope is similar to the base LLaMA‑2‑13B.

	---

	## License

	- Base model:[Meta's LLaMA 2 license](https://huggingface.co/meta-llama/Llama-2-13b-hf).
	- Fine‑tuned weights: Distributed under the same license.
	- Datasets: Combination of open academic sets (MMLU, CodeAlpaca, OpenOrca) and custom educational material.