GitHub Repo Model Weights Meta AI Llama 2 Licence


Llama-2-13B-Computer-Engineering

Overview

Llama-2-13B-Computer-Engineering is a fine‑tuned variant of LLaMA‑2‑13B, adapted for computer engineering, computer architecture, systems, and algorithms.
The model was trained using QLoRA (4‑bit quantization), then merged into a single checkpoint.
This allows 13B‑scale reasoning to run in ~6.6 GB of storage and ~16GB of GPU memory, making it usable on a single modern GPU.


Training Setup

  • Base model: LLaMA‑2‑13B
  • Fine‑tuning method: QLoRA (4‑bit NF4) + LoRA adapters (r=16, α=32)
  • Optimized Layers: Attention projection modules (q_proj, k_proj, v_proj, o_proj)
  • Final merge: LoRA weights merged into the base model → single merged checkpoint
  • Resulting size: ~6.6 GB (safetensors sharded files) vs. ~24 GB fp16

Dataset

The dataset was curated from multiple sources to emphasize reasoning, explanations, and code writing in computer engineering contexts.

Included sources:

  • MMLU (Computer Security subset) → exam‑style questions on systems and security
  • CodeAlpaca‑20k (filtered) → algorithm, data structures, complexity, trees, sorting/searching, graphs
  • OpenOrca subset → reasoning tasks mentioning computer systems and architecture
  • Custom technical examples (hand‑crafted) on:
    • CPU pipelining & instruction‑level parallelism
    • Cache coherency and MESI protocol
    • Compiler optimizations (instruction scheduling, inlining, loop unrolling)
    • RISC vs. CISC architectures
    • Memory hierarchies (registers, caches, RAM, storage)
    • Branch prediction
    • Example algorithms (binary search, stacks, etc.)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Irfanuruchi/Llama-2-13B-Computer-Engineering",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Irfanuruchi/Llama-2-13B-Computer-Engineering")

prompt = """### Instruction:
Explain CPU pipelining and its advantages.

### Response:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Responses

Q: What is cache coherency in multicore systems? A: Cache coherence ensures that all processor cores observe a consistent view of memory. Protocols such as MESI handle invalidation and updates when one core modifies data, preventing stale values and race conditions.

Q: Implement a stack in Python. A: Produces a Stack class with methods for push, pop, peek, is_empty, and size.


Limitations

  • While optimized for computer engineering, performance outside this scope is similar to the base LLaMA‑2‑13B.

License

  • Base model:Meta's LLaMA 2 license.
  • Fine‑tuned weights: Distributed under the same license.
  • Datasets: Combination of open academic sets (MMLU, CodeAlpaca, OpenOrca) and custom educational material.
Downloads last month
32
Safetensors
Model size
13B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Irfanuruchi/Llama-2-13B-Computer-Engineering

Adapter
(188)
this model

Datasets used to train Irfanuruchi/Llama-2-13B-Computer-Engineering