Model Card: Graph-R1 Series

This model card covers the Graph-R1 series of models, including the final released versions and variants used in ablation studies. All information is based on the provided research paper.

Model Details

  • Model Developer: HKUST-DSAIL
  • Model Series: Graph-R1
  • Model Variants:
    • Graph-R1-7B: Fine-tuned from Qwen2.5-7B-Instruct-1M.
    • Graph-R1-1.5B: Fine-tuned from Qwen2.5-1.5B.
    • Ablation Models: Multiple variants based on different training configurations (e.g., data volume, training stages, reward functions, curriculum learning strategies).
  • Model Type: Small reasoning language model, specialized in solving complex NP graph-theoretic problems.
  • Architecture:
    • Base Model: Qwen2.5
    • Training Framework:
      1. Cold-start Supervised Fine-Tuning (SFT): Fine-tuned using long Chain-of-Thought (Long-CoT) data extracted from the QwQ-32B model to inject graph reasoning knowledge.
      2. Reasoning Optimization via Reinforcement Learning (RL): Employs a Group Relative Policy Optimization (GRPO)-based RL framework, combined with a curriculum learning strategy.
  • Model Date: 2025/04

Intended Use

  • Primary Use Cases:
    • Solving complex graph-theoretic computational problems at the NP-Complete level, such as the Traveling Salesman Problem (TSP), Graph Edit Distance (GED), and Maximum Clique Problem (MCP).
    • Serving as a compact, resource-efficient reasoning model for academic research and practical applications.
  • Potential Cross-Domain Applications:
    • The model demonstrates transferability to other complex reasoning tasks, including mathematics, programming, STEM, and logical reasoning.
Downloads last month
9
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HKUST-DSAIL/Graph-R1-ablation-7B-SFT-mid-RL-low

Base model

Qwen/Qwen2.5-7B
Finetuned
(43)
this model

Collection including HKUST-DSAIL/Graph-R1-ablation-7B-SFT-mid-RL-low