|
--- |
|
language: en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- agentic-rl |
|
- agent |
|
- qwen |
|
- lora |
|
- file-operations |
|
- api-calls |
|
base_model: Qwen/Qwen2.5-0.5B-Instruct |
|
--- |
|
|
|
# Realistic Agentic Qwen Model |
|
|
|
This model is fine-tuned on realistic agent tasks using agentic RL techniques. It learns to take concrete actions like file operations, API calls, and system commands. |
|
|
|
## Model Description |
|
|
|
- **Base Model**: Qwen/Qwen2.5-0.5B-Instruct |
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
|
- **Training Data**: 5 successful agent trajectories |
|
- **Actions Learned**: file operations, API calls, bash commands, task completion |
|
- **Reward System**: GRPO-style with trajectory-end rewards |
|
|
|
## Training Results |
|
|
|
- **Loss Improvement**: 4.4078 |
|
- **Final Loss**: 5.9926 |
|
- **Training Samples**: 5 |
|
- **Training Date**: 2025-07-24 |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "allthingssecurity/realistic-agentic-qwen" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
# Example usage |
|
problem = "Create a configuration file with settings" |
|
inputs = tokenizer(f"Task: {problem}", return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Actions the Model Can Perform |
|
|
|
- `create_file`: Create files with specific content |
|
- `write_to_file`: Write data to existing files |
|
- `api_call`: Make HTTP API requests |
|
- `search_files`: Search for patterns in files |
|
- `bash_command`: Execute safe system commands |
|
- `complete_task`: Mark tasks as completed with validation |
|
|
|
## Training Examples |
|
|
|
The model was trained on tasks like: |
|
- Creating configuration files |
|
- Making API calls and saving responses |
|
- Searching files for specific patterns |
|
- Generating reports and summaries |
|
|
|
## Limitations |
|
|
|
- Only supports safe, predefined actions |
|
- Simulated environment for training |
|
- Best suited for file/API/system interaction tasks |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
``` |
|
@misc{realistic-agentic-qwen, |
|
title={Realistic Agentic Qwen Model}, |
|
author={Smart RL Trainer}, |
|
year={2024}, |
|
url={https://huggingface.co/allthingssecurity/realistic-agentic-qwen} |
|
} |
|
``` |
|
|