Isaac-0.1 / README.md
akshat-perceptron's picture
Add transformers tag and example code snippet (#2)
c10d94a verified
---
license: cc-by-nc-4.0
base_model:
- Qwen/Qwen3-1.7B
- google/siglip2-so400m-patch14-384
library_name: transformers
tags:
- perceptron
- issac-0.1
---
# [Isaac-0.1 by Perceptron](https://www.perceptron.inc/blog/introducing-isaac-0-1)
*Note this is the Post-trained model* [Try out the model on our playground](https://www.perceptron.inc/demo)
We're introducing Isaac 0.1, our first perceptive-language model and a major step toward building AI systems that can understand and interact with the physical world. Isaac 0.1 is an open-source, 2B-parameter model built for real-world applications. It sets a new standard for efficiency, delivering capabilities that meet or exceed those of models over 50 times its size.
Founded by the team behind Meta's Chameleon multimodal models, Perceptron is tackling a fundamental challenge: bringing the power of physical AI to the dynamic, multimodal, and real-time environments we live and work in.
Isaac 0.1 is the first in our family of models built to be the intelligence layer for the physical world. It's now available open source for researchers and developers everywhere.
## What’s new in Isaac 0.1
**Visual QA, simply trained**
Strong results on standard understanding benchmarks with a straightforward, reproducible training recipe.
**Grounded spatial intelligence**
Precise pointing and localization with robust spatial reasoning. Ask “what’s broken in this machine?” and get grounded answers with highlighted regions—handling occlusions, relationships, and object interactions.
**In-context learning for perception**
Show a few annotated examples (defects, safety conditions, etc.) in the prompt and the model adapts—no YOLO-style fine-tuning or custom detector stacks required.
**OCR & fine-grained detail**
Reads small text and dense scenes reliably, across resolutions, with dynamic image handling for tiny features and cluttered layouts.
**Conversational Pointing**
A new interaction pattern where language and vision stay in lockstep: every claim is grounded and visually cited, reducing hallucinations and making reasoning auditable.
## Benchmarks
![visual_qa](https://framerusercontent.com/images/WFsL5CWqxvsmJrlUuMXA5T8LdVY.png?width=2216&height=1610)
![grounding](https://framerusercontent.com/images/2T1Th5SaXdYhNKyxzd2ge61diA.png?width=1736&height=1260)
## Example
```bash
pip install perceptron
```
## Example using transformers
Learn more: [Huggingface Example Repo](https://github.com/perceptron-ai-inc/perceptron/tree/main/huggingface)
```bash
!git clone https://github.com/perceptron-ai-inc/perceptron.git
!cp -r perceptron/huggingface ./huggingface
```
```python
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
from huggingface.modular_isaac import IsaacProcessor
tokenizer = AutoTokenizer.from_pretrained("PerceptronAI/Isaac-0.1", trust_remote_code=True, use_fast=False)
config = AutoConfig.from_pretrained("PerceptronAI/Isaac-0.1", trust_remote_code=True)
processor = IsaacProcessor(tokenizer=tokenizer, config=config)
model = AutoModelForCausalLM.from_pretrained("PerceptronAI/Isaac-0.1", trust_remote_code=True)
```