Model info

This is a Llama-3.2 1B Instruct model converted to an Anemll model using Anemll 0.1.1 and with the context size set to 2048 (cannot be changed at runtime).

It was converted using the convert_model.sh script and a --context 2048 parameter.

Some things ⬇️

  • Anemll is in alpha, proceed at your own risk.
  • You will need to clone the Anemll repo to run this model (unlike the Anemll HF models that include runnable chat.py files).
    • Once you've downloaded both this model and the Anemll repo you can either follow the chat instructions from the docs or run it using
      python full-path-to-anemll-repo/tests/chat.py \
      --meta full-path-to-this-model-repo/meta.yaml
      
  • Anemll models can only be ran with the Anemll library and on Apple silicon. DYOR if this model is for you or not. The Anemll library creators are active on X.

Below is the copy pasted README from one of the original HF Anemll models. Follow the instructions untill the run part, where instead of running a chat.py file that does not exist in this repo, you will run a chat.py file from the cloned Anemll repo (see above).

-----------------------------------------------------------------------------------------------------

ANEMLL

ANEMLL (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).

The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.

This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security.

This is critical for autonomous applications, where models run directly on the device without requiring an internet connection.


License

ANEMLL is licensed under the MIT License.
The model is based on Meta's LLaMA 3.2 and may require a separate license.

This test model is exclusively for the Meta's LLaMA 3.2 1B (512 context) model converted for CoreML, released before the official launch of the ANEMLL repository and minimal documentation. It is intended for early adopters only who requested an early release.


Requirements

  • macOS Sequoia with Apple Neural Engine and 16GB RAM
  • CoreML Tools and HuggingFace Transformers libraries
  • Python 3.9

chat.py provides a sample inference script.
chat_full.py provides a sample inference script with history and conversation management.

Installation

  1. Download the model from Hugging Face:
# Install required tools
pip install huggingface_hub

# Install Git LFS (Large File Support)
# macOS with Homebrew:
brew install git-lfs
# Or Ubuntu/Debian:
# sudo apt-get install git-lfs

# Initialize Git LFS
git lfs install

# Clone the repository with model files
git clone https://huggingface.co/anemll/anemll-Meta-Llama-3.2-1B-ctx512_0.1.1
  1. Extract model files:
# Navigate to cloned directory
cd anemll-Meta-Llama-3.2-1B-ctx512_0.1.1

# Pull LFS files (model weights)
git lfs pull

# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;
  1. Install dependencies:
pip install coremltools transformers

Coremltools:

See coremltools installation guide at https://coremltools.readme.io/v4.0/docs/installation

How to Run (----READ TOP OF README AGAIN----)

  1. Basic chat interface:
python chat.py --meta ./meta.yaml
  1. Full conversation mode with history:
python chat_full.py --meta ./meta.yaml

Note: The first time the model loads, macOS will take some time to place it on the device. Subsequent loads will be instantaneous. Use Ctrl-D to exit, Ctrl-C to interrupt inference.

More Info More Info Please check following links for later updates:

[email protected]

Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including alexgusevski/anemll-Llama-3.2-1B-Instruct-ctx2048_0.1.1