YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Content Classification LoRA Adapter for Gemma-2B

A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.

Used in a pipeline.

Technical Specifications

Base Model

  • Model: unsloth/gemma-2b
  • LoRA Rank: 64
  • Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
  • Task: CAUSAL_LM
  • Dropout: 0
  • Alpha: 32

Input/Output Format

Input XML structure:

<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction>
<suitable>
  <content>{input_text}</content>

Output XML structure:

  <thinking>{reasoning_process}</thinking>
  <category>{content_type}</category>
  <should_index>{true|false}</should_index>
</suitable>

The model then expects an indefinite list of <suitable> ... </suitable> that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.

Your stop token should be </suitable>.

Deployment

VLLM Server Setup

export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

vllm serve unsloth/gemma-2-2b \
  --gpu-memory-utilization=1 \
  --port 6002 \
  --served-model-name="gemma" \
  --trust-remote-code \
  --max-model-len 8192 \
  --disable-log-requests \
  --enable-lora \
  --lora-modules lora=./dataset/output/unsloth/lora_model \
  --max-lora-rank 64

Processing Pipeline

  1. Install Dependencies:
pip install requests tqdm concurrent.futures
  1. Run Content Processor:
python process.py --input corpus.jsonl --output results.jsonl --threads 24

Client Implementation

import requests

def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
    xml_content = (
        '<instruction>Determine true or false if the following content is '
        'suitable and should be indexed.</instruction>\n'
        '<suitable>\n'
        f'  <content>{text}</content>'
    )
    
    response = requests.post(
        vllm_url,
        json={
            "prompt": xml_content,
            "max_tokens": 6000,
            "temperature": 1,
            "model": "lora",
            "stop": ["</suitable>"]
        },
        timeout=30000
    )
    
    completion = response.json()["choices"][0]["text"]
    
    # Parse XML tags
    import re
    def extract_tag(tag: str) -> str:
        match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL)
        return match.group(1).strip() if match else ""
        
    return {
        "thinking": extract_tag("thinking"),
        "category": extract_tag("category"),
        "should_index": extract_tag("should_index")
    }

Example Usage

text = """Multiservice Tactics, Techniques, and Procedures
for
Nuclear, Biological, and Chemical Aspects of Consequence
Management

TABLE OF CONTENTS..."""

result = classify_content(text)
print(result)

Example output:

{
    "thinking": "This is a table of contents for a document, not the actual content.",
    "category": "table of contents",
    "should_index": "false"
}

Batch Processing

The included processor supports parallel processing of JSONL files:

from request_processor import RequestProcessor

processor = RequestProcessor(
    input_file="corpus.jsonl",
    output_file="results.jsonl",
    num_threads=24
)
processor.process_file()

Input JSONL format:

{
    "pid": "document_id",
    "docid": "path/to/source",
    "content": "document text",
    "metadata": {
        "key": "value"
    }
}

Output JSONL format:

{
    "pid": "document_id",
    "docid": "path/to/source",
    "content": "document text",
    "metadata": {
        "key": "value"
    },
    "thinking": "reasoning process",
    "category": "content type",
    "should_index": "true/false",
    "processed_at": "2024-10-22 02:52:33"
}

Implementation and Performance Considerations

  • Use thread pooling for parallel processing
  • Implement atomic writes with file locking
  • Progress tracking with tqdm
  • Automatic error handling and logging
  • Configurable thread count for optimization

Error Handling

Errors are captured in the output JSONL:

{
    "error": "error message",
    "processed_at": "timestamp"
}

Monitor errors in real-time:

tail -f results.jsonl | grep error
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.