Content Classification LoRA Adapter for Gemma-2B

A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.

Used in a pipeline.

Technical Specifications

Base Model

Model: unsloth/gemma-2b
LoRA Rank: 64
Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
Task: CAUSAL_LM
Dropout: 0
Alpha: 32

Input/Output Format

Input XML structure:

<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction>
<suitable>
  <content>{input_text}</content>

Output XML structure:

  <thinking>{reasoning_process}</thinking>
  <category>{content_type}</category>
  <should_index>{true|false}</should_index>
</suitable>

The model then expects an indefinite list of <suitable> ... </suitable> that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.

Your stop token should be </suitable>.

Deployment

VLLM Server Setup

export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

vllm serve unsloth/gemma-2-2b \
  --gpu-memory-utilization=1 \
  --port 6002 \
  --served-model-name="gemma" \
  --trust-remote-code \
  --max-model-len 8192 \
  --disable-log-requests \
  --enable-lora \
  --lora-modules lora=./dataset/output/unsloth/lora_model \
  --max-lora-rank 64

Processing Pipeline

Install Dependencies:

pip install requests tqdm concurrent.futures

Run Content Processor:

python process.py --input corpus.jsonl --output results.jsonl --threads 24

Client Implementation

import requests

def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
    xml_content = (
        '<instruction>Determine true or false if the following content is '
        'suitable and should be indexed.</instruction>\n'
        '<suitable>\n'
        f'  <content>{text}</content>'
    )
    
    response = requests.post(
        vllm_url,
        json={
            "prompt": xml_content,
            "max_tokens": 6000,
            "temperature": 1,
            "model": "lora",
            "stop": ["</suitable>"]
        },
        timeout=30000
    )
    
    completion = response.json()["choices"][0]["text"]
    
    # Parse XML tags
    import re
    def extract_tag(tag: str) -> str:
        match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL)
        return match.group(1).strip() if match else ""
        
    return {
        "thinking": extract_tag("thinking"),
        "category": extract_tag("category"),
        "should_index": extract_tag("should_index")
    }

Example Usage

text = """Multiservice Tactics, Techniques, and Procedures
for
Nuclear, Biological, and Chemical Aspects of Consequence
Management

TABLE OF CONTENTS..."""

result = classify_content(text)
print(result)

Example output:

{
    "thinking": "This is a table of contents for a document, not the actual content.",
    "category": "table of contents",
    "should_index": "false"
}

Batch Processing

The included processor supports parallel processing of JSONL files:

from request_processor import RequestProcessor

processor = RequestProcessor(
    input_file="corpus.jsonl",
    output_file="results.jsonl",
    num_threads=24
)
processor.process_file()

Input JSONL format:

{
    "pid": "document_id",
    "docid": "path/to/source",
    "content": "document text",
    "metadata": {
        "key": "value"
    }
}

Output JSONL format:

{
    "pid": "document_id",
    "docid": "path/to/source",
    "content": "document text",
    "metadata": {
        "key": "value"
    },
    "thinking": "reasoning process",
    "category": "content type",
    "should_index": "true/false",
    "processed_at": "2024-10-22 02:52:33"
}

Implementation and Performance Considerations

Use thread pooling for parallel processing
Implement atomic writes with file locking
Progress tracking with tqdm
Automatic error handling and logging
Configurable thread count for optimization

Error Handling

Errors are captured in the output JSONL:

{
    "error": "error message",
    "processed_at": "timestamp"
}

Monitor errors in real-time:

tail -f results.jsonl | grep error