DukunLM V1.0 - Indonesian Language Model 🧙‍♂️

🚀 Welcome to the DukunLM V1.0 repository! DukunLM V1.0 is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation 🌟. This is an updated version from azale-ai/DukunLM-Uncensored-7B with full model release, not only adapter model like before 👽.

Model Details

Name Model	Parameters	Google Colab	Base Model	Dataset	Prompt Format	Fine Tune Method	Sharded Version
DukunLM-7B-V1.0-Uncensored	7B	Link	ehartford/WizardLM-7B-V1.0-Uncensored	MBZUAI/Bactrian-X (Indonesian subset)	Alpaca	QLoRA	Link
DukunLM-13B-V1.0-Uncensored	13B	Link	ehartford/WizardLM-13B-V1.0-Uncensored	MBZUAI/Bactrian-X (Indonesian subset)	Alpaca	QLoRA	Link

⚠️ Warning: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️

Installation

To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:

pip3 install -U git+https://github.com/huggingface/transformers.git
pip3 install -U git+https://github.com/huggingface/peft.git
pip3 install -U git+https://github.com/huggingface/accelerate.git
pip3 install -U bitsandbytes==0.39.0 einops==0.6.1 sentencepiece

How to Use

Normal Model

Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored", torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
streamer = TextStreamer(tokenizer)

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    streamer=streamer,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)

No Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored", torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantize Model

Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer

model = AutoModelForCausalLM.from_pretrained(
    "azale-ai/DukunLM-7B-V1.0-Uncensored-sharded",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored-sharded")
streamer = TextStreamer(tokenizer)

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    streamer=streamer,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)

No Stream Output

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "azale-ai/DukunLM-7B-V1.0-Uncensored-sharded",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored-sharded")

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if not input_prompt:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt)

else:
  prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""
  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmark

Coming soon, stay tune 🙂🙂.

Limitations

The base model language is English and fine-tuned to Indonesia
Cultural and contextual biases

License

DukunLM V1.0 is licensed under the Creative Commons NonCommercial (CC BY-NC 4.0) license.

Contributing

We welcome contributions to enhance and improve DukunLM V1.0. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request. Also we're open to sponsor for compute power.

Contact Us

[email protected]

azale-ai
/

DukunLM-7B-V1.0-Uncensored

DukunLM V1.0 - Indonesian Language Model 🧙‍♂️

Model Details

Installation

How to Use

Normal Model

Stream Output

No Stream Output

Quantize Model

Stream Output

No Stream Output

Benchmark

Limitations

License

Contributing

Contact Us

Dataset used to train azale-ai/DukunLM-7B-V1.0-Uncensored

Collection including azale-ai/DukunLM-7B-V1.0-Uncensored

DukunLM