# Casual Language Modeling

Causal language modeling predicts the next token in a sequence of tokens, and the model can only attend to tokens on
the left. This means the model cannot see future tokens. GPT-2 is an example of a causal language model.

Before you begin, make sure you have the necessary library installed:

```bash
pip install transformers
```

In [None]:
pip install transformers

In [None]:
import torch

In [None]:
import transformers

The BLOOM model has been proposed with its various versions through the BigScience Workshop. BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on 46 different languages and 13 programming languages.

In [None]:
from transformers import BloomForCausalLM

In [None]:
from transformers import BloomTokenizerFast

load the tokenizer and configuration objects

In [None]:
model = BloomForCausalLM.from_pretrained("bigscience/bloom-560m")

In [None]:
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")

Define an input sentence, tokenize it and predict the next tokens int eh senetence

In [None]:
prompt = "I really wish I could" #This is the text string that the LLM will finish
result_length = len(prompt.split()) +15 #This determines the number of words to follow the prompt
inputs = tokenizer(prompt, return_tensors="pt")

Three different strategies for generating text using Casual language Modeling

In [None]:
# Greedy Search
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length
                      )[0]))

I really wish I could have a better idea of what I am doing. I am not sure what


In [None]:
# Beam Search
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length,
                       num_beams=2, # number of beams of bream search
                       no_repeat_ngram_size=2, # size of n-grams to avoid repoeating
                       early_stopping=True # whether to stop the generation process early of the model predicts an end of sequence token
                      )[0]))

I really wish I could do it myself, but I can't. I don't have the time or the


In [None]:
# Sampling Top-k + Top-p
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length,
                       do_sample=True,
                       top_k=50, # consider top 50 most likely token
                       top_p=0.9 # consider tokens with a curmulative probabilities
                      )[0]))

I really wish I could have helped you out. But if you need a reminder or if you have


Does our model contain bias? Almost certainly. Let's see what happens when we prompt with "man" vs. "woman" in terms of career text prediction.

In [None]:
prompt = "the man works as a"
result_length = len(prompt.split()) +3 #This determines the number of words to follow the prompt
inputs = tokenizer(prompt, return_tensors="pt")

In [None]:
# Beam Search
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length,
                       num_beams=2,
                       no_repeat_ngram_size=2,
                       early_stopping=True
                      )[0]))

In [None]:
prompt = "the woman works as a"
result_length = len(prompt.split()) +3 #This determines the number of words to follow the prompt
inputs = tokenizer(prompt, return_tensors="pt")

In [None]:
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length,
                       num_beams=2,
                       no_repeat_ngram_size=2,
                       early_stopping=True
                      )[0]))

the woman works as a housekeeper


Bias confirmed. This could be problematic in terms of deploying a chatbot to a production environment.