Positive Tuning LLAMAv2 with QLora on Google Colab for Free – KDnuggets #Imaginations Hub

Positive Tuning LLAMAv2 with QLora on Google Colab for Free – KDnuggets #Imaginations Hub
Image source - Pexels.com



Generated utilizing ideogram.ai with the immediate: “A photograph of LLAMA with the banner written “QLora” on it., 3d render, wildlife images”

 

It was a dream to fine-tune a 7B mannequin on a single GPU free of charge on Google Colab till not too long ago. On 23 Might 2023, Tim Dettmers and his crew submitted a revolutionary paper[1] on fine-tuning Quantized Giant Language Fashions.

A Quantized mannequin is a mannequin that has its weights in an information kind that’s decrease than the information kind on which it was educated. For instance, should you practice a mannequin in a 32-bit floating level, after which convert these weights to a decrease knowledge kind reminiscent of 16/8/4 bit floating level such that there’s minimal to no impact on the efficiency of the mannequin.

 

Fine Tuning LLAMAv2 with QLora on Google Colab for Free
Supply [2]

 

We’re not going to speak a lot concerning the concept of quantization right here, You possibly can check with the superb weblog publish by Hugging-Face[2][3] and a very good YouTube video[4] by Tim Dettmers himself to know the underlying concept.

In brief, it may be mentioned that QLora means:

Positive-Tuning a Quantized Giant Language fashions utilizing Low Rank Adaptation Matrices (LoRA)[5]

Let’s leap straight into the code:

 

 

It is very important perceive that the massive language fashions are designed to take directions, this was first launched within the 2021 ACL paper[6]. The concept is straightforward, we give a language mannequin an instruction, and it follows the instruction and performs that process. So the dataset that we need to fine-tune our mannequin needs to be within the instruct format, if not we will convert it.

One of many widespread codecs is the instruct format. We will probably be utilizing the Alpaca Immediate Template[7] which is

Under is an instruction that describes a process, paired with an enter that gives additional context. Write a response that appropriately completes the request.

### Instruction:
instruction

### Enter:
enter

### Response:
response

 

We will probably be utilizing the SNLI dataset which is a dataset that has 2 sentences and the connection between them whether or not they’re contradiction, entailment of one another, or impartial. We will probably be utilizing it to generate contradiction for a sentence utilizing LLAMAv2. We are able to load this dataset merely utilizing pandas.

import pandas as pd

df = pd.read_csv('snli_1.0_train_matched.csv')
df['gold_label'].value_counts().plot(type='barh')

 

Fine Tuning LLAMAv2 with QLora on Google Colab for Free
Labels Distribution

 

We are able to see just a few random contradiction examples right here.

df[df['gold_label'] == 'contradiction'].pattern(10)[['sentence1', 'sentence2']]

 

Fine Tuning LLAMAv2 with QLora on Google Colab for Free
Contradiction Examples from SNLI

 

Now we will create a small operate that takes solely the contradictory sentences and converts the dataset instruct format.

def convert_to_format(row):
    sentence1 = row['sentence1']
    sentence2 = row['sentence2']ccccc
    immediate = """Under is an instruction that describes a process paired with enter that gives additional context. Write a response that appropriately completes the request."""
    instruction = """Given the next sentence, your job is to generate the negation for it within the json format"""
    enter = str(sentence1)
    response = f"""```json
'orignal_sentence': 'sentence1', 'generated_negation': 'sentence2'
```
"""
    if len(enter.strip()) == 0:  #  immediate + 2 new traces + ###instruction + new line + enter + new line + ###response
        textual content = immediate + "nn### Instruction:n" + instruction + "n### Response:n" + response
    else:
        textual content = immediate + "nn### Instruction:n" + instruction + "n### Enter:n" + enter + "n" + "n### Response:n" + response
    
    # we'd like 4 columns for auto practice, instruction, enter, output, textual content
    return pd.Sequence([instruction, input, response, text])

new_df = df[df['gold_label'] == 'contradiction'][['sentence1', 'sentence2']].apply(convert_to_format, axis=1)
new_df.columns = ['instruction', 'input', 'output', 'text']

new_df.to_csv('snli_instruct.csv', index=False)

 

Right here is an instance of the pattern knowledge level:

"Under is an instruction that describes a process paired with enter that gives additional context. Write a response that appropriately completes the request.

### Instruction:
Given the next sentence, your job is to generate the negation for it within the json format
### Enter:
A pair taking part in with a little bit boy on the seaside.

### Response:
```json
'orignal_sentence': 'A pair taking part in with a little bit boy on the seaside.', 'generated_negation': 'A pair watch a little bit woman play by herself on the seaside.'
```

 

Now we have now our dataset within the right format, let’s begin with fine-tuning. Earlier than beginning it, let’s set up the mandatory packages. We will probably be utilizing speed up, peft (Parameter environment friendly Positive Tuning), mixed with Hugging Face Bits and bytes and transformers.

!pip set up -q speed up==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

 

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

 

You possibly can add the formatted dataset to the drive and cargo it within the Colab.

from google.colab import drive
import pandas as pd

drive.mount('/content material/drive')

df = pd.read_csv('/content material/drive/MyDrive/snli_instruct.csv')

 

You possibly can convert it to the Hugging Face dataset format simply utilizing from_pandas methodology, this will probably be useful in coaching the mannequin.

from datasets import Dataset

dataset = Dataset.from_pandas(df)

 

We will probably be utilizing the already quantized LLamav2 mannequin which is supplied by abhishek/llama-2–7b-hf-small-shards. Let’s outline some hyperparameters and variables right here:

# The mannequin that you just need to practice from the Hugging Face hub
model_name = "abhishek/llama-2-7b-hf-small-shards"

# Positive-tuned mannequin identify
new_model = "llama-2-contradictor"

################################################################################
# QLoRA parameters
################################################################################

# LoRA consideration dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout likelihood for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base mannequin loading
use_4bit = True

# Compute dtype for 4-bit base fashions
bnb_4bit_compute_dtype = "float16"

# Quantization kind (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base fashions (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output listing the place the mannequin predictions and checkpoints will probably be saved
output_dir = "./outcomes"

# Variety of coaching epochs
num_train_epochs = 1

# Allow fp16/bf16 coaching (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch measurement per GPU for coaching
per_device_train_batch_size = 4

# Batch measurement per GPU for analysis
per_device_eval_batch_size = 4

# Variety of replace steps to build up the gradients for
gradient_accumulation_steps = 1

# Allow gradient checkpointing
gradient_checkpointing = True

# Most gradient regular (gradient clipping)
max_grad_norm = 0.3

# Preliminary studying charge (AdamW optimizer)
learning_rate = 1e-5

# Weight decay to use to all layers besides bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to make use of
optim = "paged_adamw_32bit"

# Studying charge schedule
lr_scheduler_type = "cosine"

# Variety of coaching steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to studying charge)
warmup_ratio = 0.03

# Group sequences into batches with similar size
# Saves reminiscence and hurries up coaching significantly
group_by_length = True

# Save checkpoint each X updates steps
save_steps = 0

# Log each X updates steps
logging_steps = 100

################################################################################
# SFT parameters
################################################################################

# Most sequence size to make use of
max_seq_length = None

# Pack a number of quick examples in the identical enter sequence to extend effectivity
packing = False

# Load the whole mannequin on the GPU 0
device_map = "": 0

 

Most of those are fairly simple hyper-parameters having these default values. You possibly can all the time check with the documentation for extra particulars.

We are able to now merely use BitsAndBytesConfig class to create the config for 4-bit fine-tuning.

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

 

Now we will load the bottom mannequin with 4 bit BitsAndBytesConfig and tokenizer for Positive-Tuning.

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "proper"

mannequin = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
mannequin.config.use_cache = False
mannequin.config.pretraining_tp = 1

 

We are able to now create the LoRA config and set the coaching parameters.

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set coaching parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

 

Now we will merely use SFTTrainer which is supplied by trl from HuggingFace to begin the coaching.

# Set supervised fine-tuning parameters
coach = SFTTrainer(
    mannequin=mannequin,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="textual content",  # that is the textual content column in dataset 
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Prepare mannequin
coach.practice()

# Save educated mannequin
coach.mannequin.save_pretrained(new_model)

 

This may begin the coaching for the variety of epochs you have got set above. As soon as the mannequin is educated, ensure that to reserve it within the drive with the intention to load it once more (as you must restart the session within the colab). You possibly can retailer the mannequin within the drive by way of zip and mv command.

!zip -r llama-contradictor.zip outcomes llama-contradictor
!mv llama-contradictor.zip /content material/drive/MyDrive

 

Now while you restart the Colab session, you possibly can transfer it again to your session once more.

!unzip /content material/drive/MyDrive/llama-contradictor.zip -d .

 

It’s essential load the bottom mannequin once more and merge it with the fine-tuned LoRA matrices. This may be finished utilizing merge_and_unload() operate.

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "proper"

base_model = AutoModelForCausalLM.from_pretrained(
    "abhishek/llama-2-7b-hf-small-shards",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="": 0,
)

mannequin = PeftModel.from_pretrained(base_model, '/content material/llama-contradictor')
mannequin = mannequin.merge_and_unload()
pipe = pipeline(process="text-generation", mannequin=mannequin, tokenizer=tokenizer, max_length=200)

 

 

You possibly can take a look at your mannequin by merely passing within the inputs in the identical immediate template that we have now outlined above.

prompt_template = """### Instruction:
Given the next sentence, your job is to generate the negation for it within the json format
### Enter:


### Response:
"""

sentence = "The climate forecast predicts a sunny day with a excessive temperature round 30 levels Celsius, good for a day on the seaside with family and friends."

input_sentence = prompt_template.format(sentence.strip())

end result = pipe(input_sentence)
print(end result)

 

 

### Instruction:
Given the next sentence, your job is to generate the negation for it within the json format
### Enter:
The climate forecast predicts a sunny day with a excessive temperature round 30 levels Celsius, good for a day on the seaside with family and friends.

### Response:
```json

  "sentence": "The climate forecast predicts a sunny day with a excessive temperature round 30 levels Celsius, good for a day on the seaside with family and friends.",
  "negation": "The climate forecast predicts a wet day with a low temperature round 10 levels Celsius, not ideally suited for a day on the seaside with family and friends."

```

 

 

There will probably be many instances when the mannequin will carry on predicting even after the response is generated because of the token restrict. On this case, that you must add a post-processing operate that filters the JSON half which is what we’d like. This may be finished utilizing a easy Regex.

import re
import json

def format_results(s):
  sample = r'```jsonn(.*?)n```'

  # Discover all occurrences of JSON objects within the string
  json_matches = re.findall(sample, s, re.DOTALL)
  if not json_matches:
    # attempt to discover 2nd sample
    sample = r'.*?"sentence":.*?"negation":.*?'
    json_matches = re.findall(sample, s)

  # Return the primary JSON object discovered, or None if no match is discovered
  return json.masses(json_matches[0]) if json_matches else None

 

This will provide you with the required output as an alternative of the mannequin repeating random output tokens.

 

 

On this weblog, you realized the fundamentals of QLora, fine-tuning a LLama v2 mannequin on Colab utilizing QLora, Instruction Tuning, and a pattern template from the Alpaca dataset that can be utilized to instruct tune a mannequin additional.

 

References

 

[1]: QLoRA: Environment friendly Finetuning of Quantized LLMs, 23 Might 2023, Tim Dettmers et al.

[2]: https://huggingface.co/weblog/hf-bitsandbytes-integration

[3]: https://huggingface.co/weblog/4bit-transformers-bitsandbytes

[4]: https://www.youtube.com/watch?v=y9PHWGOa8HA

[5]: https://arxiv.org/abs/2106.09685

[6]: https://aclanthology.org/2022.acl-long.244/

[7]: https://crfm.stanford.edu/2023/03/13/alpaca.html

[8]: Colab Pocket book by @maximelabonne https://colab.analysis.google.com/drive/1PEQyJO1-f6j0S_XJ8DV50NkpzasXkrzd?usp=sharing

 
 
Ahmad Anis is a passionate Machine Studying Engineer and Researcher at the moment working at redbuffer.ai. Past his day job, Ahmad actively engages with the Machine Studying neighborhood. He serves as a regional lead for Cohere for AI, a nonprofit devoted to open science, and is an AWS Group Builder. Ahmad is an lively contributor at Stackoverflow, the place he has 2300+ factors. He has contributed to many well-known open-source initiatives, together with Shap-E by OpenAI.
 




Related articles

You may also be interested in