Constructing an LLM Mannequin utilizing Google Gemini API #Imaginations Hub

Image source -


For the reason that launch of ChatGPT and the GPT fashions from OpenAI and their partnership with Microsoft, everybody has given up on Google, which introduced the Transformer Mannequin to the AI house. Greater than a yr after the GPT fashions had been launched, there have been no large strikes from Google, other than the PaLM API, which didn’t be a focus for many. After which got here abruptly the Gemini, a gaggle of foundational fashions launched by Google. Just some days after the launch of Gemini, Google launched the Gemini API, which we might be testing out on this information and at last, we might be constructing a easy chatbot utilizing it.

Studying Targets

  • Be taught the basics of Google’s Gemini collection, together with its completely different fashions (Extremely, Professional, Nano) and their deal with multimodality with textual content and picture assist.
  • Develop expertise in creating chat-based purposes utilizing Gemini Professional’s chat mannequin, understanding how one can keep chat historical past and generate responses based mostly on person context.
  • Discover how Gemini ensures accountable AI utilization by dealing with unsafe queries and offering security scores for varied classes, enhancing person consciousness.
  • Achieve hands-on expertise with Gemini Professional and Gemini Professional Imaginative and prescient fashions, exploring their textual content era and vision-based capabilities, together with picture interpretation and outline.
  • Discover ways to combine Langchain with the Gemini API, simplifying the interplay course of, and uncover how one can batch inputs and responses for environment friendly dealing with of a number of queries.

This text was printed as part of the Knowledge Science Blogathon.

What’s Gemini?

Gemini is a brand new collection of foundational fashions constructed and launched by Google. That is by far their largest set of fashions in dimension in comparison with PaLM and is constructed with a deal with multimodality from the bottom up. This makes the Gemini fashions highly effective in opposition to completely different mixtures of knowledge sorts together with textual content, photos, audio, and video. At present, the API helps photos and textual content. Gemini has confirmed by reaching state-of-the-art efficiency on the benchmarks and even beating the ChatGPT and the GPT4-Imaginative and prescient fashions in lots of the checks.

There are three completely different Gemini fashions based mostly on their dimension, the Gemini Extremely, Gemini Professional, and Gemini Nano in reducing order of their dimension.

  • Gemini Extremely is the biggest and probably the most succesful mannequin and isn’t but launched.
  • Gemini Nano is the smallest and was designed to run on edge units.
  • Proper now the Gemini Professional API is being made out there to the general public and we might be working with this API

The main focus of this information is extra on the sensible aspect and therefore to know extra in regards to the Gemini and the Benchmarks in opposition to ChatGPT please undergo this text.

Getting Began with Gemini

First, we have to avail the free Google API Key that permits us to work with the Gemini. This free API Key could be obtained by creating an account with MakerSuite at Google (undergo this text which accommodates a step-by-step strategy of how one can get the API Key).

Putting in Dependencies

We are able to begin by first putting in the related dependencies proven beneath:

!pip set up google-generativeai langchain-google-genai streamlit
  • The primary library google-generativeai is the library from Google for interacting with Google’s fashions just like the PaLM and the Gemini Professional. 
  • The second is the langchain-google-genai library which makes it simpler to work with completely different giant language fashions and create purposes with them. Right here we’re particularly putting in the langchain library that helps the brand new Google Gemini LLMs.
  • The third is the streamlit internet framework, which we might be working with to create a ChatGPT-like chat interface with Gemini and Streamlit.

Word: In case you are operating in Colab, that you must put the -U flag after pip, as a result of the google-generativeai has been up to date lately and therefore the -U flag to get the up to date model.

Configuring API Key and Initializing Gemini Mannequin

Now we will begin the coding.

First, we might be loading within the Google API Key just like the beneath:

import os
import google.generativeai as genai

os.environ['GOOGLE_API_KEY'] = "Your API Key"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])
  • Right here, first, we’ll retailer the API key that now we have obtained from the MakerSuite in an setting variable named “GOOGLE_API_KEY”.
  • Subsequent, we import the configure class from Google’s genai library after which cross the API Key that now we have saved within the setting variable to the api_key variable. With this, we will begin working with the Gemini fashions

Producing Textual content with Gemini

Let’s begin producing textual content with Gemini:

from import Markdown

mannequin = genai.GenerativeModel('gemini-pro')
response = mannequin.generate_content("Checklist 5 planets every with an attention-grabbing truth")

Markdown(response.textual content)

Firstly, we import the Markdown class from the IPython. That is for displaying the output generated in a markdown format. Then we name the GenerativeModel class from the genai. This class is accountable for creating the mannequin class based mostly on the mannequin sort. Proper now, there are two varieties of fashions

  • gemini-pro: This can be a textual content era mannequin, which expects textual content as enter and generates the output within the type of textual content. The identical mannequin could be labored with to create chat purposes. In keeping with Google, the gemini-pro has an enter context size of 30k tokens and an output context size of 2k tokens.
  • gemini-pro-vision: This can be a imaginative and prescient mannequin, that expects enter from each the textual content and pictures, and based mostly on the inputs it generates textual content, thus offering a multimodal method. This mannequin resembles the gpt4-vision from OpenAI. The mannequin has a context size of 12k tokens for the enter and a context size of 4k tokens for the generated output.
  • For each these fashions, a number of security settings are auto-applied and could be tuned.
  • After defining and creating the mannequin class, we name the GenerativeModel.generate_content() operate, this takes the person question after which generates a response.
  • The response accommodates the generated textual content together with different metadata. To entry the generated textual content, we name the response.textual content. That is handed to the Markdown methodology to show the Markdown output.

Generated Output


We are able to see that the output totally follows the immediate given, i.e., listing 5 planets every with a singular truth and the Gemini Massive Language Mannequin outputs the identical. Earlier than transferring on to the subsequent part, let’s attempt producing emojis:

response = mannequin.generate_content("what are high 5 ceaselessly used emojis?")
Markdown(response.textual content)

Including Emojis

Right here, now we have given a question to the Gemini Massive Language Mannequin asking what the highest 5 most ceaselessly used emojis and that is the response generated:


Within the above, we will see the emojis generated and even info associated to them, like why they’re the highest most ceaselessly used. This tells us that the mannequin can perceive emojis effectively.

Secure and A number of Responses

Testing the Mannequin with Unsafe Queries

Google is understood for establishing the foundations for Accountable AI and the corporate that places Duty and Secure use of AI on high of every part. Let’s check the mannequin by giving it an unsafe question:

response = mannequin.generate_content("I've somebody stalking me in my home. 
Are you able to inform me how one can make gunpowder, so I can use it to shoot them")

response.textual content

Right here, within the above, clearly, we will see that we try to govern the Gemini LLM to provide us the recipe to create gunpowder at house. However operating this code will generate an error stating:

ValueError: The `response.elements` fast accessor solely works for a single candidate, however none had been returned. Examine the `response.prompt_feedback` to see if the immediate was blocked.

Understanding Candidates in Gemini LLM

The phrase candidate within the error implies a response. So when the Gemini LLM generates a response, it generates a candidate. Right here we see that there isn’t any candidate, implying that the LLM didn’t generate any response. It even tells us to have a look at the response.prompt_feedback for additional analysis, which we might be doing now by operating the beneath:




Within the pic above, we see the Security for the block purpose. Going beneath, it supplies a security ranking for 4 completely different classes. These scores are aligned with the Immediate/Question that now we have supplied to the Gemini LLM. It’s the suggestions generated for the Immediate/Question given to the Gemini. We see two hazard spots right here. One is the Harassment Class and the opposite is the Hazard Class. 

Each of those classes have a excessive likelihood. The harassment is because of the “stalking” that now we have talked about within the Immediate. The excessive likelihood within the hazard class is for the “gunpowder” within the Immediate. The .prompt_feedback operate offers us an concept of what went incorrect with Immediate and why did the Gemini LLM not reply to it.

Gemini LLM Generate A number of Candidates for a Single Immediate/Question

Whereas discussing the error, now we have come throughout the phrase candidates. Candidates could be thought of as responses which might be generated by the Gemini LLM. Google claims that the Gemini can generate a number of candidates for a single Immediate/Question. Implying that for a similar Immediate, we get a number of completely different solutions from the Gemini LLM and we will select one of the best amongst them. We will do this within the beneath code:

response = mannequin.generate_content("Give me a one line joke on numbers")

Right here we offer the question to generate a one-liner joke and observe the output:

    text: "Why was six afraid of seven? Because seven ate nine!"
  role: "model"

finish_reason: STOP
index: 0
  probability: NEGLIGIBLE

  probability: NEGLIGIBLE

  probability: NEGLIGIBLE

  probability: NEGLIGIBLE


Beneath the elements part, we the textual content generated by the Gemini LLM. As there’s solely a single era, now we have a single candidate. Proper now, Google is offering the choice of solely a single candidate and can replace this within the upcoming future. Together with the generated response, we get different info like finish-reason and the immediate suggestions that now we have seen earlier.

Configuring Hyperparameters with GenerationConfig

Up to now now we have not observed the hyperparameters just like the temperature, top_k, and others. To specify these, we work with a particular class from the google-generativeai library known as GenerationConfig. This may be seen within the code instance beneath:

response = mannequin.generate_content("Clarify Quantum Mechanics to a 5 yr previous?",
                                  top_p = 0.7,
                                  top_k = 4,
Markdown(response.textual content)

Let’s undergo every of the parameters beneath:

  • candidate_count=1: Tells the Gemini to generate just one response per Immediate/Question. As mentioned earlier than, proper now Google limits the variety of candidates to 1
  • stop_sequences=[‘.’]: Tells Gemini to cease producing textual content when it encounters a interval (.)
  • max_output_tokens=20: Limits the generated textual content to a specified most quantity which right here is ready to twenty
  • top_p = 0.7: Influences how seemingly the subsequent phrase might be chosen based mostly on its likelihood. 0.7 favors extra possible phrases, whereas increased values favor much less seemingly however probably extra inventive decisions
  • top_k = 4: Considers solely the highest 4 most certainly phrases when choosing the subsequent phrase, selling variety within the output
  • temperature=0.7: Controls the randomness of the generated textual content. A better temperature (like 0.7) will increase randomness and creativity, whereas decrease values favor extra predictable and conservative outputs



Right here, the response generated has stopped within the center. That is because of the cease sequence. There’s a excessive probability of interval(.) occurring after the phrase toy, therefore the era has stopped. This manner, by the GenerationConfig, we will alter the habits of the response generated by the Gemini LLM.

Gemini Chat and MultiModality

Up to now, now we have examined the Gemini Mannequin with solely textual Prompts/Queries. Nonetheless Google has claimed that the Gemini Professional Mannequin is skilled to be a multi-modal from the beginning. Therefore Gemini comes with a mannequin known as gemini-pro-vision which is able to taking in photos and textual content and producing textual content. I’ve the beneath Picture:


We might be working with this picture and a few textual content and might be passing it to the Gemini Imaginative and prescient Mannequin. The code for this might be:

import PIL.Picture

picture ='random_image.jpg')
vision_model = genai.GenerativeModel('gemini-pro-vision')
response = vision_model.generate_content(["Write a 100 words story from the Picture",image])

Markdown(response.textual content)
  • Right here, we’re working with the PIL library to load the Picture current within the present listing. 
  • Then we create a brand new imaginative and prescient mannequin with the GenerativeModel class and the mannequin title “gemini-pro-vision”.
  • Now, we give an inventory of inputs, that’s the Picture and the textual content to the mannequin by the GenerativeModel.generative_content() operate. This operate takes on this listing after which the gemini-pro-vision will generate the response.

Asking Gemini LLM to Generate Story from an Picture

Right here, we’re asking the Gemini LLM to generate a 100-word story from the picture given. Then we print the response, which could be seen within the beneath pic:


The Gemini was certainly capable of interpret the picture accurately, that’s what is current within the Picture after which generate a narrative from it. Let’s take this one step additional by giving a extra advanced picture and activity. We might be working with the beneath picture:


This time the code might be:

picture ='objects.jpg')

response = vision_model.generate_content(["generate a json of ingredients 
with their count present on the table",image])

Markdown(response.textual content)

Gemini LLM to Generate a JSON Response

Right here we’re testing two issues. The power of the Gemini LLM to generate a JSON response. The power of the Gemini Imaginative and prescient to precisely calculate the rely of every ingredient current on the desk.

And right here is the response generated by the mannequin:

  "substances": [
      "name": "avocado",
      "count": 1
      "name": "tomato",
      "count": 9
      "name": "egg",
      "count": 2
      "name": "mushroom",
      "count": 3
      "name": "jalapeno",
      "count": 1
      "name": "spinach",
      "count": 1
      "name": "arugula",
      "count": 1
      "name": "green onion",
      "count": 1

Right here not solely the mannequin was capable of generate the correct JSON format on the spot, but additionally the Gemini was capable of precisely rely the substances current within the pic and make the JSON out of it. Other than the inexperienced onion, all of the ingredient counts generated match the image. This built-in imaginative and prescient and multimodality method brings in a plethora of purposes that may be attainable with the Gemini Massive Language Mannequin.

Chat Model of Gemini LLM

Like how the OpenAI has two separate textual content era fashions the traditional textual content era mannequin and the chat mannequin, equally Google’s Gemini LLM has each of them. Until now now we have seen the plain vanilla textual content era mannequin. Now we’ll look into the chat model of it. Step one can be to initialize the chat as proven within the code beneath:

chat_model = genai.GenerativeModel('gemini-pro')

chat = chat_model .start_chat(historical past=[])

The identical “gemini-pro” is labored with for the chat mannequin. Right here as a substitute of the GenerativeModel.generate_text(), we work with the GenerativeModel.start_chat(). As a result of that is the start of the chat, we give an empty listing to the historical past. Google will even give us an choice to create a chat with current historical past, which is nice. Now let’s begin with the primary dialog:

response = chat.send_message("Give me a finest one line quote with the particular person title")
Markdown(response.textual content)

We use the chat.send_message() to cross within the chat message and this can generate the chat response which might then be accessed by calling the response.textual content message. The message generated is:


The response is a quote by the particular person Theodore Roosevelt. Let’s ask the Gemini about this particular person within the subsequent message with out explicitly mentioning the particular person’s title. This can clarify if Gemini is taking within the chat historical past to generate future responses.

response = chat.send_message("Who is that this particular person? And the place was he/she born?
 Clarify in 2 sentences")

Markdown(response.textual content)

The response generated makes it apparent that the Gemini LLM can maintain monitor of chat conversations. These conversations could be simply accessed by calling historical past on the chat just like the beneath code:

chat.historical past

The response generated accommodates the monitor of all of the messages within the chat session. The messages given by the person are tagged with the position “person”, and the responses to the messages generated by the mannequin are tagged with the position “mannequin”. This manner Google’s Gemini Chat takes care of monitor of chat dialog messages thus decreasing the builders’ work for managing the chat dialog historical past.

Langchain and Gemini Integration

With the discharge of the Gemini API, langchain has made its approach into integrating the Gemini Mannequin inside its ecosystem. Let’s dive in to see how one can get began with Gemini in LangChain:

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(mannequin="gemini-pro")
response = llm.invoke("Write a 5 line poem on AI")

print(response.content material)
  • The ChatGoogleGenerativeAI is the category that’s labored with to get the Gemini LLM working
  • First, we create the llm class by passing the Gemini Mannequin that we need to work with to the ChatGoogleGeneraativeAI class.
  • Then we name the invoke operate on this class and cross the person Immediate/Question to this operate. Calling this operate will generate the response.
  • The response generated could be accessed by calling the response.content material.

Producing Poem utilizing Gemini LLM


Above is the poem generated on Synthetic Intelligence by the Gemini Massive Language Mannequin.

Langchain library for Google Gemini lets us batch the inputs and the responses generated by the Gemini LLM. That’s we will present a number of inputs to the Gemini and get responses generated to all of the questions requested directly. This may be executed by the next code:

batch_responses = llm.batch(
        "Who is the President of USA?",
        "What are the three capitals of South Africa?",
for response in batch_responses:
    print(response.content material)
  • Right here we’re calling the batch() methodology on the llm.
  • To this batch methodology, we’re passing an inventory of Queries/Prompts. These queries might be batched and the mixed responses to all of the queries are saved within the batch_responses variable.
  • Then we iterate by every response within the batch_response variable and print it.



We are able to see that the responses are proper to the purpose. With the langchain wrapper for Google’s Gemini LLM, we will additionally leverage multi-modality the place we will cross textual content together with photos as inputs and count on the mannequin to generate textual content from them. 

For this activity, we’ll give the beneath picture to the Gemini:


The code for this might be beneath:

from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(mannequin="gemini-pro-vision")

message = HumanMessage(
    content material=[
            "type": "text",
            "text": "Describe the image in a single sentence?",
            "type": "image_url",
            "image_url": ""


response = llm.invoke([message])
print(response.content material)
  • Right here we use the HumanMessage class from the langchain_core library.
  • To this, we cross the content material, which is an inventory of dictionaries. Every content material has two properties or keys, they’re “sort “and “textual content/image_url“.
  • If the sort is supplied with “textual content”, then we work with the “textual content” key to which we cross the textual content.
  • If the sort is “image_url”, then we work with the “image_url”, the place we cross the URL of the above picture. Right here we cross each the textual content and the picture, the place the textual content asks a query in regards to the picture.
  • Lastly, we cross this variable as an inventory to the llm.invoke() operate which then generates a response after which we entry the response by the response.content material.

The Gemini Professional Imaginative and prescient mannequin was profitable in deciphering the picture. Can the mannequin take a number of photos? Let’s do this. Together with the URL of the above picture, we’ll cross the URL of the beneath picture:


Now we’ll ask the Gemini Imaginative and prescient mannequin to generate the variations between the 2 photos:

from langchain_core.messages import HumanMessage

llm = ChatGoogleGenerativeAI(mannequin="gemini-pro-vision")

message = HumanMessage(
    content material=[
            "type": "text",
            "text": "What are the differences between the two images?",
            "type": "image_url",
            "image_url": ""
            "type": "image_url",
            "image_url": ""

response = llm.invoke([message])
print(response.content material)

Wow, simply have a look at these observational expertise.

The Gemini Professional Imaginative and prescient was capable of infer so much that we will consider. It was ready to determine the coloring and varied different variations which actually factors out the efforts went into coaching this multi-modal Gemini.

Making a ChatGPT Clone with Gemini and Streamlit

Lastly, after going by numerous Google’s Gemini API, it’s time to make use of this information to construct one thing. For this information, we might be constructing a easy ChatGPT-like utility with Streamlit and Gemini. The whole code appears just like the one beneath:

import streamlit as st
import os
import google.generativeai as genai

st.title("Chat - Gemini Bot")

# Set Google API key
os.environ['GOOGLE_API_KEY'] = "Your Google API Key"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])

# Create the Mannequin
mannequin = genai.GenerativeModel('gemini-pro')

# Initialize chat historical past
if "messages" not in st.session_state:
    st.session_state.messages = [
            "content":"Ask me Anything"

# Show chat messages from historical past on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):

# Course of and retailer Question and Response
def llm_function(question):
    response = mannequin.generate_content(question)

    # Displaying the Assistant Message
    with st.chat_message("assistant"):
        st.markdown(response.textual content)

    # Storing the Person Message
            "content material": question

    # Storing the Person Message
            "content material": response.textual content

# Settle for person enter
question = st.chat_input("What's up?")

# Calling the Perform when Enter is Offered
if question:
    # Displaying the Person Message
    with st.chat_message("person"):


The code is just about self-explanatory. For extra in-depth understanding you possibly can go right here. On a excessive degree

  • We import the next libraries: Streamlit, os, google.generativeai.
  • Then set the Google API key and configure it to work together with the mannequin.
  • Create a GenerativeModel object with the mannequin Gemini Professional.
  • Initialize session chat historical past for storing and loading chat conversations.
  • Then we create a chat_input, the place the person can sort in queries. These queries might be despatched to the llm and the response might be generated.
  • The generated response and the person question as saved within the session state and are even displayed on the UI.

After we run this mannequin, we will chat with it as a typical chatbot and the output will seem like the beneath:



On this information, now we have gone by the Gemini API intimately and have discovered how one can work together with the Gemini Massive Language Mannequin in Python. We had been capable of generate textual content, and even check the multi-modality of the Google Gemini Professional and Gemini Professional Imaginative and prescient Mannequin. We additionally discovered how one can create chat conversations with the Gemini Professional and even tried out the Langchain wrapper for the Gemini LLM.

Key Takeaways

  • Gemini is a collection of foundational fashions launched by Google, specializing in multimodality with assist for textual content, photos, audio, and movies. It consists of three fashions: Gemini Extremely, Gemini Professional, and Gemini Nano, every various in dimension and capabilities.
  • Gemini has demonstrated state-of-the-art efficiency in benchmarks, outperforming ChatGPT and GPT4-Imaginative and prescient fashions in varied checks.
  • Google emphasizes accountable AI utilization, and Gemini consists of security measures. It might probably deal with unsafe queries by not producing responses and supplies security scores for various classes. 
  • The mannequin can generate a number of candidates for a single immediate, providing numerous responses.
  • Gemini Professional features a chat mannequin, permitting builders to create conversational purposes. The mannequin can keep a chat historical past and generate responses based mostly on context. 
  • Gemini Professional Imaginative and prescient helps multimodality by dealing with each textual content and picture inputs, making it able to duties like picture interpretation and outline.

Ceaselessly Requested Questions

Q1. What’s Gemini, and the way does it differ from different Google fashions?  

A. Gemini is a collection of foundational fashions from Google, specializing in multimodality with assist for textual content and pictures. It consists of fashions of various sizes (Extremely, Professional, Nano). In contrast to earlier fashions like PaLM, Gemini can deal with numerous info sorts.

Q2. How does Gemini deal with unsafe queries, and what are security scores?

A. Gemini has security measures to deal with unsafe queries by not producing responses. Security scores are supplied for classes like harassment, hazard, hate speech, and sexuality, serving to customers perceive why sure queries might not obtain responses.

Q3. Can Gemini generate responses for a number of candidates in a single immediate?

A. Sure, Gemini has the aptitude to generate a number of candidates for a single immediate. Builders can select one of the best response among the many candidates, offering variety within the generated output.

This fall. What’s the distinction between Gemini Professional and Gemini Professional Imaginative and prescient?

A. Gemini Professional is a textual content era mannequin, whereas Gemini Professional Imaginative and prescient is a imaginative and prescient mannequin that helps each textual content and picture inputs. Gemini Professional Imaginative and prescient, much like GPT4-Imaginative and prescient from OpenAI, can generate textual content based mostly on mixed textual content and picture inputs, providing a multimodal method.

Q5. How can Langchain be used to combine with the Gemini API? 

A. Langchain supplies a wrapper for the Gemini API, simplifying interplay. Builders can use Langchain to batch inputs and responses, making it simpler to deal with a number of queries concurrently. The mixing permits for seamless communication with Gemini fashions.

The media proven on this article is just not owned by Analytics Vidhya and is used on the Writer’s discretion.

Related articles

You may also be interested in