How-To

You Can Explore the New Gemini Large Language Model Even if You're Not a Data Scientist

It's hot, it's huge and it's here. Here's how to try it out yourself.

Gemini is a new large language model (LLM) from Google that is designed to compete with the GPT-4 model from OpenAI. You can explore Gemini today, even if you're not a data scientist or programmer.

Gemini was released on Dec. 6, 2023. Gemini is the successor to Google's LaMDA and PaLM 2 models. There are three versions of Gemini: Gemini-Ultra (1.56 trillion parameters), Gemini-Pro (600 billion) and Gemini-Nano (1.8 billion). Compared to GPT-4 (1.76 trillion parameters), Gemini is a bit larger, is trained on a wider variety of data sources and is multi-modal, meaning that it can work with language, images, computer code, audio and video.

Within hours of the release of Gemini, hundreds of blog posts and news stories were published, all describing Gemini based on a few source blog posts. But according to the Pure AI technical experts, a good way to get a grasp of what Gemini is and how it works is to exercise it instead of reading about it. Gemini can be accessed using an online system called Google Colab and a web tool called AI Studio. AI Studio is a successor to the Maker Suite web tool.

A Simple Gemini Example
The only two prerequisites you need to try out Gemini are a machine with a web browser (presumably Google Chrome) and a Google account (typically a gmail account). Start by opening your browser and navigating to makersuite.google.com. If you are not logged in to your Google account, you will be immediately redirected to a Google login page. After logging in, you will be redirected to the main AI Studio page.

Click on the "Get API key" button in the upper left. If you have existing keys for Gemini, they will be listed in the center of the page. Click on the "Create API key in new project" button. A key will be created for you. It will look something like "AIzaPyBgAGPct00VVqvCewoK0LNLOow5hl6RTo." Click on the "Copy" button and then close the "API key generated" window. Open a text editor, for example Notepad if you're using a Windows machine, and do a CTRL-V or File | Paste to put the key in the editor. See Figure 1.

Figure 1: Getting a Gemini API Key From AI Studio
[Click on image for larger view.] Figure 1: Getting a Gemini API Key from AI Studio

At this point, you could use AI Studio to explore Gemini, but another approach is to write a small program. Even if you have zero coding experience, you can programmatically exercise Gemini. Open a second web browser and navigate to colab.research.google.com. Colab, short for Colaboratory, is a web tool that allows you to execute programs in the Google cloud.

You will see a "Open notebook" window that lists any colab projects you've already created. Click on the "New notebook" button in the lower left. You will be redirected to a new project. The project will be named something like Untitled1.ipynb ("untitled interactive notebook"). You can rename the project to something like SimpleExample.ipynb if you wish. See Figure 2.

Figure 2: Getting a Gemini API Key From AI Studio
[Click on image for larger view.] Figure 2: Creating a Colab Project

Copy the following Python language code into the cell labeled "Start coding" or something similar.

# simple_example.ipynb

print("\nBegin simple Gemini example ")
import google.generativeai as genai
genai.configure(api_key="your_api_key_here")
model = genai.GenerativeModel('gemini-pro')
question = "What is the capital of Washington?"
print("\n" + question)
response = model.generate_content(question)
print(response.text)
print("\nDone ")

In the code, replace the "your_api_key_here" with the key that you generated using the AI Studio tool. Click on the Run triangle button to the left of the code cell and Gemini will respond to the question, "What is the capital of Washington?" See Figure 3.

Figure 3: Gemini in Action
[Click on image for larger view.] Figure 3: Gemini in Action

You can experiment by replacing the text of the question variable and seeing how Gemini responds. This simple example uses all of the default values for how Gemini responds to a prompt.

A Second Example
Here's a second example that illustrates some features of interacting with Gemini. You can replace the code in the simple example, or you can click on File | New notebook to create a new project.

# second_example.ipynb
# more sophisticated than first example
# create at colab.research.google.com

print("\nBegin Gemini example ")

import google.generativeai as genai

# get key from makersuite.google.com
genai.configure(api_key="your_api_key_here")

print("\nAvailable Gemini models: ")
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)
print("")

model = genai.GenerativeModel('gemini-pro')
generation_config = genai.GenerationConfig(
  stop_sequences = None,
  temperature=0.9,
  top_p=1.0,
  top_k=32,
  candidate_count=1,
  max_output_tokens=32,
)

question = "Why is the sky red?"

responses = model.generate_content(
  contents = question,
  generation_config=generation_config,
  stream=False,
)

for response in responses:
  print(response.text)

print("\nDone ")

Copy this code into the notebook project cell and click on the Run triangle button. The GenerationConfig parameters control how Gemini responds. The two most interesting are the temperature and max_output_tokens. The temperature parameter controls how creative Gemini is. The temperature must be between 0.0 and 1.0. A value closer to 1.0 will generate responses that are more creative, and a value closer to 0.0 will usually result in more standard responses.

The max_output_tokens parameter controls how long the response can be. A token is roughly equal to a word, but some tokens are just partial words or even single letters. The maximum number of tokens varies depending on the specific Gemini model. For the Gemini-Pro model, currently the largest number of tokens allowed is 2,048.

In most scenarios, the stop_sequences, top_p, top_k and candidate_count parameters are a bit less important than the temperature and max_output_tokens parameters. You can find information about Gemini model configuration parameters at ai.google.dev/api/python/google/generativeai/GenerationConfig.

Wrapping Up
In addition to the gemini-pro model demonstrated in this article, there is a gemini-pro-vision model that can handle input images as well as text prompts. The Gemini documentation gives an example:

model = genai.GenerativeModel('gemini-pro-vision')
result = model.generate_content([
  "Give me a recipe for these:", 
  PIL.Image.open('scones.jpeg')])

The model will correctly identify the input image as blueberry scones and reply with a recipe.

The Pure AI editors spoke with Dr. James McCaffrey from Microsoft Research, who has significant experience with transformer architecture, the basis of Gemini. He commented: "There's no doubt that Gemini is an important technical achievement compared to the earlier PaLM 2 large language model. But looking at the Gemini results on benchmark tests, it's not entirely clear how Gemini compares with the GPT-4 model."

McCaffrey added, "For example, on the Big-Bench Hard problems that require multi-step reasoning, Gemini-Ultra scored 83.6 percent accuracy and GPT-4 scored 83.1 percent accuracy. In order to fully understand these scores it'd be necessary to examine exactly which problems were answered correctly and incorrectly by each model."

Featured

Upcoming Training Events