15-494/694 Cognitive Robotics Lab 9:
Large Language Models

I. Software Update and Initial Setup

At the beginning of every lab you should update your copy of the cozmo-tools package. Do this:
$ cd ~/cozmo-tools
$ git pull

II. Play Semantris

Play the Semantris game in "Blocks" mode. Note: it's more fun with sound enabled.

How does Semantris know which words are related? It's using embeddings, and computing dot products to measure similarity.

Start a new game and take a screenshot of the initial state. You will use this screenshot in the next step.

III. Experiment With Word Embeddings

  1. Run the WordEmbeddingDemo.

  2. Try hovering over a word in the 3D plot to see the closest words.

  3. You can add new words to the 3D plot by typing them in the text box below. Try that.

  4. Press the "Clear all" button to erase all the words from the display.

  5. Examine these slides to see how we can use the demo to explore the kind of matching that Semantris does.

  6. Pick six words from your Semantris screenshot. Type in one word at a time to add it to the display. After adding the word, its dot is red. Click on one of the six slots on the right side of the screen to load the word into that slot. Continue until all six words have been loaded.

  7. Pick one of the six words as your target word. Think of a one word prompt you could use to reach that target. Add the prompt word to the display by typing it in the text box.

  8. Click on the newly added prompt word to turn it from red to black. Then click on it again to turn it back to red and display the similarity measures to the six words in the slots. Did you hit your target?

  9. Take a screenshot showing the similarity lines.

IV. Question Answering With BERT

The BERT model has been publicly released by Google, and is distributed in a convenient form by Hugging Face. In this part of the lab you will run BERT on your workstation (using the GPU) to perform extractive question answering.
  1. Make a lab9 directory.

  2. Download Lab9a.py.

  3. Read the source code.

  4. Run Lab9a.py and try the following queries. Which ones work, and which ones don't? Make a record of the responses (you can just paste them into a file).
    • Is cube1 visible?
    • Is cube2 visible?
    • What is cube1's orientation?
    • What is sideways?
    • What cube is sideways?
    • What cubes are visible?
    • What isn't visible?
    • Is cube1 delicious?
    • How many cubes are there?
    • What is cube1?
    • What is cube2?
    • What is the distance to cube3?

V. GPT-3

In this section you will experiment with GPT-3, which is much more powerful than BERT. Instead of downloading the model you will use the OpenAI GPT-3 API.

  1. Download Lab9b.py and Lab9c.py.

  2. Read the source code.

  3. Enter the following shell command:
    export OPENAI_API_KEY=key_you_received_in_email.

  4. Run Lab9b.py and examine the result.

  5. Run Lab9c.py and try the same queries you used with BERT. Make a record of the results.

  6. Compare the answers you got from BERT with the answers you got from GPT-3. What do you conclude?

  7. Are GPT-3's distance calculations accurate?

VI. Homework (Solo): GPT-3.5 and Cozmo

Do this part by yourself, not in a team.

OpenAI has released GPT-3.5, which is set up to do chat completion rather than simple completion. The main difference is that in chat completion the input is structured as a sequence of messages instead of one giant string. Read the documentation here for details. This doesn't make much difference for the simple question answering task we've been exploring, but the quality of the results may be better.

  1. Write a program TestChat.py as a modified version of Lab9c that uses GPT-3.5.

  2. Compare the quality of the answers from TestChat.py to what you got from Lab9a and Lab9c.

  3. Read this page from OpenAI on prompt engineering.

  4. Write a program CozmoChat.fsm that accepts queries using the "tm" command in simple_cli, and uses GPT-3.5 to answer the query. To form the prompt for the query your program should examine Cozmo's world map to determine what it knows about cubes, walls, doorways, and faces, and put these together into a string. Note that to get cube orientation you must use wcube1 instead of cube1, since this is a feature of cozmo-tools, not the SDK.

  5. Develop a set of questions you can ask Cozmo to demonstrate the strengths and weaknesses of GPT-3.5. For example, since it knows the cube locations, can you ask which cube is closest to cube1, or what is the distance between cube1 and cube2?

Hand In

Hand in the following:

  • Your Semantris and WordEmbeddingDemo screenshots.

  • Your observations comparing results from Lab9a, Lab9c, and TestChat.

  • Your source code for TestChat.py and CozmoChat.fsm.

  • Your own questions for and results from CozmoChat.fsm.


Dave Touretzky