15-494/694 Cognitive Robotics Lab 9:
Large Language Models

I. Software Update and Initial Setup

At the beginning of every lab you should update your copy of the cozmo-tools package. Do this:

$ cd ~/cozmo-tools
$ git pull

II. Play Semantris

Play the Semantris game in "Blocks" mode. Note: it's more fun with sound enabled.

How does Semantris know which words are related? It's using embeddings, and computing dot products to measure similarity.

Start a new game and take a screenshot of the initial state. You will use this screenshot in the next step.

III. Experiment With Word Embeddings

Run the WordEmbeddingDemo.
Try hovering over a word in the 3D plot to see the closest words.
You can add new words to the 3D plot by typing them in the text box below. Try that.
Press the "Clear all" button to erase all the words from the display.
Examine these slides to see how we can use the demo to explore the kind of matching that Semantris does.
Pick six words from your Semantris screenshot. Type in one word at a time to add it to the display. After adding the word, its dot is red. Click on one of the six slots on the right side of the screen to load the word into that slot. Continue until all six words have been loaded.
Pick one of the six words as your target word. Think of a one word prompt you could use to reach that target. Add the prompt word to the display by typing it in the text box.
Click on the newly added prompt word to turn it from red to black. Then click on it again to turn it back to red and display the similarity measures to the six words in the slots. Did you hit your target?
Take a screenshot showing the similarity lines.

IV. Question Answering With BERT

The BERT model has been publicly released by Google, and is distributed in a convenient form by Hugging Face. In this part of the lab you will run BERT on your workstation (using the GPU) to perform extractive question answering.

Make a lab9 directory.
Download Lab9a.py.
Read the source code.
Run Lab9a.py. The first time your run it, it will have to download a large weight file, so be patient. Try the following queries. Which ones work, and which ones don't? Make a record of the responses (you can just paste them into a file).
1. Is cube1 visible?
2. Is cube2 visible?
3. What is cube1's orientation?
4. What is sideways?
5. What cube is sideways?
6. What cubes are visible?
7. What isn't visible?
8. Is cube1 delicious?
9. How many cubes are there?
10. What is cube1?
11. What is cube2?
12. What is the distance to cube3?

V. GPT-4

In this section you will experiment with GPT-4, which is much more powerful than BERT. Instead of downloading the model you will use the OpenAI GPT-4 API.

Download Lab9b.py and Lab9c.py.
Read the source code.
Enter the following shell command:
export OPENAI_API_KEY=key_you_received_in_email.
Run Lab9b.py and examine the result.
Run Lab9c.py and try the same queries you used with BERT. Make a record of the results.
Compare the answers you got from BERT with the answers you got from GPT-4. What do you conclude?
Are GPT-4's distance calculations accurate?

VI. Homework (Solo): GPT-4 and Cozmo

Do this part by yourself, not in a team.

Read this page from OpenAI on prompt engineering.

Write a program CozmoChat.fsm that accepts queries using the "tm" command in simple_cli, and uses GPT-4 to answer the query. To form the context for the query your program should examine Cozmo's world map to determine what it knows about cubes, walls, doorways, and faces, and put these together into a string. Note that to get cube orientation you must use wcube1 instead of cube1, since this is a feature of cozmo-tools, not the SDK.

Develop a set of questions you can ask Cozmo to demonstrate the strengths and weaknesses of GPT-4. For example:

Is cube2 sideways?
How many cubes are there?
What is the distance between cube1 and cube2?
Which cube is closest to cube1?

The code in Lab9c.py treats each query as a new conversation. We can make CozmoChat behave more like ChatGPT by cumulatively growing the context, by appending each new query and response to the messages list passed in the API call. (The user's queries are marked as role "user", while GPT-4's responses should be marked as role "assistant".) This will allow you to have interactions like:

Please remember that all cubes are 45 mm on a side.
How big is cube1?
What is the volume of cube3?

Hand In

Hand in the following:

Your Semantris and WordEmbeddingDemo screenshots.
Your observations comparing results from Lab9a vs. Lab9c.
Your source code for CozmoChat.fsm.
Your own questions for and results from CozmoChat.fsm.

Dave Touretzky

15-494/694 Cognitive Robotics Lab 9:Large Language Models