Lab 2: Private information retrieval

1. Introduction

The goal of this lab is to design a privacy-preserving key-value store using a technique that we learned in class: private information retrieval (PIR). PIR is a cryptographic technique for a client to retrieve an item from a public database hosted on a server, without revealing the location of the item to the server. In this lab, we will use another technique that we learned in class -- BFV homomorphic encryption -- to construct our protocol.

2. Background

In this lab, you will build a small distributed system that consists of a client and a server. The code will make use of two libraries:

2.1 BFV

BFV is a recent state-of-the-art FHE scheme that leverages the ring learning with errors (RLWE) assumption. The main advantage of a scheme like BFV is that a single ciphertext can encrypt plaintext vectors where the vector values are small integers. Because of this, BFV also supports single instruction multiple data (SIMD) operations, such as SIMD additions and SIMD multiplications. Therefore, a single ciphertext addition (multiplication) actually produces the element-wise sum (product) of the underlying plaintext vectors.

BFV also supports limited data movement operations. Specifically, a homomorphic rotation operation can rotate the underlying plaintext vector. One important thing to note is that BFV actually supports 2D rotations instead of 1D rotations. A ciphertext that encrypts a vector of N = 1024 plaintext values [1, 2, ..., 1024] can be rotated along two dimensions of size N/2x2. This means that rotating a ciphertext by 1 along dimension 0 gives [512, 1, 2, ..., 511, 1024, 513, 514, ..., 1023]. You can also rotate along dimension 1, which will give you [1024, 513, 514, ..., 1023, 512, 1, 2, ..., 511]. See this thread for some more information.

2.2 Private information retrieval

PIR is a technique for privately reading information from a publicly hosted database. There are two parties in the protocol: the client and the server. The client is trusted, while the server is a semihonest adversary who attempts to learn what data the client wants to retrieve from the server. In our setting, we assume that the database has M rows, and the API we want to implement is ReadRow(i). Client can call ReadRow(i) to retrieve the content of row i. Our goal is to implement a cryptographic protocol to hide the value of i.

In an insecure design, the client gives i to the server in plaintext. The server immediately learns the value of i.

In PIR, the client instead gives an encrypted query to the server. Intuitively, i is encrypted using homomorphic encryption so that the server does not learn what the value of i is, but the server can still operate on its database using this encrypted query thanks to the properties of homomorphic encryption.

2.3 Using Docker

As part of the lab, we are providing you with a Dockerfile so that you can easily do your development in this environment. You can also choose to test your code outside of this environment.

First, download the latest version of Docker.

3. Overview

3.1 System setup

The server hosts a database of size 1,048,576 rows. Each row is initialized with a small integer value such that row #i contains value i. The client wants to retrieve a row i without revealing which row was retrieved from the server.

We have provided you some starter client and server code, as well as the appropriate CMake file and proto buffer definitions. The respository is located here. The code is organized as follows:

To build the initial template code, create a build directory in the root directory by calling mkdir build Then call cd build; cmake ../; make to build the binaries, which will be located under build/src.

You can restructure the initial code, as long as you have implementations for a PIR client and a PIR server. The client should also have some kind of user-facing API to run a query, such as uint32_t ReadRow(uint32_t i). You are responsible for writing your own tests, and these tests should be turned in along with your PIR implementation.

3.2 Deliverables

1) Code implementation, as well as a comprehensive test suite. 2) PDF writeup with answers to the questions.

4. In-class exercise

Before class: Download the repository and set up your Docker environment. The Dockerfile might take a while to build!

The OpenFHE documentation is located here. This example code might be useful in getting you started.

5. Lab

Part 1: Naive design for PIR with linear-sized queries (50%)

In this initial (warm-up) design, you will implement a naive PIR design that uses linear-sized queries on a database of size 1,048,576. The idea is very simple. First, the server should arrange database into a table with of 1,048,576 rows. Then, the client should generate a linear-sized query that is a one-hot encoding of the desired row i. The query vector q should be 0 everywhere except at location i: q[j] = 0 if j != i, otherwise q[j] = 1. Therefore, the server side encrypted computation is a dot product between the database and the encrypted query.

You can simplify key management in your implementation by assuming that the client and the server are able to access the appropriate keys by loading them from keys/. The PIR server should not see the private key!!

Part 2: Communication-efficient PIR with sublinear-sized queries (50%)

Linear-sized queries are extremely inefficient. It is possible to significantly improve upon the communication overhead by restructuring the database. For example, a large database in the previous part can be re-structured as a 2D matrix that is 1024x1024. Instead of sending a linear-sized query, one can send an encrypted one-hot encoding query vector that is of length 1024, and retrieve an entire column of the database via matrix-vector multiplication. The client can then select the correct value out of the 1024 values.

For example, assume that the database is organized row-wise in the matrix: [[0, 1, 2, ..., 1023], [1024, 1025, 1026, ...], ... ]. A query for index 1026 should be mapped to a query (1, 2). Therefore, the query should retrieve column 2 via a matrix-vector multiplication. The client receives [2, 1026, 2050, ...], and returns the second value which is 1026.

Note that the 2D matrix does not necessarily have to be square. In your implementation, think about what matrix sizes make sense given your FHE key parameters.