15-821/18-843: Mobile and Pervasive Computing (IoT)

Fall 2024

Project Descriptions  (Updated  2024-08-25-13:25)

Title

Mentor

Students

Description

1

Bringing Multi-Modal Models to Mobile Platforms

Mihir Bala

1 person

Multi-modal models are deep neural networks trained to cross between different modalities e.g. text to image, text to video, image to text. They can be useful in providing zero shot image captioning, metric depth sensing from monocular cameras, and more.   This disruptive technology has revolutionized applications in object recognition, image segmentation, and image captioning.  In this project, you will use edge computing to bring this technology to a mobile device and provide near real-time inference. You will have flexibility in terms of defining which model to use from a large selection of options.

Background needed:  Python programming

2

Exploring Motion-to-Photon-Latency of Edge-Native Applications

Jim Blakley

1 student

Motion-to-Photon-Latency (MTPL) is the time it takes for an action in the real world to result in a) a change in a visual display or b) actuation of an automated system. In interactive applications, this often means the time between a mouse click and an update of the user’s screen based on the click. In computer vision applications, it means the time between an event occurring and the time that the vision system takes action. This project will build on prior Living Edge Lab (LEL) work to understand the drivers of MTPL in edge-native applications over mobile wireless networks (LTE and 5G). Using LEL applications like OpenRTiST and SteelEagle, you will develop methods to measure application-specific MTPL, collect measurements from a variety of different environments (compute, network, software, etc.), understand the factors impacting MTPL, and investigate ways to reduce the application’s MTPL.

Students will learn how edge infrastructure impacts the performance of applications that depend on the edge – and what really matters to a given application. You will implement a data collection framework and perform analysis on your collected data. You will also gain experience in diagnosing performance issues in complex systems. Experience programming in Python and Java are prerequisites. Experience with docker and containerization is a benefit. Some knowledge of wireless networking and

cloud computing infrastructure is desirable.

3

Live Learning via Drone

Eric Sturzinger

1 student

This project leverages existing work that continuously improves an existing computer vision model trained to detect rare objects.  The overall objectives are to process a stream of images received from a drone (real or simulated) in real-time, prioritize them according to the likelihood of containing an instance of a given rare object class, and label such high priority samples through an interactive browser in order to generate enough true positives required for improving the existing model.

A successful final demo would consist of receiving a continuous stream of images of a simulated or recorded drone flight on a cloudlet(s).  Demonstrate that frames containing the given rare object are scored highly  and are labeled through the browser.  Over a short period of time, a couple of model training iterations should take place, demonstrating that new models have been installed on the cloudlet.  Demonstrate that this approach functions for both single drone and multi-drone missions.

The student will learn how to integrate new features into an existing codebase, model training and inference with PyTorch, and other basic computer vision concepts.  He/she will also learn how to properly manage distributed ML systems through cloud-edge communications, video streaming, and multi-threaded/multi-processing programming.  The student will gain familiarity with the Parrot Anafi drone, Olympe SDK, the Sphinx simulation environment, and the PyTorch ML framework.   Ideally, the student should know basic Python programming.

4

Edge-native App for Indoor Navigation

Qifei Dong

1 or 2 students

Indoor localization and navigation play a key role in many location-based services. In recent years, visual SLAM (Simultaneous Localization and Mapping) has been widely adopted in autonomous navigation for robots in GPS-denied areas. Due to its high computation demand, running visual SLAM in real time on mobile devices has been a non-trivial task. Research efforts have been made to achieve this by computation offloading with the aid of edge computing.

This project requires you to build an Android app for navigation inside the GHC building. The app streams images to a cloudlet running ORB-SLAM3. You can use our ORB-SLAM3 server out-of-the-box. You will need to correlate SLAM-constructed maps to real-world locations and implement algorithms for route planning. If good progress is made, you can  make one or

more of the following improvements: voice interactions powered by ChatGPT; obstacle avoidance using a deep learning model (e.g. Midas); adding simple AR (augmented reality) effects for better visualization. From this project, you will gain familiarity with ORB-SLAM3 and Gabriel, our offloading framework, and will gain experience in developing a full-stack edge-native application. You should be familiar with Docker, Python, Android, and have some basic knowledge of computer vision.

5

Multi-tenant Multi-object Tracking

Tom Eiszler

1 student

In the SteelEagle project, we use edge computing to infuse lightweight drones with the capabilities needed to perform various vision tasks autonomously. One such task is tracking, where we are able to run DNN-based object detection and utilize the results to actuate the drone towards a moving object. At the moment, detection across frames is wholly independent. There is no correlation between the current frame and previous ones. Multi-object tracking (MOT)  systems are able to create tracks out of each object that is detected and determine if the objects detected in subsequent frames are still part of an early track.  Such systems can fairly accurately assign and maintain identifiers for each object in a scene, however they only consider input from a single source/viewpoint.

In Steeleagle, we may have two or three drones looking at the same scene from different points of view. It would be useful to refer to objects found by one drone in the mission parameters of a second drone (e.g. go to the place where we last saw the red car and also track it).

The goal of this project is to implement a system that can track multiple objects from multiple tenants who are providing input in real-time.  The student will learn how our existing Steeleagle/Gabriel system works, how to integrate existing MOT technologies (e.g. DeepSORT) into that system, and how to correlate data from multiple distributed clients to present a unified view of the same scene.  Knowledge of Python will be valuable. Some familiarity with docker and tracking algorithms will be helpful.

6

Scaling out Gabriel

Tom Eiszler

1 or 2 students

We have built many applications on top of the Gabriel platform (WCA, OpenRTiST, OpenScout, Steeleagle). All of these applications send sensor data from a mobile client to an edge server, perform some computation on those sensor streams, and may then send results back to the client (e.g. for WCA this would include instructions for the next step in the process).  Gabriel was designed to fan out the input streams to multiple different cognitive engines that perform a particular type of computation/task (e.g. object detection, face recognition). When the number of engines and/or mobile clients is not large, having a single cognitive engine of one type is sufficient. But what if we have 5 clients that need a particular service? How can we scale out Gabriel to meet those demands?  This project would explore how to scale out a Gabriel application depending on the current demand. For instance, if we have multiple clients that require object detection, can we dynamically spin up multiple object detection engines? What schemes do we use for distributing work to them? Round-robin?  A 1-1 mapping between client and cognitive engine? Currently sensor streams from every client are simply interleaved to a single cognitive engine. If we are managing cognitive engines across multiple servers, how can we manage them?  

Students will become  familiar with Gabriel,  and with docker and a container management system such as Sinfonia.   The students will explore methods for work distribution and resource management within this framework. The project may include measurements and characterization of a particular application (e.g. how many YOLO engines can be run on a single GPU of some type).

7

Log-based Cache Validation in Coda

Jan Harkes

1 or 2 students

The Coda Distributed File System persistently caches file data on the client and uses callbacks for cache invalidation when files are modified on the server.  The persistent local cache allows a Coda client to work disconnected from the Coda file servers. During disconnected operation, the client is unable to receive callback notifications that indicate when cached data should be considered stale. As a result, the entire cache is considered suspect and has to be revalidated upon reconnection. In settings with low bandwidth or where connectivity is intermittent, this can be expensive.

This project is about keeping a log on the servers to tracking which file callbacks were broken over time. This way a client can send a ‘revalidate’ call which returns the list of missed file callbacks since a given point in time.  The client can then mark those specific files as stale and assume all remaining cached data is still valid. The overall goal is to greatly improve the speed of cache validation after a disconnection.

8

Zeroconf/mDNS discovery for Sinfonia cloudlets

Jan Harkes

1 or 2 students

Sinfonia is a project to discover nearby compute resources and deploy the appropriate infrastructure for edge native applications. It currently relies on an instance in the cloud for matchmaking between

clients requesting, and cloudlets (edge compute) providing the resources.   Available resources, or cloudlets, periodically inform a cloud instance of available and used resources such as CPU, Memory, Network, and GPUs.  But the set of resources can be (and has been) expanded to include additional factors such as estimated carbon emissions.  When a client wants to deploy an edge native application backend it  sends a request to the cloud which will then try to find a match based  on resource availability, estimated network proximity, and other factors.  The cloud then negotiates deploying the backend as well as a secured VPN tunnel endpoint which the client can use to connect to the cloudlet.

The task here is to extend Sinfonia to use the zeroconf/mDNS mechanism to discover resources on the local network. The advantages are that we don't have to rely on network proximity estimates and we can use private compute resources that are not exposed to or shared with other users.

Required skill: Python programming.

Useful but not required skills: Knowledge of Kubernetes, Docker, Wireguard VPN.

9

Casual Time-Lapse Photography

Babu Pillai

1 or 2 students

Time lapse photography is an invaluable aid in the study of relatively slow phenomena, such as movement of ice or growth of plants.  However, setting up the equipment to capture a time-lapse video can be quite challenging, as this involves long-term installation of weather-, temper-, and theft-resistant equipment.  In this project, we seek to build a system for capturing time-lapse in a more casual way, without installation of specialized equipment.  Instead, we wish to leverage images captured by mobile devices when they happen to view the scene of interest.  These images may be crowd sourced or captured by autonomous vehicles or drones that go by the scene frequently.  The key deliverable of this project is a cloud-based service to collect the images and merge them into a time-lapse video.  Two key challenges are that the images will not be from precisely the same camera position or with the same camera parameters (lighting, angle of view, etc).  The cloud service will need to register the images against each other and may need to transform or warp images to match as closely as possible using various image matching and computer vision techniques.  Secondly, the images will not be spaced evenly in time.  This may require discarding some images, while interpolating between others to produce a temporally accurate and smooth final video.  A potential final demo may be a time lapse of a sprouting plant, constructed from images captured by multiple devices / people in an informal crowd sourced way.

10

Fast and Low Footprint Heart Murmur Detection on a Raspberry Pi

Alex Gaudio, Asim Smailagic

2 students

Digital stethoscopes are non-invasive medical devices for listening to heart, lung and body sounds, and they are especially useful in screening settings, in low resource settings, and for automated clinical diagnosis.  The existing prediction models for the diagnosis of heart sounds typically utilize a cloud backend server that trains and evaluates machine learning models.  Data is passed from the stethoscope to a mobile device, to the cloud and then back to the mobile device.  However, in many settings, the internet access is unreliable or non-existent.  It would be preferable if the automated diagnosis was performed on the edge, in a mobile phone or perhaps directly on the stethoscope device.   This project will address the task of detecting heart murmurs from recorded heart sound signals.  The task is to create a multi-label predictive model that detects different kinds of heart murmurs using the Physionet CirCor dataset. The predictive model should have demonstrably small resource usage (in both RAM and CPU) and give results in an online (streaming) fashion in as few seconds as possible.  Possible model architectures may include local linear models, reinforcement learning, extreme learning machines, convolutional networks or any approach you can come up with.  The outcome of the project will be a predictive model implemented on a mobile device or raspberry pi device that is compared to the performance of other baseline predictive models.  Academic papers as a result of this work are strongly encouraged.

Prerequisites:  Interest in biomedical data analysis, computational efficiency, and machine learning.