Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

NeurIPS 2023 Oral

Carnegie Mellon University
*Co-corresponding Authors

BrainDiVE synthesizes images predicted to activate different brain areas.

Note: The images shown here use the DDIM prediction with residual noise in the image, which are used as input to the encoder.

Abstract

A long standing goal in neuroscience has been to elucidate the functional organization of the brain. Within higher visual cortex, functional accounts have remained relatively coarse, focusing on regions of interest (ROIs) and taking the form of selectivity for broad categories such as faces, places, bodies, food, or words. Because the identification of such ROIs has typically relied on manually assembled stimulus sets consisting of isolated objects in non-ecological contexts, exploring functional organization without robust a priori hypotheses has been challenging.

To overcome these limitations, we introduce a data-driven approach in which we synthesize images predicted to activate a given brain region using paired natural images and fMRI recordings, bypassing the need for category-specific stimuli. Our approach -- Brain Diffusion for Visual Exploration ("BrainDiVE") -- builds on recent generative methods by combining large-scale diffusion models with brain-guided image synthesis. Validating our method, we demonstrate the ability to synthesize preferred images with appropriate semantic specificity for well-characterized category-selective ROIs. We then show that BrainDiVE can characterize differences between ROIs selective for the same high-level category. Finally we identify novel functional subdivisions within these ROIs, validated with behavioral data.

These results advance our understanding of the fine-grained functional organization of human visual cortex, and provide well-specified constraints for further examination of cortical organization using hypothesis-driven methods.

Method

BrainDiVE only requires the training of a differentaible voxel-wise fMRI encoder, where the encoder maps from RGB images to predicted voxel-wise brain activations (fMRI betas). The encoder is combined with a latent diffusion model (LDM) to generate naturalistic outputs which are predicted to activate a set of voxels.

In our framework, we leverage OpenCLIP ViT-B/16 as the encoder backbone. The output of the last layer is scaled to unit-norm, then passed through a linear probe with voxel-wise bias to predict the fMRI activations. The diffusion model used is Stable Diffusion v2-1-base, which outputs 512×512 images using ε-prediction (noise prediction). We use the approach proposed by crowsonkb of using first-order DDIM (euler) predicted output with residual noise at each time step, followed by resizing the image to 224×224 as input to the encoder. The desired voxel activations are averaged, then backproped into the diffusion output.

Similar to other image synthesis works like DALL-E, reranking can be done using the encoder itself. We choose to preserve the top-20% of images as done in NeuroGen.

Results

In the paper we apply the method at three hierarchical levels. Evaluation is done using CLIP n-way classification for the first experiment, and done via human study on prolific for the second and third experiments.



First, we apply it to broad category selective regions in the brain, where each region is identified via functional localizer to be selective to a semantic category (faces, places, bodies, words, food). In this experiment, we find that BrainDiVE generates images with high semantic specificity that matches the ground truth selectivity.



Second, we apply it to OFA/FFA, which are two individual ROIs believed to code for face features at lower (face parts) and higher levels (whole faces) in the visual hierarchy. We find that BrainDiVE generates images for OFA that are more abstractly face-like (animal, non-human face), while BrainDiVE generates images for FFA with realistic human faces.



Third, we apply it to sub-clusters of OPA, and sub-clusters of the food ROI. For the former, we identify a cluster selective for outdoor scenes, and a cluster for indoor scenes. For the latter, we identify a cluster selective for colorful foods, and another cluster selective for non-colorful foods. For clusters in both ROIs, we perform human studies to verify the trends in the original top images and the BrainDiVE generated images, and find that our method can highlight pre-existing trends in selectivity.

Please check out the paper for additional visualizations.

Related Works

There is a large amount of literature on performing similar tasks in mice, macaque monkeys, and humans.

Inception loops discover what excites neurons most using deep predictive models (2019) and Neural population control via deep image synthesis (2019) completed similar tasks in mice and macaque monkeys respectively using electrophysiology. These two approaches both use gradients of a learned image-computable encoder, but do not constrain the image using generative models.

Evolving Images for Visual Neurons Using a Deep Generative Network Reveals Coding Principles and Neuronal Preferences (2019) uses genetic algorithms in a gradient free setting to perform synthesis of naturalistic images with electrophysiology in macaques.

Recent work has moved on to fMRI recordings in humans. Computational models of category-selective brain regions enable high-throughput tests of selectivity (2021) and NeuroGen: Activation optimized image synthesis for discovery neuroscience (2022) both leverage BigGAN, and NeuroGen in particular uses the NSD dataset like us. These two works differ in their image synthesis strategy. The 2021 paper uses gradients alone, and treat the class-conditioning vector in BigGAN as a convex combination of the 1000 original ImageNet class vectors. While NeuroGen first searches for the most activating classes, then performs gradient optimization with fixed class vectors.

Concurrent work in Energy Guided Diffusion for Generating Neurally Exciting Images (2023) also uses a diffusion model, but focuses on modeling macaque V4 in early visual, which tends to be activated by textures rather than complex scenes as in our work.

BibTeX

@article{luo2023brain,
  title={Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models},
  author={Luo, Andrew F and Henderson, Margaret M and Wehbe, Leila and Tarr, Michael J},
  journal={arXiv preprint arXiv:2306.03089},
  year={2023}
}
}