A Full Dive
The complete but unedited article on brain-computer interfaces sprouted from the teaser I sent out an embarrassingly long time ago. The polished version to be published in Knowing Neurons soon!
On an overcast afternoon in November, 2022, a teenaged boy pages through a glossy video game manual in his bedroom in the Saitama prefecture of Japan. As the clock blinks towards 1:00 PM, he pulls on a dark blue helmet flashing with green lights, climbs into bed, and closes his eyes.
“Link, start!”
A rush of color and sound lifts him to his feet. His virtual feet, that is. The dull hum of a wall-mounted air conditioning unit crescendos into the roar of a large crowd. The swirling darkness inside his eyelids brightens into a bustling cityscape, illuminated by bright sunlight gleaming off of impeccably rendered marble tiles stretching into the distance.
A reality away, in his cramped, dark room, he grins.
The helmet, called a Nervegear, promises to immerse the user in a completely virtual sensory experience delivered directly to their brain. Built-in transceivers placed at strategic locations around the device send artificial sensory signals to the brain that are indistinguishable from the real thing. At the same time, the transceivers read out the user’s intended movements and faithfully reproduce them in a virtual body, allowing them to interact with a simulated environment. This fictional technology, dwarfing the cardboard headsets and comically large goggles at our current disposal, forms the basis for Sword Art Online, or SAO, a Japanese light novel series and subsequent anime television series which explores some part of the fantastic possibilities unleashed by removing the constraints of reality. The debatable merits of an escapist “metaverse” notwithstanding, one can’t help but wonder: what would it take, to take a full dive into another reality?
The key lies in the aforementioned transceivers responsible for sending messages to specific sensory regions within the brain and receiving messages from regions associated with the planning and execution of movements. However, building up this transcranial postal service is no easy task. The main obstacle is quite simple: living brains are usually situated inside skulls, and the owners of these brains generally prefer to keep them that way. Even putting this minor issue aside, there are plenty of additional problems we need to address. One of them being that neurons are really, really small. Another being that there are lots of them. Current estimates tell us that the two fistfuls’ worth of grey matter generating all the press contain approximately 86 billion neurons, and retain a near equally-sized supporting cast of other cells. Not only do we need to dig through this dense mass of tissue to find and listen in on the signals that matter to us, we also want to be able to talk to specific neurons without accidentally lighting up the entire surrounding neighborhood.
While interfacing with the brain with this level of precision is a formidable engineering challenge, figuring out what to say isn’t easy either. Instead of scribbled crayon or printed type, our brains communicate with the world through a complex and redundant language of neuronal activity, or “spikes.” As you read this paragraph, arrays of photons that fall on your retina are encoded into a pattern of spikes distributed across millions of neurons, and interpreted by your brain as a nearly 180-degree visual field filled with a screen containing lines of text. As you scroll down the page, your intended motions are encoded into more spiking patterns that drive muscle contraction and relaxation in your hand.
These two problems, building the hardware to send and receive neural messages, and the software to translate messages into and out of spiking patterns, go hand in hand. The more neurons we can record from and stimulate, the better our decoding and encoding algorithms will get. While we have made significant progress in technologies for recording from and stimulating the brain, we are still far from the level of sophistication necessary to reproduce something like the Nervegear.
Over the past few decades, however, the fields of neuroscience, electrical engineering, and computer science have advanced enough to allow us to catch a glimpse of the ebb and flow of electrochemical signals within the brain on the scale of milliseconds per millimeter. In sterile rooms and white coats, neuroscientists fix microscopic microscopes onto circular holes drilled into mice’s skulls. As the animals stir and wake, powerful lenses watch intently for transient flashes of neural activity within the gray matter, streaming the information up through a ribbon-like array of wires to recording devices above. Then, high-powered computational analyses reduce the terabytes of measurements into hypotheses of How the Brain Works.
Suppose that we were able to record every single neuron in the human brain, and capture every action potential and subthreshold twitch through a live feed. With every millimeter of the brain laid bare before us, we would surely be able to discover and interpret patterns of neural activity associated with the movement of an arm, the colors of a painting, the sound of a violin. But that’s easy to say when we’re not the ones doing the heavy lifting. Even the most powerful computing clusters in the world would sweat at the idea of analyzing the massive amount of data coming from 86 billion neurons, every millisecond, in every subject. Luckily, we might not need all that information for our purposes. Because each neuron sends and receives signals from many other neurons, the activity of one neuron can tell us a lot about what’s going on around it.
Let’s reframe brain activity as a movie, and the decoding problem as us attempting to understand the plot line of that movie. When you sit down on a couch with a mixing bowl full of microwaved popcorn, you don’t need an 8K Ultra HD resolution to understand what’s going on. For most casual viewers on a Saturday night, 1080p would be just fine. You could probably even get all the main plot points from laggy 240p stream, albeit at the risk of high blood pressure and insanity. Why this is the case? When we take a frame from a movie, we know that many pixels in the image are related to each other. If the main character has predilection for dark suits, a large number of pixels on the screen will often be black and move as a single group through consecutive frames. In other words, the motion of many pixels in the screen is highly correlated and we can infer information about all of them, like position and movement, even if we can only see a few. Similarly, many neurons are locally interconnected by synaptic junctions, wherein the activation of one neuron can quickly spread to other, nearby neurons. Even if we can only get a low-quality stream with a few neurons here and there, we can use this small sample to infer what’s going on with the rest of the neurons and construct a good summary of what the brain is doing at a given moment.
Now, there are a few important caveats to this idea. While some movies (looking at you, Nomadland) could be boiled down to low resolution blobs stuttering across a screen without losing too much content, there are many movies which rely on rich attention to detail and subtle plot cues to present a coherent, nuanced work. If we only sample the activity of a few neurons and call it a day, we could very well be tossing out the few pixels containing vital plot devices—Cinderella’s glass slipper, Snow White’s poisoned apple, the One Ring to rule them all. Then we might think that Prince Charming has a foot fetish, Snow White severe narcolepsy, and the Fellowship of the Ring a weird propensity for dangerous road trips. Ideally, we would make sure to sample our neural activity at a high enough resolution so that we can avoid embarrassing misinterpretations like these. Somewhat less ideally, we have little idea of what that resolution might be.
In 2016, a team of theoretical neuroscientists at Carnegie Mellon University and Columbia University led by Byron Yu and his PhD student, Ryan Williamson, took a stab at this question. They simulated the activity of thousands of neurons, far more than is currently possible to record at once, and compared the data to real neural recordings from macaque monkeys, containing the firing of just tens of neurons. After choosing a model that closely matched the real neural activity, they sampled small groups of neurons and found that just tens of neurons and hundreds of trials of an experiment were enough to capture the majority of the storyline. But this particular study was conducted in just one brain area of macaque monkeys doing the simplest of tasks—staring at a blank gray screen—and the study’s authors dutifully caution us that these results cannot be verified without using real recordings from many thousands of neurons instead of synthetic models. Later that same year, the band got back together, this time led by PhD student Benjamin Cowley. They trained their macaques to perform tasks with stimuli that were a tad more complex than a blank gray screen, finding out that neuronal mileage may vary with the complexity of the task. In essence, while 80-odd neurons might be enough to capture an episode of Caillou in all its glory, they will likely miss a great deal when attempting to play back the multitude of plot twists smeared across a movie like Inception.
Although we can extrapolate the activity of a few neurons to many more, when it’s time to start translating this neural activity movie into something we can understand, we often need to move in the other direction. This idea of boiling down information from many sources into a summary of the relevant main points is known as dimensionality reduction. Dimensionality reduction techniques span what seems like the entire space of two to five-letter acronyms, from FA to PCA, VAE to SLDS, GPFA to LFADS, and to describe their various assumptions and modeling focuses would leave me with with a dwindling supply of metaphors and the ramping fury of my editor. The unifying objective of these methods, however is to find a way to summarize our data while preserving some important features that we care about.
While it seems counterintuitive to throw away the precious brain data that was so difficult to gather, too much information can often be a bad thing. If you’ve ever bought a used car from a suspicious lot, you might have encountered a stereotypical sleazy salesman, who will talk your ear off about the heated leather seats, the flawless new paint job, the shiny new rims, while under the hood the engine is tottering along on its last legs, a trip to the grocery store away from finally croaking. While a glossy finish is certainly nice to have, we are in the market for a specific reason: a well-maintained, dependable car. All the other information, while potentially useful to other customers with different goals, is just noise to us. Not only does the noise take time to sift through, but it can also be distracting and misleading when the time comes to make a decision on what that neural activity actually means.
And that brings us, finally, to the main goal of a neural decoder, which is to connect what’s going on inside our heads to what’s going on in the world outside. An intuitive approach to build one might be to google (scholar) “brain part responsible for X”, stick a probe in there, have stuck person/animal perform an experimental task, claim that certain aspects of said task can be decoded from the probe readout, and call it a day. And that intuition would be almost exactly on the nose, with a little bit of extra work baked in. If someone had stuck a neural probe in my brain during middle school soccer practice in order to learn what neural activity is associated with kicking a ball, they would have picked up on signals related to pass force and direction, the lyrics to Candy Shop, and a persistent butt itch. On another trial, they might find one or two really crushing comebacks for a playground argument four days ago mixed in with the carefully calibrated sequence of motor movement. Thus, it is important to conduct many trials of an experiment with rigorously defined controls. In combination with the other steps, many trials allow us to use statistical power to tease out these random bits of noise from the activity that consistently shows up and provides a reliable indicator of some aspect of the outside world.
So far, we’ve depended on the highly correlated nature of neural activity to shore up our puny recording abilities in the decoding case. However, this quality can be a double-edged sword when it comes to the encoding problem, plaguing neuroscientists trying (and often failing) to separate causation from correlation. An example to illustrate the difference: while a rooster’s crow is highly correlated with the rising of the sun, we certainly know that the rooster’s crow does not cause the sun to rise. Judea Pearl, a pioneer in the study of causality and recipient of the Turing Award, has complained that “statistics always tells us that correlation is not causation, but never tells us what causation actually is !” When we send messages to the brain, we need to make sure we’re targeting the neurons that actually cause a particular behavior or percept, not the neurons who are just along for the ride.
Efforts to find the latter employ the discipline of causal inference, which provides us with the tools to answer questions on different levels of the “causal hierarchy,” such as “What would happen to Y if I did X?” or “Was it X that caused Y to happen?” To make this more concrete, suppose that you are on a summer work exchange on a farm in Tuscany during a 100 degree summer with high pollen allergy indications and higher population of sadistic mosquitoes, and suppose you were to also ignore the oddly specific nature of this example. Instead of finding the spiking pattern responsible for a particular sensory experience, let our goal be the equally important ordeal of getting a full night’s sleep ahead of a 6 AM wake-up call. We know that our waking time is correlated with the crowing of the rooster, but to confirm a causal relationship between the two variables, we can conduct what’s known as an interventional test. By performing an intervention, e.g. strangling that son-of-a-bitch, we can answer the counterfactual question, “Would I have woken up at that ungodly hour if the rooster hadn’t crowed?” With some subsequent statistical tests to quantify the effect of this intervention on our time of waking, we can establish a causal relationship telling us that the rooster is a sufficient cause of our sleep deprivation. Note that a sufficient condition is only one that guarantees a particular outcome if it happens, but does not imply that the rooster uniquely needs its comeuppance in order for us to sleep well at night. After all, wearing earplugs could also solve our problem. It’s just one way to achieve the desired effect, and that is really all we need when we consider the particulars of stimulating sets of neurons to emulate sensory experiences.
When it comes to the brain, there are markedly more complex logistics behind conducting types of experiments. The high density of recurrent and feedback connections between neurons, a common feature of neural networks, makes it profoundly difficult to draw simple causal arrows from one cell to another. Carrying out interventions within one part of the brain may be thwarted by compensatory mechanisms from another region. Like any other modeling question, it will take careful development of new experiments, theories, and techniques to help us tease out the patterns of activity that are sufficient to project an artificial experience into someone’s mind. One day, when we have fully described the neural combinations encoding different mental experiences, we might have “CD” players that direct excitatory probes to precisely stimulate sequences of neurons according to a preprogrammed musical arrangement, playing a sensory sonata on the keys of our brains.
The past few years have seen a complementary explosion driving our understanding of the brain: as better neural recording technologies pile up huge stores of experimental data, powerful computing resources and sophisticated models allow us to develop better theories of the mind, which in turn inform future data-generating experiments. The experiment-analysis-theory cycle, as my former professor and neural data scientist Liam Paninski calls it, is only accelerating as the constant improvements are made in every aspect of this feedback loop.
Biking along Riverside Park back to my apartment, I am struck by the sight of the setting sun glimmering across the gentle swells of the Hudson River, lighting the mirrored windows of the Manhattan skyline with a fiery red glow. It seems impossible to replicate the beauty of the natural world with some artificial “metaverse,” no matter how advanced the technology behind it. Then, a seagull drops a judicious shit right in the middle of my reality-constrained head.
