In Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI Models

Large Language Models (LLMs) are enhancing our daily lives but also pose risks like spreading misinformation and violating privacy, highlighting the importance of understanding how they process and store information. This blogpost offers a fresh look into a neuroscience-inspired perspective of LLM's memory functions, based on the concept of engrams-the physical substrate of memory in living organism. We discuss a synergy between AI research and neuroscience, as both fields cover complexities of intelligent systems.

Introduction

Large Language Models (LLMs) are increasingly prevalent, assisting with information retrieval, answering questions, and generating content. However, they’re a “double-edged sword”: while capable of remarkable feats, they also risk spreading misinformation, revealing sensitive data, or generating harmful content. Unlike traditional computer systems, where data storage is transparent and traceable, LLMs function more like “black boxes,” making it nearly impossible to pinpoint where specific information is stored and therefore challenging to edit. To effectively and efficiently edit models in this context, we need a clearer understanding of how memory is stored within these models and develop methodologies that allow for targeted editing.

In neuroscience, a field that studies highly complex systems like the biological brain, researchers have made significant progress in understanding memory formation and storage. Memory storage in neuroscience involves physical traces in the brain known as engrams, networks of neurons and synapses that “record” information after learning or an event. This process strengthens the connections between neurons that fire together based on synaptic plasticity, creating a lasting representation of that memory or skill. Here, we pose an intriguing question: Could something akin to engrams exist within Large Language Models (LLMs)?

In this blog post, we will explore three key points to bridge the gap between neuroscience’s experimental findings and established theories about memory and the field of AI. In particular we will look into:

How engrams are formulated in neuroscience,
Two interesting works that seemingly track the engrams(memory locations) in LLMs
The potential for LLM research and neuroscience to inspire one another
What is “Engram”?

Figure 1. A schematic diagram illustrating the four stages of the memory system and the concept of engrams in neuroscience. During memory formation, external stimuli (such as the "SUN") activate specific neuronal circuits, resulting in the formation of memory engrams (red connections) via strengthened synaptic connections. During the retrieval stage, these engrams are reactivated to recall previously stored information (for example, "MOON") via corresponding neuronal circuits (yellow connections). Memory modification occurs when previously encoded engrams are updated or altered to include new synaptic changes (blue dashed connections). Finally, in memory forgetting, synaptic weakening or suppression causes engram degradation, resulting in an inability to recall specific information (for example, "?"). The central inset depicts the engram as a physical substrate of memory, with patterns of neuronal and synaptic ensembles.

Over a century ago, Richard Semon introduced the concept of the engram, suggesting that each memory leaves a lasting imprint on our neural networks, much like a footprint. In his early 20th-century writings, Semon described the engram as the physical substrate of memory, proposing that experiences create enduring changes in the brain’s structure, allowing them to be recalled later. This revolutionary idea laid the groundwork for modern memory research. Today, neuroscientists define an engram as the changes within neurons, synapses, and neural circuits that store a specific memory, enabling its reactivation and recall.

To understand engram, we must first define memory. Memories follow a dynamic life cycle with four key stages: formation, retrieval, modification, and forgetting (Fig. 1). During formation, external stimuli activate neural circuits, strengthening synaptic connections through mechanisms like long-term potentiation (LTP) to create an engram. Retrieval involves reactivating these neural circuits to recall stored information. Next, modification allows previously encoded memories to be updated or altered through processes such as reconsolidation, making memory both stable and adaptable. Finally, forgetting prunes unnecessary or outdated memories, often through synaptic weakening or neural activity suppression. These stages represent the intricate, dynamic processes underpinning memory, with engrams serving as the physical foundation of memories.

The search for engrams has long captivated neuroscience, dating back to Karl Lashley’s groundbreaking work. Lashley’s 1950 lecture, In Search of the Engram, investigated the physical basis of memory by training animals to navigate mazes. He attempted to identify specific memory storage regions by systematically removing or damaging parts of their brains. Surprisingly, his findings led to the principles of mass action and equipotentiality, which revealed that memory is distributed throughout the brain rather than in a single region. While Lashley was ultimately unable to locate the elusive engram, his pioneering research laid the groundwork for modern neuroscience, shaping our understanding of memory as a dynamic and distributed process.

Recent advances in neuroscience have made it possible to identify and manipulate engrams. Using optogenetics and advanced imaging techniques, researchers have successfully mapped the neural circuits associated with specific memories in animal models. They discovered that certain groups of neurons are activated and modified during learning, forming a network that represents the memory. This network can be reactivated to “recall” the memory, and interestingly, artificially activating these neurons can induce a conscious recollection of the memory in the brain.

The concept of engrams has shed light on how memories are formed, retrieved, modified, and forgotten (Fig. 1) and emphasizes the physical reality of memory as a process deeply embedded in the brain’s structure. However, it is important to distinguish the physical substrate from the broader notion of memory, which refers to the actual information or experience we recall—not necessarily the specific neurons that store information.

This engram-based understanding could also serve as a model for artificial intelligence systems. If we could identify specific “engrams” within a neural network or large language model (LLM), we might be able to selectively edit or delete certain memories in AI models, similar to how neuroscientists can activate or deactivate specific memories in the brain experimentally. This cross-disciplinary exploration could open up new possibilities for making AI models more transparent, safe, and adaptable.

Explaining AI Research Through the Concept of Engrams

Having introduced the concept of engrams, we now examine LLM research through this lens. We have identified two research efforts that explore engram-like concepts in LLMs. The first is Grafting, a method that identifies and integrates specific parameter regions crucial for newly learned tasks. The second study is ROME (Rank-One Model Editing), which introduces targeted updates in LLMs without retraining the entire model. Although the original studies do not explicitly reference engrams, we are the first to bridge this concept to reinterpret and analyze their findings.

1. Grafting: Task-specific skill localization in LLMs

Figure 2. A conceptual figure of model grafting on a neuroscience perspective

Panigrahi et al.(2023) explored how weights in a pretrained language model are adjusted after fine-tuning for a specific task. Surprisingly, they discovered that the performance improvement in the fine-tuned model can be achieved by grafting only a small fraction of its parameters to the pretrained model. This implies that this small subset of parameters encodes the essence of the newly acquired skill, capturing the critical information required to complete the specific task.

Connecting this work to neuroscience, we can consider this subset of weight parameters to be a synaptic engram for finetuning skills.(Fig. 2), An intriguing parallel can be drawn to the work of Hayashi-Takagi et al. (2015), which examined how synaptic clusters in the motor cortex encode learning-specific changes. The researchers used an optogenetic tool to label synaptic spines that had been recently potentiated during the acquisition of a motor skill. When they selectively reduced the size of these tagged spines, the motor skill was disrupted, whereas manipulating untagged spines had no effect. This demonstrated that the learned task’s memory was stored in a dense, task-specific cluster of modified synapses, known as a synaptic ensemble or engram. This discovery is similar to the concept of grafting in AI, where a small, localized region of modified parameters encodes a new, task-specific ability.

However, we also recognize differences in these two domains in terms of the methods used to identify and label the “synaptic engram.” In the model grafting process within the AI domain, the researchers directly computed the engram through optimization.

Finding synaptic engrams in LLMs

To localize the weight parameters or ‘engram’ that have an important contribution to the performance of the fine-tuned task (e.g., sentiment analysis), Panigrahi et al.(2023) grafted the small region of the fine-tuned model $\theta_{\textrm{ft}}$ to a pretrained model $\theta_{\textrm{pre}}$ using a binary mask $\gamma \in \{0,1\}^{\lvert\theta_{\textrm{ft}}\rvert}$, which identifies a subset of parameters. They then evaluated the performance of the resulting grafted model.

Figure 3. An illustration of model grafting process. Source: A. Panigrahi et al., Task-specific skill localization in fine-tuned language models, 2023.

The grafted model is defined as:

\[\overline{\theta_{\textrm{ft}}}(\gamma) = \gamma \odot \theta_{\textrm{ft}} + (1-\gamma) \odot \theta_{\textrm{pre}}\]

where $\odot$ denotes element-wise multiplication. Figure 3 illustrates this equation.

So if the grafted model $\overline{\theta_{\textrm{ft}}}(\gamma)$ has a similar performance to the fine-tuned model $\theta_{\textrm{ft}}$, we can infer that the region in $\theta_{\textrm{ft}}$, corresponding to $\gamma$ is the engram responsible for the skill for the fine-tuned task.

To find this engram, they directly optimized the mask $\gamma$ to minimize the task loss for the grafted model. Formally, this can be expressed as:

\[\underset{\gamma \in \{0, 1\}^{|\theta_{\textrm{ft}}|} : \|\gamma\|_{0} \le s}{\operatorname{argmax}} \mathcal{L}_{\mathcal{T}}(\gamma \odot \theta_{\textrm{ft}} + (1 - \gamma) \odot \theta_{\textrm{pre}})\]

where $\mathcal{L}_{\mathcal{T}}$ is a metric for performance on the Task $\mathcal{T}$ (e.g., classification error), and $s$ is a sparsity constraint on the mask $\gamma$.

To enable optimization through gradient descent, they reparameterized the mask $\gamma$ as a differentiable variable using a sigmoid function applied to a real-valued vector $S$, i.e., $\gamma = \sigma(S)$.

Using this optimization objective directly, they successfully identified task-specific skill engrams. By grafting just 0.01% of $\theta_{\textrm{ft}}$ (the identified engram) onto $\theta_{\textrm{pre}}$, $\overline{\theta_{\textrm{ft}}}(\gamma)$ achieved over 95% of the original fine-tuned model’s performance. This confirms that the identified engram is indeed responsible for the task-specific skill.

Leveraging Engrams in LLMs

In the AI domain, researchers have demonstrated how memory locations (i.e., engrams) can be identified through direct optimization techniques. Once these task-specific engrams are identified, they can be applied to practical scenarios like continuous learning. Panigrahi et al.(2023) demonstrated that by freezing the memory region responsible for a specific skill and ensuring subsequent tasks are learned in non-overlapping regions, it is possible to achieve “forget-free” continual learning. This approach enable models to retain previous skills while sequentially learning new ones.

Identifying engrams in LLMs opens up a wide range of possibilities, including targeted memory editing. In the following section, we will delve into research that has explored memory editing techniques in LLMs.

2. ROME: Targeted Memory Editing in LLMs

Meng et al. (2022) introduced a mathematically grounded approach to update memory within LLMs. The authors treated memory as a factual association and developed a method for inserting new memory while preserving knowledge that is unrelated to the new memory. A factual association is a link between a subject (entity) and specific information about it, which a model has learned. For example, in the fact “The Eiffel Tower is located in Paris,” the subject is “Eiffel Tower,” and the associated information is “Paris.” The model recognizes this connection as a factual association, so when prompted with “The Eiffel Tower is located in…”, it outputs “Paris” based on this stored knowledge.

Finding engram cells

To edit a specific fact in the model, the authors first identified where that fact, or “engram,” is stored. Much like neuroscientists reactivating tagged engram cells to probe stored memories, they exposed the model to an unrelated context or input prompt to identify where specific factual associations reside. For instance, to trace the memory for “The Eiffel Tower is located in Paris,” they first recorded the model’s firing patterns across all layers when prompted with this information. Then, they fed the model a different context or prompt, such as “The Colosseum is located in,” and reactivated a specific layer using only the previously stored firing patterns. They identified the layer where the model outputs “Paris” instead of “Rome” in the different context, indicating the presence of the Eiffel Tower’s engram. Their findings reveal that factual associations (i.e., engrams) are stored in the multilayer perceptron (MLP) modules of the middle layers, at the final subject token position.

Injecting a False Memory

Suppose memory A encodes the factual association “The Eiffel Tower is located in Paris,” and memory B encodes the factual association “The Colosseum is located in Rome.” (See Fig. 4 Top) Now, we want to inject a false memory into the model, such as “The Eiffel Tower is located in Rome.” To do this, we need to identify the specific engram and its corresponding firing pattern. The authors hypothesize that the MLP module can be modeled as linear associative memories. In the output linear layer of an MLP, a vector $k$ serves as the input, and the value vector $v = Wk$, where $W$ represents the linear associative memory.

To determine the firing pattern $k_A$ for engram A, researchers use the prompt “The Eiffel Tower is located in” combined with various prefixed contexts. They calculate the $k_A$ vector at the middle MLP layer where the factual association engram is located by averaging activity patterns generated from inputs with different prefixes, such as “The first thing. {prompt}” and “A few days ago. {prompt}.” By averaging the responses to these prefixed inputs, they obtain a reliable estimate of the $k_A$ vector that represents the engram’s pattern in that layer. The corresponding value vector for this pattern is $v_A = Wk_A$, which leads the model to output “Paris.”

Next, to implant the false memory, the model modifies the output neurons $v$ using optimization techniques to create a new vector $v_B$ that leads the model to a false output “Rome.” By using gradient descent, $v$ is iteratively adjusted until $v_B$ reliably corresponds to “Rome.”

After finding the firing patterns $k_A$ and $v_B$, the authors proposed a rank-one model editing (ROME) method (See Fig. 4 bottom). This method produces a weight change matrix, $\Delta W$, which is then added to the original weight matrix, resulting in an updated matrix: $\hat{W} = W + \Delta W$, where

\[\Delta W = \eta (v_B k_A^{T} - W k_A k_A^{T})C^{-1}\]

Here, $C$ approximates the covariance of existing key vectors and $C^{-1}$ is the inverse of the covariance matrix, directing the weight update in a way that minimizes changes to unrelated facts, and $\eta$ is a constant.

Figure 4. ROME (Rank-One Model Editing) method for memory editing in language models. The top section shows the original associations: "Eiffel Tower" with "Paris" (Memory A) and "Colosseum" with "Rome" (Memory B). ROME injects a false association, linking "Eiffel Tower" to "Rome." The bottom section illustrates the process: the original association $W_{AA} = v_A k_A^T$ is removed, and a new association $W_{AB} = v_B k_A^T$ is added. This update produces the edited weight matrix $\hat{W}$, resulting in the model associating "Eiffel Tower" with "Rome" instead of "Paris."

In neuroscience, Hebbian learning is summarized by “neurons that fire together, wire together.” Mathematically, this is often modeled as:

\[\Delta w_{ij} = \eta x_i y_j\]

where $w_{ij}$ is the weight of the connection between neuron $i$ and neuron $j$, $x_i$ and $y_j$ are the activations of neurons $i$ and $j$, and $\eta$ is a learning rate.

This rule is similar to the update rule in ROME.

\[\Delta w_{ij} \propto [v_B k^{T}_A - W k_A k^{T}_A]_{ij} = (v_B)_i (k_A)_j - (v_A)_i (k_A)_j\]

In addition to injecting a new association ($v_B k_A^{T}$), ROME also effectively “removes” the old association by subtracting $v_A k_A^{T}$. This process resembles anti-Hebbian unlearning, as it eliminates the previous association to ensure it is fully replaced by the new one. This update rule aligns ROME with the principle of “engram cells that fire together, wire together” .

The theoretical foundations of the ROME method align with biological mechanisms demonstrated in neuroscience experiments. For instance, Ohkawa et al. (2015) explored how artificial associations between unrelated memories could be induced through synchronous activation of neuronal ensembles. Their research showed that hippocampal and amygdala engrams, representing neutral contexts and fear stimuli respectively, could be artificially linked in mice. By optogenetically co-activating these engrams, researchers created an artificial memory where the mice exhibited fear in a previously neutral context.

This mechanism resembles the ROME method’s approach of pairing a specific $k_A$ (representing the “neutral context” of the Eiffel Tower in Paris) with a new $v_B$ (representing the fear-associated memory, or in this case, Rome). Both rely on synchronizing distinct ensembles—whether neurons in the brain or patterns in an artificial model—to generate a qualitatively new association.

The findings from Ohkawa et al. provide a compelling biological analog to ROME’s editing principles, suggesting that artificial systems can mimic the brain’s capacity to update and reshape memory networks by associating novel information with pre-stored information.

This exploration of the underlying mathematical framework reveals that ROME is not merely an engineering technique; it represents a sophisticated application of associative memory principles in AI.

Concluding Remarks: Let’s connect AI research with neuroscience!

This blog gave a new interpretation of two LLM studies through the lens of neuroscience. Interestingly, although neuroscience and AI are conducted independently, LLM research can be viewed through the framework of neuroscience.

LLMs are highly complex systems, and it is challenging to understand their processes and handling of information. Similarly, the brain is an equally complex system. Researchers in both domains may share a common goal: to understand these intricate “black box” systems. Achieving this understanding could enable us to edit or control LLMs to prevent harmful behaviors. In neuroscience, such insights could contribute to treatments for Alzheimer’s disease and other brain-related conditions.