AI Associative Memories
Associative Memories
Associative Memory
Human memory and computer memory are fundamentally different. A key distinction is the way that memory is accessed. Computer memory, which consists of long lists of zeroes and ones (known as bits), is accessed by an address system. The address, which is itself stored in memory, defines the precise position of an object in the computer memory. This model of memory storage is referred to as random access memory or lasting memory, because information is represented as a list and we can access any position in that list at will using its address.
Despite its simple architecture, computer memory is able to store complicated data structures. For example, it could be that we want to store a list of images. Computer images are stored in various formats that ultimately represent the image as a series of bits. Therefore, a list of images can be stored in computer memory as a long string of bits. To access an image in the list, we just need to know where the image starts in the bit string and its length. These operations, which convert strings of bits into images we can view on a screen, and vice versa, are performed by the software running on the computer.
While storage of this form underpins the digital revolution, it is not the way that humans store memories. Human memory does not represent objects as a list that persists with high fidelity and is accessed using an address. Instead, it is associative. For example, if we consider an idea like 'Oxford', our memory associates it with other ideas, such as 'university', 'England', or 'Tolkien'. These concepts are connected in our brain in ways that are not fully understood. The connections appear to be built by repeatedly encountering the ideas together or by reasoning.
Human memory tolerates noise, allowing us to make connections when, for example, we hear a word pronounced badly or read a sentence printed with some letters missing. In the case of human intelligence and the computer-based data structures that we will examine later, the process of finding closely related objects or ideas from a partial or imperfect starting point can arise from intrinsic properties of the memory recall process, instead of an intervening noise elimination process.
In the case of computer memory, we can create data structures that connect pairs of objects, known as key-value pairs. For example, we could assign the values ‘green’ and ‘red’ to the keys ‘lime’ and ‘tomato’ respectively. A mis-spelt key, such as ‘tamoto’, cannot produce a meaningful value because it does not exist in the data structure. To associate the mis-spelt key ‘tamoto’ with the value ‘red’ would typically require a preliminary noise-correction step applied to the key before we look up the value. But there are computer data structures that perform the two steps at once.
Data structures that are accessed using another object, rather than an address, are known as associative memories or content-addressable memories.
We can see data structures which, given one data point, will identify the data point it most closely resembles. This allows the data structure to find information based on partial or distorted input.
We could achieve this by exhaustively iterating through all data points in memory and comparing them to the input, and calculating a distance between them. But this would become very slow for large amounts of data and probably is not the way our brains achieve associative memory. We would like to create a data structure that provides faster access.
We can see by looking at how bit strings of a given length can be represented in a way that allows fast inexact matching. We can define distances between two data points, for example by Hamming distance, a quantity that counts the number of non-matching bits. This is a natural definition of distance in the case of bit strings, but for more complex objects, there may be other, more relevant, ways of quantifying distance.
Definition of Hamming Distance
The Hamming distance between two strings of equal length is the number of positions at which the corresponding elements (characters or bits) are different. It measures the minimum number of substitutions required to change one string into the other.
- Error Detection and Correction: Used in coding theory to detect and correct errors in transmitted data.
- Genetics: Measures differences in DNA or protein sequences.
- Machine Learning: Used in clustering and distance-based similarity measures.
The Hamming distance is particularly useful in comparing fixed-length binary data or sequences with categorical attributes.
Hopfield Networks
We can see by examining an influential concept, attributed to J.J. Hopfield, that simulates associative memory in a rather intuitive way, mimicking how our brains might actually store information.
Consider how a normal computer works. Its state, including all its variables, the code which is running on it, and its inputs, can be described by one long binary string. In each step of the program, it changes its own bits until it reaches a state it considers to be terminal, which is when the computer program ends.
Every state is either a terminal state or not, in which case the computer continues to progress towards a terminal state, unless an infinite loop occurs. We will use this network representation to imagine how a real network of neurons might perform a computation. We consider each neuron to be in one of two states, and based on the states of the neurons, we will calculate an overall property of the network that we refer to as energy. Based on a predefined rule, we will update the network and stop when no further update can take place, at which point we return the binary string describing the states of its neurons. For the Hopfield network, we will consider the neurons to be in the binary states −1 or 1, instead of the binary states 0 or 1. This simplifies the mathematics below.
Hopfield networks in their basic form are relatively simple constructs. Imagine a complete graph, meaning that every pair of vertices is connected by an edge. Each vertex represents a neuron and has a state of −1 or 1. Edges have weights describing how strongly the vertices (neurons) are connected.
If we model the neurons as binary threshold neurons, meaning that the state they take is determined by a function, calculated from the state of the network, and a predetermined threshold. The state of the neuron indicates whether the function exceeds the threshold (1) or not (−1).
Applications
- Associative Memory: Store and retrieve patterns like images or data sequences.
- Optimization Problems: Solve problems by finding energy minima in a system.
- Error Correction: Recall correct patterns from noisy or partial inputs.
Hopfield networks illustrate how simple neural dynamics can lead to powerful pattern storage and recall capabilities.
Summary of Associative Memories
Associative memories are computational systems that store patterns or information and can retrieve them based on partial or noisy inputs. They mimic the way the human brain recalls information by association rather than direct indexing. Key examples include Hopfield Networks and Content-Addressable Memories (CAMs).
Key Characteristics
1. Pattern Storage: Associative memories store a collection of patterns or data points. 2. Pattern Recall: They retrieve the correct stored pattern when presented with a noisy or incomplete version. 3. Parallel Updates: All units (neurons or memory cells) work together to recall a stored pattern. 4. Robustness: Effective at recalling patterns even with errors or noise in the input.
Types of Associative Memories
1. Auto-Associative Memories:
* Recall a complete pattern from a partial or noisy input. * Example: Hopfield Networks, where patterns converge to stable states (attractors).
2. Hetero-Associative Memories:
* Map an input pattern to a completely different output pattern. * Example: Translating input text into corresponding binary representations.
Key Concepts
1. Energy Minimization: Associative memories, like Hopfield Networks, use an energy function to find stable patterns. Lower energy corresponds to stored patterns. 2. Storage Capacity: The number of patterns a system can store effectively before errors occur. 3. Generalization: Ability to interpolate between stored patterns based on noisy input.
Applications
- Pattern Recognition: Recognizing images, text, or audio with missing or distorted elements.
- Error Correction: Recovering corrupted data in communication systems.
- Optimization Problems: Solving combinatorial problems by finding optimal states.
Conclusion
Associative memories are powerful tools for recalling information in a way that mirrors human memory. Their ability to handle noise and incomplete inputs makes them invaluable in areas like machine learning, signal processing, and optimization.
