Hopfield network

A Hopfield network (or associative memory) is a form of recurrent neural network, or a spin glass system, that can serve as a content-addressable memory. The Hopfield network, named for John Hopfield, consists of a single layer of neurons, where each neuron is connected to every other neuron except itself. These connections are bidirectional and symmetric, meaning the weight of the connection from neuron i to neuron j is the same as the weight from neuron j to neuron i. Patterns are associatively recalled by fixing certain inputs, and dynamically evolve the network to minimize an energy function, towards local energy minimum states that correspond to stored patterns. Patterns are associatively learned (or "stored") by a Hebbian learning algorithm.

One of the key features of Hopfield networks is their ability to recover complete patterns from partial or noisy inputs, making them robust in the face of incomplete or corrupted data. Their connection to statistical mechanics, recurrent networks, and human cognitive psychology has led to their application in various fields, including physics, psychology, neuroscience, and machine learning theory and practice.

History

One origin of associative memory is human cognitive psychology, specifically the associative memory. Frank Rosenblatt studied "close-loop cross-coupled perceptrons", which are 3-layered perceptron networks whose middle layer contains recurrent connections that change by a Hebbian learning rule.^[1]^: 73–75^[2]^{: Chapter 19, 21}

Another model of associative memory is where the output does not loop back to the input. W. K. Taylor proposed such a model trained by Hebbian learning in 1956.^[3] Karl Steinbuch, who wanted to understand learning, and inspired by watching his children learn,^[4] published the Lernmatrix in 1961.^[5]^[6] It was translated to English in 1963.^[7] Similar research was done with the correlogram of D. J. Willshaw et al. in 1969.^[8] Teuvo Kohonen trained an associative memory by gradient descent in 1974.^[9]

Another origin of associative memory was statistical mechanics. The Ising model was published in 1920s as a model of magnetism, however it studied the thermal equilibrium, which does not change with time. Roy J. Glauber in 1963 studied the Ising model evolving in time, as a process towards thermal equilibrium (Glauber dynamics), adding in the component of time.^[10]

The second component to be added was adaptation to stimulus. Described independently by Kaoru Nakano in 1971^[11]^[12] and Shun'ichi Amari in 1972,^[13] they proposed to modify the weights of an Ising model by Hebbian learning rule as a model of associative memory. The same idea was published by William A. Little [de] in 1974,^[14] who was acknowledged by Hopfield in his 1982 paper.

See Carpenter (1989)^[15] and Cowan (1990)^[16] for a technical description of some of these early works in associative memory.

The Sherrington–Kirkpatrick model of spin glass, published in 1975,^[17] is the Hopfield network with random initialization. Sherrington and Kirkpatrick found that it is highly likely for the energy function of the SK model to have many local minima. In the 1982 paper, Hopfield applied this recently developed theory to study the Hopfield network with binary activation functions.^[18] In a 1984 paper he extended this to continuous activation functions.^[19] It became a standard model for the study of neural networks through statistical mechanics.^[20]^[21]

A major advance in memory storage capacity was developed by Dimitry Krotov and Hopfield in 2016^[22] through a change in network dynamics and energy function. This idea was further extended by Demircigil and collaborators in 2017.^[23] The continuous dynamics of large memory capacity models was developed in a series of papers between 2016 and 2020.^[22]^[24]^[25] Large memory storage capacity Hopfield Networks are now called Dense Associative Memories or modern Hopfield networks.

In 2024, John J. Hopfield and Geoffrey E. Hinton were awarded the Nobel Prize in Physics for their foundational contributions to machine learning, such as the Hopfield network.

Structure

The units in Hopfield nets are binary threshold units, i.e. the units only take on two different values for their states, and the value is determined by whether or not the unit's input exceeds its threshold $U_{i}$ . Discrete Hopfield nets describe relationships between binary (firing or not-firing) neurons $1,2,\ldots ,i,j,\ldots ,N$ .^[18] At a certain time, the state of the neural net is described by a vector $V$ , which records which neurons are firing in a binary word of $N$ bits.

The interactions $w_{ij}$ between neurons have units that usually take on values of 1 or −1, and this convention will be used throughout this article. However, other literature might use units that take values of 0 and 1. These interactions are "learned" via Hebb's law of association, such that, for a certain state $V^{s}$ and distinct nodes $i,j$

$w_{ij}=V_{i}^{s}V_{j}^{s}$

but $w_{ii}=0$ .

(Note that the Hebbian learning rule takes the form $w_{ij}=(2V_{i}^{s}-1)(2V_{j}^{s}-1)$ when the units assume values in $\{0,1\}$ .)

Once the network is trained, $w_{ij}$ no longer evolve. If a new state of neurons $V^{s'}$ is introduced to the neural network, the net acts on neurons such that

$V_{i}^{s'}\rightarrow 1$ if $\sum _{j}w_{ij}V_{j}^{s'}>U_{i}$
$V_{i}^{s'}\rightarrow -1$ if $\sum _{j}w_{ij}V_{j}^{s'}<U_{i}$

where $U_{i}$ is the threshold value of the i'th neuron (often taken to be 0).^[26] In this way, Hopfield networks have the ability to "remember" states stored in the interaction matrix, because if a new state $V^{s'}$ is subjected to the interaction matrix, each neuron will change until it matches the original state $V^{s}$ (see the Updates section below).

The connections in a Hopfield net typically have the following restrictions:

$w_{ii}=0,\forall i$ (no unit has a connection with itself)
$w_{ij}=w_{ji},\forall i,j$ (connections are symmetric)

The constraint that weights are symmetric guarantees that the energy function decreases monotonically while following the activation rules.^[27] A network with asymmetric weights may exhibit some periodic or chaotic behaviour; however, Hopfield found that this behavior is confined to relatively small parts of the phase space and does not impair the network's ability to act as a content-addressable associative memory system.

Hopfield also modeled neural nets for continuous values, in which the electric output of each neuron is not binary but some value between 0 and 1.^[19] He found that this type of network was also able to store and reproduce memorized states.

Notice that every pair of units i and j in a Hopfield network has a connection that is described by the connectivity weight $w_{ij}$ . In this sense, the Hopfield network can be formally described as a complete undirected graph $G=\langle V,f\rangle$ , where $V$ is a set of McCulloch–Pitts neurons and $f:V^{2}\rightarrow \mathbb {R}$ is a function that links pairs of units to a real value, the connectivity weight.

Updating

Updating one unit (node in the graph simulating the artificial neuron) in the Hopfield network is performed using the following rule:

$s_{i}\leftarrow \left\{{\begin{array}{ll}+1&{\text{if }}\sum _{j}{w_{ij}s_{j}}\geq \theta _{i},\\-1&{\text{otherwise.}}\end{array}}\right.$

where:

$w_{ij}$ is the strength of the connection weight from unit j to unit i (the weight of the connection).
$s_{i}$ is the state of unit i.
$\theta _{i}$ is the threshold of unit i.

Updates in the Hopfield network can be performed in two different ways:

Asynchronous: Only one unit is updated at a time. This unit can be picked at random, or a pre-defined order can be imposed from the very beginning.
Synchronous: All units are updated at the same time. This requires a central clock to the system in order to maintain synchronization. This method is viewed by some as less realistic, based on an absence of observed global clock influencing analogous biological or physical systems of interest.

Neurons "attract or repel each other" in state space

The weight between two units has a powerful impact upon the values of the neurons. Consider the connection weight $w_{ij}$ between two neurons i and j. If $w_{ij}>0$ , the updating rule implies that:

when $s_{j}=1$ , the contribution of j in the weighted sum is positive. Thus, $s_{i}$ is pulled by j towards its value $s_{i}=1$
when $s_{j}=-1$ , the contribution of j in the weighted sum is negative. Then again, $s_{i}$ is pushed by j towards its value $s_{i}=-1$

Thus, the values of neurons i and j will converge if the weight between them is positive. Similarly, they will diverge if the weight is negative.

Convergence properties of discrete and continuous Hopfield networks

Bruck in his paper in 1990^[28] studied discrete Hopfield networks and proved a generalized convergence theorem that is based on the connection between the network's dynamics and cuts in the associated graph. This generalization covered both asynchronous as well as synchronous dynamics and presented elementary proofs based on greedy algorithms for max-cut in graphs. A subsequent paper^[29] further investigated the behavior of any neuron in both discrete-time and continuous-time Hopfield networks when the corresponding energy function is minimized during an optimization process. Bruck showed^[28] that neuron j changes its state if and only if it further decreases the following biased pseudo-cut. The discrete Hopfield network minimizes the following biased pseudo-cut^[29] for the synaptic weight matrix of the Hopfield net.

$J_{pseudo-cut}(k)=\sum _{i\in C_{1}(k)}\sum _{j\in C_{2}(k)}w_{ij}+\sum _{j\in C_{1}(k)}{\theta _{j}}$

where $C_{1}(k)$ and $C_{2}(k)$ represents the set of neurons which are −1 and +1, respectively, at time $k$ . For further details, see the recent paper.^[29]

The discrete-time Hopfield Network always minimizes exactly the following pseudo-cut^[28]^[29]

U(k)=\sum _{i=1}^{N}\sum _{j=1}^{N}w_{ij}(s_{i}(k)-s_{j}(k))^{2}+2\sum _{j=1}^{N}\theta _{j}s_{j}(k)

The continuous-time Hopfield network always minimizes an upper bound to the following weighted cut^[29]

V(t)=\sum _{i=1}^{N}\sum _{j=1}^{N}w_{ij}(f(s_{i}(t))-f(s_{j}(t))^{2}+2\sum _{j=1}^{N}\theta _{j}f(s_{j}(t))

where $f(\cdot )$ is a zero-centered sigmoid function.

The complex Hopfield network, on the other hand, generally tends to minimize the so-called shadow-cut of the complex weight matrix of the net.^[30]

Energy

Hopfield nets have a scalar value associated with each state of the network, referred to as the "energy", E, of the network, where:

E=-{\frac {1}{2}}\sum _{i,j}w_{ij}s_{i}s_{j}-\sum _{i}\theta _{i}s_{i}

This quantity is called "energy" because it either decreases or stays the same upon network units being updated. Furthermore, under repeated updating the network will eventually converge to a state which is a local minimum in the energy function (which is considered to be a Lyapunov function).^[18] Thus, if a state is a local minimum in the energy function it is a stable state for the network. Note that this energy function belongs to a general class of models in physics under the name of Ising models; these in turn are a special case of Markov networks, since the associated probability measure, the Gibbs measure, has the Markov property.

Hopfield network in optimization

Hopfield and Tank presented the Hopfield network application in solving the classical traveling-salesman problem in 1985.^[31] Since then, the Hopfield network has been widely used for optimization. The idea of using the Hopfield network in optimization problems is straightforward: If a constrained/unconstrained cost function can be written in the form of the Hopfield energy function E, then there exists a Hopfield network whose equilibrium points represent solutions to the constrained/unconstrained optimization problem. Minimizing the Hopfield energy function both minimizes the objective function and satisfies the constraints also as the constraints are "embedded" into the synaptic weights of the network. Although including the optimization constraints into the synaptic weights in the best possible way is a challenging task, many difficult optimization problems with constraints in different disciplines have been converted to the Hopfield energy function: Associative memory systems, Analog-to-Digital conversion, job-shop scheduling problem, quadratic assignment and other related NP-complete problems, channel allocation problem in wireless networks, mobile ad-hoc network routing problem, image restoration, system identification, combinatorial optimization, etc, just to name a few. However, while it is possible to convert hard optimization problems to Hopfield energy functions, it does not guarantee convergence to a solution (even in exponential time).^[32]

Initialization and running

Initialization of the Hopfield networks is done by setting the values of the units to the desired start pattern. Repeated updates are then performed until the network converges to an attractor pattern. Convergence is generally assured, as Hopfield proved that the attractors of this nonlinear dynamical system are stable, not periodic or chaotic as in some other systems^{[citation needed]}. Therefore, in the context of Hopfield networks, an attractor pattern is a final stable state, a pattern that cannot change any value within it under updating^{[citation needed]}.

Training

Training a Hopfield net involves lowering the energy of states that the net should "remember". This allows the net to serve as a content addressable memory system, that is to say, the network will converge to a "remembered" state if it is given only part of the state. The net can be used to recover from a distorted input to the trained state that is most similar to that input. This is called associative memory because it recovers memories on the basis of similarity. For example, if we train a Hopfield net with five units so that the state (1, −1, 1, −1, 1) is an energy minimum, and we give the network the state (1, −1, −1, −1, 1) it will converge to (1, −1, 1, −1, 1). Thus, the network is properly trained when the energy of states which the network should remember are local minima. Note that, in contrast to Perceptron training, the thresholds of the neurons are never updated.

Learning rules

There are various different learning rules that can be used to store information in the memory of the Hopfield network. It is desirable for a learning rule to have both of the following two properties:

Local: A learning rule is local if each weight is updated using information available to neurons on either side of the connection that is associated with that particular weight.
Incremental: New patterns can be learned without using information from the old patterns that have been also used for training. That is, when a new pattern is used for training, the new values for the weights only depend on the old values and on the new pattern.^[33]

These properties are desirable, since a learning rule satisfying them is more biologically plausible. For example, since the human brain is always learning new concepts, one can reason that human learning is incremental. A learning system that was not incremental would generally be trained only once, with a huge batch of training data.

Hebbian learning rule for Hopfield networks

Hebbian theory was introduced by Donald Hebb in 1949 in order to explain "associative learning", in which simultaneous activation of neuron cells leads to pronounced increases in synaptic strength between those cells.^[34] It is often summarized as "Neurons that fire together wire together. Neurons that fire out of sync fail to link".

The Hebbian rule is both local and incremental. For the Hopfield networks, it is implemented in the following manner when learning $n$ binary patterns:

$w_{ij}={\frac {1}{n}}\sum _{\mu =1}^{n}\epsilon _{i}^{\mu }\epsilon _{j}^{\mu }$

where $\epsilon _{i}^{\mu }$ represents bit i from pattern $\mu$ .

If the bits corresponding to neurons i and j are equal in pattern $\mu$ , then the product $\epsilon _{i}^{\mu }\epsilon _{j}^{\mu }$ will be positive. This would, in turn, have a positive effect on the weight $w_{ij}$ and the values of i and j will tend to become equal. The opposite happens if the bits corresponding to neurons i and j are different.

Storkey learning rule

This rule was introduced by Amos Storkey in 1997 and is both local and incremental. Storkey also showed that a Hopfield network trained using this rule has a greater capacity than a corresponding network trained using the Hebbian rule.^[35] The weight matrix of an attractor neural network^{[clarification needed]} is said to follow the Storkey learning rule if it obeys:

$w_{ij}^{\nu }=w_{ij}^{\nu -1}+{\frac {1}{n}}\epsilon _{i}^{\nu }\epsilon _{j}^{\nu }-{\frac {1}{n}}\epsilon _{i}^{\nu }h_{ji}^{\nu }-{\frac {1}{n}}\epsilon _{j}^{\nu }h_{ij}^{\nu }$

where $h_{ij}^{\nu }=\sum _{k=1~:~i\neq k\neq j}^{n}w_{ik}^{\nu -1}\epsilon _{k}^{\nu }$ is a form of local field^[33] at neuron i.

This learning rule is local, since the synapses take into account only neurons at their sides. The rule makes use of more information from the patterns and weights than the generalized Hebbian rule, due to the effect of the local field.

Spurious patterns

Patterns that the network uses for training (called retrieval states) become attractors of the system. Repeated updates would eventually lead to convergence to one of the retrieval states. However, sometimes the network will converge to spurious patterns (different from the training patterns).^[36] In fact, the number of spurious patterns can be exponential in the number of stored patterns, even if the stored patterns are orthogonal.^[37] The energy in these spurious patterns is also a local minimum. For each stored pattern x, the negation -x is also a spurious pattern.

A spurious state can also be a linear combination of an odd number of retrieval states. For example, when using 3 patterns $\mu _{1},\mu _{2},\mu _{3}$ , one can get the following spurious state:

$\epsilon _{i}^{\rm {mix}}=\pm \operatorname {sgn}(\pm \epsilon _{i}^{\mu _{1}}\pm \epsilon _{i}^{\mu _{2}}\pm \epsilon _{i}^{\mu _{3}})$

Spurious patterns that have an even number of states cannot exist, since they might sum up to zero^[36]

Capacity

The Network capacity of the Hopfield network model is determined by neuron amounts and connections within a given network. Therefore, the number of memories that are able to be stored is dependent on neurons and connections. Furthermore, it was shown that the recall accuracy between vectors and nodes was 0.138 (approximately 138 vectors can be recalled from storage for every 1000 nodes) (Hertz et al., 1991). Therefore, it is evident that many mistakes will occur if one tries to store a large number of vectors. When the Hopfield model does not recall the right pattern, it is possible that an intrusion has taken place, since semantically related items tend to confuse the individual, and recollection of the wrong pattern occurs. Therefore, the Hopfield network model is shown to confuse one stored item with that of another upon retrieval. Perfect recalls and high capacity, >0.14, can be loaded in the network by Storkey learning method; ETAM,^[38]^[39] ETAM experiments also in.^[40] Ulterior models inspired by the Hopfield network were later devised to raise the storage limit and reduce the retrieval error rate, with some being capable of one-shot learning.^[41]

The storage capacity can be given as $C\cong {\frac {n}{2\log _{2}n}}$ where $n$ is the number of neurons in the net.

Human memory

The Hopfield network is a model for human associative learning and recall.^[42]^[43] It accounts for associative memory through the incorporation of memory vectors. Memory vectors can be slightly used, and this would spark the retrieval of the most similar vector in the network. However, we will find out that due to this process, intrusions can occur. In associative memory for the Hopfield network, there are two types of operations: auto-association and hetero-association. The first being when a vector is associated with itself, and the latter being when two different vectors are associated in storage. Furthermore, both types of operations are possible to store within a single memory matrix, but only if that given representation matrix is not one or the other of the operations, but rather the combination (auto-associative and hetero-associative) of the two.

Hopfield's network model utilizes the same learning rule as Hebb's (1949) learning rule, which characterised learning as being a result of the strengthening of the weights in cases of neuronal activity.

Rizzuto and Kahana (2001) were able to show that the neural network model can account for repetition on recall accuracy by incorporating a probabilistic-learning algorithm. During the retrieval process, no learning occurs. As a result, the weights of the network remain fixed, showing that the model is able to switch from a learning stage to a recall stage. By adding contextual drift they were able to show the rapid forgetting that occurs in a Hopfield model during a cued-recall task. The entire network contributes to the change in the activation of any single node.

McCulloch and Pitts' (1943) dynamical rule, which describes the behavior of neurons, does so in a way that shows how the activations of multiple neurons map onto the activation of a new neuron's firing rate, and how the weights of the neurons strengthen the synaptic connections between the new activated neuron (and those that activated it). Hopfield would use McCulloch–Pitts's dynamical rule in order to show how retrieval is possible in the Hopfield network. However, Hopfield would do so in a repetitious fashion. Hopfield would use a nonlinear activation function, instead of using a linear function. This would therefore create the Hopfield dynamical rule and with this, Hopfield was able to show that with the nonlinear activation function, the dynamical rule will always modify the values of the state vector in the direction of one of the stored patterns.

Dense associative memory or modern Hopfield network

Hopfield networks^[18]^[19] are recurrent neural networks with dynamical trajectories converging to fixed point attractor states and described by an energy function. The state of each model neuron ${\textstyle i}$ is defined by a time-dependent variable $V_{i}$ , which can be chosen to be either discrete or continuous. A complete model describes the mathematics of how the future state of activity of each neuron depends on the known present or previous activity of all the neurons.

In the original Hopfield model of associative memory,^[18] the variables were binary, and the dynamics were described by a one-at-a-time update of the state of the neurons. An energy function quadratic in the $V_{i}$ was defined, and the dynamics consisted of changing the activity of each single neuron $i$ only if doing so would lower the total energy of the system. This same idea was extended to the case of $V_{i}$ being a continuous variable representing the output of neuron $i$ , and $V_{i}$ being a monotonic function of an input current. The dynamics became expressed as a set of first-order differential equations for which the "energy" of the system always decreased.^[19] The energy in the continuous case has one term which is quadratic in the $V_{i}$ (as in the binary model), and a second term which depends on the gain function (neuron's activation function). While having many desirable properties of associative memory, both of these classical systems suffer from a small memory storage capacity, which scales linearly with the number of input features.^[18] In contrast, by increasing the number of parameters in the model so that there are not just pair-wise but also higher-order interactions between the neurons, one can increase the memory storage capacity.^[44]^[45]

Dense Associative Memories^[22] (also known as the modern Hopfield networks^[24]) are generalizations of the classical Hopfield Networks that break the linear scaling relationship between the number of input features and the number of stored memories. This is achieved by introducing stronger non-linearities (either in the energy function or neurons' activation functions) leading to super-linear^[22] (even an exponential^[23]) memory storage capacity as a function of the number of feature neurons, in effect increasing the order of interactions between the neurons.^[44]^[45] The network still requires a sufficient number of hidden neurons.^[25]

The key theoretical idea behind dense associative memory networks is to use an energy function and an update rule that is more sharply peaked around the stored memories in the space of neuron's configurations compared to the classical model,^[22] as demonstrated when the higher-order interactions and subsequent energy landscapes are explicitly modelled.^[45]

Discrete variables

A simple example^[22] of the modern Hopfield network can be written in terms of binary variables $V_{i}$ that represent the active $V_{i}=+1$ and inactive $V_{i}=-1$ state of the model neuron $i$ . $E=-\sum \limits _{\mu =1}^{N_{\text{mem}}}F{\Big (}\sum \limits _{i=1}^{N_{f}}\xi _{\mu i}V_{i}{\Big )}$ In this formula the weights ${\textstyle \xi _{\mu i}}$ represent the matrix of memory vectors (index $\mu =1...N_{\text{mem}}$ enumerates different memories, and index $i=1...N_{f}$ enumerates the content of each memory corresponding to the $i$ -th feature neuron), and the function $F(x)$ is a rapidly growing non-linear function. The update rule for individual neurons (in the asynchronous case) can be written in the following form $V_{i}^{(t+1)}=Sign{\bigg [}\sum \limits _{\mu =1}^{N_{\text{mem}}}{\bigg (}F{\Big (}\xi _{\mu i}+\sum \limits _{j\neq i}\xi _{\mu j}V_{j}^{(t)}{\Big )}-F{\Big (}-\xi _{\mu i}+\sum \limits _{j\neq i}\xi _{\mu j}V_{j}^{(t)}{\Big )}{\bigg )}{\bigg ]}$ which states that in order to calculate the updated state of the ${\textstyle i}$ -th neuron the network compares two energies: the energy of the network with the $i$ -th neuron in the ON state and the energy of the network with the $i$ -th neuron in the OFF state, given the states of the remaining neuron. The updated state of the $i$ -th neuron selects the state that has the lowest of the two energies.^[22]

In the limiting case when the non-linear energy function is quadratic $F(x)=x^{2}$ these equations reduce to the familiar energy function and the update rule for the classical binary Hopfield Network.^[18]

The memory storage capacity of these networks can be calculated for random binary patterns. For the power energy function $F(x)=x^{n}$ the maximal number of memories that can be stored and retrieved from this network without errors is given by^[22] $N_{\text{mem}}^{max}\approx {\frac {1}{2(2n-3)!!}}{\frac {N_{f}^{n-1}}{\ln(N_{f})}}$ For an exponential energy function ${\textstyle F(x)=e^{x}}$ the memory storage capacity is exponential in the number of feature neurons^[23] $N_{\text{mem}}^{max}\approx 2^{N_{f}/2}$

Continuous variables

Modern Hopfield networks or dense associative memories can be best understood in continuous variables and continuous time.^[24]^[25] Consider the network architecture, shown in Fig.1, and the equations for neuron's states evolution^[25]

{\begin{cases}\tau _{f}{\frac {dx_{i}}{dt}}=\sum \limits _{\mu =1}^{N_{h}}\xi _{i\mu }f_{\mu }-x_{i}+I_{i}\\\tau _{h}{\frac {dh_{\mu }}{dt}}=\sum \limits _{i=1}^{N_{f}}\xi _{\mu i}g_{i}-h_{\mu }\end{cases}}

(1)

where the currents of the feature neurons are denoted by ${\textstyle x_{i}}$ , and the currents of the memory neurons are denoted by $h_{\mu }$ ( $h$ stands for hidden neurons). There are no synaptic connections among the feature neurons or the memory neurons. A matrix $\xi _{\mu i}$ denotes the strength of synapses from a feature neuron $i$ to the memory neuron $\mu$ . The synapses are assumed to be symmetric, so that the same value characterizes a different physical synapse from the memory neuron $\mu$ to the feature neuron $i$ . The outputs of the memory neurons and the feature neurons are denoted by $f_{\mu }$ and $g_{i}$ , which are non-linear functions of the corresponding currents. In general these outputs can depend on the currents of all the neurons in that layer so that $f_{\mu }=f(\{h_{\mu }\})$ and ${\textstyle g_{i}=g(\{x_{i}\})}$ . It is convenient to define these activation functions as derivatives of the Lagrangian functions for the two groups of neurons

f_{\mu }={\frac {\partial L_{h}}{\partial h_{\mu }}},\ \ \ \ {\text{and}}\ \ \ \ g_{i}={\frac {\partial L_{x}}{\partial x_{i}}}

(2)

This way the specific form of the equations for neuron's states is completely defined once the Lagrangian functions are specified. Finally, the time constants for the two groups of neurons are denoted by $\tau _{f}$ and $\tau _{h}$ , $I_{i}$ is the input current to the network that can be driven by the presented data.

General systems of non-linear differential equations can have many complicated behaviors that can depend on the choice of the non-linearities and the initial conditions. For Hopfield Networks, however, this is not the case - the dynamical trajectories always converge to a fixed point attractor state. This property is achieved because these equations are specifically engineered so that they have an underlying energy function^[25]

E(t)={\Big [}\sum \limits _{i=1}^{N_{f}}(x_{i}-I_{i})g_{i}-L_{x}{\Big ]}+{\Big [}\sum \limits _{\mu =1}^{N_{h}}h_{\mu }f_{\mu }-L_{h}{\Big ]}-\sum \limits _{\mu ,i}f_{\mu }\xi _{\mu i}g_{i}

(3)

The terms grouped into square brackets represent a Legendre transform of the Lagrangian function with respect to the states of the neurons. If the Hessian matrices of the Lagrangian functions are positive semi-definite, the energy function is guaranteed to decrease on the dynamical trajectory^[25]

{\frac {dE(t)}{dt}}=-\tau _{f}\sum \limits _{i,j=1}^{N_{f}}{\frac {dx_{i}}{dt}}{\frac {\partial ^{2}L_{x}}{\partial x_{i}\partial x_{j}}}{\frac {dx_{j}}{dt}}-\tau _{h}\sum \limits _{\mu ,\nu =1}^{N_{h}}{\frac {dh_{\mu }}{dt}}{\frac {\partial ^{2}L_{h}}{\partial h_{\mu }\partial h_{\nu }}}{\frac {dh_{\nu }}{dt}}\leq 0

(4)

This property makes it possible to prove that the system of dynamical equations describing temporal evolution of neurons' activities will eventually reach a fixed point attractor state.

In certain situations one can assume that the dynamics of hidden neurons equilibrates at a much faster time scale compared to the feature neurons, ${\textstyle \tau _{h}\ll \tau _{f}}$ . In this case the steady state solution of the second equation in the system (1) can be used to express the currents of the hidden units through the outputs of the feature neurons. This makes it possible to reduce the general theory (1) to an effective theory for feature neurons only. The resulting effective update rules and the energies for various common choices of the Lagrangian functions are shown in Fig.2. In the case of log-sum-exponential Lagrangian function the update rule (if applied once) for the states of the feature neurons is the attention mechanism^[24] commonly used in many modern AI systems (see Ref.^[25] for the derivation of this result from the continuous time formulation).

Relationship to classical Hopfield network with continuous variables

Classical formulation of continuous Hopfield Networks^[19] can be understood^[25] as a special limiting case of the modern Hopfield networks with one hidden layer. Continuous Hopfield Networks for neurons with graded response are typically described^[19] by the dynamical equations

\tau _{f}{\frac {dx_{i}}{dt}}=\sum \limits _{j=1}^{N_{f}}T_{ij}V_{j}-x_{i}+I_{i}

(5)

and the energy function

E=-{\frac {1}{2}}\sum \limits _{i,j=1}^{N_{f}}T_{ij}V_{i}V_{j}-\sum \limits _{i=1}^{N_{f}}V_{i}I_{i}+\sum \limits _{i=1}^{N_{f}}\int \limits ^{V_{i}}g^{-1}(z)\,dz

(6)

where ${\textstyle V_{i}=g(x_{i})}$ , and $g^{-1}(z)$ is the inverse of the activation function $g(x)$ . This model is a special limit of the class of models that is called models A,^[25] with the following choice of the Lagrangian functions

L_{v}=\sum \limits _{i=1}^{N_{f}}\int \limits ^{x_{i}}g(x)dx,\ \ \ \ \ {\text{and}}\ \ \ \ \ L_{h}={\frac {1}{2}}\sum \limits _{\mu =1}^{N_{h}}h_{\mu }^{2}

(7)

that, according to the definition (2), leads to the activation functions

V_{i}=g(x_{i}),\ \ \ \ \ {\text{and}}\ \ \ \ \ f_{\mu }=h_{\mu }

(8)

If we integrate out the hidden neurons the system of equations (1) reduces to the equations on the feature neurons (5) with $T_{ij}=\sum \limits _{\mu =1}^{N_{h}}\xi _{\mu i}\xi _{\mu j}$ , and the general expression for the energy (3) reduces to the effective energy

E=-{\frac {1}{2}}\sum \limits _{i,j=1}^{N_{f}}T_{ij}V_{i}V_{j}-\sum \limits _{i=1}^{N_{f}}V_{i}I_{i}+\sum \limits _{i=1}^{N_{f}}{\Big (}x_{i}V_{i}-\int \limits ^{x_{i}}g(x)dx{\Big )}

(9)

While the first two terms in equation (6) are the same as those in equation (9), the third terms look superficially different. In equation (9) it is a Legendre transform of the Lagrangian for the feature neurons, while in (6) the third term is an integral of the inverse activation function. Nevertheless, these two expressions are in fact equivalent, since the derivatives of a function and its Legendre transform are inverse functions of each other. The easiest way to see that these two terms are equal explicitly is to differentiate each one with respect to $x_{i}$ . The results of these differentiations for both expressions are equal to $x_{i}g(x_{i})'$ . Thus, the two expressions are equal up to an additive constant. This completes the proof^[25] that the classical Hopfield Network with continuous states^[19] is a special limiting case of the modern Hopfield network (1) with energy (3).

General formulation of the modern Hopfield network

Biological neural networks have a large degree of heterogeneity in terms of different cell types. This section describes a mathematical model of a fully connected modern Hopfield network assuming the extreme degree of heterogeneity: every single neuron is different.^[46] Specifically, an energy function and the corresponding dynamical equations are described assuming that each neuron has its own activation function and kinetic time scale. The network is assumed to be fully connected, so that every neuron is connected to every other neuron using a symmetric matrix of weights $W_{IJ}$ , indices $I$ and $J$ enumerate different neurons in the network, see Fig.3. The easiest way to mathematically formulate this problem is to define the architecture through a Lagrangian function $L(\{x_{I}\})$ that depends on the activities of all the neurons in the network. The activation function for each neuron is defined as a partial derivative of the Lagrangian with respect to that neuron's activity

g_{I}={\frac {\partial L}{\partial x_{I}}}

(10)

From the biological perspective one can think about $g_{I}$ as an axonal output of the neuron $I$ . In the simplest case, when the Lagrangian is additive for different neurons, this definition results in the activation that is a non-linear function of that neuron's activity. For non-additive Lagrangians this activation function can depend on the activities of a group of neurons. For instance, it can contain contrastive (softmax) or divisive normalization. The dynamical equations describing temporal evolution of a given neuron are given by^[46]

\tau _{I}{\frac {dx_{I}}{dt}}=\sum \limits _{J=1}^{N}W_{IJ}g_{J}-x_{I}

(11)

This equation belongs to the class of models called firing rate models in neuroscience. Each neuron $I$ collects the axonal outputs $g_{J}$ from all the neurons, weights them with the synaptic coefficients $W_{IJ}$ and produces its own time-dependent activity $x_{I}$ . The temporal evolution has a time constant $\tau _{I}$ , which in general can be different for every neuron. This network has a global energy function^[46]

E=\sum \limits _{I=1}^{N}x_{I}g_{I}-L-{\frac {1}{2}}\sum \limits _{I,J=1}^{N}g_{I}W_{IJ}g_{J}

(12)

where the first two terms represent the Legendre transform of the Lagrangian function with respect to the neurons' currents $x_{I}$ . The temporal derivative of this energy function can be computed on the dynamical trajectories leading to (see ^[46] for details)

{\frac {dE}{dt}}=-\sum \limits _{I,K=1}^{N}{\frac {dx_{I}}{dt}}M_{IK}{\frac {dx_{K}}{dt}}\leq 0,\ \ \ \ {\text{where}}\ \ \ \ M_{IK}=\tau _{I}{\frac {\partial ^{2}L}{\partial x_{I}\partial x_{K}}}

(13)

The last inequality sign holds provided that the matrix $M_{IK}$ (or its symmetric part) is positive semi-definite. If, in addition to this, the energy function is bounded from below the non-linear dynamical equations are guaranteed to converge to a fixed point attractor state. The advantage of formulating this network in terms of the Lagrangian functions is that it makes it possible to easily experiment with different choices of the activation functions and different architectural arrangements of neurons. For all those flexible choices the conditions of convergence are determined by the properties of the matrix $M_{IJ}$ and the existence of the lower bound on the energy function.

Hierarchical associative memory network

The neurons can be organized in layers so that every neuron in a given layer has the same activation function and the same dynamic time scale. If we assume that there are no horizontal connections between the neurons within the layer (lateral connections) and there are no skip-layer connections, the general fully connected network (11), (12) reduces to the architecture shown in Fig.4. It has $N_{\text{layer}}$ layers of recurrently connected neurons with the states described by continuous variables $x_{i}^{A}$ and the activation functions $g_{i}^{A}$ , index $A$ enumerates the layers of the network, and index $i$ enumerates individual neurons in that layer. The activation functions can depend on the activities of all the neurons in the layer. Every layer can have a different number of neurons $N_{A}$ . These neurons are recurrently connected with the neurons in the preceding and the subsequent layers. The matrices of weights that connect neurons in layers $A$ and $B$ are denoted by $\xi _{ij}^{(A,B)}$ (the order of the upper indices for weights is the same as the order of the lower indices, in the example above this means that the index $i$ enumerates neurons in the layer $A$ , and index $j$ enumerates neurons in the layer $B$ ). The feedforward weights and the feedback weights are equal. The dynamical equations for the neurons' states can be written as^[46]

\tau _{A}{\frac {dx_{i}^{A}}{dt}}=\sum \limits _{j=1}^{N_{A-1}}\xi _{ij}^{(A,A-1)}g_{j}^{A-1}+\sum \limits _{j=1}^{N_{A+1}}\xi _{ij}^{(A,A+1)}g_{j}^{A+1}-x_{i}^{A}

(14)

with boundary conditions

g_{i}^{0}=0,\ \ \ \ \ {\text{and}}\ \ \ \ \ g_{i}^{N_{\text{layer}}+1}=0

(15)

The main difference between these equations and those from the conventional feedforward networks is the presence of the second term, which is responsible for the feedback from higher layers. These top-down signals help neurons in lower layers to decide on their response to the presented stimuli. Following the general recipe it is convenient to introduce a Lagrangian function $L^{A}(\{x_{i}^{A}\})$ for the $A$ -th hidden layer, which depends on the activities of all the neurons in that layer.^[46] The activation functions in that layer can be defined as partial derivatives of the Lagrangian

g_{i}^{A}={\frac {\partial L^{A}}{\partial x_{i}^{A}}}

(16)

With these definitions the energy (Lyapunov) function is given by^[46]

E=\sum \limits _{A=1}^{N_{\text{layer}}}{\Big [}\sum \limits _{i=1}^{N_{A}}x_{i}^{A}g_{i}^{A}-L^{A}{\Big ]}-\sum \limits _{A=1}^{N_{\text{layer}}-1}\sum \limits _{i=1}^{N_{A+1}}\sum \limits _{j=1}^{N_{A}}g_{i}^{A+1}\xi _{ij}^{(A+1,A)}g_{j}^{A}

(17)

If the Lagrangian functions, or equivalently the activation functions, are chosen in such a way that the Hessians for each layer are positive semi-definite and the overall energy is bounded from below, this system is guaranteed to converge to a fixed point attractor state. The temporal derivative of this energy function is given by^[46]

{\frac {dE}{dt}}=-\sum \limits _{A=1}^{N_{\text{layer}}}\tau _{A}\sum \limits _{i,j=1}^{N_{A}}{\frac {dx_{j}^{A}}{dt}}{\frac {\partial ^{2}L^{A}}{\partial x_{j}^{A}\partial x_{i}^{A}}}{\frac {dx_{i}^{A}}{dt}}\leq 0

(18)

Thus, the hierarchical layered network is indeed an attractor network with the global energy function. This network is described by a hierarchical set of synaptic weights that can be learned for each specific problem.

References

^ F. Rosenblatt, "Perceptual Generalization over Transformation Groups", pp. 63--100 in Self-organizing Systems: Proceedings of an Inter-disciplinary Conference, 5 and 6 May 1959. Edited by Marshall C. Yovitz and Scott Cameron. London, New York, [etc.], Pergamon Press, 1960. ix, 322 p.
^ Rosenblatt, Frank (1961-03-15). DTIC AD0256582: PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS. Defense Technical Information Center.
^ W. K. Taylor, 1956. Electrical simulation of some nervous system functional activities. Information Theory 3, E. C. Cherry (ed.), pp. 314-328. London: Butterworths.
^ Eulogy: 1917 Karl Steinbuch 2005, by Bernard Widrow, Reiner Hartenstein, Robert Hecht-Nielsen, IEEE Computational Intelligence Society. page 5. August 2005.
^ Steinbuch, K. (1961-01-01). "Die Lernmatrix". Kybernetik (in German). 1 (1): 36–45. doi:10.1007/BF00293853. ISSN 1432-0770.
^ Steinbuch, Karl (1961). Automat und Mensch: über menschliche und maschinelle Intelligenz. Berlin: Springer. ISBN 978-3-642-53168-2. OL 27019478M.
^ Steinbuch, K.; Piske, U. A. W. (December 1963). "Learning matrices and their applications". IEEE Transactions on Electronic Computers. EC-12 (6): 846–862. doi:10.1109/PGEC.1963.263588. ISSN 0367-7508.
^ Willshaw, D. J.; Buneman, O. P.; Longuet-Higgins, H. C. (June 1969). "Non-Holographic Associative Memory". Nature. 222 (5197): 960–962. Bibcode:1969Natur.222..960W. doi:10.1038/222960a0. ISSN 0028-0836. PMID 5789326.
^ Kohonen, T. (April 1974). "An Adaptive Associative Memory Principle". IEEE Transactions on Computers. C-23 (4): 444–445. doi:10.1109/T-C.1974.223960. ISSN 0018-9340.
^ Glauber, Roy J. (February 1963). "Roy J. Glauber "Time-Dependent Statistics of the Ising Model"". Journal of Mathematical Physics. 4 (2): 294–307. doi:10.1063/1.1703954. Retrieved 2021-03-21.
^ Nakano, Kaoru (1971). "Learning Process in a Model of Associative Memory". Pattern Recognition and Machine Learning. pp. 172–186. doi:10.1007/978-1-4615-7566-5_15. ISBN 978-1-4615-7568-9.
^ Nakano, Kaoru (1972). "Associatron-A Model of Associative Memory". IEEE Transactions on Systems, Man, and Cybernetics. SMC-2 (3): 380–388. doi:10.1109/TSMC.1972.4309133.
^ Amari, Shun-Ichi (1972). "Learning patterns and pattern sequences by self-organizing nets of threshold elements". IEEE Transactions. C (21): 1197–1206.
^ Little, W. A. (1974). "The Existence of Persistent States in the Brain". Mathematical Biosciences. 19 (1–2): 101–120. doi:10.1016/0025-5564(74)90031-5.
^ Carpenter, Gail A (1989-01-01). "Neural network models for pattern recognition and associative memory". Neural Networks. 2 (4): 243–257. doi:10.1016/0893-6080(89)90035-X. ISSN 0893-6080.
^ Cowan, Jack D. (January 1990). "Discussion: McCulloch-Pitts and related neural nets from 1943 to 1989". Bulletin of Mathematical Biology. 52 (1–2): 73–97. doi:10.1007/BF02459569. ISSN 0092-8240.
^ Sherrington, David; Kirkpatrick, Scott (1975-12-29). "Solvable Model of a Spin-Glass". Physical Review Letters. 35 (26): 1792–1796. Bibcode:1975PhRvL..35.1792S. doi:10.1103/PhysRevLett.35.1792. ISSN 0031-9007.
^ ^a ^b ^c ^d ^e ^f ^g Hopfield, J. J. (1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi:10.1073/pnas.79.8.2554. PMC 346238. PMID 6953413.
^ ^a ^b ^c ^d ^e ^f ^g Hopfield, J. J. (1984). "Neurons with graded response have collective computational properties like those of two-state neurons". Proceedings of the National Academy of Sciences. 81 (10): 3088–3092. Bibcode:1984PNAS...81.3088H. doi:10.1073/pnas.81.10.3088. PMC 345226. PMID 6587342.
^ Engel, A.; Broeck, C. van den (2001). Statistical mechanics of learning. Cambridge, UK; New York, NY: Cambridge University Press. ISBN 978-0-521-77307-2.
^ Seung, H. S.; Sompolinsky, H.; Tishby, N. (1992-04-01). "Statistical mechanics of learning from examples". Physical Review A. 45 (8): 6056–6091. Bibcode:1992PhRvA..45.6056S. doi:10.1103/PhysRevA.45.6056. PMID 9907706.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Krotov, Dmitry; Hopfield, John (2016). "Dense Associative Memory for Pattern Recognition". Neural Information Processing Systems. 29: 1172–1180. arXiv:1606.01164.
^ ^a ^b ^c ^d Mete, Demircigil; et al. (2017). "On a model of associative memory with huge storage capacity". Journal of Statistical Physics. 168 (2): 288–299. arXiv:1702.01929. Bibcode:2017JSP...168..288D. doi:10.1007/s10955-017-1806-y. S2CID 119317128.
^ ^a ^b ^c ^d ^e Ramsauer, Hubert; et al. (2021). "Hopfield Networks is All You Need". International Conference on Learning Representations. arXiv:2008.02217.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Krotov, Dmitry; Hopfield, John (2021). "Large associative memory problem in neurobiology and machine learning". International Conference on Learning Representations. arXiv:2008.06996.
^ Hopfield, J. J. (1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi:10.1073/pnas.79.8.2554. PMC 346238. PMID 6953413.
^ MacKay, David J. C. (2003). "42. Hopfield Networks". Information Theory, Inference and Learning Algorithms. Cambridge University Press. p. 508. ISBN 978-0521642989. This convergence proof depends crucially on the fact that the Hopfield network's connections are symmetric. It also depends on the updates being made asynchronously.
^ ^a ^b ^c Bruck, J. (October 1990). "On the convergence properties of the Hopfield model". Proc. IEEE. 78 (10): 1579–85. doi:10.1109/5.58341.
^ ^a ^b ^c ^d ^e Uykan, Z. (September 2020). "On the Working Principle of the Hopfield Neural Networks and its Equivalence to the GADIA in Optimization". IEEE Transactions on Neural Networks and Learning Systems. 31 (9): 3294–3304. doi:10.1109/TNNLS.2019.2940920. PMID 31603804. S2CID 204331533.
^ Uykan, Z. (March 2021). "Shadow-Cuts Minimization/Maximization and Complex Hopfield Neural Networks". IEEE Transactions on Neural Networks and Learning Systems. 32 (3): 1096–1109. doi:10.1109/TNNLS.2020.2980237. PMID 32310787. S2CID 216047831.
^ Hopfield, J.J.; Tank, D.W. (1985). "Neural computation of decisions in optimization problems". Biological Cybernetics. 52 (3): 141–6. doi:10.1007/BF00339943. PMID 4027280. S2CID 36483354.
^ Bruck, Jehoshua; Goodman, Joseph W (1990-06-01). "On the power of neural networks for solving hard problems". Journal of Complexity. 6 (2): 129–135. doi:10.1016/0885-064X(90)90001-T. ISSN 0885-064X.
^ ^a ^b Storkey, A.J.; Valabregue, R. (1999). "The basins of attraction of a new Hopfield learning rule". Neural Networks. 12 (6): 869–876. CiteSeerX 10.1.1.19.4681. doi:10.1016/S0893-6080(99)00038-6. PMID 12662662.
^ Hebb 1949
^ Storkey, Amos (1997). "Increasing the capacity of a Hopfield network without sacrificing functionality". Artificial Neural Networks – ICANN'97. Lecture Notes in Computer Science. Vol. 1327. Springer. pp. 451–6. CiteSeerX 10.1.1.33.103. doi:10.1007/BFb0020196. ISBN 978-3-540-69620-9.
^ ^a ^b Hertz 1991
^ Bruck, J.; Roychowdhury, V.P. (1990). "On the number of spurious memories in the Hopfield model (neural network)". IEEE Transactions on Information Theory. 36 (2): 393–397. doi:10.1109/18.52486.
^ Liou, C.-Y.; Lin, S.-L. (2006). "Finite memory loading in hairy neurons" (PDF). Natural Computing. 5 (1): 15–42. doi:10.1007/s11047-004-5490-x. S2CID 35025761.
^ Liou, C.-Y.; Yuan, S.-K. (1999). "Error Tolerant Associative Memory". Biological Cybernetics. 81 (4): 331–342. doi:10.1007/s004220050566. PMID 10541936. S2CID 6168346.
^ Yuan, S.-K. (June 1997). Expanding basins of attraction of the associative memory (Master thesis). National Taiwan University. 991010725609704786.
^ ABOUDIB, Ala; GRIPON, Vincent; JIANG, Xiaoran (2014). "A study of retrieval algorithms of sparse messages in networks of neural cliques". COGNITIVE 2014 : The 6th International Conference on Advanced Cognitive Technologies and Applications. pp. 140–6. arXiv:1308.4506. Bibcode:2013arXiv1308.4506A.
^ Amit, D.J. (1992). Modeling Brain Function: The World of Attractor Neural Networks. Cambridge University Press. ISBN 978-0-521-42124-9.
^ Rolls, Edmund T. (2016). Cerebral Cortex: Principles of Operation. Oxford University Press. ISBN 978-0-19-878485-2.
^ ^a ^b Horn, D; Usher, M (1988). "Capacities of multiconnected memory models". J. Phys. France. 49 (3): 389–395. doi:10.1051/jphys:01988004903038900.
^ ^a ^b ^c Burns, Thomas; Fukai, Tomoki (2023). "Simplicial Hopfield networks". International Conference on Learning Representations. 11. arXiv:2305.05179.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Krotov, Dmitry (2021). "Hierarchical Associative Memory". arXiv:2107.06446 [cs.NE].

Hebb, D.O. (2005) [1949]. The Organization of Behavior: A Neuropsychological Theory. Psychology Press. ISBN 978-1-135-63190-1.
Hertz, John A. (2018) [1991]. Introduction To The Theory Of Neural Computation. CRC Press. ISBN 978-0-429-96821-1.
McCulloch, W.S.; Pitts, W.H. (1943). "A logical calculus of the ideas immanent in nervous activity". Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.
Polyn, S.M.; Kahana, M.J. (2008). "Memory search and the neural representation of context". Trends in Cognitive Sciences. 12 (1): 24–30. doi:10.1016/j.tics.2007.10.010. PMC 2839453. PMID 18069046.
Rizzuto, D.S.; Kahana, M.J. (2001). "An autoassociative neural network model of paired-associate learning". Neural Computation. 13 (9): 2075–2092. CiteSeerX 10.1.1.45.7929. doi:10.1162/089976601750399317. PMID 11516358. S2CID 7675117.
Kruse, Rudolf; Borgelt, Christian; Klawonn, Frank; Moewes, Christian; Steinbrecher, Matthias; Held, Pascal (2013). Computational Intelligence: A Methodological Introduction. Springer. ISBN 978-1-4471-5013-8.

External links

Rojas, Raul (12 July 1996). "13. The Hopfield model" (PDF). Neural Networks – A Systematic Introduction. Springer. ISBN 978-3-540-60505-8.
Hopfield Network Javascript
The Travelling Salesman Problem Archived 2015-05-30 at the Wayback Machine – Hopfield Neural Network JAVA Applet
Hopfield, John (2007). "Hopfield network". Scholarpedia. 2 (5): 1977. Bibcode:2007SchpJ...2.1977H. doi:10.4249/scholarpedia.1977.
"Don't Forget About Associative Memories". The Gradient. November 7, 2020. Retrieved September 27, 2024.
Fletcher, Tristan. "Hopfield Network Learning Using Deterministic Latent Variables" (PDF) (Tutorial). Archived from the original (PDF) on 2011-10-05.

[1] F. Rosenblatt, "Perceptual Generalization over Transformation Groups", pp. 63--100 in Self-organizing Systems: Proceedings of an Inter-disciplinary Conference, 5 and 6 May 1959. Edited by Marshall C. Yovitz and Scott Cameron. London, New York, [etc.], Pergamon Press, 1960. ix, 322 p.

[:12-2] Rosenblatt, Frank (1961-03-15). DTIC AD0256582: PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS. Defense Technical Information Center.

[3] W. K. Taylor, 1956. Electrical simulation of some nervous system functional activities. Information Theory 3, E. C. Cherry (ed.), pp. 314-328. London: Butterworths.

[4] Eulogy: 1917 Karl Steinbuch 2005, by Bernard Widrow, Reiner Hartenstein, Robert Hecht-Nielsen, IEEE Computational Intelligence Society. page 5. August 2005.

[5] Steinbuch, K. (1961-01-01). "Die Lernmatrix". Kybernetik (in German). 1 (1): 36–45. doi:10.1007/BF00293853. ISSN 1432-0770.

[6] Steinbuch, Karl (1961). Automat und Mensch: über menschliche und maschinelle Intelligenz. Berlin: Springer. ISBN 978-3-642-53168-2. OL 27019478M.

[7] Steinbuch, K.; Piske, U. A. W. (December 1963). "Learning matrices and their applications". IEEE Transactions on Electronic Computers. EC-12 (6): 846–862. doi:10.1109/PGEC.1963.263588. ISSN 0367-7508.

[8] Willshaw, D. J.; Buneman, O. P.; Longuet-Higgins, H. C. (June 1969). "Non-Holographic Associative Memory". Nature. 222 (5197): 960–962. Bibcode:1969Natur.222..960W. doi:10.1038/222960a0. ISSN 0028-0836. PMID 5789326.

[9] Kohonen, T. (April 1974). "An Adaptive Associative Memory Principle". IEEE Transactions on Computers. C-23 (4): 444–445. doi:10.1109/T-C.1974.223960. ISSN 0018-9340.

[:22-10] Glauber, Roy J. (February 1963). "Roy J. Glauber "Time-Dependent Statistics of the Ising Model"". Journal of Mathematical Physics. 4 (2): 294–307. doi:10.1063/1.1703954. Retrieved 2021-03-21.

[Nakano1971-11] Nakano, Kaoru (1971). "Learning Process in a Model of Associative Memory". Pattern Recognition and Machine Learning. pp. 172–186. doi:10.1007/978-1-4615-7566-5_15. ISBN 978-1-4615-7568-9.

[Nakano1972-12] Nakano, Kaoru (1972). "Associatron-A Model of Associative Memory". IEEE Transactions on Systems, Man, and Cybernetics. SMC-2 (3): 380–388. doi:10.1109/TSMC.1972.4309133.

[Amari1972-13] Amari, Shun-Ichi (1972). "Learning patterns and pattern sequences by self-organizing nets of threshold elements". IEEE Transactions. C (21): 1197–1206.

[little74-14] Little, W. A. (1974). "The Existence of Persistent States in the Brain". Mathematical Biosciences. 19 (1–2): 101–120. doi:10.1016/0025-5564(74)90031-5.

[15] Carpenter, Gail A (1989-01-01). "Neural network models for pattern recognition and associative memory". Neural Networks. 2 (4): 243–257. doi:10.1016/0893-6080(89)90035-X. ISSN 0893-6080.

[16] Cowan, Jack D. (January 1990). "Discussion: McCulloch-Pitts and related neural nets from 1943 to 1989". Bulletin of Mathematical Biology. 52 (1–2): 73–97. doi:10.1007/BF02459569. ISSN 0092-8240.

[17] Sherrington, David; Kirkpatrick, Scott (1975-12-29). "Solvable Model of a Spin-Glass". Physical Review Letters. 35 (26): 1792–1796. Bibcode:1975PhRvL..35.1792S. doi:10.1103/PhysRevLett.35.1792. ISSN 0031-9007.

[Hopfield1982-18] ^ ^a ^b ^c ^d ^e ^f ^g Hopfield, J. J. (1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi:10.1073/pnas.79.8.2554. PMC 346238. PMID 6953413.

[:0-19] ^ ^a ^b ^c ^d ^e ^f ^g Hopfield, J. J. (1984). "Neurons with graded response have collective computational properties like those of two-state neurons". Proceedings of the National Academy of Sciences. 81 (10): 3088–3092. Bibcode:1984PNAS...81.3088H. doi:10.1073/pnas.81.10.3088. PMC 345226. PMID 6587342.

[20] Engel, A.; Broeck, C. van den (2001). Statistical mechanics of learning. Cambridge, UK; New York, NY: Cambridge University Press. ISBN 978-0-521-77307-2.

[21] Seung, H. S.; Sompolinsky, H.; Tishby, N. (1992-04-01). "Statistical mechanics of learning from examples". Physical Review A. 45 (8): 6056–6091. Bibcode:1992PhRvA..45.6056S. doi:10.1103/PhysRevA.45.6056. PMID 9907706.

[:1-22] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Krotov, Dmitry; Hopfield, John (2016). "Dense Associative Memory for Pattern Recognition". Neural Information Processing Systems. 29: 1172–1180. arXiv:1606.01164.

[:2-23] Mete, Demircigil; et al. (2017). "On a model of associative memory with huge storage capacity". Journal of Statistical Physics. 168 (2): 288–299. arXiv:1702.01929. Bibcode:2017JSP...168..288D. doi:10.1007/s10955-017-1806-y. S2CID 119317128.

[:3-24] Ramsauer, Hubert; et al. (2021). "Hopfield Networks is All You Need". International Conference on Learning Representations. arXiv:2008.02217.

[:4-25] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Krotov, Dmitry; Hopfield, John (2021). "Large associative memory problem in neurobiology and machine learning". International Conference on Learning Representations. arXiv:2008.06996.

[26] Hopfield, J. J. (1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi:10.1073/pnas.79.8.2554. PMC 346238. PMID 6953413.

[27] MacKay, David J. C. (2003). "42. Hopfield Networks". Information Theory, Inference and Learning Algorithms. Cambridge University Press. p. 508. ISBN 978-0521642989. This convergence proof depends crucially on the fact that the Hopfield network's connections are symmetric. It also depends on the updates being made asynchronously.

[Bruck1990-28] Bruck, J. (October 1990). "On the convergence properties of the Hopfield model". Proc. IEEE. 78 (10): 1579–85. doi:10.1109/5.58341.

[Uykan2019-29] Uykan, Z. (September 2020). "On the Working Principle of the Hopfield Neural Networks and its Equivalence to the GADIA in Optimization". IEEE Transactions on Neural Networks and Learning Systems. 31 (9): 3294–3304. doi:10.1109/TNNLS.2019.2940920. PMID 31603804. S2CID 204331533.

[Uykan2020-30] Uykan, Z. (March 2021). "Shadow-Cuts Minimization/Maximization and Complex Hopfield Neural Networks". IEEE Transactions on Neural Networks and Learning Systems. 32 (3): 1096–1109. doi:10.1109/TNNLS.2020.2980237. PMID 32310787. S2CID 216047831.

[hopfieldtank85-31] Hopfield, J.J.; Tank, D.W. (1985). "Neural computation of decisions in optimization problems". Biological Cybernetics. 52 (3): 141–6. doi:10.1007/BF00339943. PMID 4027280. S2CID 36483354.

[32] Bruck, Jehoshua; Goodman, Joseph W (1990-06-01). "On the power of neural networks for solving hard problems". Journal of Complexity. 6 (2): 129–135. doi:10.1016/0885-064X(90)90001-T. ISSN 0885-064X.

[storkey1991basins-33] Storkey, A.J.; Valabregue, R. (1999). "The basins of attraction of a new Hopfield learning rule". Neural Networks. 12 (6): 869–876. CiteSeerX 10.1.1.19.4681. doi:10.1016/S0893-6080(99)00038-6. PMID 12662662.

[34] Hebb 1949

[storkey1997-35] Storkey, Amos (1997). "Increasing the capacity of a Hopfield network without sacrificing functionality". Artificial Neural Networks – ICANN'97. Lecture Notes in Computer Science. Vol. 1327. Springer. pp. 451–6. CiteSeerX 10.1.1.33.103. doi:10.1007/BFb0020196. ISBN 978-3-540-69620-9.

[hertz1991neural-36] Hertz 1991

[37] Bruck, J.; Roychowdhury, V.P. (1990). "On the number of spurious memories in the Hopfield model (neural network)". IEEE Transactions on Information Theory. 36 (2): 393–397. doi:10.1109/18.52486.

[38] Liou, C.-Y.; Lin, S.-L. (2006). "Finite memory loading in hairy neurons" (PDF). Natural Computing. 5 (1): 15–42. doi:10.1007/s11047-004-5490-x. S2CID 35025761.

[39] Liou, C.-Y.; Yuan, S.-K. (1999). "Error Tolerant Associative Memory". Biological Cybernetics. 81 (4): 331–342. doi:10.1007/s004220050566. PMID 10541936. S2CID 6168346.

[40] Yuan, S.-K. (June 1997). Expanding basins of attraction of the associative memory (Master thesis). National Taiwan University. 991010725609704786.

[41] ABOUDIB, Ala; GRIPON, Vincent; JIANG, Xiaoran (2014). "A study of retrieval algorithms of sparse messages in networks of neural cliques". COGNITIVE 2014 : The 6th International Conference on Advanced Cognitive Technologies and Applications. pp. 140–6. arXiv:1308.4506. Bibcode:2013arXiv1308.4506A.

[42] Amit, D.J. (1992). Modeling Brain Function: The World of Attractor Neural Networks. Cambridge University Press. ISBN 978-0-521-42124-9.

[43] Rolls, Edmund T. (2016). Cerebral Cortex: Principles of Operation. Oxford University Press. ISBN 978-0-19-878485-2.

[Horn-Usher-44] Horn, D; Usher, M (1988). "Capacities of multiconnected memory models". J. Phys. France. 49 (3): 389–395. doi:10.1051/jphys:01988004903038900.

[Burns-Fukai-45] Burns, Thomas; Fukai, Tomoki (2023). "Simplicial Hopfield networks". International Conference on Learning Representations. 11. arXiv:2305.05179.

[:5-46] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Krotov, Dmitry (2021). "Hierarchical Associative Memory". arXiv:2107.06446 [cs.NE].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

v t e Differentiable computing
General	Differentiable programming Information geometry Statistical manifold Automatic differentiation Neuromorphic computing Pattern recognition Ricci calculus Computational learning theory Inductive bias
Hardware	IPU TPU VPU Memristor SpiNNaker
Software libraries	TensorFlow PyTorch Keras scikit-learn Theano JAX Flux.jl MindSpore
Portals Computer programming Technology

v t e Stochastic processes
Discrete time	Bernoulli process Branching process Chinese restaurant process Galton–Watson process Independent and identically distributed random variables Markov chain Moran process Random walk Loop-erased Self-avoiding Biased Maximal entropy
Continuous time	Additive process Bessel process Birth–death process pure birth Brownian motion Bridge Excursion Fractional Geometric Meander Cauchy process Contact process Continuous-time random walk Cox process Diffusion process Dyson Brownian motion Empirical process Feller process Fleming–Viot process Gamma process Geometric process Hawkes process Hunt process Interacting particle systems Itô diffusion Itô process Jump diffusion Jump process Lévy process Local time Markov additive process McKean–Vlasov process Ornstein–Uhlenbeck process Poisson process Compound Non-homogeneous Schramm–Loewner evolution Semimartingale Sigma-martingale Stable process Superprocess Telegraph process Variance gamma process Wiener process Wiener sausage
Both	Branching process Gaussian process Hidden Markov model (HMM) Markov process Martingale Differences Local Sub- Super- Random dynamical system Regenerative process Renewal process Stochastic chains with memory of variable length White noise
Fields and other	Dirichlet process Gaussian random field Gibbs measure Hopfield model Ising model Potts model Boolean network Markov random field Percolation Pitman–Yor process Point process Cox Poisson Random field Random graph
Time series models	Autoregressive conditional heteroskedasticity (ARCH) model Autoregressive integrated moving average (ARIMA) model Autoregressive (AR) model Autoregressive–moving-average (ARMA) model Generalized autoregressive conditional heteroskedasticity (GARCH) model Moving-average (MA) model
Financial models	Binomial options pricing model Black–Derman–Toy Black–Karasinski Black–Scholes Chan–Karolyi–Longstaff–Sanders (CKLS) Chen Constant elasticity of variance (CEV) Cox–Ingersoll–Ross (CIR) Garman–Kohlhagen Heath–Jarrow–Morton (HJM) Heston Ho–Lee Hull–White Korn-Kreer-Lenssen LIBOR market Rendleman–Bartter SABR volatility Vašíček Wilkie
Actuarial models	Bühlmann Cramér–Lundberg Risk process Sparre–Anderson
Queueing models	Bulk Fluid Generalized queueing network M/G/1 M/M/1 M/M/c
Properties	Càdlàg paths Continuous Continuous paths Ergodic Exchangeable Feller-continuous Gauss–Markov Markov Mixing Piecewise-deterministic Predictable Progressively measurable Self-similar Stationary Time-reversible
Limit theorems	Central limit theorem Donsker's theorem Doob's martingale convergence theorems Ergodic theorem Fisher–Tippett–Gnedenko theorem Large deviation principle Law of large numbers (weak/strong) Law of the iterated logarithm Maximal ergodic theorem Sanov's theorem Zero–one laws (Blumenthal, Borel–Cantelli, Engelbert–Schmidt, Hewitt–Savage, Kolmogorov, Lévy)
Inequalities	Burkholder–Davis–Gundy Doob's martingale Doob's upcrossing Kunita–Watanabe Marcinkiewicz–Zygmund
Tools	Cameron–Martin formula Convergence of random variables Doléans-Dade exponential Doob decomposition theorem Doob–Meyer decomposition theorem Doob's optional stopping theorem Dynkin's formula Feynman–Kac formula Filtration Girsanov theorem Infinitesimal generator Itô integral Itô's lemma Karhunen–Loève theorem Kolmogorov continuity theorem Kolmogorov extension theorem Lévy–Prokhorov metric Malliavin calculus Martingale representation theorem Optional stopping theorem Prokhorov's theorem Quadratic variation Reflection principle Skorokhod integral Skorokhod's representation theorem Skorokhod space Snell envelope Stochastic differential equation Tanaka Stopping time Stratonovich integral Uniform integrability Usual hypotheses Wiener space Classical Abstract
Disciplines	Actuarial mathematics Control theory Econometrics Ergodic theory Extreme value theory (EVT) Large deviations theory Mathematical finance Mathematical statistics Probability theory Queueing theory Renewal theory Ruin theory Signal processing Statistics Stochastic analysis Time series analysis Machine learning
List of topics Category