I did not appreciate just how many kinds of imaginary neurons you can have, and how different are their principles of operation.
When people say “neural networks” nowadays, they usually mean simplified neurons from 1950s: ones which just output a weighted sum of inputs. Experts from the field even discourage thinking of these things in terms of neurons. What you really do is operations on matrices. Operations have to be differentiable, so we can change the parameters in small steps, and eventually arrive to a local minimum of the cost function. Artificial neural nets of this kind are pure mathematics, in the sense that we have very little interest in how possibly they could be reproduced by living organisms.
In psychology and neuroscience, there is a spectrum of more realistic properties that scientists are willing to put in their neurons. I’ve recently played creatively with the dual-route model of reading. It stands out because it uses a hand-crafted and hand-tuned network to integrate two kinds of information: recognizing whole words and sequences of single phonemes, which are then used to decide how the system will “pronounce” the text. (Reproducing by the way various quirks of human reading process). Both these streams of evidence are processed separately, but ultimately they can be joined thanks to using the common “language” of neuronal activations; which are really, of course, numbers crunched by computer. But these neurons perform, for example, lateral inhibition (ie. they compete and try to suppress their peers from the same layer) and feedback to previous neural layers.
(Machine learning neurons in recurrent networks do feedback, but to their own layer and this is not taken into account before the next input arrives. Also, I’m aware of max pooling operation in convolutional networks, where the winner takes all and all neurons but one are “silenced” for the next layer. It looks like an artifact of the time when deep learning researchers cared about cognitive science).
The dual route model belongs to a middle ground where neuronal models make some sense biologically, but don’t go all the way. For instance, the activations here are forced to stay between 0.0 and 1.0. In machine learning we don’t really care, some types of layers can force in practice the [-1.0, 1.0] range, or [0.0, ∞], or almost anything we may desire (as long as there is a derivative function). Biological neurons have their resting potential of membrane around -70 mV and can get excited or inhibited by their peers, but the outside world can only see spikes: events when the neuron becomes so charged that it outflows current by its synapses and plunges back to the resting potential. In this sense neurons are binary. They can either spike or not spike in a given milisecond (or some more minuscule amount of time that interests us). Maybe someday I will talk about how neurons can encode information with these constraints. For now, I’m leaving a link to a neat interactive presentation of various models of neurons.
One unpleasant consequence of having real-time, gradually charging and resetting neurons is that you can’t remember anything. A neuron does not care once it fired; its activation has to be built up all over again, althought it can be fast if there are strong, sustained incoming currents. But these have to come from somewhere. The only obvious mechanism of memory is by synaptic connections between neurons, and strenghts (weights) of these connections. This week, I plan to attempt implementing a simple case of inference mechanism that is thought by some to be universal in our brains’ neocortex. There was thought here put into separating the long term (synaptic) memory and the short term (electrodynamic) one. I’m curious how it will pan out in a proper “biological” neuronal simulator (NEST).
Real neurons are scary. The boundary that I have for “real” neurons is operating in real (or to be fair, almost continuous) time, to a degree which renders reasoning in neat “epochs” useless. But even these neurons are not real, ie. they don’t reflect the complexity of biological computation. The famous Hodgkin-Huxley model (of which I like to think of as the “giant squid model”, which is sadly not strictly accurate) deals with ion channels in cell membrane and dynamics of electrical currents and potentials. This ignores other chemical events happening in cell body, synapses and dendrites, but is already complex enough to be computationally heavy on scale. Thus in modelling most people use “integrate-and-fire” neurons, which just get excited more and more, discharge and then are artificially reset to their resting potential. We don’t bother with mechanism to make neurons really saturate, close their ion channels and return to the resting potential by their internal dynamics.
It is interesting that we probably do not realize which properties of biological neurons are important: that is, where lies the shortcut taken by Nature from dumb binary switches to conscious minds. We probably could build an artificial mind from most of the models discussed (including of course artificial “neurons” used in machine learning), provided that systems built from them have sufficient expressive power. But it could be something possible mostly in principle, just how you could make a brain out of people or could approximate any function with two matrix dot products treated with some non-linearity. It is just an interesting thought to me that somewhere among obscure dynamics of neurons (probably not quantum events though, contrary to what Roger Penrose believes) there may be hidden some property that seriously facilitates cognitive computation. But in the meantime, little facilitation can be seen and people prefer to just do freeform math.