Most of this a self-teaching partial summary of Silver 2010, *Neuronal arithmetic *(link). The article concerns itself mostly with modulatory inputs (coming from somewhere in the network) and how they can change behavior of a neuron. And the behavior here is really the pattern of response to different, stronger driving inputs. As Spratling 2014 *(A single functional model of drivers and modulators in cortex)* helpfully explains, *a** distinction is commonly made between synaptic connections capable of evoking a response (“drivers”) and those that can alter ongoing activity but not initiate it (“modulators”)*.

The most obvious operation is addition. A neuron receives input currents from its neighbors, these currents increase the membrane potential — the effects add up — we get spikes, and when the electrical input is sustained, these spikes will repeat at some rate. So we can say that inputs are added, and the outcome is reflected in firing rate.

Put it another way. Any modulating excitatory current makes it easier for the neuron to reach its threshold by increasing its potential. When also enough of driving inputs coincide, the action potential can actually be reached. We can thus see why a neuron is a **coincidence detector** for its presynaptic (inputting) neighbors (although strictly it does not produce arithmetic, it only works because of addition).

Subtraction only makes sense, I think, when we look at **modulating inhibitory** inputs. We cannot, by definition, get a spike by inhibiting/depressing the neuron: on the contrary, we are making reaching the action potential harder. We are muting the cell.

Let’s say that we do not have particularly well-crafted connections for the neuron and its inputs are more like a stochastic noise doing some excitation and some inhibition. This seems to be more realistic. Then by skewing the noise up (making it more probable to be excitatory in each given tiny modicum of time) we modulate the cell’s potential additively, and by skewing the noise down — subtractively.

And which is most interesting, by decreasing the noise overall we multiply the neuron’s response. Why is that? The output firing rate of the neuron goes up, proportionally to its inputs, because it is less probable that in any given time inhibitory noise happens to overwhelm the excitatory driving input. Thus we do not really excite the cell more, we’re just making it easier for any current to drive postsynaptic firing. We multiply the response. (Compare fig. 2c, bottom (output spike probability) with 1e (output rate) in Silver 2010.)

The article mentions that we increase the neuron’s gain (which is related to amplification in electronics) when multiplying. This connection between neural gain and multiplication has further significance.

And finally, how to divide (decrease proportionally without subtracting) the response of a neuron? We just put some different, parallel way for the driving current to pass — a shunt. According the Ohm’s law, some of the current that would be otherwise “available” for the neuron would go some other way. Thus we get a proportional decrease.

]]>Edit distance is a problem I frequently encounter. It is just a number measuring how many rudimentary operations on characters (insertions, replacements and deletions) you need to transform one string of text into another. In this form it’s called the Levenshtein distance, or Levenshtein-Damerau if you add transpositions (making *apple => aplpe *counting as one operation). You may need that for error correction, for example, to know what possible replacements are most similar to the incorrect word.

Levenshtein-Damerau distance is used in the popular Peter Norvig’s post on spelling correction. His implementation is nice in being very explicit: you can literally see Python code producing all possible variations so we can then check whether some other word belongs to that set. The problem is that, obviously, this space quickly becomes enormous. If we have have just a 24-letter alphabet on 5-character word, you have 5*24 possible replacements, 6*24 possible insertions, 4 transpositions and 6 deletions. That’s 274 possibilities for the Levenshtein distance *d = 1* (not a big problem for a computer), and roughly speaking this number is taken to the power of *d*. For *d = 3*, there exist around 20.5 million possibilities (less because of duplicates, more because at this point we are already expanding words that are *5+(d-1)=7* characters long).

There are slightly smarter ways to do this, but they cannot really circumvent the number of transformations that *could *be applied. (Reasons why humans can judge similarity between two words without descending into madness are left out for today). With NLP problems I often have to check distance between many, many pairs of strings, and it would be nice if computers at least knew when to give up. If the texts are farther apart than, say, three, we could stop drowning in the swamp of exponentially bigger numbers. Put a cutoff on distance, we could say.

There exists a concept of Levenshtein automatons, which let us check whether any two words are within given distance from each other. This would be it, but apparently, as the internet says, the algorithm is very arcane. I’d rather not dive into algorithmic studies tangential to my work. But it is surprisingly hard to get a ready-made solution with this functionality in Python, which is after all a fairly popular language.

At least I found a blog post by Jules Jacobs claiming that “Levenshtein automata can be simple and fast”. What he does is implementing the dynamic programming algorithm for calculating distance, which gives us a sort of an automaton for recognizing strings that are within some prescribed distance from the original string. This is almost what I wanted, if it only worked in Python 3.0+ and also gave me a way to get the actual distance and not only binary information “match/no match”. So I went ahead and introduced some minor modifications to make it work. Again, the code is here.

Here’s how to use it.

from levenshtein import LevenshteinAutomaton max_distance = 3 base_str = 'base_string' other_strs = ['base_trsing', 'apple'] aut = LevenshteinAutomaton(base_str, max_distance) for s2 in other_strs: # reuse the automaton for all possible matches state = aut.start() # state of the automaton for c in s2: state = aut.step(state, c) if not aut.can_match(state): # break if the distance is bigger than we want break if aut.is_match(state): print('{} is a match with distance {}!'.format(s2, aut.distance(state)))

Or, just use the function (the overhead of creating a new automaton under the hood each time is rather trivial):

from levenshtein import distance_within distance_within('base_string', 'base_trsing', 2) # 2 distance_within('base_string', 'apple', 2) # False

The only problem is that it **does not support transpositions**. Currently I lack ideas for adding them in an elegant way.

Some additional explanations on how it works, meant as a commentary to Jacobs’ work and Wikipedia. (Very handwavy if you haven’t been introduced to basic algorithms/dynamic programming). The original algorithm, described in Wikipedia, finds distance between strings by recursively finding minimal distances between their shorter and shorter substrings, all the way down to the base case of individual characters. This can be done with removing (ie. putting temporarily on stack) characters either from the ends or from the beginnings of the strings. The C implementation does the former, because it makes handling the `char`

arrays easier; we just decrease their length considered by the function (`len_s`

and `len_t`

).

More efficient algorithms put these partial solutions in a matrix to avoid repeating computation. If the algorithm works from the bottom and finds early that there is no way the distance can be less than *d*, it can stop. We can also throw away earlier rows after we used them and even go sparse, as it is explained in the blog post that I linked. The important part is that finally we get our answer in the bottom right corner of the matrix. This is what my implementation uses to provide the actual computed distance.

When people say “neural networks” nowadays, they usually mean simplified neurons from 1950s: ones which just output a weighted sum of inputs. Experts from the field even discourage thinking of these things in terms of neurons. What you really do is operations on matrices. Operations have to be differentiable, so we can change the parameters in small steps, and eventually arrive to a local minimum of the cost function. Artificial neural nets of this kind are pure mathematics, in the sense that we have very little interest in how possibly they could be reproduced by living organisms.

In psychology and neuroscience, there is a spectrum of more realistic properties that scientists are willing to put in their neurons. I’ve recently played creatively with the dual-route model of reading. It stands out because it uses a hand-crafted and hand-tuned network to integrate two kinds of information: recognizing whole words and sequences of single phonemes, which are then used to decide how the system will “pronounce” the text. (Reproducing by the way various quirks of human reading process). Both these streams of evidence are processed separately, but ultimately they can be joined thanks to using the common “language” of neuronal activations; which are really, of course, numbers crunched by computer. But these neurons perform, for example, lateral inhibition (ie. they compete and try to suppress their peers from the same layer) and feedback to previous neural layers.

(Machine learning neurons in recurrent networks do feedback, but to their own layer and this is not taken into account before the next input arrives. Also, I’m aware of max pooling operation in convolutional networks, where the winner takes all and all neurons but one are “silenced” for the next layer. It looks like an artifact of the time when deep learning researchers cared about cognitive science).

The dual route model belongs to a middle ground where neuronal models make some sense biologically, but don’t go all the way. For instance, the activations here are forced to stay between 0.0 and 1.0. In machine learning we don’t really care, some types of layers can force in practice the [-1.0, 1.0] range, or [0.0, ∞], or almost anything we may desire (as long as there is a derivative function). Biological neurons have their resting potential of membrane around -70 mV and can get excited or inhibited by their peers, but the outside world can only see spikes: events when the neuron becomes so charged that it outflows current by its synapses and plunges back to the resting potential. In this sense neurons are binary. They can either spike or not spike in a given milisecond (or some more minuscule amount of time that interests us). Maybe someday I will talk about how neurons can encode information with these constraints. For now, I’m leaving a link to a neat interactive presentation of various models of neurons.

One unpleasant consequence of having real-time, gradually charging and resetting neurons is that you can’t remember anything. A neuron does not care once it fired; its activation has to be built up all over again, althought it can be fast if there are strong, sustained incoming currents. But these have to come from somewhere. The only obvious mechanism of memory is by synaptic connections between neurons, and strenghts (weights) of these connections. This week, I plan to attempt implementing a simple case of inference mechanism that is thought by some to be universal in our brains’ neocortex. There was thought here put into separating the long term (synaptic) memory and the short term (electrodynamic) one. I’m curious how it will pan out in a proper “biological” neuronal simulator (NEST).

Real neurons are scary. The boundary that I have for “real” neurons is operating in real (or to be fair, almost continuous) time, to a degree which renders reasoning in neat “epochs” useless. But even these neurons are not real, ie. they don’t reflect the complexity of biological computation. The famous Hodgkin-Huxley model (of which I like to think of as the “giant squid model”, which is sadly not strictly accurate) deals with ion channels in cell membrane and dynamics of electrical currents and potentials. This ignores other chemical events happening in cell body, synapses and dendrites, but is already complex enough to be computationally heavy on scale. Thus in modelling most people use “integrate-and-fire” neurons, which just get excited more and more, discharge and then are artificially reset to their resting potential. We don’t bother with mechanism to make neurons really saturate, close their ion channels and return to the resting potential by their internal dynamics.

It is interesting that we probably do not realize which properties of biological neurons are important: that is, where lies the shortcut taken by Nature from dumb binary switches to conscious minds. We probably could build an artificial mind from most of the models discussed (including of course artificial “neurons” used in machine learning), provided that systems built from them have sufficient expressive power. But it could be something possible mostly in principle, just how you could make a brain out of people or could approximate any function with two matrix dot products treated with some non-linearity. It is just an interesting thought to me that somewhere among obscure dynamics of neurons (probably not quantum events though, contrary to what Roger Penrose believes) there may be hidden some property that seriously facilitates cognitive computation. But in the meantime, little facilitation can be seen and people prefer to just do freeform math.

]]>