In this post I reflect on how to write readable code processing lists in Python. I am to a large extent influenced by my experience with Lisp, but the goal is not to write Lispy code in Python. This would be perverse. Rather, I am concerned with using tools of Python language to write code that explains itself.

You see, to me good code reads somewhat like natural language. The less cryptic naming, the less random ASCII mazes, the better. This should not be done by stacking layers of very specific syntactic sugar, but is best accomplished by constructs which are simple, versatile and consistent.

That’s why the infamous squint test seems so wrong to me. Patterns like loops and checking conditions have nothing to do with the task that the code performs. They are just trees in the forest, if not just branches or leaves of that trees. You have to read the variable names to see what’s going on and how it fits into the big picture. (Although I do think simple imperative patterns are good for languages that really stick to them, like C or Go, and don’t complicate matters with Byzantine object-oriented syntax. Also excuse me for using the word “imperative” in a loose sense in this post, meaning “not object-oriented or declarative or functional”).

I like list comprehensions in Python as a concept. They allow you to construct lists as things, not operations (the latter even sounds off, and this is what I mean). You write [float(x) for x in list_of_integers] instead of digressing into declaring a new list, appending to it in a loop etc. All this of course happens silently, but without disrupting your flow of writing. The reason is not less lines or less characters, but greater clarity in the program’s structure. The program is probably not about converting a bunch of integers to floats, so this operation should not become a separate “paragraph”, it should be just a “remark”.

But list comprehensions get less convenient to write as they get a little more complex.

When looping through a nested list or a dictionary, the loop order is counter-intuitive, at least to me. You have to write the loops starting from the most outer one, and then move “inward”.

seas = { 'earth': [ 'baltic', 'red', 'yellow' ], 'moon': [ 'mare serenitatis', 'mare imbrium', 'mare nubium' ] }

[sea
   for sea in place
   for place in seas]
# (error, no way)

[sea
   for place in seas
   for sea in place]
# ['e', 'a', 'r', 't', 'h', 'm', 'o', 'o', 'n']
# (yeah)

[sea
   for place in seas
   for sea in seas[place]]
# ['baltic', 'red', 'yellow', 'mare serenitatis', 'mare imbrium', 'mare nubium']

[sea
   for (place, local_seas) in seas.items()
   for sea in local_seas]
# ['baltic', 'red', 'yellow', 'mare serenitatis', 'mare imbrium', 'mare nubium']

In fact it’s most practical not to think of list comprehensions as of concept-expressing constructs. Instead you should “see” the underlying imperative code. The variables after the list element specification are treated as declared in the order you write them in your code. Hence you have to declare the variables for the outer loop first. You can even do lexical shadowing, and it irks me somewhat that the following executes at all:

[sea
   for (place, seas) in seas.items()
   for sea in seas]
# ['baltic', 'red', 'yellow', 'mare serenitatis', 'mare imbrium', 'mare nubium']

You see, under the hood happens something like this:

lst = []
for (place, seas) in seas.items():
   for sea in seas: # the seas is now locally scoped, the global variable is shadowed
      lst.append(sea)
lst

Which is just a mindless, slight reformatting of our recent list comprehension.

Similar rules of declaration order apply to if expressions.

def is_sunny(sea):
   return sea in ['red', 'mare serenitatis']

[sea if is_sunny(sea)
   for (place, local_seas) in seas.items()
   for sea in local_seas]
# (syntax error)

[sea
   for (place, local_seas) in seas.items()
   for sea in local_seas
   if is_sunny(sea)]
# ['red', 'mare serenitatis']

It by now comes as no surprise that these two are equivalent:

[sea
   for (place, local_seas) in seas.items()
   if not place == 'earth'
   for sea in local_seas]
# ['mare serenitatis', 'mare imbrium', 'mare nubium']

[sea
   for (place, local_seas) in seas.items()
   for sea in local_seas
   if not place == 'earth' ]
# ['mare serenitatis', 'mare imbrium', 'mare nubium']

# And why not...
[sea
   for (place, local_seas) in seas.items()
   for sea in local_seas
   if not is_sunny(sea)
   if not place == 'earth' ]
# ['mare imbrium', 'mare nubium']

And also the ternary operator

You may vaguely remember that we sometimes also use else‘s in those list comprehensions. But if it appears, it is actually a part of the x if z else y ternary operator. This construct doesn’t get “reformatted” in the way we saw earlier; it just gets evaluated to x or y. Same keywords, different concepts, always a great thing. So this is is possible, as long as we include both if and else so the operator is complete:

[sea if is_sunny(sea) else '{bad weather no-go}'
   for (place, local_seas) in seas.items()
   for sea in local_seas]
# ['{bad weather no-go}', 'red', '{bad weather no-go}', 'mare serenitatis', '{bad weather no-go}', '{bad weather no-go}']

Note that we need to place if ... else ... next to the “main” expression, not after for.

But we cannot do this at all in loop specifications:

[sea if is_sunny(sea) else '{bad weather no-go}'
   for ((place, local_seas) if place != 'earth' else ('earth', [ '{too much oxygen}' ]))
      in seas.items()
   for sea in local_seas]
# SyntaxError: can't assign to conditional expression

Again, this makes sense if you imagine a regular for x in y: loop, where we assign to x and need it to be a declaration, not a conditional expression.

Conclusion

Let us return, for a moment, to this case. We have a dictionary seas which sorts seas (and lunar “seas”) by their home astronomical body. However, we wish on an occasion to access a list of all seas, regardless of location. It is nice that we can avoid this:

lst = []
for (place, local_seas) in seas.items():
   for sea in local_seas:
      lst.append(sea)
lst

Why is it bad? Loops suggest that we are doing something (like sending the names somewhere, or maybe even mutating the original list–who knows), when in fact we’re just collecting stuff for some other purpose.

A list comprehension serves as immediate reassurance that we are just collecting stuff. It is possible to use functions with side effects, like in [print(x) for x in y], but I hope reasonable people use it very sparingly. No one expects this in Python.

It would also be good to be able to make list comprehensions as close to flow of human thought (= readable and self-documenting) as possible, but we are constrained by syntax. We also need to be careful not to succumb into writing some jumbled mess which is accepted by Python if it happens to reflect its underlying imperative structure.

I tend to think that the best way is to group all for‘s together, and then list all if‘s, in the order in which their variables appear in for declarations (that is, moving “inwards”). That way the notation is at least somewhat consistent and you know where to look for what.

[sea
   for (place, local_seas) in seas.items()
   for sea in local_seas
   if not place == 'earth' # `place` appears earlier than `sea`
   if not is_sunny(sea) ]

Appendix: Inline list operations

This “reference” presents list comprehensions solutions first, and then mentions built-in Python functions for given task. Personally I think functions may be better for cases in the form of function(thing1, thing2), where both things are atomic/black-boxy, ie. in the context we can abstract from their particular internal structure.

I do not cover operations on lists here, like sorting, reversing etc., because information on those is trivial to get and straightforward. Rather, I concern myself with using lists to do things elegantly.

Filter

This one is easy, we have already done that:

[n for n in range(10) if n % 2 == 0]
# [0, 2, 4, 6, 8]

You can add else with the ternary operator (but this changes syntax), see above.
There is also a a built-in function:

def is_even(n):
   return n % 2 == 0

list(filter(is_even, range(10)))
# [0, 2, 4, 6, 8]

The function returns an iterator; we need to explicitly convert the return value to a list to get  the contents.

Reduce

Sadly, someone decided to relegate reduce in Python 3 to functools package. The is even stranger considering that this is the operation that cannot be (as far as I know) replicated by list comprehensions. Release notes encourage programmers to use for loops instead. Sad.

The function, however, works like this:

from functools import reduce
probabilities = [0.5, 0.04, 0.33]
reduce(lambda x, y: x*y, probabilities)
# 0.006600000000000001

But it turns out than a frequent use case for reducing is covered by the built-in sum (see the link for more specific suggestions):

sum([42, 23, 3], 0)
# 68

The first argument can by any iterable, including a list. The second argument defaults to zero and is the first item to sum.

Chaining/flattening

Merging a list of lists into one list, something like (reduce #'append ...) (or (reduce #'nconc ...) for the more adventurous) in Lisp. I find this solution easier to remember:

day_reports = [['normal', 'met a whale', 'played bridge'], ['fled from a storm'], ['strange transmissions', 'engine failure']]
sum(day_reports, [])
# ['normal', 'met a whale', 'played bridge', 'fled from a storm', 'strange transmissions', 'engine failure']

The problem is, it works only with lists. Many things in Python 3, like filter, like to give you an iterable for efficiency reasons. So it is often wiser to use itertools package:

from itertools import chain
list(chain.from_iterable(day_reports))
# ['normal', 'met a whale', 'played bridge', 'fled from a storm', 'strange transmissions', 'engine failure']

Note that chain.from_iterable itself returns you an iterable, which need to be converted to a list for some purposes like printing.

Map

['specimen {}'.format(n) for n in range(1, 5)]
# ['specimen 1', 'specimen 2', 'specimen 3', 'specimen 4']

When supplied with many lists, a list comprehension matches elements from them in all-to-all fashion.

['specimen {}{}'.format(n, a)
   for n in range(1, 5)
   for a in ['a', 'b']]
# ['specimen 1a', 'specimen 1b', 'specimen 2a', 'specimen 2b', 'specimen 3a', 'specimen 3b', 'specimen 4a', 'specimen 4b']

To get “true” map behavior (match nth elements to nth ones), it’s probably the best to zip lists.

['specimen {}{}'.format(n, a)
   for (n, a) in zip(range(1,5), ['a', 'b'])]
# ['specimen 1a', 'specimen 2b']

Or, there is a built-in function again:

def name_specimen(n, a):
   return 'specimen {}{}'.format(n, a)
letters = ['a', 'b']
list(map(name_specimen, range(1,5), letters))
# ['specimen 1a', 'specimen 2b']

See below for mapping by applying a function to pre-made tuples of arguments.

Apply

A feature of Python I rarely use is passing lists with asterisks to functions. The lists is unpacked and its items are assigned to the function’s arguments. It can be unintuitive if you are used to asterisks having something to do with pointers.
But applying lists (or any iterables for that matter) to functions can be useful if you pull all arguments for a function from somewhere, like this:

def is_planet_safe(prob_habitable, prob_life, prob_tech):
   if prob_habitable * prob_life * prob_tech > 0.01:
      return False
   return True

def probe_planet(name):
   return [0.5, 0.04, 0.33] # the probe cheats

is_planet_safe(*probe_planet('earth'))
# True

Note that this way you can tinker with return value of probe_planet and argument list of is_planet_safe without updating function calls.
Also, with ** (two asterisks) you can pass a dictionary, which serves a source of keyword arguments for the function.
Relatedly, itertools package has a fantastically named function starmap. This behaves like map, but receives a list of tuples of arguments:

from itertools import starmap
list(starmap(is_planet_safe, [probe_planet('earth'), probe_planet('venus'), probe_planet('nibiru')]))
# [True, True, True]

The whole package is worth a look.

Leave a Reply

Your email address will not be published. Required fields are marked *