Writing effective loops in python is a skill, it is that plain and simple. If you aren’t doing that then no worries, it is really easy once you know common pitfalls in it. Ready? Let’s jump into it!
Coming from C/C++ background, this is how always my loops looked like (and by the way, just heads up, there’s nothing wrong with the below loop, nothing syntactically at least!):
i = 0
mylist = [ 10, 20, 30 ]
while i < len(mylist):
print("mylist[",i,"] is ",mylist[i])
i+=1
Result:
mylist[ 0 ] is 10
mylist[ 1 ] is 20
mylist[ 2 ] is 30
So you say so what? It gets the job done and it is technically no inefficient, what is wrong with this? The answer is nothing wrong with it, it is just not the best of python code. There are also possible problems which could occur when this code goes into maintenance, let’s find out:
Last line where we increment ‘i’ could be missed or skipped in future, if say somebody adds more code in this loop and code logic of future code decides to “continue” in the loop but forgets to add increment. This will be caught nevertheless in unit testing but then ultimately code needs to be repeated then for an increment of ‘i’ anytime loop iteration short circuits current iteration
The code is maintaining two logic paths, one to track index ‘i’ and increment it and other is an actual iteration of ‘mylist’, anytime they mismatch (which has higher chances of happening if the code actually starts doing more things inside the loop with mylist elements).
It’s just too much code, really, if you don’t find that convincing then read along.
Okay so what is the best way to write above code in the more natural pythonic way? The answer is simple – use built-in iterators wherever you can. All built-in types in python have built-in iterators, too, which for loop can benefit from. Any object in python which is iteratable always supports iterator classes and objects which greatly simplify loops. So above code can be written like this with iterators:
mylist = [ 10, 20, 30 ]
for list_item in mylist:
print("list_item=",list_item)
Result:
list_item= 10
list_item= 20
list_item= 30
Sweet, isn't it? We just managed to reduce 5 line block of code to just 3 line block and additionally, code became almost easier to read and understand. That is the beauty of python!
Okay, I hear you say we managed to change the format for print in order to simplify the code. How do you get the index for such special cases without adding back unnecessary overhead to maintain the value of the index across iteration? Python again comes to rescue and we can use ‘enumerate’ built-in function which returns a tuple of (index, value) for iteratable objects, so above code would become:
mylist = [ 10, 20, 30 ]
for i, list_item in enumerate(mylist):
print("mylist[",i,"] is ", list_item)
Result:
mylist[ 0 ] is 10
mylist[ 1 ] is 20
mylist[ 2 ] is 30
Cool, you say but how about non-built in or user-defined iteratable objects? Sure, python got you covered! We will take the example of name array object. Let's assume it stores name list and it has an interface to iterate over.
# This is just for demonstration, as built in types like list would
# suit here for such use cases otherwise.
class MyNameArray:
def __init__(self):
self._names = []
def add_name(self, name):
self._names.append(name)
def get_names(self):
retrn self._names
Now here, we could simply call ‘get_names’ and use it in our optimized loop, but assume this class changes overtime to contain more data than just names (it could contain set of fields per entry). Then we could end up with multiple get methods. Also, even with the current format with just name entries, it wouldn't do justice to use it this way as we really want it to act like name array. How do we do this? The answer is simple – iterator protocol of python.
Any class or data type which is iteratable can follow the iterator protocol by defining two interfaces in the class and those optimized for loops with x-in-datastructure format will just work fine for our objects too. Let's add iterator protocol and interfaces to our array class:
class MyNameArray:
def __init__(self):
self._names = []
def add_name(self, name):
self._names.append(name)
def __iter__(self):
self._index = 0
return self
def __next__(self):
if self._index < len(self._names):
name = self._names[self._index]
self._index+=1
return name
else:
raise StopIteration
Did you notice we just removed get_names method. It was not helping with encapsulation anyway, exposing data from a class that way unhides class data members and so not recommended. Do notice we also added two dunder methods – next and iter. ‘iter’ method is supposed to return iterator object, it can be any standalone object too, though, in our class’s case, class itself acts like an iterator class so we return the self-instance.
Next dunder method ‘next’ gets called on our object actually in iterator object context (where our class object is used as an iterator, since we returned ourselves in ‘iter’ method).
Okay so enough with this mumbo-jumbo, lets put it to use in a loop and see how it does:
myNameArray = MyNameArray()
myNameArray.add_name("John")
myNameArray.add_name("Joseph")
myNameArray.add_name("Mery")
for name in myNameArray:
print("Hi ", name)
Result:
Hi John
Hi Joseph
Hi Mery
Sweet! now this not only resulted in the smaller code to what we would have with complex array class but also it is much simpler to read and understand. Most importantly, now our array class also behaves like an array. You will appreciate this iterator protocol and beauty of doing things pythonic way as your classes (like MyNameArray above) get bigger and bigger and get more complex over time.
Summary
Doing things pythonic way has its own benefits, loop constructs are no exception to that. Pythonic way makes code smaller, easier to read and easier to maintain. Ultimately, it results in happier code reviewers and maintainers for your code. Especially code maintainers down the line would appreciate efforts like this. Happy python looping!
Comments