2008
08.30

After my last post on generators in Python I realized I missed go through one thing that I wanted to mention in the first part, namely how the return keyword interfaces with yield and generators, take this example function and its usage:

def count_to_3or4():
	counter = 0

	while counter < 3:
		counter += 1
		yield counter

	return counter+1

c = count_to_3or4()
print c.next() # 1
print c.next() # 2
print c.next() # 3
print c.next() # 4, from return - or ?

If you've read the previous post, or have a basic understanding of generators you would probably guess that 1, 2, 3 will print from the thee first .next()-calls - but you would be wrong. If you try to run the above code you will get this thrown back in your face:

  File "generators.py", line 15
    return counter+1
SyntaxError: 'return' with argument inside generator

So you can't use return in generators (functions with yield) - well, yes you can - you just can't use return with a value attached to it. If you call return within a generator function it will exit and any further calls to .next() will throw a StopIteration exception, take this example code:

def count_to_2or3():
	counter = 0

	while counter < 3:
		counter += 1
		yield counter

		if counter is 2:
			return

c = count_to_2or3()
print c.next() # 1
print c.next() # 2
print c.next() # 3, or ?

It will print 1 and 2, when you call .next() a third time it will hit return (since counter == 2, the if-clause evaluates to true) and throw a StopIteration exception. Basically "return" inside a generator-function does what "break" does inside a loop.

Performance

When you're using generators as a type of iterator together with for (or manually, for that matter) working with large datasets you will see a substantial performance increase over list-generating functions. These two functions will generate the exact same output, but one will be significantly faster and use less memory:

def count_to_list(stop):
	_list = []
	counter = 0

	while counter < stop:
		counter += 1
		_list.append(counter)

	return _list

def count_to_generator(stop):
	counter = 0

	while counter < stop:
		counter += 1
		yield counter

The first function will generate a list of numbers (which takes quite some time and memory) and then return that while the second function, our generator will produce one number each time .next() is called on it and only consume as much memory as one integer take up while also being a fair bit faster, running this on my VPS yields (again, no pun intended) these results:

fredrik@holmstrom:~/python/generators$ time python list.py

real    0m0.611s
user    0m0.540s
sys     0m0.040s
fredrik@holmstrom:~/python/generators$ time python generator.py

real    0m0.385s
user    0m0.380s
sys     0m0.000s

I didn't measure memory usage here, but trust me - generator.py will consume a lot less memory, this technique is also called "lazy evaluation" in proper CS terms - there's a lot more information on this topic alone, but this will do for now.

Advanced usage

As I mentioned in the history introduction in my previous post about generators, Python 2.5 gave generators a substantial usability boost allowing us to pass information back into the function through the yield statement and the .send() and .throw() methods on the generator-object. send() works exactly like next() except that you can pass a value back into the function as it's first argument, but there are a few caveats you should look out for - take this snippet of code:

def echo():
	while True:
		print yield

Will give you the following SyntaxError:

  File "explained.py", line 3
    print yield
              ^

SyntaxError: invalid syntax

Changing the print yield to this:

def echo():
	while True:
		val = yield
		print val

Will make the code execute properly, however this seems pretty non-pythonic having to store the variable we want to store the result in a temporary variable - so instead we can do this:

def echo():
	while True:
		print (yield)

Wrapping the yield in parenthesizes will allow you to use the result of it directly instead of storing it in a temporary variable, so let's put our echo() generator to use:

def echo():
	while True:
		print (yield)

e = echo()
e.send("Hello")
e.send("World!")

But, running this code will show you the second caveat of trying to pass values back into the generator function, this TypeError will be thrown in your face if you run this code:

Traceback (most recent call last):
  File "generator.py", line 6, in 
    e.send("Hello")
TypeError: can't send non-None value to a just-started generator

Remember how I said that yield paused the execution of the generator function and that when you call the generator function (in this case e = echo()) no code is yet to be executed until you call .next() on your generator-object? So if .send() can be used to pass data back into a yield statement while the generator is paused, we can't call .send() when no code has been executed and no yield statement has paused the generator, right?

What this means in practice is that you either have to call .next() or .send(None) the first time you call a generator, and when the generator reaches its first yield statement it will pause execution waiting for another call to .send() (or .next() if you don't want to pass any data back) that will pass data back into it at the yield statement, confusing? So changing the code above to this:

def echo():
	while True:
		print (yield)

e = echo()
e.send(None) # or e.next()
e.send("Hello")
e.send("World!")

Will make it run, printing:

Hello
World!

To illustrate exactly what's happening here, I'll take another example - slightly more advanced but still achieving the same result as above:

def echo():
	counter = 0

	while True:
		counter += 1
		print (yield counter)

e = echo()
print "Yeild nr %s" % e.send(None) 	# Sending nothing in (since we havn't paused
					# anything with yield yet) and yielding nr 1
					# back to the print statement

print "Yeild nr %s" % e.send("Hello")	# the pause from nr 1 gets resumed, passing "Hello"
					# back in and printing it, then doing another loop
					# and yielding nr 2 back and pausing execution

print "Yeild nr %s" % e.send("World!")	# the pause from nr 2 gets resumed, passing "World!"
					# back in and printing it, then doing another loop
					# and yielding nr 3 back to and pausing execution

					# If we would call the same e.send("Blah"), etc.
					# here we could go on forever since the yield
					# statement is stuck in a "while True"-loop

Make sure to read the comments in the above code since I figured it would be a lot easier to explain if the comments where attached to the correct line, running the above code will yield (again, no pun intended ;p) the following results:

Yeild nr 1
Hello
Yeild nr 2
World!
Yeild nr 3

Quite simple, and yet so powerful. There is one last thing I want to demonstrate in this, second part, of the tutorial - the method .throw() those of you familiar with other languages then python might recognize the word throw and figure it would have something to do with exceptions, and you'd be correct - it does.

As I've demonstrated, .send() sends in data to the paused yield statement, and .throw() does something similar: it sends in an exception that gets thrown and the paused yield statements line, let's demonstrate:

def exceptional():
	while True:
		yield

e = exceptional()
e.next()
e.throw(Exception)

Will give you this output:

Traceback (most recent call last):
  File "generator.py", line 7, in 
    e.throw(Exception)
  File "generator.py", line 3, in exceptional
    yield
Exception

Which is correct, because you sent an Exception in. It is possible to call .throw() as the first method on a new generator object, before any call to .next() or .send(), however that will throw an exception before any code is executed in the method and you will not have a chance to handle it.

In the stack trace above you also see that the exception is actually thrown at the "yield" line when it's resumed after being paused by .next() the first time.

Let's do a more advanced example, with a custom exception class:

def exceptional():
	counter = 0
	while True:
		try:
			counter += 1
			yield counter
		except DemoException, exc:
			print "Caught exception with message: %s" % exc

class DemoException(Exception):
	pass

e = exceptional()
print e.next()
print e.throw(DemoException("Hello World"))

The above code will print this:

1
Caught exception with message: Hello World
2

And here's the magic - if you handle the exception that gets thrown in at the line yield was called at (by wrapping it in a try/except/finally-block) the code will continue executing like it should and .throw() will return the result of the next invocation of yield. All in all .send() and .throw() work exactly the same way except that .throw() raises whatever you feed it with as an exception.

The ability to pass errors (exceptions) *into* generators allows you to do some really neat error handling that doesn't require your wrapping code to have any information about the generator resulting in a very clean and loosely coupled code.

In the next, and last, part I will go through a real world example using asynchronous i/o and network calls utilizing all the techniques explained in these two posts.

2008
08.29

First, some history…

Generators is a concept that was introduced in Python at version 2.2, back then they were unidirectional that only allowed information to be passed out of the generator and not back into it, which limited their use to simple iterators and not much else. This was changed / enhanced in Python 2.5 when both data and exceptions now can be passed back into to generator. The changes made in 2.5 allowed for generators to be used as coroutines enabling them to function in complex event-driven programming such as asynchronous I/o, games, etc.

So how does one define a generator in Python? It’s actually very simple, you just define a normal subroutine (or function, if you will) that has the keyword yield somewhere inside of its body, here’s a quick example:

def foo():
    yield

What does yield do to a subroutine then? When a subroutine encounters the yield expression it suspends execution so that it can be resumed at a later time, as chosen by the programmer. You basically tell the routine “I don’t want to continue executing you now, but at a later stage I might want to and you should resume from the point where the yield statement was and not start over”, it’s also important to note that when yield is called the subroutine’s state (variable values, etc.) are all saved, so when you continue executing it everything will be the way you left it.

When you call a generator-function (a subroutine/function with the yield-keyword in its body) you don’t get a result back, instead you get a generator-object back that is used to control the execution of the subroutine, take the foo()-routine we defined above, if we do this:

gen = foo()
print gen

This is what python will print about the “result” of foo(): <generator object at 0x2b8cbb061098>, so when we call a generator function we get a generator object back, not the result of the function call. Note that none of the code inside foo() has yet been executed, as I’ve said the generators execution is controlled through the generator-object, primarily by it’s next()-method which will start/resume execution until a yield statement is found, and then return. So if we do this instead:

gen = foo()
print gen.next()

We get back this: None, not very useful at all, if we try calling gen.next() again we will get something like this:

Traceback (most recent call last):
  File "generators.py", line 7, in 
    print gen.next()
StopIteration

Because the yield statement only gets executed once in our foo-generator and the generator then reaches its end, we can only “resume” execution with next() once. So what if add two yields to foo() instead, making the code look like this:

def foo():
	yield
	yield

gen = foo()
print gen.next()
print gen.next()

This works, giving us back:

None
None

calling gen.next() a third time will, again, raise a StopIteration-exception. We’re still only getting back a lot of nothing (None) from our generators, how about passing something back out from our yield statements, modifying foo again making it look like this:

def foo():
	yield "Hello"
	yield "World!"

gen = foo()
print gen.next()
print gen.next()

Will yield (no pun intended) this result:

Hello
World!

Kind of what you were expecting, huh? So let’s do something a bit more interesting, or well – something that shows what generators are useful for:

def counter(count_to):
	counter = 0;

	while counter < count_to:
		counter = counter+1
		yield counter

As you see this generator named counter takes one argument, an integer which decides how far we should count, remember when we call counter(3) to count to three no code inside the generator gets executed until we call the generator-objects next() method, it then executes normally until it hits a yield statement and then suspends returning (through next()) whatever we fed to yield, let's see it in action:

c = counter(3)
print c.next()
print c.next()
print c.next()

This will, maybe not to our surprise now, print:

1
2
3

When the three gets "yielded" to us, we can not call next() again without raising an StopIteration-exception because the while-loops condition would return false skipping the yield statement within it and counter() would end, without yielding anything back to us through next().

What happens if we call counter() several times? We will get several generator-objects each representing one invocation of counter() with its own internal state, you can almost think of it like creating an two object instances of a class:

c1 = counter(3)
c2 = counter(4)
print c1.next()
print c2.next()
print c1.next()
print c2.next()
print c1.next()
print c2.next()

print c2.next() # We can run c2 once more c1 since its counting to four and c1 to three

The above code will print,

1
1
2
2
3
3
4

, demonstrating that each invocation of a generator function creates its own generator-object and scope. Generators are used everywhere in python, in most cases they are used as iterators together with the for statement but they have other uses to. Using a generator together with a for statement is very straightforward, take the above counter()-function, we can use it the same way range() is used in python:

for i in counter(5):
	print i

The for-language construct in python has a built in way of handling generators, when it gets fed a generator-object (the result of calling counter(5) in this case) it will call .next() on it putting the value returned in the iteration-variable, i in this case. When for gets an StopIteration exception thrown from the generator for calling .next() one to many times it will silently kill the exception and stop the loop, neat huh?

Lets write a, again useless, generator-function that iterates through every letter in a word and call it with for:

def letters(word):
	for i in range(len(word)):
		yield word[i]

for letter in letters("Hello World"):
	print letter

I think you can guess what this will print, yeah. While the above function is practically useless in python - it's a good example on how generators and the for statement work together. I hope this run-through gave you a quick look into what generators are, if you're interested in learning more about them make sure to check out part two of this article series.