Generators are Python functions that have a yield statement instead of a return statement. Or rather, calling a function that has a yield statement returns a generator, which is an iterable object that uses the underlying function to produce values. Each time you call next() on the generator, it yields the next value in the sequence, or throws StopIteration, which indicates that it's out of values. Python's famous list comprehensions are actually generators. For example, you can create a list of square numbers with a command like
[x**2 for x in range(5)]. If you leave out the brackets you get a generator:
>>> g = (x**2 for x in range(5))
>>> g
<generator object at 0x2aaaae169f38>
>>> g.next()
0
>>> g.next()
1
Python 2.5 added some more sophistication to generators that allow them to be used as coroutines (
PEP 342). For instance, generators now have a close() method that causes the generating function to throw a GeneratorExit, which can act like a signal to tell the generator to clean itself up. It's also possible to pass an arbitrary exception to the generator, or to pass values back to the generator.
The advantage of using coroutines is that it allows you to modularize your code without having to break out a lot of heavy object-oriented features. For example, I often have to write scripts that plot a bunch of little figures. I often want to break the figure across multiple pages, so I generate axes until the figure is full, then write the figure to disk and start a new one. There's some bookkeeping involved in this; I have to create the initial figure, then check after every plot if the figure is full and take the appropriate actions if it is. All of that is extraneous to what the script is actually doing, so we want to factor that out. There are many ways of solving this, but a generator is remarkably simple:
def axgriditer(grid=(1,1), figfun=None, **figparams):
"""
Generates axes for multiple gridded plots. Initial call
to generator specifies plot grid (default 1x1). Yields axes
on the grid; when the grid is full, opens a new figure and starts
filling that.
Arguments:
grid - specify the grid layout. Can be a tuple or a function that
yields a series of axes [signature grid(fig)]
figfun - called when the figure is full or the generator is
closed. Can be used for final figure cleanup or to save
the figure. [signature: figfun(fig)]
additional arguments are passed to the figure() function
"""
if len(grid)==2:
nx,ny = grid
grid = lambda fig: (fig.add_subplot(nx,ny,i) for i in range(1,nx*ny+1))
elif not callable(grid):
raise ValueError, "Grid argument must be length 2 or a function"
fig = figure(**figparams)
axg = grid(fig)
try:
while 1:
for ax in axg:
yield ax
if callable(figfun): figfun(fig)
fig = figure(**figparams)
axg = grid(fig)
except Exception, e:
# cleanup and re-throw exception
if callable(figfun): figfun(fig)
raise e
Using this generator is fairly simple. The following code plots 4 figures with 5 panels each, with only a single line of extra code to initialize the coroutine.
axes_grid = axgriditer((5,1))
for i in range(20):
axes_grid.next().plot(randn(25))
Several things to note about this function:
- The grid function is itself a generator that spits out Axes objects. If the user just specifies the number of columns and rows, I create a generator that uses matplotlib's add_subplot() method. But I have other generators I can pass in that create irregularly spaced grids, so this is highly modular.
- Remember that because of Python's closure rules, the grid() generator uses the values of nx and ny defined in the main function. Thus, even though the signature for grid is grid(fig), it can still access these values. Note that if you define the gridding function in a different scope, it binds to the variables in that context. In other words, you could modify the grid dynamically by changing those variables in the calling scope, although this would probably be an undesirable side effect since the generator doesn't tell you when it's started on a new figure.
- The grid generator stops yielding axes when the figure is full, at which point the post-processing function is called and a new figure is generated.
- The try/except block catches calls to the close() or throw() methods of the generator and makes sure figfun() gets run on the final figure.
Let's say you're making a nice class and you want to expose some constant property of the class so that code that uses your class will know, for example, what kind of data type the class deals in. In Java or C++ you create a constant static member and you're set. In python you do the same thing, except these variables are called class variables. But how do you do this when you're defining a Python class in C or C++?
The answer is probably totally obvious to anyone who uses python a lot, but it took me more than 15 minutes to find the answer, so it goes here: you put it in the class dictionary. With normal python objects both the instance and the class have a dictionary that contains the attributes of the instance. The instance dictionary takes precedence, so if you define a class variable and then subsequently assign some value to that variable for an instance, it gets stored in the instance dictionary, and that's the value you get if you access that attribute in the future. Types defined in C extension modules generally don't have instance dictionaries, so anything you put in the type dictionary winds up being read-only.
Here's the line I used (in the initialization function) to create the class variable _dtype and assign it the numpy dtype associated with the data the class processes.
PcmfileType.tp_dict = Py_BuildValue("{s:O}", "_dtype", (PyObject *)PyArray_DescrFromType(NPY_SHORT));