Numpy supports structured arrays, which are the nearest thing to R's data.frame class. Data are organized into fields and records. Each field (column) has a name and data type, and each record (row) has a value for all the fields. Columns are indexed by name, and rows are indexed by integers. Recarray objects can be generated from nested Python iterable objects using numpy.rec.fromrecords:


>>> D = [('fair',6.0,1), ('good',12,2)]
>>> D = numpy.rec.fromrecords(D, names='quality,price,size')
>>> D

rec.array([('fair', 6.0, 1), ('good', 12.0, 2)],
      dtype=[('quality', '|S4'), ('price', '<f8'), ('size', '<i4')])

>>> D['quality']

rec.array(['fair', 'good'],
      dtype='|S4')

>> D[0]

('fair', 6, 1)
 


Note that the 'price' field has a float data type because one of the records has a float value, and the field is promoted to the most general data type. For more precise control over field data types, fromrecords() takes a format argument, which is a comma-delimited list of format strings. For instance, to force 'price' to be an integer, call D = numpy.rec.fromrecords(D, names='quality,price,size', formats='S4,i4,i4')

For reading and writing recarrays, use matplotlib.mlab.rec2csv() and matplotlib.mlab.csv2rec(). The format of each field can be specified using a dictionary. There are a number of arguments to both functions that can be used to control how the data is read in (e.g. delimiter, is the first row a list of field names, etc), most of which are documented. The rec2csv() function always outputs field names as headers. To avoid this behavior, or to avoid having a dependency on matplotlib, use numpy.savetxt()


>>> from matplotlib import mlab

>>> formatd = {'quality' : mlab.FormatString(), 'price' : mlab.FormatFloat(2),}
>>> mlab.rec2csv(D, 'test.csv', formatd=formatd)
>>> mlab.csv2rec('test.csv')

rec.array([('fair', 6.0, 1), ('good', 12.0, 2)],
      dtype=[('quality', '|S4'), ('price', '<f8'), ('size', '<i4')])

>>> numpy.savetxt('test.csv', D, delimiter=',', fmt=('%s','%3.2f','%d'))
>>> numpy.loadtxt('test.csv', delimiter=',', dtype={'names': ('quality','price','size'), 'formats' : ('S4', 'f8', 'i4')})

array([('fair', 6.0, 1), ('good', 12.0, 2)],
      dtype=[('quality', '|S4'), ('price', '<f8'), ('size', '<i4')])