Skip to content

Commit d95ddc5

Browse files
committed
ENH: itertuples() returns namedtuples
1 parent 2dd4335 commit d95ddc5

File tree

4 files changed

+74
-20
lines changed

4 files changed

+74
-20
lines changed

doc/source/basics.rst

+18-10
Original file line numberDiff line numberDiff line change
@@ -1211,9 +1211,10 @@ To iterate over the rows of a DataFrame, you can use the following methods:
12111211
* :meth:`~DataFrame.iterrows`: Iterate over the rows of a DataFrame as (index, Series) pairs.
12121212
This converts the rows to Series objects, which can change the dtypes and has some
12131213
performance implications.
1214-
* :meth:`~DataFrame.itertuples`: Iterate over the rows of a DataFrame as tuples of the values.
1215-
This is a lot faster as :meth:`~DataFrame.iterrows`, and is in most cases preferable to
1216-
use to iterate over the values of a DataFrame.
1214+
* :meth:`~DataFrame.itertuples`: Iterate over the rows of a DataFrame
1215+
as namedtuples of the values. This is a lot faster as
1216+
:meth:`~DataFrame.iterrows`, and is in most cases preferable to use
1217+
to iterate over the values of a DataFrame.
12171218

12181219
.. warning::
12191220

@@ -1307,7 +1308,7 @@ index value along with a Series containing the data in each row:
13071308
df_orig['int'].dtype
13081309
13091310
To preserve dtypes while iterating over the rows, it is better
1310-
to use :meth:`~DataFrame.itertuples` which returns tuples of the values
1311+
to use :meth:`~DataFrame.itertuples` which returns namedtuples of the values
13111312
and which is generally much faster as ``iterrows``.
13121313

13131314
For instance, a contrived way to transpose the DataFrame would be:
@@ -1325,9 +1326,9 @@ itertuples
13251326
~~~~~~~~~~
13261327

13271328
The :meth:`~DataFrame.itertuples` method will return an iterator
1328-
yielding a tuple for each row in the DataFrame. The first element
1329-
of the tuple will be the row's corresponding index value,
1330-
while the remaining values are the row values.
1329+
yielding a namedtuple for each row in the DataFrame. The first element
1330+
of the tuple will be the row's corresponding index value, while the
1331+
remaining values are the row values.
13311332

13321333
For instance,
13331334

@@ -1336,9 +1337,16 @@ For instance,
13361337
for row in df.itertuples():
13371338
print(row)
13381339
1339-
This method does not convert the row to a Series object but just returns the
1340-
values inside a tuple. Therefore, :meth:`~DataFrame.itertuples` preserves the
1341-
data type of the values and is generally faster as :meth:`~DataFrame.iterrows`.
1340+
This method does not convert the row to a Series object but just
1341+
returns the values inside a namedtuple. Therefore,
1342+
:meth:`~DataFrame.itertuples` preserves the data type of the values
1343+
and is generally faster as :meth:`~DataFrame.iterrows`.
1344+
1345+
.. note::
1346+
1347+
The columns names will be renamed to positional names if they are
1348+
invalid Python identifiers, repeated, or start with an underscore.
1349+
With a large number of columns (>255), regular tuples are returned.
13421350

13431351
.. _basics.dt_accessors:
13441352

doc/source/whatsnew/v0.17.1.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ API changes
3838
Legacy Python syntax (``set([x, y])``) (:issue:`11215`)
3939
- Indexing with a null key will raise a ``TypeError``, instead of a ``ValueError`` (:issue:`11356`)
4040
- ``Series.sort_index()`` now correctly handles the ``inplace`` option (:issue:`11402`)
41+
- ``DataFrame.itertuples()`` now returns ``namedtuple`` objects, when possible. (:issue:`11269`)
4142

4243
.. _whatsnew_0171.deprecations:
4344

@@ -71,7 +72,7 @@ Bug Fixes
7172
- Bug in ``HDFStore.append`` with strings whose encoded length exceded the max unencoded length (:issue:`11234`)
7273
- Bug in merging ``datetime64[ns, tz]`` dtypes (:issue:`11405`)
7374
- Bug in ``HDFStore.select`` when comparing with a numpy scalar in a where clause (:issue:`11283`)
74-
- Bug in using ``DataFrame.ix`` with a multi-index indexer(:issue:`11372`)
75+
- Bug in using ``DataFrame.ix`` with a multi-index indexer(:issue:`11372`)
7576

7677

7778
- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issue:`11295`)

pandas/core/frame.py

+33-9
Original file line numberDiff line numberDiff line change
@@ -584,7 +584,7 @@ def iteritems(self):
584584
See also
585585
--------
586586
iterrows : Iterate over the rows of a DataFrame as (index, Series) pairs.
587-
itertuples : Iterate over the rows of a DataFrame as tuples of the values.
587+
itertuples : Iterate over the rows of a DataFrame as namedtuples of the values.
588588
589589
"""
590590
if self.columns.is_unique and hasattr(self, '_item_cache'):
@@ -617,7 +617,7 @@ def iterrows(self):
617617
int64
618618
619619
To preserve dtypes while iterating over the rows, it is better
620-
to use :meth:`itertuples` which returns tuples of the values
620+
to use :meth:`itertuples` which returns namedtuples of the values
621621
and which is generally faster as ``iterrows``.
622622
623623
2. You should **never modify** something you are iterating over.
@@ -632,7 +632,7 @@ def iterrows(self):
632632
633633
See also
634634
--------
635-
itertuples : Iterate over the rows of a DataFrame as tuples of the values.
635+
itertuples : Iterate over the rows of a DataFrame as namedtuples of the values.
636636
iteritems : Iterate over (column name, Series) pairs.
637637
638638
"""
@@ -641,15 +641,23 @@ def iterrows(self):
641641
s = Series(v, index=columns, name=k)
642642
yield k, s
643643

644-
def itertuples(self, index=True):
644+
def itertuples(self, index=True, name="Pandas"):
645645
"""
646-
Iterate over the rows of DataFrame as tuples, with index value
646+
Iterate over the rows of DataFrame as namedtuples, with index value
647647
as first element of the tuple.
648648
649649
Parameters
650650
----------
651651
index : boolean, default True
652652
If True, return the index as the first element of the tuple.
653+
name : string, default "Pandas"
654+
The name of the returned namedtuple.
655+
656+
Notes
657+
-----
658+
The columns names will be renamed to positional names if they are
659+
invalid Python identifiers, repeated, or start with an underscore.
660+
With a large number of columns (>255), regular tuples are returned.
653661
654662
See also
655663
--------
@@ -666,16 +674,32 @@ def itertuples(self, index=True):
666674
b 2 0.2
667675
>>> for row in df.itertuples():
668676
... print(row)
669-
('a', 1, 0.10000000000000001)
670-
('b', 2, 0.20000000000000001)
677+
...
678+
Pandas(Index='a', col1=1, col2=0.10000000000000001)
679+
Pandas(Index='b', col1=2, col2=0.20000000000000001)
671680
672681
"""
673682
arrays = []
683+
fields = []
674684
if index:
675685
arrays.append(self.index)
686+
fields.append("Index")
676687

677688
# use integer indexing because of possible duplicate column names
678689
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
690+
691+
# Python 3 supports at most 255 arguments to constructor, and
692+
# things get slow with this many fields in Python 2
693+
if len(self.columns) + index < 256:
694+
# `rename` is unsupported in Python 2.6
695+
try:
696+
itertuple = collections.namedtuple(
697+
name, fields+list(self.columns), rename=True)
698+
return (itertuple(*row) for row in zip(*arrays))
699+
except:
700+
pass
701+
702+
# fallback to regular tuples
679703
return zip(*arrays)
680704

681705
if compat.PY3: # pragma: no cover
@@ -1213,7 +1237,7 @@ def to_panel(self):
12131237

12141238
def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
12151239
columns=None, header=True, index=True, index_label=None,
1216-
mode='w', encoding=None, compression=None, quoting=None,
1240+
mode='w', encoding=None, compression=None, quoting=None,
12171241
quotechar='"', line_terminator='\n', chunksize=None,
12181242
tupleize_cols=False, date_format=None, doublequote=True,
12191243
escapechar=None, decimal='.', **kwds):
@@ -1251,7 +1275,7 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
12511275
A string representing the encoding to use in the output file,
12521276
defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
12531277
compression : string, optional
1254-
a string representing the compression to use in the output file,
1278+
a string representing the compression to use in the output file,
12551279
allowed values are 'gzip', 'bz2',
12561280
only used when the first argument is a filename
12571281
line_terminator : string, default '\\n'

pandas/tests/test_frame.py

+21
Original file line numberDiff line numberDiff line change
@@ -5545,6 +5545,27 @@ def test_itertuples(self):
55455545
dfaa = df[['a', 'a']]
55465546
self.assertEqual(list(dfaa.itertuples()), [(0, 1, 1), (1, 2, 2), (2, 3, 3)])
55475547

5548+
tup = next(df.itertuples(name='TestName'))
5549+
5550+
# no support for field renaming in Python 2.6, regular tuples are returned
5551+
if sys.version >= LooseVersion('2.7'):
5552+
self.assertEqual(tup._fields, ('Index', 'a', 'b'))
5553+
self.assertEqual((tup.Index, tup.a, tup.b), tup)
5554+
self.assertEqual(type(tup).__name__, 'TestName')
5555+
5556+
df.columns = ['def', 'return']
5557+
tup2 = next(df.itertuples(name='TestName'))
5558+
self.assertEqual(tup2, (0, 1, 4))
5559+
5560+
if sys.version >= LooseVersion('2.7'):
5561+
self.assertEqual(tup2._fields, ('Index', '_1', '_2'))
5562+
5563+
df3 = DataFrame(dict(('f'+str(i), [i]) for i in range(1024)))
5564+
# will raise SyntaxError if trying to create namedtuple
5565+
tup3 = next(df3.itertuples())
5566+
self.assertFalse(hasattr(tup3, '_fields'))
5567+
self.assertIsInstance(tup3, tuple)
5568+
55485569
def test_len(self):
55495570
self.assertEqual(len(self.frame), len(self.frame.index))
55505571

0 commit comments

Comments
 (0)