Skip to content

ENH: Allow index names to be included in itertuples() result #27407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Dr-Irv opened this issue Jul 15, 2019 · 2 comments
Open

ENH: Allow index names to be included in itertuples() result #27407

Dr-Irv opened this issue Jul 15, 2019 · 2 comments

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jul 15, 2019

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [10, 20, 30, 40]},
            index=pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']],
                                                names=['ab', 'cd']))
        df
Out[2]:
       x   y
ab cd
a  c   1  10
   d   2  20
b  c   3  30
   d   4  40

In [3]: for it in df.itertuples():
               print(it)

Pandas(Index=('a', 'c'), x=1, y=10)
Pandas(Index=('a', 'd'), x=2, y=20)
Pandas(Index=('b', 'c'), x=3, y=30)
Pandas(Index=('b', 'd'), x=4, y=40)

Problem description

When iterating through a DataFrame, the names of the Index are lost.

It would be really convenient if when a MultiIndex is used, the names of the MultiIndex were included in the result of itertuples().

Propose to add named argument to itertuples() called nameIndex with default value False to retain current behavior, and nameIndex=True causing output as shown below.

Expected Output

Pandas(Index=Index(ab='a', cd='c'), x=1, y=10)
Pandas(Index=Index(ab='a', cd='d'), x=2, y=20)
Pandas(Index=Index(ab='b', cd='c'), x=3, y=30)
Pandas(Index=Index(ab='b', cd='d'), x=4, y=40)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : b57d523
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.0rc0+62.gb57d523b3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.11
pytest : 5.0.0
hypothesis : 4.23.6
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.0
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
s3fs : 0.2.1
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@akukuq
Copy link

akukuq commented Jan 27, 2020

I would like to add that even when using a normal index (i.e. not a MultiIndex) it is possible to assign that index a name (df.index.name = "some_name"). In such cases it would make sense to use the assigned name in the namedtuples yielded by itertuples.

Expected Output

>>> df = pd.DataFrame({"foo": [0,1,2]})
>>> df.index.name = "bar"
>>> next(df.itertuples())
Pandas(bar=0, foo=0) #Currently returns Pandas(Index=0, foo=0)

@konstantinmiller
Copy link

I would expect the same should work with iterrows():

import pandas as pd

df = pd.DataFrame(
    index=pd.MultiIndex(
        names=['ind1'],
        levels=[['a']],
        codes=[[0]]
    ),
    data={'C': [42]})

ind, row = next(df.iterrows())
row.C
ind.ind1

Since ind is not a named tuple, it throws an AttributeError: 'tuple' object has no attribute 'ind1'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants