Skip to content

BUG: Inconsistent conversion of missing column names #44818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
johnzangwill opened this issue Dec 8, 2021 · 5 comments · Fixed by #44878
Closed
3 tasks done

BUG: Inconsistent conversion of missing column names #44818

johnzangwill opened this issue Dec 8, 2021 · 5 comments · Fixed by #44878
Assignees
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@johnzangwill
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
f = pd.DataFrame([1], index=pd.MultiIndex.from_arrays([[2],[3],[4]], names=[None, "a", None]))

>>> f.reset_index()
   level_0  a  level_2  0
0        2  3        4  1

>>> pd.DataFrame(f.to_records())
   level_0  a  level_1  0
0        2  3        4  1

>>> f.index.to_frame(index=False)
   0  a  2
0  2  3  4

Issue Description

DataFrame.to_records() is dealing with blank column labels by counting occurences and prefixing with "level_"
Index and MultiIndex.to_frame() is using the un-prefixed column count.
The rest of Pandas uses the column count prefixed with "level_"

Expected Behavior

>>> pd.DataFrame(f.to_records())
   level_0  a  level_2  0
0        2  3        4  1

>>> f.index.to_frame(index=False)
   level_0  a  level_2
0        2  3        4

Installed Versions

INSTALLED VERSIONS

commit : 04b538a
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252

pandas : 1.4.0.dev0+1332.g04b538a553
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 59.2.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.25.0
sphinx : 4.3.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.4
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.1
gcsfs : 2021.11.0
matplotlib : 3.5.0
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.2
sqlalchemy : 1.4.27
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

@johnzangwill johnzangwill added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2021
@phofl
Copy link
Member

phofl commented Dec 10, 2021

Yep to records is off.

Not 100% sure what I would expect in the DataFrame case, but your example looks reasonable

@mattclin
Copy link

@johnzangwill I'm looking to start contributing. Do you mind if I work on this?

@johnzangwill
Copy link
Contributor Author

@mattclin actually, I would prefer that you found something else, if possible. I have already done most of it and would like to try to get it into 1.4, due end December. There could be some problems, since we are changing the behavior of a much-used method...

@johnzangwill
Copy link
Contributor Author

johnzangwill commented Dec 13, 2021

@phofl Thanks, yes, to_records() is clearly wrong.
index.to_frame() is problematic. There are at least three ways of dealing with unspecified columns:

  1. Single index -> "index"
  2. Multiple index -> "level_0", "level_1", ...
  3. Multiple index -> 0, 1, ...

Here is an example of all of them!:

>>>pd.DataFrame([[0]]).reset_index().reset_index()
   level_0  index  0
0        0      0  0

DataFrame(tuples) uses 3. so I suspect that index.to_frame() should be left alone.
But it is certainly worthy of discussion...

@johnzangwill
Copy link
Contributor Author

take

@jreback jreback added this to the 1.4 milestone Dec 15, 2021
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants