BUG: Inconsistent conversion of missing column names #44818

johnzangwill · 2021-12-08T13:27:16Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
f = pd.DataFrame([1], index=pd.MultiIndex.from_arrays([[2],[3],[4]], names=[None, "a", None]))

>>> f.reset_index()
   level_0  a  level_2  0
0        2  3        4  1

>>> pd.DataFrame(f.to_records())
   level_0  a  level_1  0
0        2  3        4  1

>>> f.index.to_frame(index=False)
   0  a  2
0  2  3  4

Issue Description

DataFrame.to_records() is dealing with blank column labels by counting occurences and prefixing with "level_"
Index and MultiIndex.to_frame() is using the un-prefixed column count.
The rest of Pandas uses the column count prefixed with "level_"

Expected Behavior

>>> pd.DataFrame(f.to_records())
   level_0  a  level_2  0
0        2  3        4  1

>>> f.index.to_frame(index=False)
   level_0  a  level_2
0        2  3        4

Installed Versions

INSTALLED VERSIONS

commit : 04b538a
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252

pandas : 1.4.0.dev0+1332.g04b538a553
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 59.2.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.25.0
sphinx : 4.3.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.4
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.1
gcsfs : 2021.11.0
matplotlib : 3.5.0
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.2
sqlalchemy : 1.4.27
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

The text was updated successfully, but these errors were encountered:

phofl · 2021-12-10T10:23:18Z

Yep to records is off.

Not 100% sure what I would expect in the DataFrame case, but your example looks reasonable

mattclin · 2021-12-13T04:09:27Z

@johnzangwill I'm looking to start contributing. Do you mind if I work on this?

johnzangwill · 2021-12-13T15:06:45Z

@mattclin actually, I would prefer that you found something else, if possible. I have already done most of it and would like to try to get it into 1.4, due end December. There could be some problems, since we are changing the behavior of a much-used method...

johnzangwill · 2021-12-13T19:49:06Z

@phofl Thanks, yes, to_records() is clearly wrong.
index.to_frame() is problematic. There are at least three ways of dealing with unspecified columns:

Single index -> "index"
Multiple index -> "level_0", "level_1", ...
Multiple index -> 0, 1, ...

Here is an example of all of them!:

>>>pd.DataFrame([[0]]).reset_index().reset_index()
   level_0  index  0
0        0      0  0

DataFrame(tuples) uses 3. so I suspect that index.to_frame() should be left alone.
But it is certainly worthy of discussion...

johnzangwill · 2021-12-14T11:59:19Z

take

johnzangwill added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2021

github-actions bot assigned johnzangwill Dec 14, 2021

johnzangwill mentioned this issue Dec 14, 2021

BUG: Inconsistent conversion of missing column names #44878

Merged

4 tasks

jreback added this to the 1.4 milestone Dec 15, 2021

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 15, 2021

jreback closed this as completed in #44878 Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Inconsistent conversion of missing column names #44818

BUG: Inconsistent conversion of missing column names #44818

johnzangwill commented Dec 8, 2021

INSTALLED VERSIONS

phofl commented Dec 10, 2021

mattclin commented Dec 13, 2021

johnzangwill commented Dec 13, 2021

johnzangwill commented Dec 13, 2021 •

edited

Loading

johnzangwill commented Dec 14, 2021

BUG: Inconsistent conversion of missing column names #44818

BUG: Inconsistent conversion of missing column names #44818

Comments

johnzangwill commented Dec 8, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Dec 10, 2021

mattclin commented Dec 13, 2021

johnzangwill commented Dec 13, 2021

johnzangwill commented Dec 13, 2021 • edited Loading

johnzangwill commented Dec 14, 2021

johnzangwill commented Dec 13, 2021 •

edited

Loading