DataFrameGroupBy.ffill with Duplicate Column Labels ValueError #25610

datatravelgit · 2019-03-08T21:37:20Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


df1 = pd.DataFrame({'field1': [1, 2, 3, 4],
                   'field2': [1, 2, 3, 4],
                   'field3': [1, 2, 3, 4]
                   })
df2 = pd.DataFrame({'field1': [1, 2, np.nan, 4],
                   })

same_col = pd.concat([df1, df2], axis=1)

print(same_col)
#    field1  field2  field3  field1
# 0       1       1       1     1.0
# 1       2       2       2     2.0
# 2       3       3       3     NaN
# 3       4       4       4     4.0


print(same_col.ffill())
#    field1  field2  field3  field1
# 0       1       1       1     1.0
# 1       2       2       2     2.0
# 2       3       3       3     2.0
# 3       4       4       4     4.0

for k, v in same_col.groupby(by=['field2']):
    print(v.ffill())
    #    field1  field2  field3  field1
    # 0       1       1       1     1.0
    #    field1  field2  field3  field1
    # 1       2       2       2     2.0
    #    field1  field2  field3  field1
    # 2       3       3       3     NaN
    #    field1  field2  field3  field1
    # 3       4       4       4     4.0

same_col.groupby(by=['field2']).ffill()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Problem description

A DataFrameGroupBy.ffill with 2 or more column with the same name produce an error:
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Pandas 0.22.0 did not have this bud. It seems that it was introduced recently. Or if this is an expected behaviour, it must be consistent with the behaviour of a DataFrame.ffill with 2 or more column with the same name and have a meaningful error.

Expected Output

#    field1  field2  field3  field1
# 0       1       1       1     1.0
# 1       2       2       2     2.0
# 2       3       3       3     NaN
# 3       4       4       4     4.0

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
pd.show_versions()
INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_GB.UTF-8
pandas: 0.24.1
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-03-09T19:03:23Z

OK thanks. This looks like it is a result of having duplicate column labels.

Investigation and PRs would certainly be welcome

Itay4 · 2019-03-10T20:52:58Z

@WillAyd How would expect this to be handled? Not allow duplicate column labels?

WillAyd · 2019-03-10T20:55:27Z

@Itay4 I think fine to allow. Probably something in the code base that can be switched to accessing by position instead of labels

WillAyd added Bug Groupby Regression Functionality that used to work in a prior pandas version labels Mar 9, 2019

WillAyd added this to the Contributions Welcome milestone Mar 9, 2019

WillAyd changed the title ~~DataFrameGroupBy.ffill - ValueError: Buffer has wrong number of dimensions (expected 1, got 2)~~ DataFrameGroupBy.ffill with Duplicate Column Labels ValueError Mar 9, 2019

WillAyd mentioned this issue Oct 7, 2019

Remove blocks from GroupBy Code #28782

Closed

phofl mentioned this issue Sep 13, 2020

[TST]: Groupy raised ValueError for ffill with duplicate column names #36326

Merged

4 tasks

phofl added Needs Tests Unit test(s) needed to prevent regressions and removed Bug Regression Functionality that used to work in a prior pandas version labels Sep 13, 2020

jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Sep 13, 2020

jreback modified the milestones: Contributions Welcome, 1.2 Sep 13, 2020

jreback closed this as completed in #36326 Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrameGroupBy.ffill with Duplicate Column Labels ValueError #25610

DataFrameGroupBy.ffill with Duplicate Column Labels ValueError #25610

datatravelgit commented Mar 8, 2019 •

edited

Loading

[paste the output of `pd.show_versions()` here below this line]
pd.show_versions()
INSTALLED VERSIONS

WillAyd commented Mar 9, 2019

Itay4 commented Mar 10, 2019

WillAyd commented Mar 10, 2019

DataFrameGroupBy.ffill with Duplicate Column Labels ValueError #25610

DataFrameGroupBy.ffill with Duplicate Column Labels ValueError #25610

Comments

datatravelgit commented Mar 8, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] pd.show_versions() INSTALLED VERSIONS

WillAyd commented Mar 9, 2019

Itay4 commented Mar 10, 2019

WillAyd commented Mar 10, 2019

datatravelgit commented Mar 8, 2019 •

edited

Loading

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
pd.show_versions()
INSTALLED VERSIONS