-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.reindex with specified fill method fails for MultiIndex #23693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Indexing
Related to indexing on series/frames, not to indexes themselves
MultiIndex
Needs Tests
Unit test(s) needed to prevent regressions
Milestone
Comments
Interesting: When the targe index has an additional level and the level 0 is unchanged things work: i = pd.MultiIndex.from_tuples([('a',), ('d',)])
df = pd.DataFrame([[0, 7], [3, 4]], index=i, columns=['x', 'y'])
i2 = pd.MultiIndex.from_tuples([('a', 'b'), ('a', 't'), ('d', 'e'), ('d', 'f')])
df.reindex(i2, axis=0, method='ffill') When the level 0 is changed again wrong NA rows are introduced: i = pd.MultiIndex.from_tuples([('a', 'b'), ('d', 'e')])
df = pd.DataFrame([[0, 7], [3, 4]], index=i, columns=['x', 'y'])
i2 = pd.MultiIndex.from_tuples([('a', 'b'), ('a', 't'), ('d', 'e'), ('d', 'f')])
df.reindex(i2, axis=0, method='ffill') |
cc @toobaz |
Here is another example of the almost-identical issue I stumbled onto: from io import StringIO
r_dfshort = StringIO('''
a,timestamp,foo
A,2018-12-12,12
A,2019-01-02,2.1
A,2019-01-04,4.1
B,2019-01-02,2.2
B,2019-01-04,4.2
''')
r_dflong = StringIO('''
a,timestamp,bar
A,2019-01-01,1
A,2019-01-02,2
A,2019-01-03,3
A,2019-01-04,4
B,2019-01-01,1
B,2019-01-02,2
B,2019-01-03,3
B,2019-01-04,4
''')
d_dfshort = pd.read_csv(r_dfshort).set_index(['a','timestamp'])
d_dflong = pd.read_csv(r_dflong).set_index(['a','timestamp'])
a_dfshort = d_dfshort.reindex(index=d_dflong.index, method='ffill')
print(a_dfshort) Outputs: foo
a timestamp
A 2019-01-01 nan #right (although I'd prefer if it were "12")
2019-01-02 2.10
2019-01-03 nan #wrong
2019-01-04 4.10
B 2019-01-01 4.10 #right
2019-01-02 2.20
2019-01-03 4.10 #wrong
2019-01-04 4.20 Note that a workaround, "align" seems to work well: a_dfshort, _ = d_dfshort.align(d_dflong, axis='rows', method='ffill', join='right')
print(a_dfshort) Output
Click to expand
|
Code entry-point (and interesting commit to bisect): #9019 |
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Indexing
Related to indexing on series/frames, not to indexes themselves
MultiIndex
Needs Tests
Unit test(s) needed to prevent regressions
Code Sample, a copy-pastable example if possible
Problem description
The reindexing operation above introduces a row to the
MultiIndex
. When no fill method is specified the new row is added and filled with NA as expected.When
ffill
is specified the behavior is not explainable for me. The index is updated as expected but a NA row is added in the middle of the existing data.Expected Output
Not sure if
ffill
for MultiIndexes is designed like this, but I was hoping forOutput of
pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.23.4
pytest: 3.7.1
pip: 18.1
setuptools: 39.0.1
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.6.1
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: