Skip to content

Replace with bfill, ffill or pad on DataFrame #19632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reidy-p opened this issue Feb 10, 2018 · 1 comment · Fixed by #19894
Closed

Replace with bfill, ffill or pad on DataFrame #19632

reidy-p opened this issue Feb 10, 2018 · 1 comment · Fixed by #19894
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@reidy-p
Copy link
Contributor

reidy-p commented Feb 10, 2018

Code Sample, a copy-pastable example if possible

With a Series replace with method equal to 'bfill', 'ffill' or 'pad' and value=None works as expected

In [1]: s = pd.Series([0, 1, 2, 3, 4])
In [2]: s.replace([1, 2], method='bfill')
Out[2]: 
0    0
1    3
2    3
3    3
4    4
dtype: int64
In [3]: s.replace(1, method='bfill')
Out[3]: 
0    0
1    2
2    2
3    3
4    4
dtype: int64

But for a DataFrame with a list or scalar it throws an error:

In [4]: df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
                   'B': [0, 1, 2, 3, 4],
                   'C': ['a', 'b', 'c', 'd', 'e']})
In [5]: df.replace(to_replace=[1, 2], value=None, method='bfill')
Out[5]: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-b03353e82971> in <module>()
----> 1 df.replace(2, method='bfill')

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method, axis)
   4492             if isinstance(to_replace, (tuple, list)):
   4493                 return _single_replace(self, to_replace, method, inplace,
-> 4494                                        limit)
   4495 
   4496             if not is_dict_like(to_replace):

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _single_replace(self, to_replace, method, inplace, limit)
     74     if self.ndim != 1:
     75         raise TypeError('cannot replace {0} with method {1} on a {2}'
---> 76                         .format(to_replace, method, type(self).__name__))
     77 
     78     orig_dtype = self.dtype
TypeError: cannot replace [[1, 2]] with method bfill on a DataFrame
In [6]: df.replace(to_replace=2, value=None, method='bfill')
Out[6]:
TypeError                                 Traceback (most recent call last)
<ipython-input-27-b03353e82971> in <module>()
----> 1 df.replace(2, method='bfill')

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method, axis)
   4492             if isinstance(to_replace, (tuple, list)):
   4493                 return _single_replace(self, to_replace, method, inplace,
-> 4494                                        limit)
   4495 
   4496             if not is_dict_like(to_replace):

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _single_replace(self, to_replace, method, inplace, limit)
     74     if self.ndim != 1:
     75         raise TypeError('cannot replace {0} with method {1} on a {2}'
---> 76                         .format(to_replace, method, type(self).__name__))
     77 
     78     orig_dtype = self.dtype

TypeError: cannot replace [2] with method bfill on a DataFrame

Should the DataFrame case work in the same way as the Series or is this simply a user-error on my part?

Expected Output

DataFrame.replace(method=) to work in a similar way to Series.replace(method=)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IE.UTF-8
LOCALE: en_IE.UTF-8

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

this is essentially not implemented. This needs to be a per-column operation.

@jreback jreback added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Difficulty Intermediate labels Feb 10, 2018
@jreback jreback added this to the Next Major Release milestone Feb 10, 2018
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants