Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
It is a pretty common occurrence to have leading and trailing NaN values in a table or DataFrame. This is particularly true after joins and in timeseries data.
import numpy as np
import pandas as pd
df1 = pd.DataFrame({
'a': [1, 2, 3, 4, 5, 6],
'b': [np.NaN, 2, np.NaN, 4, 5, np.NaN],
})
Out[0]:
a b
0 1 NaN
1 2 2.0
2 3 NaN
3 4 4.0
4 5 5.0
5 6 NaN
See this stack overflow question for more examples (and common workarounds).
Feature Description
Potential solution:
df1.stripna()
Out[2]:
a b
1 2 2.0
2 3 NaN
3 4 4.0
4 5 5.0
Potential kwargs to pass could be:
- how : {‘any’, ‘all’}, default ‘any’
- axis : {0 or ‘index’, 1 or ‘columns’}, default 0
- subset : column label or sequence of labels, optional
- limit_direction : {{‘forward’, ‘backward’, ‘both’}}, Optional
Alternative Solutions
Another solution would be to add area_limit
(as in ffill
, bfill
and interpolate
) to dropna
. From the point of view of extending the API this is probably more intuitive for those with wider pandas knowledge of ffill
, bfill
and interpolate
, however, I would imagine the source code behind dropna
is written in an element-wise manner so there might be a lot of work to extend it. Just an uneducated guess?
For those coming from a more pure python world, stripna
is pretty intuitive as more are aware of str.strip
.
Additional Context
No response