Skip to content

ENH: "stripna" for dropping leading or trailing NaNs #60162

Open
@joshdunnlime

Description

@joshdunnlime

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

It is a pretty common occurrence to have leading and trailing NaN values in a table or DataFrame. This is particularly true after joins and in timeseries data.

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
    'a': [1, 2, 3, 4, 5, 6],
    'b': [np.NaN, 2, np.NaN, 4, 5, np.NaN],
})

Out[0]:
    a   b
0   1   NaN
1   2   2.0
2   3   NaN
3   4   4.0
4   5   5.0
5   6   NaN

See this stack overflow question for more examples (and common workarounds).

Feature Description

Potential solution:

df1.stripna()

Out[2]:
    a   b
1   2   2.0
2   3   NaN
3   4   4.0
4   5   5.0

Potential kwargs to pass could be:

  • how : {‘any’, ‘all’}, default ‘any’
  • axis : {0 or ‘index’, 1 or ‘columns’}, default 0
  • subset : column label or sequence of labels, optional
  • limit_direction : {{‘forward’, ‘backward’, ‘both’}}, Optional

Alternative Solutions

Another solution would be to add area_limit (as in ffill, bfill and interpolate) to dropna. From the point of view of extending the API this is probably more intuitive for those with wider pandas knowledge of ffill, bfill and interpolate, however, I would imagine the source code behind dropna is written in an element-wise manner so there might be a lot of work to extend it. Just an uneducated guess?

For those coming from a more pure python world, stripna is pretty intuitive as more are aware of str.strip.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions