Skip to content

Idea: Frame/series should support grep-like context specification #3269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Apr 7, 2013 · 4 comments
Closed

Idea: Frame/series should support grep-like context specification #3269

ghost opened this issue Apr 7, 2013 · 4 comments
Labels
Ideas Long-Term Enhancement Discussions

Comments

@ghost
Copy link

ghost commented Apr 7, 2013

In shell grep you can do stuff like:

λ seq 1000 | grep -C 3 500
497
498
499
**500**
501
502
503
λ seq 1000 | grep -B 1 -A 3 500
499
**500**
501
502
503

I'd be glad to have something like

s.grep_index(pat_or_pred,A=1,B=3) # -> list of series/df , each of len(A+B) or C
s.grep_index(pat_or_pred,C=5)
also
s.grep_index(label_list,C=5)

in core.

An example use-case:
looking for point of change in a timeseries, where you want to
grab a window around a point of change:

s = some_timeseries
# pickout index label where a big change occured
cps = s[abs(s.diff()- threshold) > threshold]
# cps is a list of labels for "interest points"
list_of_s = s.grep_index(cps,C=5)

s[0]
< 5 rows from s, with the middle one being the first "intrest point" in cps >

the "return list of series" is really a sort of variant on groupby,
and could concievably be implemented on top of something like
#3101, after de-warting.

Would also love to have this working with #2460 (if it ever makes it in)
for the same (but slow) on plain,unindexed data columns.

@ghost ghost mentioned this issue Apr 8, 2013
5 tasks
@petehuang
Copy link
Contributor

Hi,

This issue's last interaction was in 2013. I noticed you had paused work on the PR - are you aware of any other efforts to pick this up? Does this continue to represent a useful enhancement?

@jreback
Copy link
Contributor

jreback commented Dec 29, 2016

things like this are contemplated in pandas 2.0

@jreback jreback closed this as completed Dec 29, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Someday Dec 31, 2016
@MattFaus
Copy link

MattFaus commented Mar 2, 2017

I came up with this for now:

def find_with_context(base_idx, before=0, after=0, context=0):
    if context != 0:
        before = context
        after = context
        
    ret_idx = base_idx
    for i in range(1, before + 1):
        ret_idx |= base_idx.shift(-i)
        
    for i in range(1, after + 1):
        ret_idx |= base_idx.shift(i)
        
    return ret_idx

df[find_with_context(df.my_column != 1.0, before=1)]

@robrechtdr
Copy link

robrechtdr commented Dec 31, 2019

Just want to give big thumbs up for a grep-like functionality.

For me the most useful part of it is for a dataframe search across all columns (stringified values). This is hugely time saving functionality. For now using the following:

df[np.logical_or.reduce([df[col].astype(str).str.contains("somethingtosearchfor") for col in df.columns])]

It's a big time saver to not always have to first inspect the columns and/or values and only then perform a filter specifying a column name when debugging; as this need arises very often. So it would save time on having to find the right column and also having to type the column name (you can't always use auto-complete with the dot-syntax as far as I'm aware like when a columns contains spaces (which happens often). Also, an auto-complete supported debugging tool is not always immediately accessible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ideas Long-Term Enhancement Discussions
Projects
None yet
Development

No branches or pull requests

5 participants