Idea: Frame/series should support grep-like context specification #3269

ghost · 2013-04-07T19:10:20Z

In shell grep you can do stuff like:

λ seq 1000 | grep -C 3 500
497
498
499
**500**
501
502
503

λ seq 1000 | grep -B 1 -A 3 500
499
**500**
501
502
503

I'd be glad to have something like

s.grep_index(pat_or_pred,A=1,B=3) # -> list of series/df , each of len(A+B) or C
s.grep_index(pat_or_pred,C=5)
also
s.grep_index(label_list,C=5)

in core.

An example use-case:
looking for point of change in a timeseries, where you want to
grab a window around a point of change:

s = some_timeseries
# pickout index label where a big change occured
cps = s[abs(s.diff()- threshold) > threshold]
# cps is a list of labels for "interest points"
list_of_s = s.grep_index(cps,C=5)

s[0]
< 5 rows from s, with the middle one being the first "intrest point" in cps >

the "return list of series" is really a sort of variant on groupby,
and could concievably be implemented on top of something like
#3101, after de-warting.

Would also love to have this working with #2460 (if it ever makes it in)
for the same (but slow) on plain,unindexed data columns.

The text was updated successfully, but these errors were encountered:

petehuang · 2016-12-27T23:36:17Z

Hi,

This issue's last interaction was in 2013. I noticed you had paused work on the PR - are you aware of any other efforts to pick this up? Does this continue to represent a useful enhancement?

jreback · 2016-12-29T11:12:55Z

things like this are contemplated in pandas 2.0

MattFaus · 2017-03-02T19:36:36Z

I came up with this for now:

def find_with_context(base_idx, before=0, after=0, context=0):
    if context != 0:
        before = context
        after = context
        
    ret_idx = base_idx
    for i in range(1, before + 1):
        ret_idx |= base_idx.shift(-i)
        
    for i in range(1, after + 1):
        ret_idx |= base_idx.shift(i)
        
    return ret_idx

df[find_with_context(df.my_column != 1.0, before=1)]

robrechtdr · 2019-12-31T12:56:45Z

Just want to give big thumbs up for a grep-like functionality.

For me the most useful part of it is for a dataframe search across all columns (stringified values). This is hugely time saving functionality. For now using the following:

df[np.logical_or.reduce([df[col].astype(str).str.contains("somethingtosearchfor") for col in df.columns])]

It's a big time saver to not always have to first inspect the columns and/or values and only then perform a filter specifying a column name when debugging; as this need arises very often. So it would save time on having to find the right column and also having to type the column name (you can't always use auto-complete with the dot-syntax as far as I'm aware like when a columns contains spaces (which happens often). Also, an auto-complete supported debugging tool is not always immediately accessible).

ghost mentioned this issue Apr 8, 2013

WIP: add df.dgrep, df.neighbours #3276

Closed

5 tasks

jreback closed this as completed Dec 29, 2016

jorisvandenbossche modified the milestones: No action, Someday Dec 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Frame/series should support grep-like context specification #3269

Idea: Frame/series should support grep-like context specification #3269

ghost commented Apr 7, 2013

petehuang commented Dec 27, 2016

jreback commented Dec 29, 2016

MattFaus commented Mar 2, 2017 •

edited

Loading

robrechtdr commented Dec 31, 2019 •

edited

Loading

Idea: Frame/series should support grep-like context specification #3269

Idea: Frame/series should support grep-like context specification #3269

Comments

ghost commented Apr 7, 2013

petehuang commented Dec 27, 2016

jreback commented Dec 29, 2016

MattFaus commented Mar 2, 2017 • edited Loading

robrechtdr commented Dec 31, 2019 • edited Loading

MattFaus commented Mar 2, 2017 •

edited

Loading

robrechtdr commented Dec 31, 2019 •

edited

Loading