Improve docs about filtering #746

wesm · 2012-02-04T21:22:14Z

from mailing list "[pystatsmodels] read_csv + text file with one column"

Yes. Suppose that function f returns a boolean value given a column
value. To filter you would do:

df[df[column_to_filter_by].map(f)]

oftentimes filtering is using NumPy vector operations like

df[df[column] == val]

but sometimes you have an element-wise Python function you want to apply.

You said you are a new Python programmer so I can understand lambdas
and regular expressions looking weird =) Lambda is just an alternative
to doing something like:

def condition(x):
   return x.startswith('A')

df[df[column].map(condition)]

you could also do:

df[[condition(x) for x in df[column]]]

the map method is just a faster way of doing the list comprehension
(and returns an ndarray with the right data type)

The select method applies a function to the *axis labels*, directly
from the docstring:

In [1]: DataFrame.select?
Type:       instancemethod
Base Class: <type 'instancemethod'>
String Form:<unbound method DataFrame.select>
Namespace:  Interactive
File:       /home/wesm/code/pandas/pandas/core/generic.py
Definition: DataFrame.select(self, crit, axis=0)
Docstring:
Return data corresponding to axis labels matching criteria

Parameters
----------
crit : function
   To be called on each index (label). Should return True or False
axis : int

Returns
-------
selection : type of caller

hope this helps,
Wes

The text was updated successfully, but these errors were encountered:

gdraps · 2012-02-07T05:38:01Z

Hi Wes,

First off, thanks for pandas and your recent talk at the NYC Python meetup. On the topic of filtering, NumPy vector filters are awesome for numeric data, but I have found myself reaching for the following idioms when dealing with alpha-numeric columns:

df[df.method.contains('abc')]
df[df.method.startswith('ghi')]
df[df.method.endswith('xyz')]

Would you consider the addition of these methods to the Series class, not only to complement the existing isin() method, but to bridge the gap with SQL libraries, such as SQLAlchemy (http://docs.sqlalchemy.org/en/latest/core/expression_api.html#sqlalchemy.sql.operators.ColumnOperators), and improve conciseness of string queries in pandas?

Update: in the thread referenced, I see you've already thought about similar methods (match()) and that handling exceptions due to NA values, among other details, is the tricky bit. On the surface, throwing a type exception when a NA is encountered seems acceptable because it feels consistent with other Python idioms (for better or worse). e.g., ', '.join(x) throws an exception when x contains a non-string element

wesm · 2012-02-07T16:20:33Z

hi @gdraps there is actually an open issue about this, #620

wesm mentioned this issue Feb 7, 2012

Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

Closed

changhiskhan pushed a commit that referenced this issue May 6, 2012

DOC: more filtering examples. #746

7853f3f

wesm closed this as completed May 7, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve docs about filtering #746

Improve docs about filtering #746

wesm commented Feb 4, 2012

gdraps commented Feb 7, 2012

Uh oh!

wesm commented Feb 7, 2012

Uh oh!

Uh oh!

Improve docs about filtering #746

Improve docs about filtering #746

Comments

wesm commented Feb 4, 2012

gdraps commented Feb 7, 2012

Uh oh!

wesm commented Feb 7, 2012

Uh oh!