BUG: Rowwise subset of a DataFrame based on index using .loc #7690

Gitman-code · 2014-07-08T08:14:11Z

https://stackoverflow.com/questions/24536734/rowwise-subset-of-a-dataframe-based-on-index

This should act like an inner join but it adds NULL rows to the table being operated on. At a very least it should raise an error when the index is not found.

jreback · 2014-07-08T11:43:03Z

dupe here: #2033

This was an original API decision to make .loc/.ix act like a reindex when presented with a slice/index-like, e.g.

df.loc['a':'f'] only requires the inclusion of the end-points so
df.loc[list(....)] only requires at least 1 point that is included.

you can simply df.loc[list(...)].dropna() if you are doing an isin type of operation (or you can
do some sort of join if you want.)

further you would completely break (I suppose it could work with special casing, maybe), a very common operation:

In [3]: df = DataFrame(np.arange(6),index=pd.MultiIndex.from_product([['a','b'],range(3)]))

In [4]: df
Out[4]: 
     0
a 0  0
  1  1
  2  2
b 0  3
  1  4
  2  5

In [6]: df.loc[['a']]
Out[6]: 
     0
a 0  0
  1  1
  2  2

In [8]: df.index.values
Out[8]: array([('a', 0), ('a', 1), ('a', 2), ('b', 0), ('b', 1), ('b', 2)], dtype=object)

if you would like to enhance the docs that would be fine.

closing as a dupe

jreback · 2014-07-08T15:03:42Z

as a side note its possible somthing like:

df.loc(strict=True)[.....] could be done, that could set a 'strict' mode (and possibly have an option for this).
not sure how useful / trouble this is worth though

Gitman-code · 2014-07-08T15:28:16Z

I would be strongly in favor of such an implementation however I do not know how much effort it will be so the worth is unknown. I would think that this is basic functionality and I can't think of when the current functionality would be useful. Perhaps a option with set theory ('Union', 'intersection') or SQL ('Inner','Outer') semantics would be more clear in the implementation than 'strict'.

The current work around Smalldf = Smalldf[Smalldf.index.isin(Largedf.index))] as given in the stack overflow answer is more cumbersome and possibly slower than Smalldf = Smalldf.loc[Largedf.index]

Also, Smalldf = Smalldf.loc[Largedf.index].dropna() does not work because I often have meaningful NULL values.

jreback · 2014-07-08T15:38:37Z

this is an API issue, and as I stated has long been the case.

This is the usecase for isin. The point is that you maybe selecting values, but you don't need a KeyError, just because say a small number of values are missing. Pandas always wants to align to the index.

You have many options on what to do.

You are effectively doing a join, so you should explore that as well.

jreback closed this as completed Jul 8, 2014

jreback added Indexing labels Jul 8, 2014

jreback modified the milestone: 0.15.0 Jul 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Rowwise subset of a DataFrame based on index using .loc #7690

BUG: Rowwise subset of a DataFrame based on index using .loc #7690

Gitman-code commented Jul 8, 2014

jreback commented Jul 8, 2014

jreback commented Jul 8, 2014

Gitman-code commented Jul 8, 2014

jreback commented Jul 8, 2014

BUG: Rowwise subset of a DataFrame based on index using .loc #7690

BUG: Rowwise subset of a DataFrame based on index using .loc #7690

Comments

Gitman-code commented Jul 8, 2014

jreback commented Jul 8, 2014

jreback commented Jul 8, 2014

Gitman-code commented Jul 8, 2014

jreback commented Jul 8, 2014