PERF: high memory in MI #15245

jreback · 2017-01-27T19:29:36Z

Creates an efficient MultiIndexHashTable in cython. This allows us to efficiently store a multi-index for fast indexing (.get_loc() and ``.get_indexer()`), with the current tuple-based (and gil holding) use of the PyObject Hash Table.

This uses the pandas.tools.hashing routines to hash each of the 'values' of a MI to a single uint64.

So this makes MI more memory friendly and much more efficient. You get these speedups, because the creation of the hashtable is now much more efficient.

    before     after       ratio
  [48fc9d61] [f47131ce]
-  871.05ms   423.93ms      0.49  reindex.Reindexing.time_reindex_multiindex
-  230.31ms    72.51ms      0.31  indexing.MultiIndexing.time_is_monotonic
-  747.11ms   199.49ms      0.27  indexing.MultiIndexing.time_multiindex_get_indexer

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

This doesn't quite close the above issue yet. Still working on getting rid of all/most of the actual instantiation of array-of-tuples. but that's the next stage.

jreback · 2017-01-27T19:32:21Z

pandas/indexes/multi.py

+        # fast-path
+        if not self.levels[0].is_monotonic:
+            return False
+


If someone comes up with a 'nice' way to do this (from a list of arrays), would be great.

@jreback are you trying to do this just from self.labels and self.levels, or are you okay with materializing the level values arrays? If the latter, I think you can do this with np.lexsort:

# lexsort_test.py import numpy as np import pandas as pd sorted_index = pd.MultiIndex.from_product([[0, 1, 2], ['a', 'b', 'c']]) unsorted_first = pd.MultiIndex.from_product([[0, 2, 1], ['a', 'b', 'c']]) unsorted_second = pd.MultiIndex.from_product([[0, 1, 2], ['a', 'c', 'b']]) duplicates = pd.MultiIndex.from_product([[0, 1, 2], ['b', 'b', 'b']]) def level_values(mi, num): return np.take(mi.levels[num], mi.labels[num]) def int64_is_monotonic(a): # This is needlessly expensive in the non-monotonic case. I assume this is # already implemented efficiently somewhere else in pandas. return (np.diff(a) >= 0).all() def multi_index_is_monotonic(mi): # reversed() because lexsort() wants the most significant key last. values = [level_values(mi, i) for i in reversed(range(len(mi.levels)))] sort_order = np.lexsort(values) return int64_is_monotonic(sort_order) print("Sorted: ", multi_index_is_monotonic(sorted_index)) print("Sorted With Duplicates:", multi_index_is_monotonic(duplicates)) print("Unsorted First:", multi_index_is_monotonic(unsorted_first)) print("Unsorted Second:", multi_index_is_monotonic(unsorted_second))

$ python lexsort_test.py Sorted: True Unsorted First: False Unsorted Second: False Sorted With Duplicates: True

One caveat here is that I'm not 100% sure that lexsort is guaranteed to be stable, which I think you'd need to avoid false negatives in the case where there are duplicates. The docs for lexsort itself don't seem to mention stability, but the See Also from argsort says: lexsort : Indirect stable sort with multiple keys.

before

In [1]: i = MultiIndex.from_product([np.arange(1000), np.arange(1000)], names=['one','two']) In [2]: %time i.is_monotonic CPU times: user 620 ms, sys: 42.1 ms, total: 662 ms Wall time: 662 ms

after

In [2]: %time i.is_monotonic CPU times: user 67.1 ms, sys: 14.1 ms, total: 81.2 ms Wall time: 80.8 ms

@ssanderson thanks! this seems to pass, pushed up a commit with this. for sure an improvement.

codecov-io · 2017-01-27T21:18:56Z

Codecov Report

Merging #15245 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15245      +/-   ##
==========================================
+ Coverage   90.34%   90.36%   +0.01%     
==========================================
  Files         135      135              
  Lines       49378    49425      +47     
==========================================
+ Hits        44613    44661      +48     
+ Misses       4765     4764       -1

Impacted Files	Coverage Δ
pandas/types/cast.py	`85.42% <ø> (ø)`	✅
pandas/io/pytables.py	`93.9% <100%> (ø)`	✅
pandas/core/frame.py	`97.87% <100%> (ø)`	✅
pandas/indexes/multi.py	`96.72% <100%> (+0.14%)`	✅
pandas/core/algorithms.py	`94.47% <100%> (-0.01%)`	❌
pandas/tools/hashing.py	`99.01% <100%> (ø)`	✅
pandas/indexes/base.py	`96.23% <100%> (ø)`	✅
pandas/tseries/index.py	`95.42% <ø> (+0.09%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbb583c...7df6c34. Read the comment docs.

jreback · 2017-01-30T16:31:50Z

cc @chris-b1 @jorisvandenbossche @wesm

wesm · 2017-01-30T21:05:06Z

pandas/src/hashtable_class_helper.pxi.in

+        from pandas.tools.hashing import hash_tuples
+        key = self.mi._as_valid_indexing_key(key)
+        value = hash_tuples(key)
+        k = kh_get_uint64(self.table, key)


value is not used here?

wesm · 2017-01-30T21:05:30Z

pandas/src/hashtable_class_helper.pxi.in

+        value = hash_tuples(key)
+
+        # should check nlevels == len tuple
+        k = kh_get_uint64(self.table, value)


how does this handle hash collisions?

not sure. haven't really looked into how khash resolves collisions.

Well, the issue is that the keys in the table are now integers, so you can resolve collisions but only collisions on the hashed value. So when you have a collision you don't have the original tuple available to compare against for equality. So you need a way to say "give me the tuple for the value that's currently occupying this slot" to see if it's equal to the tuple you just hashed.

I think there is a 1-1 mapping on tuple -> hashed value. IOW, these have a unique hashing guarantee. This is quite similar to the original impl where the PyObject pointer was the hashing value (which also is unique). So I am not sure that it is possible to actually have collisions (sure you can have collisions inside the hashtable itself, but we don't directly manage that, we simply ask for the value at the hash and get it).

Or am not understanding your point?

Having not looked at the details of pandas.tools.hashing too carefully (so take with grain of salt), it seems very difficult to establish a tuple to uint64_t mapping that cannot produce collisions (I'm not sure how you could prove it). The probability of collisions may be extremely small, but they are still possible.

here are some references on the hashing schemes:

python using siphash
siphash
blog by ameuer about hashing

So the short answer is, we don't deal with collisions at all. This is different than hash() which must be-definition as it has some idiosyncratic hashing built in (e.g. -1 is not possible), so it must deal with collisions.

Note that PyObject hashing doesn't deal with collisions either, though I suppose since its memory address based it by-definition is 1-1 with the tuple.

For a unique MI, I don't think in practice this is actually a problem, and if it shows up it is a bug. A tuple should be hashed 1-1 to its tuples. I suppose there is a theoretical possibility that this is violated.

So I suppose I could check, and I think it would be efficient (e.g. only constructing the wanted tuples as needed), and raise if this is the case. Let me see about that.

OTOH, a duplicate MI by-definition will not work with the scheme (in fact we don't use this indexer directly for duplicated MI).

@mikegraham any light?

0dae56a should provide a run-time assertion for this.

it will slow down worst case checking a bit (still about 2x faster than master)

In [1]: i = MultiIndex.from_product([np.arange(1000), np.arange(1000)], names=['one','two']) In [2]: v = i.values In [3]: %time i.get_indexer(v) CPU times: user 356 ms, sys: 84.4 ms, total: 441 ms Wall time: 441 ms Out[3]: array([ 0, 1, 2, ..., 999997, 999998, 999999])

but type of hit would only occur if you are actually reindexing from tuples themselves (and not an actual MI).

wesm · 2017-01-31T19:14:27Z

pandas/tests/indexes/test_multi.py

+
+        for i in [0, 1, len(index)-2, len(index)-1]:
+            result = index.get_loc(index[i])
+            self.assertEqual(result, i)


collisions could come from other tuples. rare, but possible

yep, that's why this is a non-smoke tests :>

if you have any better ideas on how to test, all ears.

The only fool-proof way to prevent mistaken lookups is to compare the tuples for equality -- keeping around big arrays of tuples is how we got into this mess in the first place. So you can

compute the hashed key value

look up the index location

construct the tuple for that index location

compare the tuple in the index with your tuple key

if they are unequal, raise a KeyError

With reindex operations, it gets more expense. But at least we aren't keeping around an huge internal array of tuples like we were before.

I already pushed code to do exactly this. 99f7cc1

I do a run-time check if there is a collision by comparison. It actually turned up a bug (separate)

jreback · 2017-01-31T19:35:44Z

ok, updated. had an insiduous condition in MultiIndex.equals with NaT only tuples. This was a latent bug.

jreback · 2017-01-31T21:36:09Z

pandas/indexes/multi.py

+        # reversed() because lexsort() wants the most significant key last.
+        values = [level_values(i) for i in reversed(range(len(self.levels)))]
+        sort_order = np.lexsort(values)
+        return Index(sort_order).is_monotonic


@ssanderson this avoids the worst case scenario as this is a short-circuit operator (in cython)

ssanderson · 2017-01-31T22:58:59Z

pandas/indexes/multi.py

+            unique = self.levels[level]
+            labels = self.labels[level]
+            return algos.take_1d(unique.values, labels,
+                                 fill_value=unique._na_value)


Can you have null values in the uniques here? If the level values contain NaNs, then I imagine the behavior of lexsort is undefined, which means you could get false positives.

for most dtypes this not s problem as we fill with iNaT for example in i8; only for floats this could be an issue i suppose

do you have an example that fails?

and i just added the try except which will catch type errors and fallback

I think the case where this could potentially fail currently is something like:

pd.MultiIndex.from_product([[1.0, np.nan, 2.0], ['a', 'b', 'c']])

You've made some interesting life choices if you've got NaNs in your index (or really if you're intentionally using a float index at all IMO), but I'd expect is_monotonic to be reliably False in this case, which I don't think my proposed implementation guarantees, since any sorting algorithm is going to return garbage if the inputs contain NaN.

yeah added as a test. it works, but subject to numpy sorting order to nan.

jreback · 2017-02-02T15:54:34Z

ok this is all set. any more comments

@wesm @jorisvandenbossche

jorisvandenbossche

small comment, but won't have time to look more in detail the coming days

jorisvandenbossche · 2017-02-02T18:54:21Z

pandas/indexes/base.py

@@ -1429,6 +1429,10 @@ def inferred_type(self):
        """ return a string of the type inferred from the values """
        return lib.infer_dtype(self)

+    def is_memory_usage_qualified(self):


Can we make this a private method?

jorisvandenbossche

Nice performance improvements! Did a review and added some comments (apart from the hashing-related code, that's not my cup of tea)

There is part of __delitem__ that is not covered anymore (https://codecov.io/gh/pandas-dev/pandas/pulls/15245/src/pandas/core/generic.py). This is due to change behaviour of __contains__ in the MultiIndexEngine:

In [22]: midx = pd.MultiIndex.from_product([['A', 'B'], [1,2]])

In [23]: 'A' in midx._engine
...
TypeError: must be convertible to a list-of-tuples

In [24]: ('A',) in midx._engine
...
KeyError:

Before this PR, both the above return False, taking the shortcut in that code.
The exact behaviour of __contains__ is maybe not that important/public, but this also results in some slight changes in behaviour of del:

In [32]: df = pd.DataFrame(np.random.randn(4,4), columns=midx)

In [33]: del df['A']

In [34]: df
Out[34]: 
          B          
          1         2
0 -1.170604 -0.905548
1 -1.198860 -1.163344
2  2.352208  0.214405
3 -0.520114 -0.702562

In [35]: del df['A']
...
IndexError: index 0 is out of bounds for axis 0 with size 0

## -> the above should be KeyError

In [37]: df = pd.DataFrame(np.random.randn(4,4), columns=midx)

In [38]: del df[('A',)]
...
KeyError: 

## -> the above worked previously

I have to say that I never use del, but still, the above changes can be fixed I think.

jorisvandenbossche · 2017-02-10T14:47:56Z

pandas/indexes/multi.py

@@ -656,9 +665,74 @@ def _has_complex_internals(self):
        return True

    @cache_readonly
+    def is_monotonic(self):


Docstring for public function

Something else, if you are updating is_monotonic, should is_monotonic_increasing/decreasing be updated accordingly?

jorisvandenbossche · 2017-02-11T13:06:31Z

pandas/indexes/multi.py

+                            # e.g. with embedded tuples, this might fail
+                            # TODO: should prob not allow construction of a MI
+                            # like this in the first place
+                            return Index(self.values).get_loc(key)


This doesn't seem covered by a test

turns out this is some old code (added for another reason). deleted.

jorisvandenbossche · 2017-02-11T13:25:28Z

pandas/indexes/multi.py

-            return False
+            if not isinstance(other, tuple):
+                return False
+            other = Index([other])


Is there a reason for this change?
I can remember we had some discussion about this before whether or not Index.equals should coerce it's argument to an index. But it seems a bit strange to do this solely for a single tuple.

this is to allow equivalence testing of a tuple and a MI which basically a list-of-tuples (though impl is different).

I don't really understand your comment here.
You mean you want to have pd.MultiIndex.from_tuples([('a', 1)]).equals(('a', 1)) be True? But why? Also for a single level Index, a 1-length Index does not equals to a scalar with the same value.

jorisvandenbossche · 2017-02-11T13:32:40Z

pandas/tests/indexes/test_multi.py

@@ -1425,8 +1448,6 @@ def test_equals_multi(self):

        self.assertFalse(self.index.equals(self.index[:-1]))

-        self.assertTrue(self.index.equals(self.index._tuple_index))


Still test this using pd.Index(self.index.values) instead of self.index._tuple_index? (unless this is covered somewhere else to test equality of MI against object Index of tuples).

_tuple_index is gone, that's part of the reason for this PR

Yes, I understand that, but we still want to test the fact that Index of tuples evaluates as equal.

added test. And they don't evaluate as equals (nor did they in master). the are equivalent, but not equal (you have to wrap it in a MI to make it equal, NOT an Index)

In [1]: major_axis = Index(['foo', 'bar', 'baz', 'qux']) ...: minor_axis = Index(['one', 'two']) ...: ...: major_labels = np.array([0, 0, 1, 2, 3, 3]) ...: minor_labels = np.array([0, 1, 0, 1, 0, 1]) ...: index=MultiIndex(levels=[major_axis, minor_axis], ...: labels=[major_labels, minor_labels ...: ], names=['foo','bar'], ...: verify_integrity=False) ...: In [2]: index Out[2]: MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two']], labels=[[0, 0, 1, 2, 3, 3], [0, 1, 0, 1, 0, 1]], names=['foo', 'bar']) In [3]: index.equals(index.values) Out[3]: False In [4]: index.equals(MultiIndex.from_tuples(index.values)) Out[4]: True In [6]: from pandas.core.index import _ensure_index In [7]: index.equals(_ensure_index(index.values)) Out[7]: True In [8]: index.equals(Index(index.values)) Out[8]: True

so I could make them compare equal (let me try), its more consistent this way

Sorry, that was not what I meant. I would just leave it as it was (we had a discussion about this before, and explicitly change arrays no longer to evaluate equal to Index objects). The behaviour was already good, my only question was about to keep the test (but with updated syntax of course)

See the discussion here: #13986 (comment). We changed Index.equals behaviour to return False for everything that is not an Index. I was not in favor of that change, but since we only changed that in 0.19.0, we should keep to that decision I think

hmm, I remember that. Though [25] should not be True then. (though I guess its an index, and NOT a list-like). But its NOT an MI either....

ok fixed back to original (with my small mod)

Though [25] should not be True then. (though I guess its an index, and NOT a list-like). But its NOT an MI either....

Yeah, you are right. I would not really have a problem with changing this to False (the comparison of MI with object index of tuples). That seems to follow our previous changes of making equals more strict.

jorisvandenbossche · 2017-02-11T13:35:48Z

pandas/tests/indexes/test_multi.py

-        assert not ind.is_monotonic
+        self.assertFalse(ind.is_monotonic)
+
+    def test_is_monotonic(self):


also include some tests for is_monotonic_increasing and is_monotonic_decreasing ?

jreback · 2017-02-11T15:12:22Z

@jorisvandenbossche fixed up. __contains__ was tested much, so behavior wasn't nailed down.

jorisvandenbossche · 2017-02-11T15:23:56Z

pandas/tests/frame/test_mutate_columns.py

+        assert len(df.columns) == 2
+
+        with pytest.raises(KeyError):
+            del df[('A',)]


Is it the purpose that this also raises if the A key is still in df. If so, I would put this before del df['A']. Otherwise we also need to test that this works on the initial df

jorisvandenbossche · 2017-02-11T15:34:49Z

They already compare equal, also on your PR branch. So nothing to change (only the test) (github is doing annoying, and I cannot find this thread in the web interface, but the above is about the `midx.equals(Index(midx.values))`)

jreback · 2017-02-11T15:46:56Z

We have this in 0.19.2 / master
seems odd

In [1]:         midx = MultiIndex.from_product([['A', 'B'], [1, 2]])
   ...:         df = DataFrame(np.random.randn(4, 4), columns=midx)
   ...: 

In [2]: 'A' in df
Out[2]: True

In [3]: ('A', ) in df
Out[3]: True

In [4]: del df['A']

In [5]: 'A' in df
Out[5]: True

In [6]: ('A', ) in df
Out[6]: False

jorisvandenbossche · 2017-02-12T21:50:08Z

In [5] seems certainly like a bug. Possibly because it's looking in the full level and not the actually used one?

jorisvandenbossche · 2017-02-12T21:56:26Z

pandas/tests/frame/test_mutate_columns.py

+        with pytest.raises(KeyError):
+            del df[('A',)]
+
+        assert 'A' in df.columns


As in my other comment, shouldn't we regard this as a bug?
Not needed to fix in this PR, but I would at least add a FIXME comment here so it is clear this test may change

#2770

This is an old / long discussed issue. I'll annotate the tess though.

jorisvandenbossche · 2017-02-12T22:00:50Z

pandas/tests/indexes/test_multi.py

+        # make sure take is not using -1
+        i = pd.MultiIndex.from_tuples([(0, pd.NaT),
+                                       (0, pd.Timestamp('20130101'))])
+        result = i[0:1].equals(i[0])


I assume this is the test related to my other comment. So what I also said there, is that this equivalence is not true for single-level indices:

In [15]: i = pd.Index([1,2,3]) In [16]: i[0:1] Out[16]: Int64Index([1], dtype='int64') In [17]: i[0:1].equals(i[0]) Out[17]: False

So why making this different for MultiIndex ?

was to make things a bit easier on collsision check, but you are right, not necessary (or correct), reverted.

wesm

LGTM modulo other pending comments

wesm · 2017-02-12T22:13:48Z

pandas/indexes/multi.py

+        except TypeError:
+
+            # we have mixed types and np.lexsort is not happy
+            return Index(self.values).is_monotonic


I presume the memory use / perf here is no worse than it was before -- where might this get called unexpectedly (causing possibly balooning memory use)?

yes, only thing that hits it is under py3 with something like this (e.g. with a np.nan in a level)

In [1]: mi = MultiIndex(levels=[[1, 2, 3, 4], ['gb00b03mlx29', 'lu0197800237', 'nl0000289783', 'nl0000289965', 'nl0000301109']], ...: labels=[[0, 1, 1, 2, 2, 2, 3], [4, 2, 0, 0, 1, 3, -1]], ...: names=['household_id', 'asset_id']) In [2]: mi.is_monotonic Out[2]: False In [3]: values = [mi._get_level_values(i) for i in reversed(range(len(mi.levels)))] In [4]: values Out[4]: [array(['nl0000301109', 'nl0000289783', 'gb00b03mlx29', 'gb00b03mlx29', 'lu0197800237', 'nl0000289965', nan], dtype=object), array([1, 2, 2, 3, 3, 3, 4])] In [5]: np.lexsort(values) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-5-b1af67214d96> in <module>() ----> 1 np.lexsort(values) TypeError: unorderable types: float() > str()

wesm · 2017-02-12T22:21:07Z

pandas/src/hashtable_class_helper.pxi.in

+            self._check_for_collisions(locs, key)
+            return True
+
+        return False


Here, perhaps

try: self.getitem(key) return True except (KeyError, ValueError, TypeError): return False

jreback · 2017-02-15T01:22:51Z

all changes pushed. WIll merge on pass tomorrow sometime unless any more comments.

@ssanderson

closes pandas-dev#13904 BUG: a qualifer (+) would always display with a MultiIndex, regardless if it needed deep introspection for memory usage PERF: rework MultiIndex.is_monotonic as per @ssanderson idea

jreback · 2017-03-02T17:18:09Z

neat article about hash collisions: https://medium.com/@robertgrosse/generating-64-bit-hash-collisions-to-dos-python-5b21404a5306#.wi6ivpqtv (more about cryptographic hashing, but still neat)

closes pandas-dev#13904 Creates an efficient MultiIndexHashTable in cython. This allows us to efficiently store a multi-index for fast indexing (.get_loc() and .get_indexer()), with the current tuple-based (and gil holding) use of the PyObject Hash Table. This uses the pandas.tools.hashing routines to hash each of the 'values' of a MI to a single uint64. So this makes MI more memory friendly and much more efficient. You get these speedups, because the creation of the hashtable is now much more efficient. Author: Jeff Reback <[email protected]> Closes pandas-dev#15245 from jreback/mi and squashes the following commits: 7df6c34 [Jeff Reback] PERF: high memory in MI

jreback added MultiIndex Performance Memory or execution speed performance labels Jan 27, 2017

jreback commented Jan 27, 2017

View reviewed changes

jreback changed the title ~~WIP: high memory in MI~~ PERF: high memory in MI Jan 27, 2017

jreback force-pushed the mi branch 2 times, most recently from e070cf0 to 0411455 Compare January 27, 2017 20:41

jreback force-pushed the mi branch 2 times, most recently from 7d430da to e13b5e0 Compare January 30, 2017 14:25

jreback added this to the 0.20.0 milestone Jan 30, 2017

wesm reviewed Jan 30, 2017

View reviewed changes

jreback force-pushed the mi branch 2 times, most recently from 9a2cb69 to 0dae56a Compare January 31, 2017 16:28

wesm reviewed Jan 31, 2017

View reviewed changes

jreback force-pushed the mi branch from cf8c051 to 762e6bf Compare January 31, 2017 19:35

jreback force-pushed the mi branch 2 times, most recently from 594f653 to 1583099 Compare January 31, 2017 21:35

jreback commented Jan 31, 2017

View reviewed changes

jreback force-pushed the mi branch from 1583099 to e056b3a Compare January 31, 2017 22:46

ssanderson reviewed Jan 31, 2017

View reviewed changes

jreback force-pushed the mi branch 4 times, most recently from f47131c to c3fd7e8 Compare February 2, 2017 13:11

jorisvandenbossche reviewed Feb 2, 2017

View reviewed changes

jreback force-pushed the mi branch 2 times, most recently from 6b2001f to 780f1dc Compare February 6, 2017 15:26

jreback mentioned this pull request Feb 10, 2017

BUG: conversion of uint64 to float when taking #15362

Closed

jreback force-pushed the mi branch from ed34da5 to 091754e Compare February 10, 2017 19:17

jorisvandenbossche reviewed Feb 11, 2017

View reviewed changes

jreback force-pushed the mi branch from 091754e to de4cd0d Compare February 11, 2017 15:11

jreback force-pushed the mi branch from de4cd0d to 6db38b8 Compare February 11, 2017 15:17

jorisvandenbossche reviewed Feb 11, 2017

View reviewed changes

jreback force-pushed the mi branch from 463f527 to dd7f12e Compare February 11, 2017 15:41

jreback force-pushed the mi branch 2 times, most recently from b07d50a to 97e1080 Compare February 12, 2017 17:39

jorisvandenbossche reviewed Feb 12, 2017

View reviewed changes

wesm reviewed Feb 12, 2017

View reviewed changes

jreback force-pushed the mi branch 2 times, most recently from 623d27c to 04d828b Compare February 13, 2017 03:26

jreback mentioned this pull request Feb 14, 2017

DataFrame.sort_values(inplace=True) is slow and eats too much memory #15389

Open

jreback force-pushed the mi branch from 04d828b to 7f06248 Compare February 15, 2017 00:32

jreback force-pushed the mi branch from 7f06248 to e9c8cad Compare February 15, 2017 02:51

PERF: high memory in MI

7df6c34

closes pandas-dev#13904 BUG: a qualifer (+) would always display with a MultiIndex, regardless if it needed deep introspection for memory usage PERF: rework MultiIndex.is_monotonic as per @ssanderson idea

jreback force-pushed the mi branch from e9c8cad to 7df6c34 Compare February 15, 2017 03:26

jreback closed this in e351ed0 Feb 15, 2017

chris-b1 mentioned this pull request Mar 14, 2017

Signature of get_level_values not as documented #15681

Closed

jorisvandenbossche mentioned this pull request May 11, 2017

PERF: improved performance of small multiindexes #16324

Merged

toobaz mentioned this pull request Nov 27, 2017

New engine for MultiIndex? #18519

Closed

		@@ -1425,8 +1448,6 @@ def test_equals_multi(self):

		self.assertFalse(self.index.equals(self.index[:-1]))

		self.assertTrue(self.index.equals(self.index._tuple_index))

PERF: high memory in MI #15245

PERF: high memory in MI #15245

Conversation

jreback commented Jan 27, 2017 • edited Loading

Choose a reason for hiding this comment

ssanderson Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

jreback Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Jan 27, 2017 • edited Loading

Codecov Report

jreback commented Jan 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 2, 2017

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 11, 2017

Choose a reason for hiding this comment

jorisvandenbossche commented Feb 11, 2017 via email • edited Loading

jreback commented Feb 11, 2017

jorisvandenbossche commented Feb 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wesm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 15, 2017

jreback commented Mar 2, 2017 • edited Loading

jreback commented Jan 27, 2017 •

edited

Loading

ssanderson Jan 31, 2017 •

edited

Loading

jreback Jan 31, 2017 •

edited

Loading

codecov-io commented Jan 27, 2017 •

edited

Loading

jreback Jan 31, 2017 •

edited

Loading

jorisvandenbossche left a comment •

edited

Loading

jorisvandenbossche commented Feb 11, 2017 via email •

edited

Loading

jreback Feb 15, 2017 •

edited

Loading

jreback commented Mar 2, 2017 •

edited

Loading