API: map() on Index returns an Index, not array #14506

nateyoder · 2016-10-26T21:18:10Z

closes API: Index.map should return Index rather than array #12766
closes API: map() on Index returns an Index, not array #12798
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

This is a follow on to #12798.

nateyoder · 2016-10-26T21:21:23Z

Note that the following changes result in some additional changes to the API which should potentially be addressed so I would appreciate suggestions.

MultiIndex maps to Index: because mapping a function can reduce or increase the dimensionality of the index I am returning an Index from these operations rather than inspecting the data to determine whether the output should potentially be a MultiIndex
Categorical and CategoricalIndex: When a categorical cannot be returned returns an Index of the appropriate type rather than an np.ndarray

jreback · 2016-10-26T22:09:05Z

@nateyoder use ensure_index to sniff for Index/Multi_index. You can simply return a CategoricalIndex (rather than a plain Categorical).

This will need to be in 0.20.0

jreback · 2016-10-26T22:09:17Z

doc/source/whatsnew/v0.19.1.txt


+API changes


move to 0.20.0

jreback · 2016-10-26T22:09:39Z

pandas/core/categorical.py

@@ -943,7 +943,7 @@ def map(self, mapper):

        Returns
        -------
-        applied : Categorical or np.ndarray.
+        applied : Categorical or Index.


actually this is ok

jreback · 2016-10-26T22:10:26Z

pandas/tests/indexes/test_base.py

@@ -766,6 +766,26 @@ def test_sub(self):
        self.assertRaises(TypeError, lambda: idx - idx.tolist())
        self.assertRaises(TypeError, lambda: idx.tolist() - idx)

+    def test_map_identity_mapping(self):
+        for name, cur_index in self.indices.items():


can you add a comment with the appropriate github issue numbers (don't go crazy but where appropriate)

jreback · 2016-10-26T22:11:13Z

pandas/tests/indexes/test_base.py

+
+    def test_map_that_returns_tuples_creates_index_not_multi_index(self):
+        boolean_index = tm.makeIntIndex(3).map(lambda x: (x, x == 1))
+        expected = Index([(0, False), (1, True), (2, False)],


use ensure_index and this will be a MI

nateyoder · 2016-10-26T22:36:02Z

@jreback I may have misunderstood what you were saying but I gave using _ensure_index a shot with the below code but it didn't seem to do anything. I believe this may be because the map returns a np.ndarray rather than a list which would then be converted to a MultiIndex? Should I go ahead an put a tolist in there despite the performance drawbacks or am I totally barking up the wrong tree?

_ensure_index(Index(self._arrmap(self.values, mapper), **attributes))

jreback · 2016-10-26T22:41:15Z

hmm, actually it IS prob worth introspecting this and if it is ndim==1, then its no problem, if its is an array-like of tuples, then you can pass to MultiIndex.from_tuples directly.

nateyoder · 2016-10-26T23:28:23Z

Because the outputs of _arrmap were actually always contained in a 1d np.ndarray I actually checked for the presence of a tuple. Let me know if you'd prefer something else.

codecov-io · 2016-10-27T04:41:13Z

Current coverage is 85.31% (diff: 100%)

Merging #14506 into master will increase coverage by <.01%

@@             master     #14506   diff @@
==========================================
  Files           144        144          
  Lines         51016      51022     +6   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43522      43528     +6   
  Misses         7494       7494          
  Partials          0          0

Powered by Codecov. Last update 3ba2cff...95e4440

jorisvandenbossche

For DatetimeIndex (and probably timedelta and period as well), the return value is still an array if the result is not a datetime anymore (eg with .map(lambda x: 1) or .map(lambda x: x.hour)

jorisvandenbossche · 2016-10-27T08:19:14Z

pandas/indexes/base.py

-        applied : array
+        applied : Index
+            The output of the mapping function applied to the index.
+            If the function returns a tuple a


missing some content

jorisvandenbossche · 2016-10-27T08:40:21Z

pandas/tests/indexes/test_base.py

+        expected = Index([(0, ), (1, ), (2, )])
+        self.assert_index_equal(boolean_index, expected)
+
+    def test_map_that_reduces_multi_index_to_single_index_returns_index(self):


Can you shorten the test names a bit?
(or else just include them in one single test, then you can add the long name/explanation as a comment)

jorisvandenbossche · 2016-10-27T08:50:31Z

doc/source/whatsnew/v0.20.0.txt

@@ -38,6 +38,25 @@ Other enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+-  ``map`` on an ``Index`` now returns an ``Index``, not an array (:issue:`12766`)
+.. ipython:: python


Some sphinx sytax comments (sphinx is rather picky ..):

blank line above .. ipython ...

Can you also indent this line (and all lines below) with two spaces? (to be at the level of the content of the bullet point) -> so everything will be part of this bullet point

nateyoder · 2016-10-27T17:39:55Z

Great catch on the DatetimeIndex! Sorry I missed that. I'll try to update the PR today.

nateyoder · 2016-10-27T19:20:29Z

Unfortunately when making this change I ended up causing two other tests (series.test_apply.TestSeriesApply.test_apply_datetimetz and series.test_apply.TestSeriesMap.test_map_datetimetz) to fail because they became np.int64 instead of np.int32. Is this a significant failure that I should try to work around or should I change the dtype in the tests?

nateyoder · 2016-11-12T01:08:57Z

Just wanted to check and see if there were any additional actions you would like taken on this PR? It appeared to me as though the AppVeyor failure was not due to my changes but am not familiar with AppVeyor.

jreback · 2016-11-12T16:15:10Z

doc/source/whatsnew/v0.20.0.txt

@@ -38,8 +38,27 @@ Other enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-.. _whatsnew_0200.api:
+-  ``map`` on an ``Index`` now returns an ``Index``, not an array (:issue:`12766`)


make a new sub-section

jreback · 2016-11-12T16:17:23Z

pandas/tests/indexes/test_base.py

+    def test_map_identity_mapping(self):
+        # GH 12766
+        for name, cur_index in self.indices.items():
+            self.assert_index_equal(cur_index, cur_index.map(lambda x: x))


I think we just use assert_index_equal (not the self), same routine, but more consistent

Sounds good. I changed the ones related to my commit but left the other ones in test_base.py alone. Let me know if you'd like them changed as well.

jreback · 2016-11-12T16:18:42Z

pandas/tests/indexes/test_category.py

-        exp = pd.Categorical(list('ababc'), categories=list('cba'),
-                             ordered=True)
-        tm.assert_categorical_equal(result, exp)
+        exp = pd.CategoricalIndex(list('ababc'), categories=list('cba'),


show an example like this in the whatsnew as well (e.g. CategoryIndex.map -> CI rather than Category now)

Hi @jreback. Just wanted to touchbase here since I added this example and then you asked me to remove in another comment below. Did I add it in the wrong place or did you just decide it was a little overkill? Thanks.

it's too long for the what's new; so need to pare it down

💯 sounds good. I shortened it up like you suggested. Let me know if you'd like any other changes.

jreback · 2016-11-12T16:20:15Z

pandas/tests/series/test_apply.py

@@ -124,7 +124,7 @@ def test_apply_datetimetz(self):

        # change dtype
        result = s.apply(lambda x: x.hour)
-        exp = pd.Series(list(range(24)) + [0], name='XX', dtype=np.int32)
+        exp = pd.Series(list(range(24)) + [0], name='XX', dtype=np.int64)


hmm, I don't think this should have changed, these are normally int32s

any idea? (also this might not be the same on windows)

It's true it previously gave int32, but is there any reason for that? We almost always use int64 as the default integer size, and also currently Timestamp.hour gives you back int64 when you replace the map with an explicit loop:

In [20]: dtidx = pd.date_range(start='2012-01-01', periods=4) In [29]: dtidx.map(lambda x: x.hour) Out[29]: array([0, 0, 0, 0], dtype=int32) In [30]: np.array([x.hour for x in dtidx]) Out[30]: array([0, 0, 0, 0]) In [31]: np.array([x.hour for x in dtidx]).dtype Out[31]: dtype('int64')

It's probably the result of using asobject below

In [37]: dtidx.map(lambda x: x.hour) Out[37]: array([0, 0, 0, 0], dtype=int32) In [36]: dtidx.asobject.map(lambda x: x.hour) Out[36]: array([0, 0, 0, 0], dtype=int64) ``

this is actually an implementation detail (as they r stored as int32)

we can change but that should be separate PR

ideally just like to preserve here

jreback · 2016-11-12T16:21:22Z

pandas/tseries/base.py

                raise TypeError
            return result
        except Exception:
-            return _algos.arrmap_object(self.asobject.values, f)
+            return self.asobject.map(f)


this is better, using .values can change dtypes (esp timezones)

this might actually close another issue...

jreback · 2016-11-12T16:21:54Z

pandas/tseries/tests/test_converter.py

@@ -104,7 +104,7 @@ def test_dateindex_conversion(self):
        for freq in ('B', 'L', 'S'):
            dateindex = tm.makeDateIndex(k=10, freq=freq)
            rs = self.dtc.convert(dateindex, None, None)
-            xp = converter.dates.date2num(dateindex._mpl_repr())
+            xp = Index(converter.dates.date2num(dateindex._mpl_repr()))
            tm.assert_almost_equal(rs, xp, decimals)


compare as index

jreback · 2016-11-12T16:22:37Z

pandas/tseries/tests/test_timedeltas.py

@@ -1513,8 +1513,8 @@ def test_map(self):

        f = lambda x: x.days
        result = rng.map(f)
-        exp = np.array([f(x) for x in rng], dtype=np.int64)


I might show some of these examples in the whatsnew as well (e.g. period/timedelta)

jreback · 2016-11-12T16:25:57Z

@nateyoder sorry for the delay. this looks pretty good. couple of comments. Pls have a look at the dtype changes w.r.t. .apply(lambda x: x.hour); these should be int32.

@sinhrks thoughts

nateyoder · 2016-11-27T05:20:19Z

Hi @jreback. Sorry for my confusing message. I understand that it is the current behavior of indices to ALWAYS return int64 but if you would like to be able to return int32 rather than int64 types from map operations on Indices as you had said two weeks ago I believe that would have to change. Would you like me to try to make those changes or are you alright with map operations always returning int64 if an integer type is returned? Apologies again for the confusion.

jreback · 2016-11-27T17:02:19Z

not sure what you are referring

jorisvandenbossche · 2016-11-28T08:37:40Z

@jreback I think the confusion is this: you ask that the return dtype should be int32 (as before: return type of an int32 array), but for an Index this can never be the case as even an int32 array will result in a int64 index.

So for Index.map, the resulting dtype has to change in this PR.

The question remains of course if it is possible to keep the dtype of Series.map/apply. But, this is already a bit inconsistent at the moment:

In [45]: dtidx = pd.date_range("2012-01-01", freq='H', periods=5)

In [46]: dtidx.map(lambda x: x.hour)
Out[46]: array([0, 1, 2, 3, 4], dtype=int32)

In [47]: dtidx2 = dtidx.tz_localize("Europe/Brussels")

In [48]: dtidx2.map(lambda x: x.hour)
Out[48]: array([0, 1, 2, 3, 4], dtype=int32)

In [49]: pd.Series(dtidx).map(lambda x: x.hour)
Out[49]: 
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [50]: pd.Series(dtidx2).map(lambda x: x.hour)
Out[50]: 
0    0
1    1
2    2
3    3
4    4
dtype: int32

So for a series it gives in64 for datetimes, but int32 for timezone aware datetimes. So personally I wouldn't mind of all those case become int64 (for Index.map it will be in any case, only leaving tz aware Series to return int32)

jreback · 2016-11-28T10:57:46Z

@nateyoder @jorisvandenbossche response is good (I dont think this is a big deal that we r actually returning int32 from the datetime computations - can evaluate that on another issue)

I think my original comment was why we needed to change a test from int32 to int64; since we are always returning an Index which by definition have restricted dtypes the test needs to change

so ought to document this with an example of that (in whatsnew)

nateyoder · 2016-12-06T04:09:14Z

Sounds good. I updated the whatsnew and believe I have no addressed all of the comments.

jreback · 2016-12-06T11:15:13Z

doc/source/whatsnew/v0.20.0.txt

@@ -12,7 +12,7 @@ Highlights include:

 Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.

-.. contents:: What's new in v0.19.0


this is already fixed in master, so you may need to rebase

jreback

lgtm. just shorten up the whatsnew examples a bit. If you want refresh the docs a bit (with some examples), you can search where we use Index.map (I don't know if its even shown) in indexing.rst or advanced.rst. Can do this in a followup PR (or here if you want)

jreback · 2016-12-06T11:15:44Z

doc/source/whatsnew/v0.20.0.txt

+Map on Index types now return other Index types
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+-  ``map`` on an ``Index`` now returns an ``Index``, not an array (:issue:`12766`)


array -> np.array

or 'numpy array'

jreback · 2016-12-06T11:16:22Z

doc/source/whatsnew/v0.20.0.txt

+
+  .. ipython:: python
+
+     mi = MultiIndex.from_tuples([(1, 2), (2, 4)])


put this with where you define the Index as well. You can then put the previous/new for Multi with Index

jreback · 2016-12-06T11:16:45Z

doc/source/whatsnew/v0.20.0.txt

+
+
+-  ``map`` on an ``CategoricalIndex`` now returns a ``CategoricalIndex``, not a Categorical
+


you can skip this example (the CategoricalIndex) and below

jreback · 2016-12-06T11:17:43Z

pandas/indexes/base.py

-        applied : array
+        applied : Index
+            The output of the mapping function applied to the index.
+            If the function returns a tuple with more than one element


say the output Index type will be inferred

jreback · 2016-12-06T11:20:58Z

pandas/tseries/base.py

+            if isinstance(result, np.ndarray):
+                self._shallow_copy(result)
+
+            if not isinstance(result, Index):
                raise TypeError


I think this is caught in an outer scope and so is not seen. but can you add an informative message to this TypeError

jorisvandenbossche · 2016-12-14T13:26:17Z

@nateyoder Could you update for the latest comments of @jreback? Would be nice to get this in!

…categorical index; map on a categorical will either return a categorical or an index (rather than a numpy array)

… create a multiindex instead of an index

…ot a tseries; sphinx changes; fix docstring

…st to create the object

…other uses of assert_index_equal to testing instead os self

nateyoder · 2016-12-16T03:28:38Z

@jorisvandenbossche Sorry for the delay. Let me know if you notice anything else that needs to be updated.

Thanks I've enjoyed my first contribution!

jreback · 2016-12-16T11:23:50Z

doc/source/whatsnew/v0.20.0.txt

@@ -91,8 +91,75 @@ Other enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-.. _whatsnew_0200.api:



this needs a ref tag as well
.. _whatsnew.api_breaking.index_map

jreback · 2016-12-16T11:24:20Z

doc/source/whatsnew/v0.20.0.txt

+      mi.map(lambda x: x[0])
+
+
+-  ``map`` on a Series withe datetime64 values may return int64 dtypes rather than int32


withe -> with

jreback · 2016-12-16T11:26:42Z

lgtm. just a small doc change.

jreback · 2016-12-16T23:26:05Z

thanks!

great effort!

jorisvandenbossche · 2016-12-18T00:09:10Z

@nateyoder Thanks a lot!

closes pandas-dev#12766 closes pandas-dev#12798 This is a follow on to pandas-dev#12798. Author: Nate Yoder <[email protected]> Closes pandas-dev#14506 from nateyoder/index_map_index and squashes the following commits: 95e4440 [Nate Yoder] fix typo and add ref tag in whatsnew b36e83c [Nate Yoder] update whatsnew, fix documentation 4635e6a [Nate Yoder] compare as index a17ddab [Nate Yoder] Fix unused import and docstrings per pep8radius docformatter; change other uses of assert_index_equal to testing instead os self ab168e7 [Nate Yoder] Update whatsnew and add git PR to tests to denote changes 504c2a2 [Nate Yoder] Fix tests that weren't run by PyCharm 23c133d [Nate Yoder] Update tests to match dtype int64 07b772a [Nate Yoder] use the numpy results if we can to avoid repeating the computation just to create the object a110be9 [Nate Yoder] make map on time tseries indices return index if dtype of output is not a tseries; sphinx changes; fix docstring a596744 [Nate Yoder] introspect results from map so that if the output array has tuples we create a multiindex instead of an index 5fc66c3 [Nate Yoder] make map return an index if it operates on an index, multi index, or categorical index; map on a categorical will either return a categorical or an index (rather than a numpy array)

jreback reviewed Oct 26, 2016

View reviewed changes

doc/source/whatsnew/v0.19.1.txt

API changes

Copy link

Contributor

jreback Oct 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.20.0

jreback reviewed Oct 26, 2016

View reviewed changes

jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design Dtype Conversions Unexpected or buggy dtype conversions labels Oct 26, 2016

jorisvandenbossche reviewed Oct 27, 2016

View reviewed changes

jorisvandenbossche added this to the 0.20.0 milestone Oct 27, 2016

nateyoder force-pushed the index_map_index branch from 892a7a6 to aa0120f Compare October 27, 2016 19:23

nateyoder force-pushed the index_map_index branch from aa0120f to 77ca253 Compare November 6, 2016 01:23

jrings mentioned this pull request Nov 7, 2016

API: map() on Index returns an Index, not array #12798

Closed

4 tasks

jreback reviewed Nov 12, 2016

View reviewed changes

nateyoder force-pushed the index_map_index branch from d1fb027 to 9a28a29 Compare November 22, 2016 19:38

jreback reviewed Dec 6, 2016

View reviewed changes

jreback requested changes Dec 6, 2016

View reviewed changes

nateyoder added 10 commits December 15, 2016 18:40

make map return an index if it operates on an index, multi index, or …

5fc66c3

…categorical index; map on a categorical will either return a categorical or an index (rather than a numpy array)

introspect results from map so that if the output array has tuples we…

a596744

… create a multiindex instead of an index

make map on time tseries indices return index if dtype of output is n…

a110be9

…ot a tseries; sphinx changes; fix docstring

use the numpy results if we can to avoid repeating the computation ju…

07b772a

…st to create the object

Update tests to match dtype int64

23c133d

Fix tests that weren't run by PyCharm

504c2a2

Update whatsnew and add git PR to tests to denote changes

ab168e7

Fix unused import and docstrings per pep8radius docformatter; change …

a17ddab

…other uses of assert_index_equal to testing instead os self

compare as index

4635e6a

update whatsnew, fix documentation

b36e83c

nateyoder force-pushed the index_map_index branch from 3e5c421 to b36e83c Compare December 16, 2016 03:27

jreback reviewed Dec 16, 2016

View reviewed changes

jreback approved these changes Dec 16, 2016

View reviewed changes

fix typo and add ref tag in whatsnew

95e4440

jreback closed this in 6f4e36a Dec 16, 2016

jorisvandenbossche mentioned this pull request Dec 31, 2016

API: let DatetimeIndex date/time components return a new Index instead of array #15022

Closed

		@@ -12,7 +12,7 @@ Highlights include:

		Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations <whatsnew_0200.deprecations>` before updating.

		.. contents:: What's new in v0.19.0


		.. ipython:: python

		mi = MultiIndex.from_tuples([(1, 2), (2, 4)])



		- ``map`` on an ``CategoricalIndex`` now returns a ``CategoricalIndex``, not a Categorical

		mi.map(lambda x: x[0])


		- ``map`` on a Series withe datetime64 values may return int64 dtypes rather than int32

API: map() on Index returns an Index, not array #14506

API: map() on Index returns an Index, not array #14506

Conversation

nateyoder commented Oct 26, 2016 • edited Loading

nateyoder commented Oct 26, 2016 • edited Loading

jreback commented Oct 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateyoder commented Oct 26, 2016 • edited Loading

jreback commented Oct 26, 2016

nateyoder commented Oct 26, 2016

codecov-io commented Oct 27, 2016 • edited Loading

Current coverage is 85.31% (diff: 100%)

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateyoder commented Oct 27, 2016

nateyoder commented Oct 27, 2016

nateyoder commented Nov 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateyoder Nov 12, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateyoder Dec 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 12, 2016

nateyoder commented Nov 27, 2016

jreback commented Nov 27, 2016

jorisvandenbossche commented Nov 28, 2016

jreback commented Nov 28, 2016

nateyoder commented Dec 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 14, 2016

nateyoder commented Dec 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 16, 2016

jreback commented Dec 16, 2016

jorisvandenbossche commented Dec 18, 2016

nateyoder commented Oct 26, 2016 •

edited

Loading

nateyoder commented Oct 26, 2016 •

edited

Loading

nateyoder commented Oct 26, 2016 •

edited

Loading

codecov-io commented Oct 27, 2016 •

edited

Loading

nateyoder Nov 12, 2016 •

edited

Loading

nateyoder Dec 16, 2016 •

edited

Loading