Skip to content

Commit c7bfb4e

Browse files
committed
Merge pull request #7891 from jreback/index
CLN/INT: remove Index as a sub-class of NDArray
2 parents 83ed483 + 8d3cb3f commit c7bfb4e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+1391
-758
lines changed

doc/source/api.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1104,6 +1104,8 @@ Modifying and Computations
11041104
Index.order
11051105
Index.reindex
11061106
Index.repeat
1107+
Index.take
1108+
Index.putmask
11071109
Index.set_names
11081110
Index.unique
11091111
Index.nunique

doc/source/indexing.rst

+7-1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,12 @@ indexing.
5252
should be avoided. See :ref:`Returning a View versus Copy
5353
<indexing.view_versus_copy>`
5454

55+
.. warning::
56+
57+
In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray``
58+
but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This should be
59+
a transparent change with only very limited API implications (See the :ref:`Internal Refactoring <whatsnew_0150.refactoring>`)
60+
5561
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies
5662

5763
Different Choices for Indexing (``loc``, ``iloc``, and ``ix``)
@@ -2175,7 +2181,7 @@ you can specify ``inplace=True`` to have the data change in place.
21752181
21762182
.. versionadded:: 0.15.0
21772183

2178-
``set_names``, ``set_levels``, and ``set_labels`` also take an optional
2184+
``set_names``, ``set_levels``, and ``set_labels`` also take an optional
21792185
`level`` argument
21802186

21812187
.. ipython:: python

doc/source/v0.15.0.txt

+21-2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ users upgrade to this version.
1010
- Highlights include:
1111

1212
- The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here <whatsnew_0150.cat>`
13+
- Internal refactoring of the ``Index`` class to no longer sub-class ``ndarray``, see :ref:`Internal Refactoring <whatsnew_0150.refactoring>`
1314

1415
- :ref:`Other Enhancements <whatsnew_0150.enhancements>`
1516

@@ -25,6 +26,12 @@ users upgrade to this version.
2526

2627
- :ref:`Bug Fixes <whatsnew_0150.bug_fixes>`
2728

29+
.. warning::
30+
31+
In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray``
32+
but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be
33+
a transparent change with only very limited API implications (See the :ref:`Internal Refactoring <whatsnew_0150.refactoring>`)
34+
2835
.. _whatsnew_0150.api:
2936

3037
API changes
@@ -155,6 +162,18 @@ previously results in ``Exception`` or ``TypeError`` (:issue:`7812`)
155162
didx
156163
didx.tz_localize(None)
157164

165+
.. _whatsnew_0150.refactoring:
166+
167+
Internal Refactoring
168+
~~~~~~~~~~~~~~~~~~~~
169+
170+
In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray``
171+
but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be
172+
a transparent change with only very limited API implications (:issue:`5080`,:issue:`7439`,:issue:`7796`)
173+
174+
- you may need to unpickle pandas version < 0.15.0 pickles using ``pd.read_pickle`` rather than ``pickle.load``. See :ref:`pickle docs <io.pickle>`
175+
- when plotting with a ``PeriodIndex``. The ``matplotlib`` internal axes will now be arrays of ``Period`` rather than a ``PeriodIndex``. (this is similar to how a ``DatetimeIndex`` passess arrays of ``datetimes`` now)
176+
158177
.. _whatsnew_0150.cat:
159178

160179
Categoricals in Series/DataFrame
@@ -278,7 +297,7 @@ Performance
278297
~~~~~~~~~~~
279298

280299
- Performance improvements in ``DatetimeIndex.__iter__`` to allow faster iteration (:issue:`7683`)
281-
300+
- Performance improvements in ``Period`` creation (and ``PeriodIndex`` setitem) (:issue:`5155`)
282301

283302

284303

@@ -386,7 +405,7 @@ Bug Fixes
386405
- Bug in ``GroupBy.filter()`` where fast path vs. slow path made the filter
387406
return a non scalar value that appeared valid but wasn't (:issue:`7870`).
388407
- Bug in ``date_range()``/``DatetimeIndex()`` when the timezone was inferred from input dates yet incorrect
389-
times were returned when crossing DST boundaries (:issue:`7835`, :issue:`7901`).
408+
times were returned when crossing DST boundaries (:issue:`7835`, :issue:`7901`).
390409

391410

392411

pandas/compat/pickle_compat.py

+45-40
Original file line numberDiff line numberDiff line change
@@ -5,29 +5,32 @@
55
import pandas
66
import copy
77
import pickle as pkl
8-
from pandas import compat
8+
from pandas import compat, Index
99
from pandas.compat import u, string_types
10-
from pandas.core.series import Series, TimeSeries
11-
from pandas.sparse.series import SparseSeries, SparseTimeSeries
12-
1310

1411
def load_reduce(self):
1512
stack = self.stack
1613
args = stack.pop()
1714
func = stack[-1]
15+
1816
if type(args[0]) is type:
1917
n = args[0].__name__
20-
if n == u('DeprecatedSeries') or n == u('DeprecatedTimeSeries'):
21-
stack[-1] = object.__new__(Series)
22-
return
23-
elif (n == u('DeprecatedSparseSeries') or
24-
n == u('DeprecatedSparseTimeSeries')):
25-
stack[-1] = object.__new__(SparseSeries)
26-
return
2718

2819
try:
29-
value = func(*args)
30-
except:
20+
stack[-1] = func(*args)
21+
return
22+
except Exception as e:
23+
24+
# if we have a deprecated function
25+
# try to replace and try again
26+
27+
if '_reconstruct: First argument must be a sub-type of ndarray' in str(e):
28+
try:
29+
cls = args[0]
30+
stack[-1] = object.__new__(cls)
31+
return
32+
except:
33+
pass
3134

3235
# try to reencode the arguments
3336
if getattr(self,'encoding',None) is not None:
@@ -57,6 +60,35 @@ class Unpickler(pkl.Unpickler):
5760
Unpickler.dispatch = copy.copy(Unpickler.dispatch)
5861
Unpickler.dispatch[pkl.REDUCE[0]] = load_reduce
5962

63+
def load_newobj(self):
64+
args = self.stack.pop()
65+
cls = self.stack[-1]
66+
67+
# compat
68+
if issubclass(cls, Index):
69+
obj = object.__new__(cls)
70+
else:
71+
obj = cls.__new__(cls, *args)
72+
73+
self.stack[-1] = obj
74+
Unpickler.dispatch[pkl.NEWOBJ[0]] = load_newobj
75+
76+
# py3 compat
77+
def load_newobj_ex(self):
78+
kwargs = self.stack.pop()
79+
args = self.stack.pop()
80+
cls = self.stack.pop()
81+
82+
# compat
83+
if issubclass(cls, Index):
84+
obj = object.__new__(cls)
85+
else:
86+
obj = cls.__new__(cls, *args, **kwargs)
87+
self.append(obj)
88+
try:
89+
Unpickler.dispatch[pkl.NEWOBJ_EX[0]] = load_newobj_ex
90+
except:
91+
pass
6092

6193
def load(fh, encoding=None, compat=False, is_verbose=False):
6294
"""load a pickle, with a provided encoding
@@ -74,11 +106,6 @@ def load(fh, encoding=None, compat=False, is_verbose=False):
74106
"""
75107

76108
try:
77-
if compat:
78-
pandas.core.series.Series = DeprecatedSeries
79-
pandas.core.series.TimeSeries = DeprecatedTimeSeries
80-
pandas.sparse.series.SparseSeries = DeprecatedSparseSeries
81-
pandas.sparse.series.SparseTimeSeries = DeprecatedSparseTimeSeries
82109
fh.seek(0)
83110
if encoding is not None:
84111
up = Unpickler(fh, encoding=encoding)
@@ -89,25 +116,3 @@ def load(fh, encoding=None, compat=False, is_verbose=False):
89116
return up.load()
90117
except:
91118
raise
92-
finally:
93-
if compat:
94-
pandas.core.series.Series = Series
95-
pandas.core.series.Series = TimeSeries
96-
pandas.sparse.series.SparseSeries = SparseSeries
97-
pandas.sparse.series.SparseTimeSeries = SparseTimeSeries
98-
99-
100-
class DeprecatedSeries(np.ndarray, Series):
101-
pass
102-
103-
104-
class DeprecatedTimeSeries(DeprecatedSeries):
105-
pass
106-
107-
108-
class DeprecatedSparseSeries(DeprecatedSeries):
109-
pass
110-
111-
112-
class DeprecatedSparseTimeSeries(DeprecatedSparseSeries):
113-
pass

pandas/core/base.py

+131-4
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from pandas.core import common as com
99
import pandas.core.nanops as nanops
1010
import pandas.tslib as tslib
11-
from pandas.util.decorators import cache_readonly
11+
from pandas.util.decorators import Appender, cache_readonly
1212

1313
class StringMixin(object):
1414

@@ -205,6 +205,19 @@ def __unicode__(self):
205205
quote_strings=True)
206206
return "%s(%s, dtype='%s')" % (type(self).__name__, prepr, self.dtype)
207207

208+
def _unbox(func):
209+
@Appender(func.__doc__)
210+
def f(self, *args, **kwargs):
211+
result = func(self.values, *args, **kwargs)
212+
from pandas.core.index import Index
213+
if isinstance(result, (np.ndarray, com.ABCSeries, Index)) and result.ndim == 0:
214+
# return NumPy type
215+
return result.dtype.type(result.item())
216+
else: # pragma: no cover
217+
return result
218+
f.__name__ = func.__name__
219+
return f
220+
208221
class IndexOpsMixin(object):
209222
""" common ops mixin to support a unified inteface / docs for Series / Index """
210223

@@ -238,6 +251,64 @@ def _wrap_access_object(self, obj):
238251

239252
return obj
240253

254+
# ndarray compatibility
255+
__array_priority__ = 1000
256+
257+
def transpose(self):
258+
""" return the transpose, which is by definition self """
259+
return self
260+
261+
T = property(transpose, doc="return the transpose, which is by definition self")
262+
263+
@property
264+
def shape(self):
265+
""" return a tuple of the shape of the underlying data """
266+
return self._data.shape
267+
268+
@property
269+
def ndim(self):
270+
""" return the number of dimensions of the underlying data, by definition 1 """
271+
return 1
272+
273+
def item(self):
274+
""" return the first element of the underlying data as a python scalar """
275+
return self.values.item()
276+
277+
@property
278+
def data(self):
279+
""" return the data pointer of the underlying data """
280+
return self.values.data
281+
282+
@property
283+
def itemsize(self):
284+
""" return the size of the dtype of the item of the underlying data """
285+
return self.values.itemsize
286+
287+
@property
288+
def nbytes(self):
289+
""" return the number of bytes in the underlying data """
290+
return self.values.nbytes
291+
292+
@property
293+
def strides(self):
294+
""" return the strides of the underlying data """
295+
return self.values.strides
296+
297+
@property
298+
def size(self):
299+
""" return the number of elements in the underlying data """
300+
return self.values.size
301+
302+
@property
303+
def flags(self):
304+
""" return the ndarray.flags for the underlying data """
305+
return self.values.flags
306+
307+
@property
308+
def base(self):
309+
""" return the base object if the memory of the underlying data is shared """
310+
return self.values.base
311+
241312
def max(self):
242313
""" The maximum value of the object """
243314
return nanops.nanmax(self.values)
@@ -340,6 +411,20 @@ def factorize(self, sort=False, na_sentinel=-1):
340411
from pandas.core.algorithms import factorize
341412
return factorize(self, sort=sort, na_sentinel=na_sentinel)
342413

414+
def searchsorted(self, key, side='left'):
415+
""" np.ndarray searchsorted compat """
416+
417+
### FIXME in GH7447
418+
#### needs coercion on the key (DatetimeIndex does alreay)
419+
#### needs tests/doc-string
420+
return self.values.searchsorted(key, side=side)
421+
422+
#----------------------------------------------------------------------
423+
# unbox reductions
424+
425+
all = _unbox(np.ndarray.all)
426+
any = _unbox(np.ndarray.any)
427+
343428
# facilitate the properties on the wrapped ops
344429
def _field_accessor(name, docstring=None):
345430
op_accessor = '_{0}'.format(name)
@@ -431,13 +516,17 @@ def asobject(self):
431516

432517
def tolist(self):
433518
"""
434-
See ndarray.tolist
519+
return a list of the underlying data
435520
"""
436521
return list(self.asobject)
437522

438523
def min(self, axis=None):
439524
"""
440-
Overridden ndarray.min to return an object
525+
return the minimum value of the Index
526+
527+
See also
528+
--------
529+
numpy.ndarray.min
441530
"""
442531
try:
443532
i8 = self.asi8
@@ -456,9 +545,30 @@ def min(self, axis=None):
456545
except ValueError:
457546
return self._na_value
458547

548+
def argmin(self, axis=None):
549+
"""
550+
return a ndarray of the minimum argument indexer
551+
552+
See also
553+
--------
554+
numpy.ndarray.argmin
555+
"""
556+
557+
##### FIXME: need some tests (what do do if all NaT?)
558+
i8 = self.asi8
559+
if self.hasnans:
560+
mask = i8 == tslib.iNaT
561+
i8 = i8.copy()
562+
i8[mask] = np.iinfo('int64').max
563+
return i8.argmin()
564+
459565
def max(self, axis=None):
460566
"""
461-
Overridden ndarray.max to return an object
567+
return the maximum value of the Index
568+
569+
See also
570+
--------
571+
numpy.ndarray.max
462572
"""
463573
try:
464574
i8 = self.asi8
@@ -477,6 +587,23 @@ def max(self, axis=None):
477587
except ValueError:
478588
return self._na_value
479589

590+
def argmax(self, axis=None):
591+
"""
592+
return a ndarray of the maximum argument indexer
593+
594+
See also
595+
--------
596+
numpy.ndarray.argmax
597+
"""
598+
599+
#### FIXME: need some tests (what do do if all NaT?)
600+
i8 = self.asi8
601+
if self.hasnans:
602+
mask = i8 == tslib.iNaT
603+
i8 = i8.copy()
604+
i8[mask] = 0
605+
return i8.argmax()
606+
480607
@property
481608
def _formatter_func(self):
482609
"""

0 commit comments

Comments
 (0)