Skip to content

Fix irregular Timestamp arithmetic types #6543 #6544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
17 changes: 16 additions & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ API Changes
or numbering columns as needed (:issue:`2385`)
- Slicing and advanced/boolean indexing operations on ``Index`` classes will no
longer change type of the resulting index (:issue:`6440`).
- ``set_index`` no longer converts MultiIndexes to an Index of tuples (:issue:`6459`).
- Slicing with negative start, stop & step values handles corner cases better (:issue:`6531`):
- ``df.iloc[:-len(df)]`` is now empty
- ``df.iloc[len(df)::-1]`` now enumerates all elements in reverse

Experimental Features
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -139,6 +143,7 @@ Improvements to existing features
Bug Fixes
~~~~~~~~~

- Bug in Series ValueError when index doesn't match data (:issue:`6532`)
- Bug in ``pd.DataFrame.sort_index`` where mergesort wasn't stable when ``ascending=False`` (:issue:`6399`)
- Bug in ``pd.tseries.frequencies.to_offset`` when argument has leading zeroes (:issue:`6391`)
- Bug in version string gen. for dev versions with shallow clones / install from tarball (:issue:`6127`)
Expand Down Expand Up @@ -180,7 +185,7 @@ Bug Fixes
- Bug in :meth:`DataFrame.replace` where nested dicts were erroneously
depending on the order of dictionary keys and values (:issue:`5338`).
- Perf issue in concatting with empty objects (:issue:`3259`)
- Clarify sorting of ``sym_diff`` on ``Index``es with ``NaN``s (:isssue:`6444`)
- Clarify sorting of ``sym_diff`` on ``Index``es with ``NaN``s (:issue:`6444`)
- Regression in ``MultiIndex.from_product`` with a ``DatetimeIndex`` as input (:issue:`6439`)
- Bug in ``str.extract`` when passed a non-default index (:issue:`6348`)
- Bug in ``str.split`` when passed ``pat=None`` and ``n=1`` (:issue:`6466`)
Expand All @@ -194,6 +199,16 @@ Bug Fixes
- Bug in ``read_html`` tests where redirected invalid URLs would make one test
fail (:issue:`6445`).
- Bug in multi-axis indexing using ``.loc`` on non-unique indices (:issue:`6504`)
- Bug that caused _ref_locs corruption when slice indexing across columns axis of a DataFrame (:issue:`6525`)
- Regression from 0.13 in the treatmenet of numpy ``datetime64`` non-ns dtypes in Series creation (:issue:`6529`)
- ``.names`` attribute of MultiIndexes passed to ``set_index`` are now preserved (:issue:`6459`).
- Bug in setitem with a duplicate index and an alignable rhs (:issue:`6541`)
- Bug in setitem with loc on mixed integer Indexes (:issue:`6546`)
- Bug in ``pd.read_stata`` which would use the wrong data types and missing values (:issue:`6327`)
- Bug in ``DataFrame.to_stata`` that lead to data loss in certain cases, and could exported using the
wrong data types and missing values (:issue:`6335`)
- Inconsistent types in Timestamp addition/subtraction (:issue:`6543`)


pandas 0.13.1
-------------
Expand Down
46 changes: 46 additions & 0 deletions doc/source/v0.14.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,49 @@ These are out-of-bounds selections
.. ipython:: python

i[[0,1,2]].astype(np.int_)
- ``set_index`` no longer converts MultiIndexes to an Index of tuples. For example,
the old behavior returned an Index in this case (:issue:`6459`):

.. ipython:: python
:suppress:

from itertools import product
tuples = list(product(('a', 'b'), ('c', 'd')))
mi = MultiIndex.from_tuples(tuples)
df_multi = DataFrame(np.random.randn(4, 2), index=mi)
tuple_ind = pd.Index(tuples)

.. ipython:: python

df_multi.index

@suppress
df_multi.index = tuple_ind

# Old behavior, casted MultiIndex to an Index
df_multi.set_index(df_multi.index)

@suppress
df_multi.index = mi

# New behavior
df_multi.set_index(df_multi.index)

This also applies when passing multiple indices to ``set_index``:

.. ipython:: python

@suppress
df_multi.index = tuple_ind

# Old output, 2-level MultiIndex of tuples
df_multi.set_index([df_multi.index, df_multi.index])

@suppress
df_multi.index = mi

# New output, 4-level MultiIndex
df_multi.set_index([df_multi.index, df_multi.index])


MultiIndexing Using Slicers
Expand Down Expand Up @@ -248,6 +291,9 @@ Enhancements
using ``DataFrame.to_csv`` (:issue:`5414`, :issue:`4528`)
- Added a ``to_julian_date`` function to ``TimeStamp`` and ``DatetimeIndex``
to convert to the Julian Date used primarily in astronomy. (:issue:`4041`)
- ``DataFrame.to_stata`` will now check data for compatibility with Stata data types
and will upcast when needed. When it isn't possibly to losslessly upcast, a warning
is raised (:issue:`6327`)

Performance
~~~~~~~~~~~
Expand Down
9 changes: 3 additions & 6 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def isnull(obj):

See also
--------
pandas.notnull: boolean inverse of pandas.isnull
pandas.notnull: boolean inverse of pandas.isnull
"""
return _isnull(obj)

Expand Down Expand Up @@ -272,7 +272,7 @@ def notnull(obj):
isnulled : array-like of bool or bool
Array or bool indicating whether an object is *not* null or if an array
is given which of the element is *not* null.

See also
--------
pandas.isnull : boolean inverse of pandas.notnull
Expand Down Expand Up @@ -1727,10 +1727,7 @@ def _possibly_cast_to_datetime(value, dtype, coerce=False):
dtype = value.dtype

if dtype.kind == 'M' and dtype != _NS_DTYPE:
try:
value = tslib.array_to_datetime(value)
except:
raise
value = value.astype(_NS_DTYPE)

elif dtype.kind == 'm' and dtype != _TD_DTYPE:
from pandas.tseries.timedeltas import \
Expand Down
15 changes: 9 additions & 6 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1867,11 +1867,6 @@ def eval(self, expr, **kwargs):
kwargs['resolvers'] = kwargs.get('resolvers', ()) + resolvers
return _eval(expr, **kwargs)

def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):
axis = self._get_block_manager_axis(axis)
new_data = self._data.get_slice(
slobj, axis=axis, raise_on_error=raise_on_error)
return self._constructor(new_data)

def _box_item_values(self, key, values):
items = self.columns[self.columns.get_loc(key)]
Expand Down Expand Up @@ -2240,7 +2235,15 @@ def set_index(self, keys, drop=True, append=False, inplace=False,

to_remove = []
for col in keys:
if isinstance(col, Series):
if isinstance(col, MultiIndex):
# append all but the last column so we don't have to modify
# the end of this loop
for n in range(col.nlevels - 1):
arrays.append(col.get_level_values(n))

level = col.get_level_values(col.nlevels - 1)
names.extend(col.names)
elif isinstance(col, (Series, Index)):
level = col.values
names.append(col.name)
elif isinstance(col, (list, np.ndarray)):
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1079,6 +1079,16 @@ def _clear_item_cache(self, i=None):
else:
self._item_cache.clear()

def _slice(self, slobj, axis=0, typ=None):
"""
Construct a slice of this container.

typ parameter is maintained for compatibility with Series slicing.

"""
axis = self._get_block_manager_axis(axis)
return self._constructor(self._data.get_slice(slobj, axis=axis))

def _set_item(self, key, value):
self._data.set(key, value)
self._clear_item_cache()
Expand Down
32 changes: 30 additions & 2 deletions pandas/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,29 @@ def _convert_list_indexer(self, key, typ=None):
""" convert a list indexer. these should be locations """
return key

def _convert_list_indexer_for_mixed(self, keyarr, typ=None):
""" passed a key that is tuplesafe that is integer based
and we have a mixed index (e.g. number/labels). figure out
the indexer. return None if we can't help
"""
if com.is_integer_dtype(keyarr) and not self.is_floating():
if self.inferred_type != 'integer':
keyarr = np.where(keyarr < 0,
len(self) + keyarr, keyarr)

if self.inferred_type == 'mixed-integer':
indexer = self.get_indexer(keyarr)
if (indexer >= 0).all():
return indexer

from pandas.core.indexing import _maybe_convert_indices
return _maybe_convert_indices(indexer, len(self))

elif not self.inferred_type == 'integer':
return keyarr

return None

def _convert_indexer_error(self, key, msg=None):
if msg is None:
msg = 'label'
Expand Down Expand Up @@ -987,8 +1010,13 @@ def intersection(self, other):
except TypeError:
pass

indexer = self.get_indexer(other.values)
indexer = indexer.take((indexer != -1).nonzero()[0])
try:
indexer = self.get_indexer(other.values)
indexer = indexer.take((indexer != -1).nonzero()[0])
except:
# duplicates
indexer = self.get_indexer_non_unique(other.values)[0].unique()

return self.take(indexer)

def diff(self, other):
Expand Down
73 changes: 13 additions & 60 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,32 +91,8 @@ def _get_label(self, label, axis=0):
def _get_loc(self, key, axis=0):
return self.obj._ixs(key, axis=axis)

def _slice(self, obj, axis=0, raise_on_error=False, typ=None):

# make out-of-bounds into bounds of the object
if typ == 'iloc':
ax = self.obj._get_axis(axis)
l = len(ax)
start = obj.start
stop = obj.stop
step = obj.step
if start is not None:
# degenerate to return nothing
if start >= l:
return self._getitem_axis(tuple(),axis=axis)

# equiv to a null slice
elif start <= -l:
start = None
if stop is not None:
if stop > l:
stop = None
elif stop <= -l:
stop = None
obj = slice(start,stop,step)

return self.obj._slice(obj, axis=axis, raise_on_error=raise_on_error,
typ=typ)
def _slice(self, obj, axis=0, typ=None):
return self.obj._slice(obj, axis=axis, typ=typ)

def __setitem__(self, key, value):

Expand Down Expand Up @@ -441,7 +417,9 @@ def can_do_equal_len():
# align to
if item in value:
v = value[item]
v = v.reindex(self.obj[item].index & v.index)
i = self.obj[item].index
v = v.reindex(i & v.index)

setter(item, v.values)
else:
setter(item, np.nan)
Expand Down Expand Up @@ -909,20 +887,10 @@ def _reindex(keys, level=None):
# asarray can be unsafe, NumPy strings are weird
keyarr = _asarray_tuplesafe(key)

if is_integer_dtype(keyarr) and not labels.is_floating():
if labels.inferred_type != 'integer':
keyarr = np.where(keyarr < 0,
len(labels) + keyarr, keyarr)

if labels.inferred_type == 'mixed-integer':
indexer = labels.get_indexer(keyarr)
if (indexer >= 0).all():
self.obj.take(indexer, axis=axis, convert=True)
else:
return self.obj.take(keyarr, axis=axis)
elif not labels.inferred_type == 'integer':

return self.obj.take(keyarr, axis=axis)
# handle a mixed integer scenario
indexer = labels._convert_list_indexer_for_mixed(keyarr, typ=self.name)
if indexer is not None:
return self.obj.take(indexer, axis=axis)

# this is not the most robust, but...
if (isinstance(labels, MultiIndex) and
Expand Down Expand Up @@ -1062,11 +1030,9 @@ def _convert_to_indexer(self, obj, axis=0, is_setter=False):
objarr = _asarray_tuplesafe(obj)

# If have integer labels, defer to label-based indexing
if is_integer_dtype(objarr) and not is_int_index:
if labels.inferred_type != 'integer':
objarr = np.where(objarr < 0,
len(labels) + objarr, objarr)
return objarr
indexer = labels._convert_list_indexer_for_mixed(objarr, typ=self.name)
if indexer is not None:
return indexer

# this is not the most robust, but...
if (isinstance(labels, MultiIndex) and
Expand Down Expand Up @@ -1353,8 +1319,7 @@ def _get_slice_axis(self, slice_obj, axis=0):
return obj

if isinstance(slice_obj, slice):
return self._slice(slice_obj, axis=axis, raise_on_error=True,
typ='iloc')
return self._slice(slice_obj, axis=axis, typ='iloc')
else:
return self.obj.take(slice_obj, axis=axis, convert=False)

Expand Down Expand Up @@ -1657,18 +1622,6 @@ def _need_slice(obj):
(obj.step is not None and obj.step != 1))


def _check_slice_bounds(slobj, values):
l = len(values)
start = slobj.start
if start is not None:
if start < -l or start > l - 1:
raise IndexError("out-of-bounds on slice (start)")
stop = slobj.stop
if stop is not None:
if stop < -l - 1 or stop > l:
raise IndexError("out-of-bounds on slice (end)")


def _maybe_droplevels(index, key):
# drop levels
original_index = index
Expand Down
Loading