Skip to content

BUG: concat/append misc fixes #13660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 3, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1356,6 +1356,13 @@ Bug Fixes
- Bug in ``DatetimeIndex``, which did not honour the ``copy=True`` (:issue:`13205`)
- Bug in ``DatetimeIndex.is_normalized`` returns incorrectly for normalized date_range in case of local timezones (:issue:`13459`)

- Bug in ``pd.concat`` and ``.append`` may coerces ``datetime64`` and ``timedelta`` to ``object`` dtype containing python built-in ``datetime`` or ``timedelta`` rather than ``Timestamp`` or ``Timedelta`` (:issue:`13626`)
- Bug in ``PeriodIndex.append`` may raises ``AttributeError`` when the result is ``object`` dtype (:issue:`13221`)
- Bug in ``CategoricalIndex.append`` may accept normal ``list`` (:issue:`13626`)
- Bug in ``pd.concat`` and ``.append`` with the same timezone get reset to UTC (:issue:`7795`)
- Bug in ``Series`` and ``DataFrame`` ``.append`` raises ``AmbiguousTimeError`` if data contains datetime near DST boundary (:issue:`13626`)


- Bug in ``DataFrame.to_csv()`` in which float values were being quoted even though quotations were specified for non-numeric values only (:issue:`12922`, :issue:`13259`)
- Bug in ``DataFrame.describe()`` raising ``ValueError`` with only boolean columns (:issue:`13898`)
- Bug in ``MultiIndex`` slicing where extra elements were returned when level is non-unique (:issue:`12896`)
Expand Down
12 changes: 9 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4384,14 +4384,20 @@ def append(self, other, ignore_index=False, verify_integrity=False):
raise TypeError('Can only append a Series if ignore_index=True'
' or if the Series has a name')

index = None if other.name is None else [other.name]
if other.name is None:
index = None
else:
# other must have the same index name as self, otherwise
# index name will be reset
index = Index([other.name], name=self.index.name)

combined_columns = self.columns.tolist() + self.columns.union(
other.index).difference(self.columns).tolist()
other = other.reindex(combined_columns, copy=False)
other = DataFrame(other.values.reshape((1, len(other))),
index=index, columns=combined_columns)
index=index,
columns=combined_columns)
other = other._convert(datetime=True, timedelta=True)

if not self.columns.equals(combined_columns):
self = self.reindex(columns=combined_columns)
elif isinstance(other, list) and not isinstance(other[0], DataFrame):
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,16 +289,18 @@ def _set_axis(self, axis, labels, fastpath=False):

is_all_dates = labels.is_all_dates
if is_all_dates:

if not isinstance(labels,
(DatetimeIndex, PeriodIndex, TimedeltaIndex)):
try:
labels = DatetimeIndex(labels)
# need to set here becuase we changed the index
if fastpath:
self._data.set_axis(axis, labels)
except tslib.OutOfBoundsDatetime:
except (tslib.OutOfBoundsDatetime, ValueError):
# labels may exceeds datetime bounds,
# or not be a DatetimeIndex
pass

self._set_subtyp(is_all_dates)

object.__setattr__(self, '_index', labels)
Expand Down
75 changes: 26 additions & 49 deletions pandas/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1392,15 +1392,19 @@ def __getitem__(self, key):
else:
return result

def _ensure_compat_append(self, other):
def append(self, other):
"""
prepare the append
Append a collection of Index options together

Parameters
----------
other : Index or list/tuple of indices

Returns
-------
list of to_concat, name of result Index
appended : Index
"""
name = self.name

to_concat = [self]

if isinstance(other, (list, tuple)):
Expand All @@ -1409,46 +1413,29 @@ def _ensure_compat_append(self, other):
to_concat.append(other)

for obj in to_concat:
if (isinstance(obj, Index) and obj.name != name and
obj.name is not None):
name = None
break
if not isinstance(obj, Index):
raise TypeError('all inputs must be Index')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a bit similar with the equals case)

Currently, we are accepting a list or tuple to be added (in case that it is a list of list, in the flat case you get an error):

In [56]: idx = pd.Index([1,2])

In [57]: idx.append([[1,2]])
Out[57]: Int64Index([1, 2, 1, 2], dtype='int64')

The docstring of append also says for other: "Index or list/tuple of indices"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ensure_index should handle this (it might think its a MultiIndex actually), so might need to disambiguate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, _ensure_index would convert list to Index. @jreback are you saying Index.append should accept lists?

On second thought, maybe I misread the Index.append docstring. The "Index or list/tuple of indices" can be interpreted as both as "list/tuple of Index objects" (so no guarantee to accept list as Index object) or "list of index labels".
But actually probably the first interpretation. In that case, I think this PR is OK, but @sinhrks it would maybe be good to list this in the whatsnew as change as well? (or do we regard it as a bug fix?)

Copy link
Member Author

@sinhrks sinhrks Sep 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree docstring intends first one. I'm adding an validation to check all elements are Index if input is list-like.

I think it is regarded as a bug fix, because it is similar to the fix which CategiricalIndex.append accepts (flat) list of labels (included in this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, merging then!


to_concat = self._ensure_compat_concat(to_concat)
to_concat = [x._values if isinstance(x, Index) else x
for x in to_concat]
return to_concat, name
names = set([obj.name for obj in to_concat])
name = None if len(names) > 1 else self.name

def append(self, other):
"""
Append a collection of Index options together
typs = _concat.get_dtype_kinds(to_concat)

Parameters
----------
other : Index or list/tuple of indices
if 'category' in typs:
# if any of the to_concat is category
from pandas.indexes.category import CategoricalIndex
return CategoricalIndex._append_same_dtype(self, to_concat, name)

Returns
-------
appended : Index
"""
to_concat, name = self._ensure_compat_append(other)
attribs = self._get_attributes_dict()
attribs['name'] = name
return self._shallow_copy_with_infer(
np.concatenate(to_concat), **attribs)

@staticmethod
def _ensure_compat_concat(indexes):
from pandas.tseries.api import (DatetimeIndex, PeriodIndex,
TimedeltaIndex)
klasses = DatetimeIndex, PeriodIndex, TimedeltaIndex

is_ts = [isinstance(idx, klasses) for idx in indexes]
if len(typs) == 1:
return self._append_same_dtype(to_concat, name=name)
return _concat._concat_index_asobject(to_concat, name=name)

if any(is_ts) and not all(is_ts):
return [_maybe_box(idx) for idx in indexes]

return indexes
def _append_same_dtype(self, to_concat, name):
"""
Concatenate to_concat which has the same class
"""
# must be overrided in specific classes
return _concat._concat_index_asobject(to_concat, name)

_index_shared_docs['take'] = """
return a new %(klass)s of the values selected by the indices
Expand Down Expand Up @@ -3634,16 +3621,6 @@ def _ensure_has_len(seq):
return seq


def _maybe_box(idx):
from pandas.tseries.api import DatetimeIndex, PeriodIndex, TimedeltaIndex
klasses = DatetimeIndex, PeriodIndex, TimedeltaIndex

if isinstance(idx, klasses):
return idx.asobject

return idx


def _trim_front(strings):
"""
Trims zeros and decimal points
Expand Down
21 changes: 6 additions & 15 deletions pandas/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -569,26 +569,17 @@ def insert(self, loc, item):
codes = np.concatenate((codes[:loc], code, codes[loc:]))
return self._create_from_codes(codes)

def append(self, other):
def _append_same_dtype(self, to_concat, name):
"""
Append a collection of CategoricalIndex options together

Parameters
----------
other : Index or list/tuple of indices

Returns
-------
appended : Index

Raises
------
Concatenate to_concat which has the same class
ValueError if other is not in the categories
"""
to_concat, name = self._ensure_compat_append(other)
to_concat = [self._is_dtype_compat(c) for c in to_concat]
codes = np.concatenate([c.codes for c in to_concat])
return self._create_from_codes(codes, name=name)
result = self._create_from_codes(codes, name=name)
# if name is None, _create_from_codes sets self.name
result.name = name
return result

@classmethod
def _add_comparison_methods(cls):
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/indexes/test_category.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,12 +271,12 @@ def test_append(self):
lambda: ci.append(ci.values.reorder_categories(list('abc'))))

# with objects
result = ci.append(['c', 'a'])
result = ci.append(Index(['c', 'a']))
expected = CategoricalIndex(list('aabbcaca'), categories=categories)
tm.assert_index_equal(result, expected, exact=True)

# invalid objects
self.assertRaises(TypeError, lambda: ci.append(['a', 'd']))
self.assertRaises(TypeError, lambda: ci.append(Index(['a', 'd'])))

def test_insert(self):

Expand Down
38 changes: 36 additions & 2 deletions pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
import re
import warnings

from pandas import (DataFrame, date_range, MultiIndex, Index, CategoricalIndex,
compat)
from pandas import (DataFrame, date_range, period_range, MultiIndex, Index,
CategoricalIndex, compat)
from pandas.core.common import PerformanceWarning
from pandas.indexes.base import InvalidIndexError
from pandas.compat import range, lrange, u, PY3, long, lzip
Expand Down Expand Up @@ -769,6 +769,40 @@ def test_append(self):
result = self.index.append([])
self.assertTrue(result.equals(self.index))

def test_append_mixed_dtypes(self):
# GH 13660
dti = date_range('2011-01-01', freq='M', periods=3,)
dti_tz = date_range('2011-01-01', freq='M', periods=3, tz='US/Eastern')
pi = period_range('2011-01', freq='M', periods=3)

mi = MultiIndex.from_arrays([[1, 2, 3],
[1.1, np.nan, 3.3],
['a', 'b', 'c'],
dti, dti_tz, pi])
self.assertEqual(mi.nlevels, 6)

res = mi.append(mi)
exp = MultiIndex.from_arrays([[1, 2, 3, 1, 2, 3],
[1.1, np.nan, 3.3, 1.1, np.nan, 3.3],
['a', 'b', 'c', 'a', 'b', 'c'],
dti.append(dti),
dti_tz.append(dti_tz),
pi.append(pi)])
tm.assert_index_equal(res, exp)

other = MultiIndex.from_arrays([['x', 'y', 'z'], ['x', 'y', 'z'],
['x', 'y', 'z'], ['x', 'y', 'z'],
['x', 'y', 'z'], ['x', 'y', 'z']])

res = mi.append(other)
exp = MultiIndex.from_arrays([[1, 2, 3, 'x', 'y', 'z'],
[1.1, np.nan, 3.3, 'x', 'y', 'z'],
['a', 'b', 'c', 'x', 'y', 'z'],
dti.append(pd.Index(['x', 'y', 'z'])),
dti_tz.append(pd.Index(['x', 'y', 'z'])),
pi.append(pd.Index(['x', 'y', 'z']))])
tm.assert_index_equal(res, exp)

def test_get_level_values(self):
result = self.index.get_level_values(0)
expected = Index(['foo', 'foo', 'bar', 'baz', 'qux', 'qux'],
Expand Down
86 changes: 86 additions & 0 deletions pandas/tests/types/test_concat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# -*- coding: utf-8 -*-

import nose
import pandas as pd
import pandas.types.concat as _concat
import pandas.util.testing as tm


class TestConcatCompat(tm.TestCase):

_multiprocess_can_split_ = True

def check_concat(self, to_concat, exp):
for klass in [pd.Index, pd.Series]:
to_concat_klass = [klass(c) for c in to_concat]
res = _concat.get_dtype_kinds(to_concat_klass)
self.assertEqual(res, set(exp))

def test_get_dtype_kinds(self):
to_concat = [['a'], [1, 2]]
self.check_concat(to_concat, ['i', 'object'])

to_concat = [[3, 4], [1, 2]]
self.check_concat(to_concat, ['i'])

to_concat = [[3, 4], [1, 2.1]]
self.check_concat(to_concat, ['i', 'f'])

def test_get_dtype_kinds_datetimelike(self):
to_concat = [pd.DatetimeIndex(['2011-01-01']),
pd.DatetimeIndex(['2011-01-02'])]
self.check_concat(to_concat, ['datetime'])

to_concat = [pd.TimedeltaIndex(['1 days']),
pd.TimedeltaIndex(['2 days'])]
self.check_concat(to_concat, ['timedelta'])

def test_get_dtype_kinds_datetimelike_object(self):
to_concat = [pd.DatetimeIndex(['2011-01-01']),
pd.DatetimeIndex(['2011-01-02'], tz='US/Eastern')]
self.check_concat(to_concat,
['datetime', 'datetime64[ns, US/Eastern]'])

to_concat = [pd.DatetimeIndex(['2011-01-01'], tz='Asia/Tokyo'),
pd.DatetimeIndex(['2011-01-02'], tz='US/Eastern')]
self.check_concat(to_concat,
['datetime64[ns, Asia/Tokyo]',
'datetime64[ns, US/Eastern]'])

# timedelta has single type
to_concat = [pd.TimedeltaIndex(['1 days']),
pd.TimedeltaIndex(['2 hours'])]
self.check_concat(to_concat, ['timedelta'])

to_concat = [pd.DatetimeIndex(['2011-01-01'], tz='Asia/Tokyo'),
pd.TimedeltaIndex(['1 days'])]
self.check_concat(to_concat,
['datetime64[ns, Asia/Tokyo]', 'timedelta'])

def test_get_dtype_kinds_period(self):
# because we don't have Period dtype (yet),
# Series results in object dtype
to_concat = [pd.PeriodIndex(['2011-01'], freq='M'),
pd.PeriodIndex(['2011-01'], freq='M')]
res = _concat.get_dtype_kinds(to_concat)
self.assertEqual(res, set(['period[M]']))

to_concat = [pd.Series([pd.Period('2011-01', freq='M')]),
pd.Series([pd.Period('2011-02', freq='M')])]
res = _concat.get_dtype_kinds(to_concat)
self.assertEqual(res, set(['object']))

to_concat = [pd.PeriodIndex(['2011-01'], freq='M'),
pd.PeriodIndex(['2011-01'], freq='D')]
res = _concat.get_dtype_kinds(to_concat)
self.assertEqual(res, set(['period[M]', 'period[D]']))

to_concat = [pd.Series([pd.Period('2011-01', freq='M')]),
pd.Series([pd.Period('2011-02', freq='D')])]
res = _concat.get_dtype_kinds(to_concat)
self.assertEqual(res, set(['object']))


if __name__ == '__main__':
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
exit=False)
6 changes: 6 additions & 0 deletions pandas/tools/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,9 @@ def _normalize(table, normalize, margins):
column_margin = table.loc[:, 'All'].drop('All')
index_margin = table.loc['All', :].drop('All')
table = table.drop('All', axis=1).drop('All')
# to keep index and columns names
table_index_names = table.index.names
table_columns_names = table.columns.names

# Normalize core
table = _normalize(table, normalize=normalize, margins=False)
Expand Down Expand Up @@ -550,6 +553,9 @@ def _normalize(table, normalize, margins):
else:
raise ValueError("Not a valid normalize argument")

table.index.names = table_index_names
table.columns.names = table_columns_names

else:
raise ValueError("Not a valid margins argument")

Expand Down
Loading