Skip to content

BUG: pd.concat does not drop DatetimeIndex.freq when result is monotonic with even spacing #25796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 26, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ Reshaping
- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`)
- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as ``objs`` argument (:issue:`21510`)

- Bug in :func:`concat` where the resulting ``freq`` of two :class:`DatetimeIndex` with the same ``freq`` would be dropped (:issue:`3232`).

Sparse
^^^^^^
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -587,11 +587,15 @@ def _concat_same_dtype(self, to_concat, name):
if len({str(x.dtype) for x in to_concat}) != 1:
raise ValueError('to_concat must have the same tz')

if not is_period_dtype(self):
new_data = type(self._values)._concat_same_type(to_concat).asi8

# GH 3232: If the concat result is evenly spaced, we can retain the
# original frequency
is_diff_evenly_spaced = len(np.unique(np.diff(new_data))) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have something that does this in tseries.frequencies, can you see / use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked in tseries.frequencies but didn't find anything. We have is_monotonic functions but it doesn't check for even spacing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use unique_1d here (should be faster), though not a big deal. worth moving this actually to tseries.frequencies as a named function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit like unique_deltas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbrockmendel. Exactly what I needed.

if not is_period_dtype(self) and not is_diff_evenly_spaced:
# reset freq
attribs['freq'] = None

new_data = type(self._values)._concat_same_type(to_concat).asi8
return self._simple_new(new_data, **attribs)

@Appender(_index_shared_docs['astype'])
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/reshape/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2539,3 +2539,20 @@ def test_concat_categorical_tz():
'a', 'b'
])
tm.assert_series_equal(result, expected)


def test_concat_datetimeindex_freq():
# GH 3232
# Monotonic index result
dr = pd.date_range('01-Jan-2013', periods=100, freq='50L', tz='UTC')
data = list(range(100))
expected = pd.DataFrame(data, index=dr)
result = pd.concat([expected[:50], expected[50:]])
tm.assert_frame_equal(result, expected)

# Non-monotonic index result
result = pd.concat([expected[50:], expected[:50]])
expected = pd.DataFrame(data[50:] + data[:50],
index=dr[50:].append(dr[:50]))
expected.index.freq = None
tm.assert_frame_equal(result, expected)