Skip to content

BUG: GH14323 Union of differences from DatetimeIndex incorrect #14346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Liam3851
Copy link
Contributor

@Liam3851 Liam3851 commented Oct 4, 2016

Sets freq to None when doing a difference operation on a DatetimeIndex
or TimedeltaIndex, rather than retaining the frequency (which can cause
problems with downstream operations). Frequency of PeriodIndex is
retained.


v0.19.1 (????, 2016)
--------------------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the whatsnew things
they will be added independently

i = self.create_index()
a = i[1:4]
tm.assert_index_equal(i[:1].union(i[4:]), i.difference(a))
b = i[3:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put expected on a separate line

# so a differencing operation should not retain the freq field of the
# original index.
i = self.create_index()
self.assertIsNone(i.difference(i[1:3]).freq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert_index_equal against a constructed expected index (same for ones below)

for dti and tdi do this in the superclass override for pi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers, again I'm new to this.

One thing: if I want to test against an explicitly constructed expected index, I don't think I can do it in the superclass, can I? Wouldn't it have to be a different constructed index for each class (e.g. DatetimeIndex(["20130101", "20130105"]), vs. TimedeltaIndex(["1 days", "5 days"])? I can't use the self.indices field because the point is I need an index result that doesn't have a regular freq.

Also, assert_index_equal looks like it only compares freq for PeriodIndex, not DatetimeIndex:

if isinstance(left, pd.PeriodIndex) or isinstance(right, pd.PeriodIndex):
    assert_attr_equal('freq', left, right, obj=obj)

which would ignore what I'm trying to test. Perhaps I could create a constructed index, run an assert_index_equal, and then run an explicit assert_attr_equal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_index_equal looks like it only compares freq for PeriodIndex

Indeed, that seems the case. So it is ok to still explicitly check the freq attribute to be None of the result

I can't use the self.indices field because the point is I need an index result that doesn't have a regular freq.

But the starting index should have a freq, no? So you can check the freq is reset to None after doing a difference?

Copy link
Contributor Author

@Liam3851 Liam3851 Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wait-- took me a bit to get the git to work (my own confusion). I've reverted the whatsnew changes and changed the tests to use explicit comparison indexes, at the expense of subclassing.

@jorisvandenbossche Happy to refactor the tests again to use subclassing if it's helpful, I'm just not sure I can do that and make the desired result explicit. From what I can tell, the way self.indices is constructed all of them correctly have freq Day, so I can't use them as a desired result. Conversely if want to explicitly create a comparison index, I need to know what class I'm working with, since TimedeltaIndex and DatetimeIndex don't take the same set of string inputs.

@codecov-io
Copy link

codecov-io commented Oct 5, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14346 into master will increase coverage by <.01%

@@             master     #14346   diff @@
==========================================
  Files           140        140          
  Lines         50632      50632          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43171      43172     +1   
+ Misses         7461       7460     -1   
  Partials          0          0          

Powered by Codecov. Last update 58542e8...1dbf582

@jreback jreback added Bug Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version labels Oct 5, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.19.1 milestone Oct 6, 2016
Sets freq to None when doing a difference operation on a DatetimeIndex
or TimedeltaIndex, rather than retaining the frequency (which can cause
problems with downstream operations). Frequency of PeriodIndex is
retained.
@Liam3851
Copy link
Contributor Author

Just bumping since my other comment was downthread-- I've reverted the whatsnew changes and made the tests use explicit comparisons, and then squashed the resulting commits. Glad to make a further refactor to the tests if you'd like.

@jorisvandenbossche
Copy link
Member

@Liam3851 Sorry for the delay in comment.
For me the explicit separate tests are OK. But, can you simplify the test cases a little bit? As the issue is not that you take a union of differences (that was just the specific case where you encountered the issue). The issue is rather that a) difference now returned a freq, and b) that union does not handle indexes with freq correctly. So I would just use a simple example for both cases.

@jreback jreback closed this in bee90a7 Oct 24, 2016
@jreback
Copy link
Contributor

jreback commented Oct 24, 2016

thanks!

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Nov 2, 2016
… incorrect

closes pandas-dev#14323

Sets freq to None when doing a difference operation on a DatetimeIndex
or TimedeltaIndex, rather than retaining the frequency (which can
cause  problems with downstream operations). Frequency of PeriodIndex
is retained.

Author: David Krych <[email protected]>

Closes pandas-dev#14346 from Liam3851/dtind_diff_14323 and squashes the following commits:

1dbf582 [David Krych] BUG: GH14323 Union of differences from DatetimeIndex incorrect

(cherry picked from commit bee90a7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DatetimeIndex union fails in 0.19rc1 when constructed from differences in DatetimeIndexes
4 participants