Skip to content

ENH: (GH3863) Timestamp.min and Timestamp.max return a valid Timestamp #3902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2013
Merged

ENH: (GH3863) Timestamp.min and Timestamp.max return a valid Timestamp #3902

merged 1 commit into from
Jul 2, 2013

Conversation

SleepingPills
Copy link
Contributor

closes #3863 by adding valid min and max fields to the Timestamp class.

@jreback
Copy link
Contributor

jreback commented Jun 14, 2013

would you add a mention in release notes under API?

@jreback
Copy link
Contributor

jreback commented Jun 14, 2013

also I think you might mention min/max here: http://pandas.pydata.org/pandas-docs/dev/gotchas.html#minimum-and-maximum-timestamps

@hayd
Copy link
Contributor

hayd commented Jun 14, 2013

Could you also add a testcase :)

@jreback
Copy link
Contributor

jreback commented Jun 14, 2013

maybe instead of the actual values, should define them as np.iinfo(np.int64).max/min ?

with min + 1 (to avoid NaT conflict)

@SleepingPills
Copy link
Contributor Author

@jreback: Good idea about using np.iinfo - as a matter of fact, that's what I did initially but then I found that the constants were already defined but commented out. I'll add the change to the release notes and gotchas as well.

@hayd: With regards to tests - I'm not entirely sure what to test there? It is basically 2 fields. I can add some tests that assert the fields value...

@SleepingPills
Copy link
Contributor Author

Interestingly enough, on osx np.iinfo(np.int64).min returns a number that causes Timestamp to wrap around and return a value near the upper boundary:

In [1]: pd.Timestamp(np.iinfo(np.int64).max)
Out[1]: Timestamp('2262-04-11 23:47:16.854775807', tz=None)

In [2]: pd.Timestamp(np.iinfo(np.int64).min)
Out[2]: NaT

In [3]: pd.Timestamp(np.iinfo(np.int64).min + 1)
Out[3]: Timestamp('2262-04-10 00:12:43.145224193', tz=None)

This could be the reason why the magic number is used - and also mentioned in the gotchas.

@hayd
Copy link
Contributor

hayd commented Jun 14, 2013

Are you just asserting that these don't raise? (i.e. call them in a test is enough)

pd.Timestamp(pd.Timestamp.min)
pd.Timestamp(pd.Timestamp.max)

Actually check they equals whatever they're supposed to equal?

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

@SleepingPills can you hook up travis, then recommit this?

@@ -388,6 +388,13 @@ cpdef object get_value_box(ndarray arr, object loc):
return util.get_value_1d(arr, i)


# Add the min and max fields at the class level
cdef int64_t _NS_LOWER_BOUND = -9223372036854775807LL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these constants different? (they are below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are negative below which seems wrong. shouldn't these be each other's 2's complement? in which case _NS_LOWER_BOUND = -9223372036854775808

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual values are commented out....and somewhat platform dependent.....so not sure should be resetting these....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah i missed that. why not use LLONG_MAX/MIN? or do what @jreback suggested and use np.iinfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this on Win7 as well and using np.iinfo(np.int64).min + 1 is not good because of the wraparound. The magic numbers supplied worked fine on both OSX and Windows that I've tested.

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

@SleepingPills maybe just define both of these as np.iinfo(np.int64).max and min (and don't add 1!),
rather than hardcode

@jreback
Copy link
Contributor

jreback commented Jun 23, 2013

@SleepingPills can you define these as np.info(np.int64).max and min?

@jreback
Copy link
Contributor

jreback commented Jun 24, 2013

moving to 0.12

@SleepingPills
Copy link
Contributor Author

Sorry for the delay, I've been waylaid by some urgent stuff.

I definitely wouldn't use np.iinfo(np.int64).min + 1 because of the wraparound. I'm not sure I understand the reasoning behind defining Timestamp.min as np.iinfo(np.int64).min. Wouldn't this simply mean that Timestamp.min == NaT?

@jreback
Copy link
Contributor

jreback commented Jun 27, 2013

I mean defining the min num as np.iinfo(np.int64).min+1; wraparound shouldn't affect this

also don't put the tests in a separate file; put in tseries/tests/test_timeseries.py

@SleepingPills
Copy link
Contributor Author

Even counting the fact that many other things, e.g. Timestamp.to_pydatetime() will behave really badly as a result?

@jreback
Copy link
Contributor

jreback commented Jun 27, 2013

what do you mean Timestamp.to_pydatetime() will behave bady? can you give an example?

@SleepingPills
Copy link
Contributor Author

Well:

In [1]: ts = pd.Timestamp(np.iinfo(np.int64).min+1)

In [2]: ts
Out[3]: Timestamp('2262-04-10 00:12:43.145224193', tz=None)

In [3]: ts == "20120101"
Out[3]: False

In [4]: ts == ts.to_pydatetime()
Warning: discarding nonzero nanoseconds
Out[4]: False

In [5]: ts.to_pydatetime()
Warning: discarding nonzero nanoseconds
Out[5]: datetime.datetime(2262, 4, 10, 0, 12, 43, 145224)

We now get a completely wrong datetime object. Not counting the fact that we lose precision by the conversion, the way it would behave with the np.iinfo() solution is worse (IMO).

@jreback
Copy link
Contributor

jreback commented Jun 27, 2013

sorry...yes you are right...the min datetime is a fixed value (and < np.iinfo(np.int64).min).....ok....fixed values are fine (I think wes avoided this whole wraparound issue on diff platforms by avoiding values that are too close to the min value)....

pls move the tests to tseries/test/time_series..and can merge this

@SleepingPills
Copy link
Contributor Author

Ok, I moved the tests and added an extra case for the datetime conversion roundtrip.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2013

perfect, could you add a release notes mention? (doc/sources/release.rst)

and then squash down to a single commit?

thanks

@SleepingPills
Copy link
Contributor Author

Done. I thought I had pushed the changes to the docs already but turns out I had them on a different machine.

How do I squash already pushed commits?

@jreback
Copy link
Contributor

jreback commented Jun 28, 2013

git checkout master
git pull
git branch mybranch --set-upstream master
git checkout mybranch
git pull --rebase

fix any merge issues

when done rebasing

git rebase -i <commit just before this PR>

use s and/or f

when done

git push myfork mybranch -f

will replace all existing commits with the new set

generally on something like this is pretty short, can squash down to 1-2

@jreback
Copy link
Contributor

jreback commented Jun 28, 2013

related is #4065, if you would like to tackle would be appreciated (separate PR)

@SleepingPills
Copy link
Contributor Author

Squashed

@jreback
Copy link
Contributor

jreback commented Jul 2, 2013

thanks for the PR! merged

@SleepingPills
Copy link
Contributor Author

Thanks for the assistance - I'll try to make it smoother going forward.

@SleepingPills SleepingPills deleted the timestamp_min_max branch July 2, 2013 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp.min and .max return a Python datetime object that is not a valid Timestamp
4 participants