-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Timestamp() is wrong if np.datetime64 < 1678-01-01 #4065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll try to tackle this since I already mucked around with Timestamp. |
great! |
The heart of the bug was not doing a _check_dts_bounds() in _get_datetime64_nanos when converting from a datetime64 that wasn't in [ns] units. But fixing this lead to another issue: the way _check_dts_bounds() worked previously, it did a broad check on the validity of a pandas_datetimestruct.year, and then assuming that triggered, it did a more fine grained check using the integer value of the given datetime64. However, if you have a datetime64 in another unit, the integer value isn't useful for comparison (because by definition it will map into a valid datetime64 somewhere in the [ns] range), so you don't really know if it is valid or not. So if you have a datetime64 that might be any unit, the only way to verify if it is in bounds is by just looking at the pandas_datetimestruct and just seeing if the values are in bounds or not. So I changed _check_dts_bounds() to do bounds checking this way. |
can you add some tests with e.g.
assume several cases, test_values are valid (with various types of dtypes), and another case of an invalid value thxs.. this is great stuff getting all of these bugs out of the way! |
Yeah this is a good point, I need to do a little more work here. When you have a situation like:
You give it strings, it gives back strings because it can't convert them to valid datetime64[ns]-es because they are out of bounds. Currently, if you give it datetime64s it can't deal with, it just maps them randomly into the datetime64[ns] range:
This is bad, but at least it has 2 nice properties: it doesn't throw errors and it always returns datetime64[ns]s. So I think there are a few potential solutions. First of all, in all cases if coerce=True, we just convert the out-of-bounds dt64s into NaTs, so we'll ignore that case: a) b) c) I like solution (c). Even though it is kind of surprising, it generally does the right thing and will make sure developers don't have to worry about non datetime64[ns]-es cropped up all over the place in these edge cases. Then in the future if pandas ever supports datetime64s of various units, it will be easy delete the code which converts the dt64s to strings and just return them. Though I'm amenable to any of the above solutions or a different solution, no hugely strong feelings. Also if this is confusing or I am missing something obvious, please let me know. I'm pretty new to this section of the code. |
c) is the current/right way, but with one additional check
There is a bug lurking that allows a This should raise be better to let the user know that this is not an acceptable value side issue is whether their are raises if |
@danbirken another related issue: #3944 |
related is #4066 as well |
Alright, I'm a little confused as to the exact interface you are looking for and while digging through the code and all of these related issues i've found a few more potential fixes, so here are some clarification statements. Let me know if any of these are wrong:
|
can you explain 4 again? when is 1) not true? all others look good |
For 1)
IE pd.to_datetime() will not fix the datetime64[] range type if it isn't wrapped in an array. It is a simple fix, I'm just double checking the behavior. For 4) It is basically this case:
This is the current behavior:
as far as I see, this could: a) throw an error b) don't change it, let the current behavior continue c) convert the datetime64s to strings (or maybe datetime.datetimes?), so you get this output:
d) something else |
ok....I understand 1); thought it was fixed, but good catch for 4) I think if you detect this (an 'invalid' if coerce is True, then obviously just otherwise, return convert to an object array of datetimes (these won't be further converted anywhere) np.datetime64('1000-01-01').item() is the datetime
I am not 100% in love with this, becuase its 'probably' a mistake but ok for now |
@danbirken how's this coming? |
I'm 90% done but I'm currently in Munich for Octoberfest which is slowing -Dan On Thu, Sep 26, 2013 at 1:35 PM, jreback [email protected] wrote:
|
@danbirken enjoy the beer! |
Alright, well it seems my change has triggered some incompatibilities on other platforms which I need to investigate (I just setup python2.6 build env and reproduced them locally, so that is good). Unfortunately I'm still on vacation and having less time for work than expected so I can't get to this now. So basically, you can ignore this change for the time being. |
np...lmk.... |
To fix the bug, this change adds bounds checking to _get_datetime64_nanos() for numpy datetimes that aren't already in [ns] units. Additionally, it updates _check_dts_bounds() to do the bound check just based off the pandas_datetimestruct, by comparing to the minimum and maximum valid pandas_datetimestructs for datetime64[ns]. It is simpler and more accurate than the previous system. Also includes a number of small refactors/fixes to deal with new error cases that didn't exist when invalid datetime64s were just silently coerced into the valid datetime64[ns] range.
Everything builds fine on windows. This change is an improvement, but it also does add a bit of complexity. As I note in the commit message, the fact that bounds checking didn't exist for datetime64s was hiding other issues (like pickling/unpickling dt64s in numpy 1.6, putting datetime64s into DatetimeIndexes now can throw errors) which I need to also fix here. I think I probably would like to also refactor Also all of this datetime64 stuff is all much easier to deal with in numpy 1.7. numpy 1.6 has a bunch of annoying issues. Is pandas always going to support numpy 1.6 or is there some timetable to only support 1.7? |
believe me @danbirken we feel your pain about numpy 1.6 ... we've jumped through crazy hoops to make sure it works. not sure about the timeline |
@danbirken hows this coming? what are the remaining 1.6 issues? |
Oh I don't think there are any remaining issues. This code should be good -Dan On Mon, Oct 7, 2013 at 10:21 PM, jreback [email protected] wrote:
|
@danbirken concur ! |
@danbirken thanks...another bug squashed! feel free to take on issues as you can! |
The following should work, but since it doesn't should raise an error...
In [116]: pd.Timestamp(np.datetime64('1677-01-01'),'D')
Out[116]: Timestamp('2261-07-22 23:34:33.709551616', tz=None)
In [117]: pd.Timestamp(np.datetime64('1678-01-01'),'D')
Out[117]: Timestamp('1678-01-01 00:00:00', tz=None)
In [121]: pd.version
Out[121]: '0.11.1.dev-06cd915'
In [125]: sys.version
Out[125]: '3.3.2 (default, May 21 2013, 15:40:45) \n[GCC 4.8.0 20130502 (prerelease)]'
The text was updated successfully, but these errors were encountered: