-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Move NaT to self-contained module #18014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/tslib.pyx
Outdated
f.__doc__ = getattr(cls, func_name).__doc__ | ||
return f | ||
# We patch these NaTType methods with the Timestamp docstrings. | ||
NaTType.astimezone = _make_error_func('astimezone', Timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these 2 you can simply use the datetime
doc-strings no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timestamp.astimezone is an alias for Timestamp.tz_convert, which has a distinct docstring.
pandas/_libs/tslib.pyx
Outdated
|
||
NaTType.tz_convert = _make_nat_func('tz_convert', Timestamp) | ||
NaTType.tz_localize = _make_nat_func('tz_localize', Timestamp) | ||
NaTType.replace = _make_nat_func('replace', Timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for .replace, now, today
pandas/_libs/tslib.pyx
Outdated
return f | ||
NaTType.now = _make_nat_func('now', Timestamp) | ||
NaTType.today = _make_nat_func('today', Timestamp) | ||
NaTType.round = _make_nat_func('round', Timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very messy. instead define a function in tslib.nattype
which does this (IOW calls the _make_nat_func) and pass in the Timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even better since there are not that many I would just copy-paste the doc-strings and avoid this whole thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even better since there are not that many I would just copy-paste the doc-strings and avoid this whole thing
I like it!
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -9,12 +9,11 @@ from numpy cimport int64_t | |||
|
|||
cimport util | |||
|
|||
from nattype import _nat_strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I would de-prevatize this in nattype
circleci error:
Update never mind, I think I see where this came from. Fixing shortly. |
pandas/_libs/tslibs/nattype.pyx
Outdated
# Timestamp has empty docstring for some methods. | ||
utcfromtimestamp = _make_error_func('utcfromtimestamp', None) | ||
fromtimestamp = _make_error_func('fromtimestamp', None) | ||
combine = _make_error_func('combine', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add these doc strings as well (separately should fix for Timestamp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's in the TODO list, will be sure to follow up.
|
||
Returns | ||
------- | ||
a new Timestamp rounded to the given resolution of `freq` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ought to make these doc strings simpler
eg this doesn’t round but returns a NaT
the parameters are kind of irrelevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add to the todo list.
# ---------------------------------------------------------------------- | ||
|
||
cdef inline bint _checknull_with_nat(object val): | ||
""" utility to check if a value is a nat or not """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprivatize at some point
Codecov Report
@@ Coverage Diff @@
## master #18014 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50091 50092 +1
==========================================
- Hits 45703 45696 -7
- Misses 4388 4396 +8
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18014 +/- ##
==========================================
- Coverage 91.24% 91.23% -0.02%
==========================================
Files 163 163
Lines 50114 50115 +1
==========================================
- Hits 45729 45721 -8
- Misses 4385 4394 +9
Continue to review full report at Codecov.
|
|
||
from tslibs.parsing import parse_time_string, NAT_SENTINEL | ||
from tslibs.frequencies cimport get_freq_code | ||
from tslibs.nattype import nat_strings | ||
from tslibs.nattype cimport _nat_scalar_rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in followup I could de-privatize this _nat_scalar_rules
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works. I'm also ok with getting rid of it, since op != Py_NE
is pretty easy.
pandas/_libs/tslib.pyx
Outdated
@@ -96,6 +96,10 @@ from tslibs.fields import ( | |||
get_date_name_field, get_start_end_field, get_date_field, | |||
build_field_sarray) | |||
|
|||
from tslibs.nattype import NaT, nat_strings, __nat_unpickle | |||
# Note: __nat_unpickle needs to be in the namespace for backward compat pickle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? why is that, you already do the pickle compat dance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to go back and check the relevant Travis logs, but there was an error a couple days ago. Might have been an xarray thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a rebase anyhow. remove the import and let's see.
pandas/_libs/tslib.pyx
Outdated
@@ -96,6 +96,10 @@ from tslibs.fields import ( | |||
get_date_name_field, get_start_end_field, get_date_field, | |||
build_field_sarray) | |||
|
|||
from tslibs.nattype import NaT, nat_strings, __nat_unpickle | |||
# Note: __nat_unpickle needs to be in the namespace for backward compat pickle | |||
from tslibs.nattype cimport _checknull_with_nat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in followup, can de-privatize checknull_with_nat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to it.
|
@@ -74,10 +74,12 @@ def load_reduce(self): | |||
('pandas._libs.sparse', 'BlockIndex'), | |||
('pandas.tslib', 'Timestamp'): | |||
('pandas._libs.tslib', 'Timestamp'), | |||
('pandas.tslib', '__nat_unpickle'): | |||
('pandas._libs.tslib', '__nat_unpickle'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need the transitive rule as well (and put them together)
('pandas._tslib.tslib', '__nat_unpickle'):
('pandas._libs.tslibs.nattype', '__nat_unpickle')
pandas/compat/pickle_compat.py
Outdated
@@ -74,9 +74,13 @@ def load_reduce(self): | |||
('pandas._libs.sparse', 'BlockIndex'), | |||
('pandas.tslib', 'Timestamp'): | |||
('pandas._libs.tslib', 'Timestamp'), | |||
('pandas._period', 'Period'): ('pandas._libs.period', 'Period'), | |||
|
|||
# 18014 moved __nat_unpickle from _libs.tslib-->_libs.tslibs.nattype | |||
('pandas.tslib', '__nat_unpickle'): | |||
('pandas._libs.tslib', '__nat_unpickle'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to change the to location -> ('pandas._libs.tslibs.nattype', '__nat_unpickle')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should there be one entry or two? I took the "transitive" comment above to mean two. Are you saying it should just be:
('pandas.tslib', '__nat_unpickle'): ('pandas._libs.tslibs.nattype', '__nat_unpickle')
or
('pandas.tslib', '__nat_unpickle'):
('pandas._libs.tslibs.nattype', '__nat_unpickle'),
('pandas._libs.tslib', '__nat_unpickle'):
('pandas._libs.tslibs.nattype', '__nat_unpickle'),
or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd one
should be an easy test
one works one doesn’t
needs a rebase |
thanks @jbrockmendel this was the big one! |
This completes the messing-with-NaT sequence of PRs.
Highlights:
_nat_strings
across a couple of modules.tslibs.nattype
relies onutil
but is otherwise free of pandas dependencies.lib
currently depends ontslib
; this is a big step towards changing this (note that in setup.py tslib is declared as depending on lib. I think that declaration is wrong, but thats a separate issue. Regardless, DAG for the win)Lowlights:
NaT
method docstrings after importing intotslib
. I'm not wild about that.Optimizations to consider:
If we cdef the constant NaN at the module-level and return it instead of
np.nan
, the property lookups are about 15% faster. The downside of this is thatNaT.day is np.nan
is no longer True.@cython.final
may offer an optimization for some methods incdef class
es. Because_NaT
no longer subclasses_Timestamp
, we can apply that optimization to_Timestamp
methods, too. No idea what level of speedup we get.closes #xxxx
tests added / passed
passes
git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry