Move NaT to self-contained module #18014

jbrockmendel · 2017-10-28T18:25:09Z

This completes the messing-with-NaT sequence of PRs.

Highlights:

Removes duplicate definitions of _nat_strings across a couple of modules.
tslibs.nattype relies on util but is otherwise free of pandas dependencies.
lib currently depends on tslib; this is a big step towards changing this (note that in setup.py tslib is declared as depending on lib. I think that declaration is wrong, but thats a separate issue. Regardless, DAG for the win)

Lowlights:

We have to patch some of the NaT method docstrings after importing into tslib. I'm not wild about that.

Optimizations to consider:

If we cdef the constant NaN at the module-level and return it instead of np.nan, the property lookups are about 15% faster. The downside of this is that NaT.day is np.nan is no longer True.
@cython.final may offer an optimization for some methods in cdef classes. Because _NaT no longer subclasses _Timestamp, we can apply that optimization to _Timestamp methods, too. No idea what level of speedup we get.
closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2017-10-28T18:40:19Z

pandas/_libs/tslib.pyx

-    f.__doc__ = getattr(cls, func_name).__doc__
-    return f
+# We patch these NaTType methods with the Timestamp docstrings.
+NaTType.astimezone = _make_error_func('astimezone', Timestamp)


for these 2 you can simply use the datetime doc-strings no?

Timestamp.astimezone is an alias for Timestamp.tz_convert, which has a distinct docstring.

jreback · 2017-10-28T18:40:46Z

pandas/_libs/tslib.pyx


+NaTType.tz_convert = _make_nat_func('tz_convert', Timestamp)
+NaTType.tz_localize = _make_nat_func('tz_localize', Timestamp)
+NaTType.replace = _make_nat_func('replace', Timestamp)


same for .replace, now, today

jreback · 2017-10-28T18:43:12Z

pandas/_libs/tslib.pyx

-    return f
+NaTType.now = _make_nat_func('now', Timestamp)
+NaTType.today = _make_nat_func('today', Timestamp)
+NaTType.round = _make_nat_func('round', Timestamp)


This is very messy. instead define a function in tslib.nattype which does this (IOW calls the _make_nat_func) and pass in the Timestamp.

or even better since there are not that many I would just copy-paste the doc-strings and avoid this whole thing.

or even better since there are not that many I would just copy-paste the doc-strings and avoid this whole thing

I like it!

jreback · 2017-10-28T18:47:34Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -9,12 +9,11 @@ from numpy cimport int64_t

 cimport util

+from nattype import _nat_strings


so I would de-prevatize this in nattype

…libs-nattype6

jbrockmendel · 2017-10-28T21:18:01Z

circleci error:

ImportError: numpy.core.multiarray failed to import
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/pandas/pandas/__init__.py", line 61, in <module>
    from pandas.io.api import *
  File "/home/ubuntu/pandas/pandas/io/api.py", line 9, in <module>
    from pandas.io.excel import ExcelFile, ExcelWriter, read_excel
  File "/home/ubuntu/pandas/pandas/io/excel.py", line 25, in <module>
    import pandas._libs.json as json
SystemError: initialization of json raised unreported exception

Update never mind, I think I see where this came from. Fixing shortly.

jreback · 2017-10-28T21:48:17Z

pandas/_libs/tslibs/nattype.pyx

+    # Timestamp has empty docstring for some methods.
+    utcfromtimestamp = _make_error_func('utcfromtimestamp', None) 
+    fromtimestamp = _make_error_func('fromtimestamp', None)
+    combine = _make_error_func('combine', None)


could add these doc strings as well (separately should fix for Timestamp)

That's in the TODO list, will be sure to follow up.

jreback · 2017-10-28T21:49:46Z

pandas/_libs/tslibs/nattype.pyx

+
+        Returns
+        -------
+        a new Timestamp rounded to the given resolution of `freq`


ought to make these doc strings simpler
eg this doesn’t round but returns a NaT
the parameters are kind of irrelevant

Will add to the todo list.

jreback · 2017-10-28T21:50:28Z

pandas/_libs/tslibs/nattype.pyx

+# ----------------------------------------------------------------------
+
+cdef inline bint _checknull_with_nat(object val):
+    """ utility to check if a value is a nat or not """


deprivatize at some point

codecov · 2017-10-29T17:37:29Z

Codecov Report

Merging #18014 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18014      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50091    50092       +1     
==========================================
- Hits        45703    45696       -7     
- Misses       4388     4396       +8

Flag	Coverage Δ
#multiple	`89.03% <100%> (ø)`	⬆️
#single	`40.24% <0%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/compat/pickle_compat.py	`75.6% <ø> (ø)`	⬆️
pandas/core/tools/datetimes.py	`82.97% <100%> (ø)`	⬆️
pandas/tslib.py	`100% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/io/pytables.py	`92.84% <0%> (+0.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2d0d1b...05e5981. Read the comment docs.

codecov · 2017-10-29T17:37:31Z

Codecov Report

Merging #18014 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18014      +/-   ##
==========================================
- Coverage   91.24%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50114    50115       +1     
==========================================
- Hits        45729    45721       -8     
- Misses       4385     4394       +9

Flag	Coverage Δ
#multiple	`89.04% <100%> (ø)`	⬆️
#single	`40.24% <0%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/compat/pickle_compat.py	`75.6% <ø> (ø)`	⬆️
pandas/core/tools/datetimes.py	`82.97% <100%> (ø)`	⬆️
pandas/tslib.py	`100% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4578a03...4b7593e. Read the comment docs.

…libs-nattype6

jreback · 2017-10-31T00:35:54Z

pandas/_libs/period.pyx


 from tslibs.parsing import parse_time_string, NAT_SENTINEL
 from tslibs.frequencies cimport get_freq_code
+from tslibs.nattype import nat_strings
+from tslibs.nattype cimport _nat_scalar_rules


in followup I could de-privatize this _nat_scalar_rules)

That works. I'm also ok with getting rid of it, since op != Py_NE is pretty easy.

jreback · 2017-10-31T00:36:51Z

pandas/_libs/tslib.pyx

@@ -96,6 +96,10 @@ from tslibs.fields import (
    get_date_name_field, get_start_end_field, get_date_field,
    build_field_sarray)

+from tslibs.nattype import NaT, nat_strings, __nat_unpickle
+# Note: __nat_unpickle needs to be in the namespace for backward compat pickle


? why is that, you already do the pickle compat dance.

I'd have to go back and check the relevant Travis logs, but there was an error a couple days ago. Might have been an xarray thing?

this needs a rebase anyhow. remove the import and let's see.

jreback · 2017-10-31T00:37:11Z

pandas/_libs/tslib.pyx

@@ -96,6 +96,10 @@ from tslibs.fields import (
    get_date_name_field, get_start_end_field, get_date_field,
    build_field_sarray)

+from tslibs.nattype import NaT, nat_strings, __nat_unpickle
+# Note: __nat_unpickle needs to be in the namespace for backward compat pickle
+from tslibs.nattype cimport _checknull_with_nat


in followup, can de-privatize checknull_with_nat

Looking forward to it.

…libs-nattype6

jbrockmendel · 2017-10-31T01:49:32Z

___________ test_pickles[0.20.3-0.20.3_x86_64_darwin_2.7.14.pickle] ____________
[gw0] linux2 -- Python 2.7.14 /home/travis/miniconda3/envs/pandas/bin/python
current_pickle_data = {'cat': {'int16': [0, 1, 2, 3, 4, ..., 995, 996, 997, 998, 999]
Length: 1000
Categories (1000, int64): [0, 1, 2, 3, .....x'], [u'one', u'two']],
      ...2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=[u'first', u'second'])}, ...}
version = '0.20.3', f = '0.20.3_x86_64_darwin_2.7.14.pickle'
    @pytest.mark.parametrize('version, f', legacy_pickle_versions())
    def test_pickles(current_pickle_data, version, f):
        if not is_platform_little_endian():
            pytest.skip("known failure on non-little endian")
    
        vf = tm.get_data_path('legacy_pickle/{}/{}'.format(version, f))
        with catch_warnings(record=True):
>           compare(current_pickle_data, vf, version)
pandas/tests/io/test_pickle.py:207: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/tests/io/test_pickle.py:66: in compare
    data = pandas.read_pickle(vf)
pandas/io/pickle.py:110: in read_pickle
    return try_read(path)
pandas/io/pickle.py:108: in try_read
    lambda f: pc.load(f, encoding=encoding, compat=True))
pandas/io/pickle.py:84: in read_wrapper
    return func(f)
pandas/io/pickle.py:108: in <lambda>
    lambda f: pc.load(f, encoding=encoding, compat=True))
pandas/compat/pickle_compat.py:196: in load
    return up.load()
../../../miniconda3/envs/pandas/lib/python2.7/pickle.py:864: in load
    dispatch[key](self)
../../../miniconda3/envs/pandas/lib/python2.7/pickle.py:1096: in load_global
    klass = self.find_class(module, name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pandas.compat.pickle_compat.Unpickler instance at 0x7f8e9438e320>
module = 'pandas._libs.tslib', name = '__nat_unpickle'
    def find_class(self, module, name):
        # override superclass
        key = (module, name)
        module, name = _class_locations_map.get(key, key)
        __import__(module)
        mod = sys.modules[module]
>       klass = getattr(mod, name)
E       AttributeError: 'module' object has no attribute '__nat_unpickle'

jreback · 2017-10-31T02:04:10Z

pandas/compat/pickle_compat.py

@@ -74,10 +74,12 @@ def load_reduce(self):
        ('pandas._libs.sparse', 'BlockIndex'),
    ('pandas.tslib', 'Timestamp'):
        ('pandas._libs.tslib', 'Timestamp'),
-    ('pandas.tslib', '__nat_unpickle'):
-        ('pandas._libs.tslib', '__nat_unpickle'),


you need the transitive rule as well (and put them together)

('pandas._tslib.tslib', '__nat_unpickle'): ('pandas._libs.tslibs.nattype', '__nat_unpickle')

jreback · 2017-10-31T02:16:55Z

pandas/compat/pickle_compat.py

@@ -74,9 +74,13 @@ def load_reduce(self):
        ('pandas._libs.sparse', 'BlockIndex'),
    ('pandas.tslib', 'Timestamp'):
        ('pandas._libs.tslib', 'Timestamp'),
+    ('pandas._period', 'Period'): ('pandas._libs.period', 'Period'),
+
+    # 18014 moved __nat_unpickle from _libs.tslib-->_libs.tslibs.nattype
    ('pandas.tslib', '__nat_unpickle'):
        ('pandas._libs.tslib', '__nat_unpickle'),


you need to change the to location -> ('pandas._libs.tslibs.nattype', '__nat_unpickle')

So should there be one entry or two? I took the "transitive" comment above to mean two. Are you saying it should just be:

('pandas.tslib', '__nat_unpickle'): ('pandas._libs.tslibs.nattype', '__nat_unpickle')

or

('pandas.tslib', '__nat_unpickle'): ('pandas._libs.tslibs.nattype', '__nat_unpickle'), ('pandas._libs.tslib', '__nat_unpickle'): ('pandas._libs.tslibs.nattype', '__nat_unpickle'),

or something else?

2nd one
should be an easy test
one works one doesn’t

jreback · 2017-10-31T11:26:44Z

needs a rebase

…libs-nattype6

jreback · 2017-11-01T00:09:36Z

thanks @jbrockmendel this was the big one!

move NaT to self-contained module

3926da5

jreback requested changes Oct 28, 2017

View reviewed changes

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Oct 28, 2017

jbrockmendel added 3 commits October 28, 2017 13:22

de-privatize _nat_strings, copy/paste docstrings instead of patching

c232845

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

dd99721

…libs-nattype6

Update pickle_compat

776d581

fix import of NaTType

3a44c2c

jreback requested changes Oct 28, 2017

View reviewed changes

jbrockmendel mentioned this pull request Oct 28, 2017

tslibs TODO list #17652

Closed

59 tasks

jbrockmendel added 3 commits October 28, 2017 22:04

update import; fixes downstream test

009c907

import nat_unpickle to fix backcompat pickle

449d5de

flake8 fixup

05e5981

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

8af334a

…libs-nattype6

jbrockmendel mentioned this pull request Oct 30, 2017

Port Timedelta implementation to tslibs.timedeltas #17937

Merged

4 tasks

jreback requested changes Oct 31, 2017

View reviewed changes

jbrockmendel added 2 commits October 30, 2017 17:48

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

a2c3c7e

…libs-nattype6

remove import to try to recreate test failure

63ff688

jreback reviewed Oct 31, 2017

View reviewed changes

nat_pickle fixup

5f3b512

jreback reviewed Oct 31, 2017

View reviewed changes

another try at nat_pickle

11e699c

Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…

4b7593e

…libs-nattype6

jreback added this to the 0.22.0 milestone Nov 1, 2017

jreback approved these changes Nov 1, 2017

View reviewed changes

jreback merged commit e922016 into pandas-dev:master Nov 1, 2017

jbrockmendel deleted the tslibs-nattype6 branch November 1, 2017 03:10

1kastner pushed a commit to 1kastner/pandas that referenced this pull request Nov 5, 2017

Move NaT to self-contained module (pandas-dev#18014)

c354271

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Move NaT to self-contained module (pandas-dev#18014)

2786fac

		@@ -9,12 +9,11 @@ from numpy cimport int64_t

		cimport util

		from nattype import _nat_strings

Uh oh!

Move NaT to self-contained module #18014

Move NaT to self-contained module #18014

Uh oh!

Conversation

jbrockmendel commented Oct 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 29, 2017

Codecov Report

Uh oh!

codecov bot commented Oct 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 31, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Oct 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Oct 31, 2017

Uh oh!

jreback commented Nov 1, 2017

Uh oh!

Uh oh!

jbrockmendel commented Oct 28, 2017 •

edited

Loading

codecov bot commented Oct 29, 2017 •

edited

Loading

jreback Oct 31, 2017 •

edited

Loading