Skip to content

Move NaT to self-contained module #18014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 1, 2017

Conversation

jbrockmendel
Copy link
Member

This completes the messing-with-NaT sequence of PRs.

Highlights:

  • Removes duplicate definitions of _nat_strings across a couple of modules.
  • tslibs.nattype relies on util but is otherwise free of pandas dependencies.
  • lib currently depends on tslib; this is a big step towards changing this (note that in setup.py tslib is declared as depending on lib. I think that declaration is wrong, but thats a separate issue. Regardless, DAG for the win)

Lowlights:

  • We have to patch some of the NaT method docstrings after importing into tslib. I'm not wild about that.

Optimizations to consider:

  • If we cdef the constant NaN at the module-level and return it instead of np.nan, the property lookups are about 15% faster. The downside of this is that NaT.day is np.nan is no longer True.

  • @cython.final may offer an optimization for some methods in cdef classes. Because _NaT no longer subclasses _Timestamp, we can apply that optimization to _Timestamp methods, too. No idea what level of speedup we get.

  • closes #xxxx

  • tests added / passed

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

  • whatsnew entry

f.__doc__ = getattr(cls, func_name).__doc__
return f
# We patch these NaTType methods with the Timestamp docstrings.
NaTType.astimezone = _make_error_func('astimezone', Timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these 2 you can simply use the datetime doc-strings no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamp.astimezone is an alias for Timestamp.tz_convert, which has a distinct docstring.


NaTType.tz_convert = _make_nat_func('tz_convert', Timestamp)
NaTType.tz_localize = _make_nat_func('tz_localize', Timestamp)
NaTType.replace = _make_nat_func('replace', Timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for .replace, now, today

return f
NaTType.now = _make_nat_func('now', Timestamp)
NaTType.today = _make_nat_func('today', Timestamp)
NaTType.round = _make_nat_func('round', Timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very messy. instead define a function in tslib.nattype which does this (IOW calls the _make_nat_func) and pass in the Timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even better since there are not that many I would just copy-paste the doc-strings and avoid this whole thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even better since there are not that many I would just copy-paste the doc-strings and avoid this whole thing

I like it!

@@ -9,12 +9,11 @@ from numpy cimport int64_t

cimport util

from nattype import _nat_strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would de-prevatize this in nattype

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Oct 28, 2017
@jbrockmendel
Copy link
Member Author

jbrockmendel commented Oct 28, 2017

circleci error:

ImportError: numpy.core.multiarray failed to import
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/pandas/pandas/__init__.py", line 61, in <module>
    from pandas.io.api import *
  File "/home/ubuntu/pandas/pandas/io/api.py", line 9, in <module>
    from pandas.io.excel import ExcelFile, ExcelWriter, read_excel
  File "/home/ubuntu/pandas/pandas/io/excel.py", line 25, in <module>
    import pandas._libs.json as json
SystemError: initialization of json raised unreported exception

Update never mind, I think I see where this came from. Fixing shortly.

# Timestamp has empty docstring for some methods.
utcfromtimestamp = _make_error_func('utcfromtimestamp', None)
fromtimestamp = _make_error_func('fromtimestamp', None)
combine = _make_error_func('combine', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add these doc strings as well (separately should fix for Timestamp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in the TODO list, will be sure to follow up.


Returns
-------
a new Timestamp rounded to the given resolution of `freq`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ought to make these doc strings simpler
eg this doesn’t round but returns a NaT
the parameters are kind of irrelevant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add to the todo list.

# ----------------------------------------------------------------------

cdef inline bint _checknull_with_nat(object val):
""" utility to check if a value is a nat or not """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprivatize at some point

@jbrockmendel jbrockmendel mentioned this pull request Oct 28, 2017
59 tasks
@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

Merging #18014 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18014      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50091    50092       +1     
==========================================
- Hits        45703    45696       -7     
- Misses       4388     4396       +8
Flag Coverage Δ
#multiple 89.03% <100%> (ø) ⬆️
#single 40.24% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/compat/pickle_compat.py 75.6% <ø> (ø) ⬆️
pandas/core/tools/datetimes.py 82.97% <100%> (ø) ⬆️
pandas/tslib.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/io/pytables.py 92.84% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2d0d1b...05e5981. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

Merging #18014 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18014      +/-   ##
==========================================
- Coverage   91.24%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50114    50115       +1     
==========================================
- Hits        45729    45721       -8     
- Misses       4385     4394       +9
Flag Coverage Δ
#multiple 89.04% <100%> (ø) ⬆️
#single 40.24% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/compat/pickle_compat.py 75.6% <ø> (ø) ⬆️
pandas/core/tools/datetimes.py 82.97% <100%> (ø) ⬆️
pandas/tslib.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4578a03...4b7593e. Read the comment docs.


from tslibs.parsing import parse_time_string, NAT_SENTINEL
from tslibs.frequencies cimport get_freq_code
from tslibs.nattype import nat_strings
from tslibs.nattype cimport _nat_scalar_rules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in followup I could de-privatize this _nat_scalar_rules)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works. I'm also ok with getting rid of it, since op != Py_NE is pretty easy.

@@ -96,6 +96,10 @@ from tslibs.fields import (
get_date_name_field, get_start_end_field, get_date_field,
build_field_sarray)

from tslibs.nattype import NaT, nat_strings, __nat_unpickle
# Note: __nat_unpickle needs to be in the namespace for backward compat pickle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? why is that, you already do the pickle compat dance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to go back and check the relevant Travis logs, but there was an error a couple days ago. Might have been an xarray thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a rebase anyhow. remove the import and let's see.

@@ -96,6 +96,10 @@ from tslibs.fields import (
get_date_name_field, get_start_end_field, get_date_field,
build_field_sarray)

from tslibs.nattype import NaT, nat_strings, __nat_unpickle
# Note: __nat_unpickle needs to be in the namespace for backward compat pickle
from tslibs.nattype cimport _checknull_with_nat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in followup, can de-privatize checknull_with_nat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to it.

@jbrockmendel
Copy link
Member Author

___________ test_pickles[0.20.3-0.20.3_x86_64_darwin_2.7.14.pickle] ____________
[gw0] linux2 -- Python 2.7.14 /home/travis/miniconda3/envs/pandas/bin/python
current_pickle_data = {'cat': {'int16': [0, 1, 2, 3, 4, ..., 995, 996, 997, 998, 999]
Length: 1000
Categories (1000, int64): [0, 1, 2, 3, .....x'], [u'one', u'two']],
      ...2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=[u'first', u'second'])}, ...}
version = '0.20.3', f = '0.20.3_x86_64_darwin_2.7.14.pickle'
    @pytest.mark.parametrize('version, f', legacy_pickle_versions())
    def test_pickles(current_pickle_data, version, f):
        if not is_platform_little_endian():
            pytest.skip("known failure on non-little endian")
    
        vf = tm.get_data_path('legacy_pickle/{}/{}'.format(version, f))
        with catch_warnings(record=True):
>           compare(current_pickle_data, vf, version)
pandas/tests/io/test_pickle.py:207: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/tests/io/test_pickle.py:66: in compare
    data = pandas.read_pickle(vf)
pandas/io/pickle.py:110: in read_pickle
    return try_read(path)
pandas/io/pickle.py:108: in try_read
    lambda f: pc.load(f, encoding=encoding, compat=True))
pandas/io/pickle.py:84: in read_wrapper
    return func(f)
pandas/io/pickle.py:108: in <lambda>
    lambda f: pc.load(f, encoding=encoding, compat=True))
pandas/compat/pickle_compat.py:196: in load
    return up.load()
../../../miniconda3/envs/pandas/lib/python2.7/pickle.py:864: in load
    dispatch[key](self)
../../../miniconda3/envs/pandas/lib/python2.7/pickle.py:1096: in load_global
    klass = self.find_class(module, name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pandas.compat.pickle_compat.Unpickler instance at 0x7f8e9438e320>
module = 'pandas._libs.tslib', name = '__nat_unpickle'
    def find_class(self, module, name):
        # override superclass
        key = (module, name)
        module, name = _class_locations_map.get(key, key)
        __import__(module)
        mod = sys.modules[module]
>       klass = getattr(mod, name)
E       AttributeError: 'module' object has no attribute '__nat_unpickle'

@@ -74,10 +74,12 @@ def load_reduce(self):
('pandas._libs.sparse', 'BlockIndex'),
('pandas.tslib', 'Timestamp'):
('pandas._libs.tslib', 'Timestamp'),
('pandas.tslib', '__nat_unpickle'):
('pandas._libs.tslib', '__nat_unpickle'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need the transitive rule as well (and put them together)

('pandas._tslib.tslib', '__nat_unpickle'):
         ('pandas._libs.tslibs.nattype', '__nat_unpickle')

@@ -74,9 +74,13 @@ def load_reduce(self):
('pandas._libs.sparse', 'BlockIndex'),
('pandas.tslib', 'Timestamp'):
('pandas._libs.tslib', 'Timestamp'),
('pandas._period', 'Period'): ('pandas._libs.period', 'Period'),

# 18014 moved __nat_unpickle from _libs.tslib-->_libs.tslibs.nattype
('pandas.tslib', '__nat_unpickle'):
('pandas._libs.tslib', '__nat_unpickle'),
Copy link
Contributor

@jreback jreback Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to change the to location -> ('pandas._libs.tslibs.nattype', '__nat_unpickle')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should there be one entry or two? I took the "transitive" comment above to mean two. Are you saying it should just be:

('pandas.tslib', '__nat_unpickle'): ('pandas._libs.tslibs.nattype', '__nat_unpickle')

or

    ('pandas.tslib', '__nat_unpickle'):
        ('pandas._libs.tslibs.nattype', '__nat_unpickle'),
    ('pandas._libs.tslib', '__nat_unpickle'):
        ('pandas._libs.tslibs.nattype', '__nat_unpickle'),

or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd one
should be an easy test
one works one doesn’t

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

needs a rebase

@jreback jreback added this to the 0.22.0 milestone Nov 1, 2017
@jreback jreback merged commit e922016 into pandas-dev:master Nov 1, 2017
@jreback
Copy link
Contributor

jreback commented Nov 1, 2017

thanks @jbrockmendel this was the big one!

@jbrockmendel jbrockmendel deleted the tslibs-nattype6 branch November 1, 2017 03:10
1kastner pushed a commit to 1kastner/pandas that referenced this pull request Nov 5, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants