Skip to content

Implement npy_dtime.pyx #17805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 29, 2017
Merged

Conversation

jbrockmendel
Copy link
Member

npy_dtime is a util-like module for tslibs. Actually, a better analogy would be src/datetime.pxd (which it ultimately is intended to replace).

Upcoming steps require _check_dts_bounds and OutOfBoundsDatetime be in a module upstream from tslib. The exception OutOfBoundsDatetime cannot be defined in a pxd file, so this can't just be shunted into src/datetime.pxd.

Using a .pyx file instead of a .pxd file is that dependency specification in setup.py gets less fragile. After updating e.g. tslibs.fields to cimport from npy_dtime, it will no longer need tseries_depends and pandas/_libs/src/datetime/np_datetime.c etc in its ext_data entry.

This also implements two small convenience functions dtstruct_to_dt64 and dt64_to_dtstruct that pass through to pandas_datetimestruct_to_datetime and pandas_datetime_to_datetimestruct respectively in the overwhelmingly-most-common case where unit=PANDAS_FR_ns. To keep the footprint small this PR only demonstrates their usage in one location in strptime. It's a lot less verbose, ultimately gets rid of a lot of unsightly wrapped lines.

@pep8speaks
Copy link

pep8speaks commented Oct 6, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 29, 2017 at 20:27 Hours UTC

@sinhrks sinhrks added the Clean label Oct 6, 2017
@codecov
Copy link

codecov bot commented Oct 7, 2017

Codecov Report

Merging #17805 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17805      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49978    49978              
==========================================
- Hits        45611    45602       -9     
- Misses       4367     4376       +9
Flag Coverage Δ
#multiple 89.04% <ø> (ø) ⬆️
#single 40.24% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e63c935...ce7197d. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 7, 2017

Codecov Report

Merging #17805 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17805      +/-   ##
==========================================
- Coverage   91.24%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50091    50099       +8     
==========================================
- Hits        45704    45703       -1     
- Misses       4387     4396       +9
Flag Coverage Δ
#multiple 89.03% <ø> (ø) ⬆️
#single 40.24% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/categorical.py 95.79% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5959ee3...e017a52. Read the comment docs.

@jbrockmendel
Copy link
Member Author

The flake8 warning is pretty unavoidable here.

@jbrockmendel
Copy link
Member Author

This is a blocker for upcoming work.

@jreback
Copy link
Contributor

jreback commented Oct 15, 2017

can we name this something else, maybe just spell it out

numpy_datetime or numpy_compat

@jreback
Copy link
Contributor

jreback commented Oct 15, 2017

np_datetime would be consistent

@jbrockmendel
Copy link
Member Author

Changing to np_datetime, though with some misgivings about the exact name match.

@jbrockmendel
Copy link
Member Author

suggestions for moving this down the field?

@jreback
Copy link
Contributor

jreback commented Oct 21, 2017

suggestions for moving this down the field?

what do you mean?

@jbrockmendel
Copy link
Member Author

what do you mean?

The circleci errors are in test_grouped_box_multiple_axes and test_boxplot_legacy, which I think are unrelated, and the travis error is a flake8 complaint that is pretty unavoidable. Not sure what needs to be done to get this approved+merged.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2017

The circleci errors are in test_grouped_box_multiple_axes and test_boxplot_legacy, which I think are unrelated, and the travis error is a flake8 complaint that is pretty unavoidable. Not sure what needs to be done to get this approved+merged.

did you rebase on master? these are working ok there.

@jbrockmendel
Copy link
Member Author

did you rebase on master? these are working ok there.

Looks like that fixed the CircleCI error, good call. Looks like a build stalled on Travis; just took another swing at fixing the flake8 error and re-pushed. With a little luck we'll get to all-green.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2017

fyi i always keep master up to date and anytime i push i rebase
makes it easier

@@ -58,6 +57,9 @@ from datetime cimport (
from datetime import timedelta, datetime
from datetime import time as datetime_time

from tslibs.np_datetime cimport check_dts_bounds as _check_dts_bounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for the rename?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because you told me to change it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the de-privatize is what I am referring. I don't recall asking for that and its not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. De-privatized because it isn't actually private when is this module. But I'll revert it anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-privatize here (you did elsewhere)

# -*- coding: utf-8 -*-
# cython: profile=False

from numpy cimport int64_t, int32_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could actually just call this module util i think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid name overlap with the existing libs/src/utilmodule. Also because in the dev branch I've ported src/util to a pure-cython (no C deps--> much simpler setup.py) tslibs.util. Don't want to get those mixed up.

My first choice is still the original npy_dtime, since np_datetime overlaps with the existing libs/src/datetime/np_datetime files.

pass


cdef inline _check_dts_bounds(pandas_datetimestruct *dts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this should be de-privatized

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do it since that's how it was before: #17805 (comment)

@jorisvandenbossche jorisvandenbossche added this to the 0.22.0 milestone Oct 24, 2017
@jbrockmendel jbrockmendel mentioned this pull request Oct 27, 2017
59 tasks
@@ -58,6 +57,9 @@ from datetime cimport (
from datetime import timedelta, datetime
from datetime import time as datetime_time

from tslibs.np_datetime cimport check_dts_bounds as _check_dts_bounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-privatize here (you did elsewhere)

int32_t month, day, hour, min, sec, us, ps, as


cdef check_dts_bounds(pandas_datetimestruct *dts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need a void here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, saw your note, pls add to the list to add this as void (if possible / needed), not sure.


cdef check_dts_bounds(pandas_datetimestruct *dts)

cdef int64_t dtstruct_to_dt64(pandas_datetimestruct* dts) nogil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where these nogil before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analogous src/datetime functions are, yes.

pass


cdef inline check_dts_bounds(pandas_datetimestruct *dts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void?

iresult[i] = dtstruct_to_dt64(&dts)
try:
check_dts_bounds(&dts)
except ValueError:
if is_coerce:
iresult[i] = NPY_NAT
continue
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove the else (as we have a continue)

setup.py Outdated
@@ -493,6 +494,9 @@ def pxd(name):
'pxdfiles': ['_libs/src/util', '_libs/lib'],
'depends': tseries_depends,
'sources': npdt_srces},
'_libs.tslibs.np_datetime': {'pyxfile': '_libs/tslibs/np_datetime',
'depends': tseries_depends[:2],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather be explict here rather than depend on the ordering (you can also make a ctseries_depends or whatever and use that too).

setup.py Outdated
@@ -493,6 +494,9 @@ def pxd(name):
'pxdfiles': ['_libs/src/util', '_libs/lib'],
'depends': tseries_depends,
'sources': npdt_srces},
'_libs.tslibs.np_datetime': {'pyxfile': '_libs/tslibs/np_datetime',
'depends': tseries_depends[:2],
'sources': npdt_srces},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just spell things out, np_datetime_sources is much more readable (and change all references)

@jreback jreback added the Datetime Datetime data dtype label Oct 28, 2017


cdef inline void check_dts_bounds(pandas_datetimestruct *dts):
"""Returns True if an error needs to be raised"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string, and change this (which is wrong), it will just raise if there is an oob date

dts.day, dts.hour,
dts.min, dts.sec)
raise OutOfBoundsDatetime(
'Out of bounds nanosecond timestamp: %s' % fmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use {} in formatting. I would also pass dts to the OutOfBounds Constructor and format it there (a bit more idiomatic I think).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'll implement that half of this that I know how to do; will work on my .format-fu and follow-up with the rest.

@jbrockmendel
Copy link
Member Author

Adding the "void" to check_dts_bounds caused a bunch of errors because exceptions were ignored (i.e. no except -1). The current implementation in tslib doesn't have the void, so I just reverted that.

@jreback
Copy link
Contributor

jreback commented Oct 29, 2017

some linting errors.

@jreback jreback merged commit a355ed2 into pandas-dev:master Oct 29, 2017
@jreback
Copy link
Contributor

jreback commented Oct 29, 2017

thanks!

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
@jbrockmendel jbrockmendel deleted the tslibs-npy_dtime2 branch December 8, 2017 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants