Skip to content

standalone implementation of ccalendar #18540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Dec 7, 2017

Conversation

jbrockmendel
Copy link
Member

For the moment only updated cimports in fields.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18540 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18540      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.03%     
==========================================
  Files         164      164              
  Lines       49801    49801              
==========================================
- Hits        45494    45484      -10     
- Misses       4307     4317      +10
Flag Coverage Δ
#multiple 89.13% <ø> (-0.01%) ⬇️
#single 40.81% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.61% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88ab693...d91ec05. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18540 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18540      +/-   ##
==========================================
- Coverage   91.59%   91.57%   -0.02%     
==========================================
  Files         153      153              
  Lines       51212    51212              
==========================================
- Hits        46908    46898      -10     
- Misses       4304     4314      +10
Flag Coverage Δ
#multiple 89.43% <ø> (-0.01%) ⬇️
#single 40.67% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.59% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 695e893...6b245cf. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Nov 28, 2017

what PR was this originally implemented in?

@jreback jreback added Clean Datetime Datetime data dtype labels Nov 28, 2017

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef monthrange(int64_t year, Py_ssize_t month):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this as a function that returns a tuple. its it simply better to split into 2, e.g. monthday and dayofweek? its 2 functions, but then these become all nogil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two separate functions also exist here. There's not much reason to retain monthrange, you're right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you changing this in this PR?


y -= m < 3
day = (y + y / 4 - y / 100 + y / 400 + sakamoto_arr[m - 1] + d) % 7
# convert to python day
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a link to python dayofweek docs (e.g. its sunday 0, -6)

# convert to python day
return (day + 6) % 7


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on get_week_of_year you have boundscheck/wraparound but not here, do we need in either?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment about is_leapyear? If so, is_leapyear doesn't do any array lookups, so those decorators are irrelevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea

@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-
# cython: profile=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this only used in fields? no-where else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only implemented in fields. Haven't updated others yet. A couple places in offsets, a few more fastpaths in Timestamp.

@jbrockmendel
Copy link
Member Author

what PR was this originally implemented in?

#18489, but that was reverted as out-of-scope

@jbrockmendel
Copy link
Member Author

Just pushed a commit that uses ccalendar functions to implement fastpaths for some of the Timestamp properties (ones not covered by #18539)

master

In [2]: ts = pd.Timestamp.now()
In [3]: %timeit ts.is_month_end
The slowest run took 34.08 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 26.8 µs per loop

PR

In [2]: ts = pd.Timestamp.now()
In [3]: %timeit ts.is_month_end
The slowest run took 27.70 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.19 µs per loop


@cython.wraparound(False)
@cython.boundscheck(False)
cpdef monthrange(int64_t year, Py_ssize_t month):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you changing this in this PR?



@cython.wraparound(False)
@cython.boundscheck(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you changing the boundscheck/wraparound in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadn't planned on it. Are they harmful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob not harmful but extra code and confusing. pls change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in get_week_of_year there are array lookups, so disabling these checks should be perf-relevant, shouldn't they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are the array lookups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 174 lookup in month_offset

@jbrockmendel
Copy link
Member Author

are you changing [monthrange] in this PR?

Hadn't planned on it. Even if it shouldn't be used, ATM it still is used in a few places. Besides I wrote a really nice docstring.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2017

Hadn't planned on it. Even if it shouldn't be used, ATM it still is used in a few places. Besides I wrote a really nice docstring.

what changed? we had this discussion about why returning a tuple is a bad idea

@cython.wraparound(False)
@cython.boundscheck(False)
@cython.cdivision
cdef int dayofweek(int y, int m, int d) nogil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are duplicating things from np_datetime.c ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are duplicating things from np_datetime.c ?

@jorisvandenbossche think porting less than duplicating. The np_datetime.c version is never actually used in src/datetime (will remove old version in follow-up). As long as perf doesn't take a hit, the default should by python/cython right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand your explanation. Previously it was imported it from np_datetime.c, and you reimplemented it here and changed that import. That seems like duplication rather than porting?

dayofweek is maybe not used in np_datetime.c itself, but is_leapyear is, so that one cannot just be removed there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like duplication rather than porting?

You're right. The intention is to remove the np_datetime.c version in a follow-up PR.

is_leapyear will be duplicated because it is used in np_datetime.c, as you noted. The idea is that np_datetime.c* is a heavy dependency (in practice tends to come with tslibs.np_datetime and np_datetime_strings.c) which can be replaced with a much lighter (and more pythonic) ccalendar dependency in some cases.

* The name overlap between np_datetime.c and tslibs.np_datetime is not my favorite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather include extra deps than duplicate code. mark this for a followup.

@jbrockmendel
Copy link
Member Author

what changed? we had this discussion about why returning a tuple is a bad idea

Will remove.



@cython.wraparound(False)
@cython.boundscheck(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are the array lookups?

@jreback
Copy link
Contributor

jreback commented Dec 7, 2017

rebase and will look again

@cython.wraparound(False)
@cython.boundscheck(False)
@cython.cdivision
cdef int dayofweek(int y, int m, int d) nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather include extra deps than duplicate code. mark this for a followup.

@jreback jreback added this to the 0.22.0 milestone Dec 7, 2017
@jreback jreback merged commit 3e506a3 into pandas-dev:master Dec 7, 2017
@jbrockmendel jbrockmendel deleted the tslibs-ccalendar branch December 7, 2017 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants