-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: to_datetime support iso week year (16607) #25541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just some things from the previous PR. Up to date with master. Moved to 0.25 with up to date description. Added docstrings to the function calc_julian. Things needed to be addressed
|
Codecov Report
@@ Coverage Diff @@
## master #25541 +/- ##
==========================================
- Coverage 91.75% 91.75% -0.01%
==========================================
Files 173 173
Lines 52960 52960
==========================================
- Hits 48596 48595 -1
- Misses 4364 4365 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25541 +/- ##
==========================================
- Coverage 91.25% 91.25% -0.01%
==========================================
Files 172 172
Lines 52977 52977
==========================================
- Hits 48343 48342 -1
- Misses 4634 4635 +1
Continue to review full report at Codecov.
|
@RjLi13 : With the branch rebased, are you now by any chance able to reproduce any of the test failures locally using the override methods that we discussed in the old PR? |
@gfyoung Yes I get the same error as what the CI checks, using the |
Okay, that's good. At least you can debug locally this time around. |
The code was not passing the CI checks for the locale builds because the regex could not match the correct string with the pattern generated from the passed format, and it threw a different error ending the code before it got to the correct error. See this code here: pandas/pandas/_libs/tslibs/strptime.pyx Lines 152 to 158 in 947c346
My tests for failures were not expecting to have that error thrown (see test_tools.py changes) As for why the regex could not match the pattern, my guess is that the locale could not detect English, given the string to search for was '1999 Monday' and the regex pattern for 'zh_CN.utf8' was |
As for the fix, I'm not quite sure what to do. The code works correctly, independent of the locale, because the format wanted for iso_time is %G-%V-%u, which given this snippet pandas/pandas/_libs/tslibs/strptime.pyx Lines 546 to 574 in 947c346
shows the regex is static. The iso code references the strptime function written for cpython here quite liberally. My solution was essentially to check for the locales currently being overridden, and not run the ValueError tests for them. Alternative solutions include getting rid of checking the message matches for the ValueError, since both throw ValueErrors, or deleting the tests that use a format dependent on locale. I'm not sure how to proceed further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loooks pretty good. ping when updated and green.
pandas/_libs/tslibs/strptime.pyx
Outdated
ISO weeks start on Mondays, with week 01 being the week containing 4 Jan. | ||
ISO week days range from 1 (Monday) to 7 (Sunday). | ||
|
||
:param iso_year: the year taken from format %G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
def test_ValueError_iso_week_year(self, msg, s, _format): | ||
# See GH#16607 | ||
if locale.getlocale() != ('zh_CN', 'UTF-8') and \ | ||
locale.getlocale() != ('it_IT', 'UTF-8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont use \
on the break, use parens instead.
name this test: test_error_iso_week_year
@jreback I addressed comments and rebased my branch to latest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RjLi13 lgtm. just some small comments. I think we need to put a link to the stftime docs in pd.to_datetime doc-string (like we do in doc/source/user_guide/timeseries.rst). can do here or in a followup.
# However, as discussed on PR#25541, overriding the locale | ||
# causes a different error to be thrown due to the format being | ||
# locale specific, but the test data is in english. | ||
# Thus, a hack I did was to only run this test if locale was not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove from here down (the comments) as it is enough by here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RjLi13 : Did you address this comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the sentence starting with: "Thus, a hack I did was..." necessary? Comments regarding first-person events are generally omitted, or should be reworded to not use first-person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung I can refactor to not use first person. If you have something in mind already let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could refactor to not use first person, that probably would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
RR (Review Ready) |
thanks @RjLi13 really nice patch! keep em coming! |
* origin/master: DOC: clean bug fix section in whatsnew (pandas-dev#25792) DOC: Fixed PeriodArray api ref (pandas-dev#25526) Move locale code out of tm, into _config (pandas-dev#25757) Unpin pycodestyle (pandas-dev#25789) Add test for rdivmod on EA array (GH23287) (pandas-dev#24047) ENH: Support datetime.timezone objects (pandas-dev#25065) Cython language level 3 (pandas-dev#24538) API: concat on sparse values (pandas-dev#25719) TST: assert_produces_warning works with filterwarnings (pandas-dev#25721) make core.config self-contained (pandas-dev#25613) CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721) TST: Check pytables<3.5.1 when skipping (pandas-dev#25773) DOC: Fix typo in docstring of DataFrame.memory_usage (pandas-dev#25770) Replace dicts with OrderedDicts in groupby aggregation functions (pandas-dev#25693) TST: Fixturize tests/frame/test_missing.py (pandas-dev#25640) DOC: Improve the docsting of Series.iteritems (pandas-dev#24879) DOC: Fix function name. (pandas-dev#25751) Implementing iso_week_year support for to_datetime (pandas-dev#25541) DOC: clarify corr behaviour when using a callable (pandas-dev#25732) remove unnecessary check_output (pandas-dev#25755) # Conflicts: # doc/source/whatsnew/v0.25.0.rst
git diff upstream/master -u -- "*.py" | flake8 --diff
I messed up the rebase of the old PR #24844 and started this again on a clean state. Reviewers please review on this PR!
From old PR
I found the issue stagnant, but since the fix was already in place, I manually took the code @rosygupta made and applied it onto latest master. Rebase wasn't an option that I found would work since the file in question has been split into multiple. Let me know what else needs to be updated.