Skip to content

ENH: to_datetime support iso week year (16607) Updated #24844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 220 commits into from

Conversation

RjLi13
Copy link
Contributor

@RjLi13 RjLi13 commented Jan 20, 2019

I found the issue stagnant, but since the fix was already in place, I manually took the code @rosygupta made and applied it onto latest master. Rebase wasn't an option that I found would work since the file in question has been split into multiple. Let me know what else needs to be updated.

Question: Given the code was created from a reference of cpython https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Lib/_strptime.py#L321
I'm wondering why we can't use cpython's strptime directly and have to make our own implementation?

@codecov
Copy link

codecov bot commented Jan 20, 2019

Codecov Report

Merging #24844 into master will decrease coverage by 49.48%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24844       +/-   ##
===========================================
- Coverage   92.39%    42.9%   -49.49%     
===========================================
  Files         166      166               
  Lines       52378    52378               
===========================================
- Hits        48393    22472    -25921     
- Misses       3985    29906    +25921
Flag Coverage Δ
#multiple ?
#single 42.9% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
pandas/core/tools/numeric.py 10.44% <0%> (-89.56%) ⬇️
... and 123 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4458c1...f3183f0. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 20, 2019

Codecov Report

Merging #24844 into master will decrease coverage by 0.62%.
The diff coverage is 92.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24844      +/-   ##
==========================================
- Coverage   92.38%   91.75%   -0.63%     
==========================================
  Files         166      173       +7     
  Lines       52398    52960     +562     
==========================================
+ Hits        48406    48595     +189     
- Misses       3992     4365     +373
Flag Coverage Δ
#multiple 90.33% <92.5%> (-0.48%) ⬇️
#single 41.71% <29.35%> (-1.18%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 99.35% <ø> (ø) ⬆️
pandas/core/dtypes/base.py 100% <ø> (ø) ⬆️
pandas/core/accessor.py 98.79% <ø> (ø) ⬆️
pandas/core/groupby/ops.py 94.15% <ø> (-2.58%) ⬇️
pandas/core/reshape/tile.py 97.67% <ø> (+2.73%) ⬆️
pandas/io/html.py 92.66% <ø> (ø) ⬆️
pandas/core/reshape/melt.py 97.5% <ø> (ø) ⬆️
pandas/core/config.py 87.04% <ø> (ø) ⬆️
pandas/io/parquet.py 84.61% <ø> (ø) ⬆️
pandas/core/groupby/__init__.py 100% <ø> (ø) ⬆️
... and 108 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b16e2e...f73952d. Read the comment docs.

@sthagen
Copy link

sthagen commented Jan 20, 2019

Failures in travisci build for python 3.5?
https://travis-ci.org/pandas-dev/pandas/jobs/481950579 maybe this reduces in the project codecpv down to zero? Hth

@gfyoung gfyoung added Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype labels Jan 20, 2019
@gfyoung gfyoung requested a review from jreback January 20, 2019 20:01
@RjLi13
Copy link
Contributor Author

RjLi13 commented Jan 21, 2019

Hey so I am currently stuck on making the checks pass. From what I can understand from the CI checks, it seems on Linux builds the test_ValueError_iso_week_year is returning a different error message than expected. Somehow the error message is being caught in

raise ValueError("time data %r does not match "
instead of
if iso_year != -1:
and I'm unclear why this happens when a year ago the changes did pass that time's tests.

@gfyoung
Copy link
Member

gfyoung commented Jan 21, 2019

a year ago the changes did pass that time's tests.

@RjLi13 : Our codebase changes relatively quickly, so a year is essentially an eternity in our world. A lot could have changed since then. 🙂 Here's what I would do:

  • Take the code for your test setup (and the particular pytest case that it's failing) and run the code for that test in your Python console.

  • Look at the stack-trace to see how the logic is being traced. It could be that the logic isn't even getting to your error condition (because this strptime error happens earlier), or it is, but erroneously passing over it, which could indicate a bug in your implementation.

If you're having issues, feel free to share the stacktrace, and we can have a look.

@RjLi13
Copy link
Contributor Author

RjLi13 commented Jan 22, 2019

@gfyoung So I have a Mac running python3.7 and the Azure pipelines show that only Linux builds seems to fail. I don't know what OS travisCI runs on but I don't have this specific environment currently on my machine

Python: 3.5
JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8"

Because of this I can't reproduce the test failures.

Could you give me some steps on how to recreate such an env or reproduce the failures?

@gfyoung
Copy link
Member

gfyoung commented Jan 22, 2019

Could you give me some steps on how to recreate such an env or reproduce the failures?

Generally, such failures can be reproduced locally without having to follow the exact setup for the test. Thus, try running the code in your development environment.

If you can't reproduce, ping us again.

@RjLi13
Copy link
Contributor Author

RjLi13 commented Jan 22, 2019

Here's my console output on my dev environment. I followed the instructions to run my tests from the doc (they contain 'iso_week_year')

(pandas-dev) Ruijings-MacBook-Pro:pandas ruijing$ pytest pandas/tests/indexes/datetimes/test_tools.py -v -k iso_week_year
======================================= test session starts ========================================
platform darwin -- Python 3.7.2, pytest-4.1.1, py-1.7.0, pluggy-0.8.0 -- /Users/ruijing/anaconda3/envs/pandas-dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'ci' -> deadline=500, timeout=unlimited, suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('/Users/ruijing/pandas/.hypothesis/examples')
rootdir: /Users/ruijing/pandas, inifile: setup.cfg
plugins: xdist-1.26.0, forked-0.2, cov-2.6.1, hypothesis-4.0.1
collected 379 items / 363 deselected                                                               

pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_to_datetime_iso_week_year_format[2015-1-1-%G-%V-%u-dt0] PASSED [  6%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_to_datetime_iso_week_year_format[2015-1-4-%G-%V-%u-dt1] PASSED [ 12%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_to_datetime_iso_week_year_format[2015-1-7-%G-%V-%u-dt2] PASSED [ 18%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO week directive '%V' must be used with the ISO year directive '%G' and a weekday directive '%A', '%a', '%w', or '%u'.-1999 50-%Y %V] PASSED [ 25%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO year directive '%G' must be used with the ISO week directive '%V' and a weekday directive '%A', '%a', '%w', or '%u'.-1999 51-%G %V] PASSED [ 31%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO year directive '%G' must be used with the ISO week directive '%V' and a weekday directive '%A', '%a', '%w', or '%u'.-1999 Monday-%G %A] PASSED [ 37%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO year directive '%G' must be used with the ISO week directive '%V' and a weekday directive '%A', '%a', '%w', or '%u'.-1999 Mon-%G %a] PASSED [ 43%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO year directive '%G' must be used with the ISO week directive '%V' and a weekday directive '%A', '%a', '%w', or '%u'.-1999 6-%G %w] PASSED [ 50%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO year directive '%G' must be used with the ISO week directive '%V' and a weekday directive '%A', '%a', '%w', or '%u'.-1999 6-%G %u] PASSED [ 56%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO year directive '%G' must be used with the ISO week directive '%V' and a weekday directive '%A', '%a', '%w', or '%u'.-2051-%G] PASSED [ 62%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[Day of the year directive '%j' is not compatible with ISO year directive '%G'. Use '%Y' instead.-1999 51 6 256-%G %V %u %j] PASSED [ 68%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO week directive '%V' is incompatible with the year directive '%Y'. Use the ISO year '%G' instead.-1999 51 Sunday-%Y %V %A] PASSED [ 75%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO week directive '%V' is incompatible with the year directive '%Y'. Use the ISO year '%G' instead.-1999 51 Sun-%Y %V %a] PASSED [ 81%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO week directive '%V' is incompatible with the year directive '%Y'. Use the ISO year '%G' instead.-1999 51 1-%Y %V %w] PASSED [ 87%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO week directive '%V' is incompatible with the year directive '%Y'. Use the ISO year '%G' instead.-1999 51 1-%Y %V %u] PASSED [ 93%]
pandas/tests/indexes/datetimes/test_tools.py::TestToDatetime::test_ValueError_iso_week_year[ISO week directive '%V' must be used with the ISO year directive '%G' and a weekday directive '%A', '%a', '%w', or '%u'.-20-%V] PASSED [100%]

============================ 16 passed, 363 deselected in 0.17 seconds =============================
(pandas-dev) Ruijings-MacBook-Pro:pandas ruijing$ 

In addition, I ran pytest pandas and pytest pandas/tests/indexes/datetimes/test_tools.py -v with no failures reported. Please advise next steps.

@gfyoung
Copy link
Member

gfyoung commented Jan 22, 2019

@RjLi13 : Cool! Thanks for doing that. Alright, so what I'm observing is that all of the relevant failures occurred on Python 3.x environments with a locale override, which are:

  • zh_CN.UTF-8
  • it_IT.UTF-8

Do you mind overriding your locale first, double check that the override works, and then run those same test commands again?

week_starts_Mon)
if julian == -1 and weekday != -1:
if week_of_year != -1:
week_starts_Mon = True if week_of_year_start == 0 else False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just week_starts_Mon = week_of_year_start == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@RjLi13 RjLi13 closed this Jan 27, 2019
@RjLi13 RjLi13 reopened this Jan 27, 2019
@RjLi13
Copy link
Contributor Author

RjLi13 commented Jan 27, 2019

@gfyoung can you explain how to override the locale? I'm not quite understanding what you linked.

Edit: From my understanding, I just run the 'ci/run_test.sh` script with -n and one of the locale options you provided?

@gfyoung
Copy link
Member

gfyoung commented Jan 27, 2019

@RjLi13 : Before you run pytest, you should run those three commands (lines 16 - 18 of the highlight) in your terminal in order to:

  1. Change locale in your session
  2. Confirm that locale was changed

Just do that and then run your original pytest command.

@RjLi13
Copy link
Contributor Author

RjLi13 commented Jan 27, 2019

@gfyoung I'm having trouble having pandas detect my system locale ovveride. Running lines 16-17 did change my system locale, I can confirm that with the locale command. But running pandas.get_option("display.encoding") still gives me 'utf-8'

@gfyoung
Copy link
Member

gfyoung commented Jan 27, 2019

But running pandas.get_option("display.encoding") still gives me 'utf-8'

Hmmm...I see...since it is successfully changed after you check with the locale command, can you try running tests anyway and see what happens?

@RjLi13
Copy link
Contributor Author

RjLi13 commented Mar 5, 2019

Rebase gone wrong, not sure how to fix. I'm closing the PR and remaking a new one which I will reference here.

@RjLi13 RjLi13 closed this Mar 5, 2019
@RjLi13 RjLi13 deleted the pandas-rosy-new branch March 5, 2019 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_datetime should support ISO week year