Skip to content

Not tested: Period.strftime and PeriodIndex.strftime with non-ascii char present in the formatting string #46468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
smarie opened this issue Mar 22, 2022 · 2 comments · Fixed by #46405
Closed
1 task done
Labels
Bug Period Period data type Unicode Unicode strings

Comments

@smarie
Copy link
Contributor

smarie commented Mar 22, 2022

EDIT : as I found out this was not a bug in the main branch, I introduced it accidentally in #46405.
Therefore I rename the ticket as "not tested", meaning that current test suite does not cover this case.

Pandas version checks

  • I have checked that this issue has not already been reported.

Reproducible Example

import pandas as pd
import locale

locale.setlocale(locale.LC_ALL, "fr_FR")
per = pd.Period("2018-03-11 13:00", freq="H")
assert per.strftime("%Y é") == "2018 é"

Issue Description

When there is a non-ascii character in the formatting string passed to Period.strftime the result may be corrupted. This is most probably related to an encoding error.

Expected Behavior

The assert should pass without error

@smarie smarie added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 22, 2022
@smarie
Copy link
Contributor Author

smarie commented Mar 22, 2022

This issue was extracted from #46319 for clarity

@smarie
Copy link
Contributor Author

smarie commented Mar 22, 2022

I just realized that this bug was actually introduced in my PR #46405
There was no test for this, introducing one is the only thing new that can be done.

  • Renamed accordingly
  • Edited original post so that it describes the test to create

@smarie smarie changed the title BUG: Period.strftime and PeriodIndex.strftime output incorrect results when non-ascii char is present in the formatting string Not tested: Period.strftime and PeriodIndex.strftime with non-ascii char present in the formatting string Mar 22, 2022
@jreback jreback added this to the 1.5 milestone Mar 22, 2022
@jreback jreback added Unicode Unicode strings Period Period data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 22, 2022
@mroeschke mroeschke removed this from the 1.5 milestone Aug 15, 2022
mroeschke pushed a commit that referenced this issue Sep 8, 2022
…specific directive is used (#46405)

* Added test representative of #46319. Should fail on CI

* Added a gha worker with non utf 8 zh_CN encoding

* Attempt to fix the encoding so that locale works

* Added the fix, but not using it for now, until CI is able to reproduce the issue.

* Crazy idea: maybe simply removing the .utf8 modifier will use the right encoding !

* Hopefully fixing the locale not available error

* Now simply generating the locale, not updating the ubuntu one

* Trying to install the locale without enabling it

* Stupid mistake

* Testing the optional locale generator condition

* Put back all runners

* Added whatsnew

* Now using the fix

* As per code review: moved locale-switching fixture `overridden_locale` to conftest

* Flake8

* Added comments on the runner

* Added a non-utf8 locale in the `it_IT` runner. Added the zh_CN.utf8 locale in the tests

* Improved readability of fixture `overridden_locale` as per code review

* Added two comments on default encoding

* Fixed #46319 by adding a new `char_to_string_locale` function in the `tslibs.util` module, able to decode char* using the current locale.

* As per code review: modified the test to contain non-utf8 chars. Fixed the resulting issue.

* Split the test in two for clarity

* Fixed test and flake8 error.

* Updated whatsnew to ref #46468 . Updated test name

* Removing wrong whatsnew bullet

* Nitpick on whatsnew as per code review

* Fixed build error rst directive

* Names incorrectly reverted in last merge commit

* Fixed test_localization so that #46595 can be demonstrated on windows targets (even if today these do not run on windows targets, see #46597)

* Fixed `tm.set_locale` context manager, it could error and leak when category LC_ALL was used. Fixed #46595

* Removed the fixture as per code review, and added corresponding parametrization in tests.

* Dummy mod to trigger CI again

* reverted dummy mod

* Attempt to fix the remaining error on the numpy worker

* Fixed issue in `_from_ordinal`

* Added asserts to try to understand

* Reverted debugging asserts and applied fix for numpy repeat from #47670.

* Fixed the last issue on numpy dev: a TypeError message had changed

* Code review: Removed `EXTRA_LOC`

* Code review: removed commented line

* Code review: reverted out of scope change

* Code review: reverted out of scope change

* Fixed unused import

* Fixed revert mistake

* Moved whatsnew to 1.6.0

* Update pandas/tests/io/parser/test_quoting.py

Co-authored-by: Sylvain MARIE <[email protected]>
noatamir pushed a commit to noatamir/pandas that referenced this issue Nov 9, 2022
…specific directive is used (pandas-dev#46405)

* Added test representative of pandas-dev#46319. Should fail on CI

* Added a gha worker with non utf 8 zh_CN encoding

* Attempt to fix the encoding so that locale works

* Added the fix, but not using it for now, until CI is able to reproduce the issue.

* Crazy idea: maybe simply removing the .utf8 modifier will use the right encoding !

* Hopefully fixing the locale not available error

* Now simply generating the locale, not updating the ubuntu one

* Trying to install the locale without enabling it

* Stupid mistake

* Testing the optional locale generator condition

* Put back all runners

* Added whatsnew

* Now using the fix

* As per code review: moved locale-switching fixture `overridden_locale` to conftest

* Flake8

* Added comments on the runner

* Added a non-utf8 locale in the `it_IT` runner. Added the zh_CN.utf8 locale in the tests

* Improved readability of fixture `overridden_locale` as per code review

* Added two comments on default encoding

* Fixed pandas-dev#46319 by adding a new `char_to_string_locale` function in the `tslibs.util` module, able to decode char* using the current locale.

* As per code review: modified the test to contain non-utf8 chars. Fixed the resulting issue.

* Split the test in two for clarity

* Fixed test and flake8 error.

* Updated whatsnew to ref pandas-dev#46468 . Updated test name

* Removing wrong whatsnew bullet

* Nitpick on whatsnew as per code review

* Fixed build error rst directive

* Names incorrectly reverted in last merge commit

* Fixed test_localization so that pandas-dev#46595 can be demonstrated on windows targets (even if today these do not run on windows targets, see pandas-dev#46597)

* Fixed `tm.set_locale` context manager, it could error and leak when category LC_ALL was used. Fixed pandas-dev#46595

* Removed the fixture as per code review, and added corresponding parametrization in tests.

* Dummy mod to trigger CI again

* reverted dummy mod

* Attempt to fix the remaining error on the numpy worker

* Fixed issue in `_from_ordinal`

* Added asserts to try to understand

* Reverted debugging asserts and applied fix for numpy repeat from pandas-dev#47670.

* Fixed the last issue on numpy dev: a TypeError message had changed

* Code review: Removed `EXTRA_LOC`

* Code review: removed commented line

* Code review: reverted out of scope change

* Code review: reverted out of scope change

* Fixed unused import

* Fixed revert mistake

* Moved whatsnew to 1.6.0

* Update pandas/tests/io/parser/test_quoting.py

Co-authored-by: Sylvain MARIE <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type Unicode Unicode strings
Projects
None yet
4 participants