-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fixed Unicode decoding error in Period.strftime
when a locale-specific directive is used
#46405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mroeschke
merged 60 commits into
pandas-dev:main
from
smarie:feature/46319_locale_strftime_period
Sep 8, 2022
Merged
BUG: Fixed Unicode decoding error in Period.strftime
when a locale-specific directive is used
#46405
Changes from 28 commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
d177852
Added test representative of #46319. Should fail on CI
c09b76d
Added a gha worker with non utf 8 zh_CN encoding
9dff153
Attempt to fix the encoding so that locale works
2fa0736
Added the fix, but not using it for now, until CI is able to reproduc…
28a0921
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
6b85fb2
Crazy idea: maybe simply removing the .utf8 modifier will use the rig…
409d196
Hopefully fixing the locale not available error
4d09db9
Now simply generating the locale, not updating the ubuntu one
3c1b80c
Trying to install the locale without enabling it
4c687a8
Stupid mistake
e01e051
Testing the optional locale generator condition
a535249
Put back all runners
b4ad6dd
Added whatsnew
0eafd80
Now using the fix
b721238
As per code review: moved locale-switching fixture `overridden_locale…
cebed78
Flake8
a25f3a9
Added comments on the runner
d175c36
Added a non-utf8 locale in the `it_IT` runner. Added the zh_CN.utf8 l…
184e480
Improved readability of fixture `overridden_locale` as per code review
e5f3062
Added two comments on default encoding
22b0ae4
Fixed #46319 by adding a new `char_to_string_locale` function in the …
bbaa0c1
As per code review: modified the test to contain non-utf8 chars. Fixe…
f6857d6
Split the test in two for clarity
60c9e69
Fixed test and flake8 error.
db0e3cc
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
06a9567
Updated whatsnew to ref #46468 . Updated test name
411ccb7
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
e5ab44e
Removing wrong whatsnew bullet
smarie fe35aac
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
0790e72
Nitpick on whatsnew as per code review
d6914f4
Fixed build error rst directive
22c11e7
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
236a2a0
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
befa0a0
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
0105ac9
Names incorrectly reverted in last merge commit
a1f3773
Fixed test_localization so that #46595 can be demonstrated on windows…
0fcea6c
Fixed `tm.set_locale` context manager, it could error and leak when c…
7e18375
Removed the fixture as per code review, and added corresponding param…
a1c6d83
Dummy mod to trigger CI again
78ac13d
reverted dummy mod
3fc0e25
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
9ad0ee1
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
406d304
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
bc1b222
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
fcd2ce2
Attempt to fix the remaining error on the numpy worker
b69b074
Fixed issue in `_from_ordinal`
65c3a1d
Added asserts to try to understand
e5bca9f
Reverted debugging asserts and applied fix for numpy repeat from #47670.
fbfefec
Fixed the last issue on numpy dev: a TypeError message had changed
b64548b
Merge branch 'main' into feature/46319_locale_strftime_period
smarie 4d026cf
Code review: Removed `EXTRA_LOC`
cf3be80
Code review: removed commented line
8f65bd9
Code review: reverted out of scope change
7b414a9
Code review: reverted out of scope change
cb43fd5
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
59b4fe3
Fixed unused import
4950e06
Fixed revert mistake
14f854b
Moved whatsnew to 1.6.0
90c31cb
Update pandas/tests/io/parser/test_quoting.py
smarie 5d467e2
Merge branch 'main' into feature/46319_locale_strftime_period
smarie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the
Locale: it_IT.utf8
isnt working either?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends what you mean by "working". Do you refer to the bug #46319 or to something more general ?
UnicodeError
when usingPeriod.strftime
with non-utf8 locale #46319 happens only when the current locale uses a non-unicode encoding (no utf-8). So it does not happen in any of the workers before this PRzh_CN
locale, that has a non-utf-8 encoding.locale.setlocale
was raising an error (locale not available)sudo locale-gen <locale>
Does that answer your question ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was assumed setting the environment variables
would ensure the locale for the test run would not be the default (en_US.UTF-8) for the other locale builds.
So IIUC, this is necessary if setting a non-utf8 locale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is what I thought too.
Unfortunately when I tried setting
LANG
andLC_ALL
tozh_CN
(non-utf8), then there was a strange segmentation fault appearing (see #46405 (comment))So this is why I now do not set it through these env variables but I only install it.
Maybe the problem is that when the env variables are set, many side effects happen in the build chain, that I do not quite understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're setting the locale this way, can we remove setting the
LANG
andLC_ALL
environment variables here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I have no idea :) At least for the tests in this PR, yes these env variables are not used.
However these env vars are probably related to other locale-related tests, including being able to compile pandas on alternate locale, etc. It seems that in the CI those env variables are used to even add a specific line at the beginning of the init file of pandas ! (see
ci/setup_env.sh
)This seems a bit beyond the scope of this PR. However it is maybe worth keeping track of this, in a dedicated ticket.