-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: UnicodeError
when using Period.strftime
with non-utf8 locale
#46319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll try to fix it in #46116 |
It actually seems that Maybe we should rather use |
…e string returned by `c_strftime`.
Note that this issue may also happen when plotting timeseries with matplotlib. Indeed the x-axis labels formatter seems to be using WARNING: C:\(...)\doc\examples\1_demo.py failed to execute correctly: Traceback (most recent call last):
File "C:\(...)\.nox\doc\lib\site-packages\sphinx_gallery\scrapers.py", line 378, in save_figures
rst = scraper(block, block_vars, gallery_conf)
File "C:\(...)\.nox\doc\lib\site-packages\sphinx_gallery\scrapers.py", line 171, in matplotlib_scraper
fig.savefig(image_path, **these_kwargs)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\figure.py", line 3019, in savefig
self.canvas.print_figure(fname, **kwargs)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\backend_bases.py", line 2319, in print_figure
result = print_method(
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\backend_bases.py", line 1648, in wrapper
return func(*args, **kwargs)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\_api\deprecation.py", line 412, in wrapper
return func(*inner_args, **inner_kwargs)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\backends\backend_agg.py", line 540, in print_png
FigureCanvasAgg.draw(self)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\backends\backend_agg.py", line 436, in draw
self.figure.draw(self.renderer)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\artist.py", line 73, in draw_wrapper
result = draw(artist, renderer, *args, **kwargs)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
return draw(artist, renderer)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\figure.py", line 2810, in draw
mimage._draw_list_compositing_images(
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
a.draw(renderer)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
return draw(artist, renderer)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\axes\_base.py", line 3082, in draw
mimage._draw_list_compositing_images(
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
a.draw(renderer)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
return draw(artist, renderer)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\axis.py", line 1158, in draw
ticks_to_draw = self._update_ticks()
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\axis.py", line 1046, in _update_ticks
major_labels = self.major.formatter.format_ticks(major_locs)
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\ticker.py", line 224, in format_ticks
return [self(value, i) for i, value in enumerate(values)]
File "C:\(...)\.nox\doc\lib\site-packages\matplotlib\ticker.py", line 224, in <listcomp>
return [self(value, i) for i, value in enumerate(values)]
File "C:\(...)\.nox\doc\lib\site-packages\pandas\plotting\_matplotlib\converter.py", line 1074, in __call__
return Period(ordinal=int(x), freq=self.freq).strftime(fmt)
File "pandas\_libs\tslibs\period.pyx", line 2458, in pandas._libs.tslibs.period._Period.strftime
File "pandas\_libs\tslibs\period.pyx", line 1225, in pandas._libs.tslibs.period.period_format
File "pandas\_libs\tslibs\period.pyx", line 1258, in pandas._libs.tslibs.period._period_strftime
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte |
…on in the `tslibs.util` module, able to decode char* using the current locale.
Finally note that the issue also happens when there is a non-utf8 character in the formatting string: import pandas as pd
import locale
locale.setlocale(locale.LC_ALL, "fr_FR")
per = pd.Period("2018-03-11 13:00", freq="H")
assert per.strftime("é") == "é" # AssertionError In that case no error is raised but the output string does not correspond to the expected one. Thanks @jreback for suggesting that this might fail ! |
I moved the above into a dedicated issue for clarity, as the issues have similar causes but are not related |
…specific directive is used (pandas-dev#46405) * Added test representative of pandas-dev#46319. Should fail on CI * Added a gha worker with non utf 8 zh_CN encoding * Attempt to fix the encoding so that locale works * Added the fix, but not using it for now, until CI is able to reproduce the issue. * Crazy idea: maybe simply removing the .utf8 modifier will use the right encoding ! * Hopefully fixing the locale not available error * Now simply generating the locale, not updating the ubuntu one * Trying to install the locale without enabling it * Stupid mistake * Testing the optional locale generator condition * Put back all runners * Added whatsnew * Now using the fix * As per code review: moved locale-switching fixture `overridden_locale` to conftest * Flake8 * Added comments on the runner * Added a non-utf8 locale in the `it_IT` runner. Added the zh_CN.utf8 locale in the tests * Improved readability of fixture `overridden_locale` as per code review * Added two comments on default encoding * Fixed pandas-dev#46319 by adding a new `char_to_string_locale` function in the `tslibs.util` module, able to decode char* using the current locale. * As per code review: modified the test to contain non-utf8 chars. Fixed the resulting issue. * Split the test in two for clarity * Fixed test and flake8 error. * Updated whatsnew to ref pandas-dev#46468 . Updated test name * Removing wrong whatsnew bullet * Nitpick on whatsnew as per code review * Fixed build error rst directive * Names incorrectly reverted in last merge commit * Fixed test_localization so that pandas-dev#46595 can be demonstrated on windows targets (even if today these do not run on windows targets, see pandas-dev#46597) * Fixed `tm.set_locale` context manager, it could error and leak when category LC_ALL was used. Fixed pandas-dev#46595 * Removed the fixture as per code review, and added corresponding parametrization in tests. * Dummy mod to trigger CI again * reverted dummy mod * Attempt to fix the remaining error on the numpy worker * Fixed issue in `_from_ordinal` * Added asserts to try to understand * Reverted debugging asserts and applied fix for numpy repeat from pandas-dev#47670. * Fixed the last issue on numpy dev: a TypeError message had changed * Code review: Removed `EXTRA_LOC` * Code review: removed commented line * Code review: reverted out of scope change * Code review: reverted out of scope change * Fixed unused import * Fixed revert mistake * Moved whatsnew to 1.6.0 * Update pandas/tests/io/parser/test_quoting.py Co-authored-by: Sylvain MARIE <[email protected]>
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Period.strftime("%p")
prints the locale-specific version of AM or PM. When the locale uses a non-utf8 compliant encoding, it crashes.This bug does not happen with others, for example
Timestamp.strftime
.Expected Behavior
No error, printing the actual string representing AM or PM
Installed Versions
This is on the
main
branch headThe text was updated successfully, but these errors were encountered: