Skip to content

BUG: tm.set_locale does not correctly set back the initial locale when locale.LC_ALL is used. tm.can_set_locale is impacted by the leak #46595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
smarie opened this issue Apr 1, 2022 · 6 comments · Fixed by #47570
Labels
Bug Localization Internationalization of data
Milestone

Comments

@smarie
Copy link
Contributor

smarie commented Apr 1, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import locale
import pandas._testing as tm

current_locale = locale.setlocale(locale.LC_ALL)

print(current_locale)
# On my machine (FR):
# >>> 'LC_COLLATE=C;LC_CTYPE=French_France.1252;LC_MONETARY=C;LC_NUMERIC=C;LC_TIME=C'

print(tm.can_set_locale("it_IT.utf8", locale.LC_ALL))  # False

new_current_locale = locale.setlocale(locale.LC_ALL)
assert new_current_locale == current_locale  # AssertionError: the contextmanager leaked
print(new_current_locale)  # 'it_IT.UTF-8' !!!!

Issue Description

The call to tm.can_set_locale actually contains a silent error when the inner tm.set_locale tries to set back the original locale. Because this error happens, the original locale is not set back !

The issue is actually due to a mistake in the code for set_locale: indeed the way it captures the initial locale (to set it back at the end) is wrong.

current_locale = locale.getlocale() is used, however this returns only the locale for the default category LC_CTYPE (see python doc)

What should be used instead is current_locale = locale.setlocale(lc_var). That way it is also working if lc_var is LC_ALL

Note that the example in the cpython documentation is misleading, I'll open an issue there as well.

Expected Behavior

No assertion error at the end of the example.

Installed Versions

AssertionError: C:\Miniconda3\envs\pandas-dev\lib\distutils\core.py
@smarie smarie added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 1, 2022
@smarie
Copy link
Contributor Author

smarie commented Apr 1, 2022

I'll fix this in #46405

@smarie
Copy link
Contributor Author

smarie commented Apr 1, 2022

Cpython doc issue opened: https://bugs.python.org/issue47187

smarie pushed a commit to smarie/pandas that referenced this issue Apr 1, 2022
…on windows targets (even if today these do not run on windows targets, see pandas-dev#46597)
smarie pushed a commit to smarie/pandas that referenced this issue Apr 1, 2022
@andjc
Copy link

andjc commented Apr 14, 2022

Although,

if you use

import locale
import pandas._testing as tm

current_locale = locale.setlocale(locale.LC_ALL, '')

print(current_locale)
print(tm.can_set_locale("it_IT.utf8", locale.LC_ALL)) 
new_current_locale = locale.setlocale(locale.LC_ALL, '')
assert new_current_locale == current_locale  
print(new_current_locale)  

Then it works as expected. The python documentation does suggest using the form locale.setlocale(locale.LC_ALL, '')

locale.setlocale(locale.LC_ALL) would be the same as locale.setlocale(locale.LC_ALL, None), i.e. set the locale to the current value. Which is different to saying locale.setlocale(locale.LC_ALL, ''), i.e. set locale to the default locale.

@smarie
Copy link
Contributor Author

smarie commented Apr 14, 2022

Thanks for the feedback @andjc !
Indeed you are right, this usage works.

Still, the reported bug exists so I'm convinced that it should be fixed to avoid anyone falling into that nasty trap.

@andjc
Copy link

andjc commented Apr 14, 2022

@smarie

The problem is, your code was saying:

  1. set it to the current locale (which at the beginning is your default locale
  2. set it to Italian
  3. set it to the current locale (which should be Italian)

@smarie
Copy link
Contributor Author

smarie commented May 2, 2022

@andjc , I think that you got my initial example wrong.

I use locale.setlocale(locale.LC_ALL) to GET the current locale (not to set it). According to the python docs,

If locale is omitted or None, the current setting for category is returned.

so this is the way to get the current locale for LC_ALL. Indeed getLocale can not be used to get the current locale for LC_ALL (this is explicitly stated in https://docs.python.org/3/library/locale.html#locale.getlocale)

category may be one of the LC_* values except LC_ALL.

Anyway, this example is more straightforward if you prefer:

import locale
import pandas._testing as tm

print(locale.getlocale())
print(tm.can_set_locale("it_IT.utf8", locale.LC_ALL))  # Should be True but is False
print(locale.getlocale())

yields

('fr_FR', 'cp1252')
False
('it_IT', 'UTF-8')

So tm.can_set_locale returns False (first issue) AND changes the locale (second issue). It should just return True without changing anything.

The root of this issue can be found by looking at the code from set_locale:
https://github.com/pandas-dev/pandas/blob/main/pandas/_config/localization.py#L42-L52

The "finally" block does not work and raises locale.Error: unsupported locale setting in the example above. The reason is that current_locale is obtained with current_locale = locale.getlocale(), while in the finally block there is an attempt to set it using locale.setlocale, which does not work when lc_var is LC_ALL with the value returned by locale.getLocale. The fix can be seen here: https://github.com/pandas-dev/pandas/pull/46405/files#diff-52be4ac24f9b9731e939169af395829a61cd2cb2f254b14c5c0e83bdd2296ddcR42

Feel free to provide an example that would show better the issue, if you like !

smarie pushed a commit to smarie/pandas that referenced this issue Jul 1, 2022
smarie pushed a commit to smarie/pandas that referenced this issue Jul 6, 2022
… avoid misuses of `getlocale` (see pandas-dev#46595). Added tests.
@mroeschke mroeschke added Localization Internationalization of data and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 6, 2022
@jreback jreback added this to the 1.5 milestone Jul 8, 2022
mroeschke pushed a commit that referenced this issue Sep 8, 2022
…specific directive is used (#46405)

* Added test representative of #46319. Should fail on CI

* Added a gha worker with non utf 8 zh_CN encoding

* Attempt to fix the encoding so that locale works

* Added the fix, but not using it for now, until CI is able to reproduce the issue.

* Crazy idea: maybe simply removing the .utf8 modifier will use the right encoding !

* Hopefully fixing the locale not available error

* Now simply generating the locale, not updating the ubuntu one

* Trying to install the locale without enabling it

* Stupid mistake

* Testing the optional locale generator condition

* Put back all runners

* Added whatsnew

* Now using the fix

* As per code review: moved locale-switching fixture `overridden_locale` to conftest

* Flake8

* Added comments on the runner

* Added a non-utf8 locale in the `it_IT` runner. Added the zh_CN.utf8 locale in the tests

* Improved readability of fixture `overridden_locale` as per code review

* Added two comments on default encoding

* Fixed #46319 by adding a new `char_to_string_locale` function in the `tslibs.util` module, able to decode char* using the current locale.

* As per code review: modified the test to contain non-utf8 chars. Fixed the resulting issue.

* Split the test in two for clarity

* Fixed test and flake8 error.

* Updated whatsnew to ref #46468 . Updated test name

* Removing wrong whatsnew bullet

* Nitpick on whatsnew as per code review

* Fixed build error rst directive

* Names incorrectly reverted in last merge commit

* Fixed test_localization so that #46595 can be demonstrated on windows targets (even if today these do not run on windows targets, see #46597)

* Fixed `tm.set_locale` context manager, it could error and leak when category LC_ALL was used. Fixed #46595

* Removed the fixture as per code review, and added corresponding parametrization in tests.

* Dummy mod to trigger CI again

* reverted dummy mod

* Attempt to fix the remaining error on the numpy worker

* Fixed issue in `_from_ordinal`

* Added asserts to try to understand

* Reverted debugging asserts and applied fix for numpy repeat from #47670.

* Fixed the last issue on numpy dev: a TypeError message had changed

* Code review: Removed `EXTRA_LOC`

* Code review: removed commented line

* Code review: reverted out of scope change

* Code review: reverted out of scope change

* Fixed unused import

* Fixed revert mistake

* Moved whatsnew to 1.6.0

* Update pandas/tests/io/parser/test_quoting.py

Co-authored-by: Sylvain MARIE <[email protected]>
noatamir pushed a commit to noatamir/pandas that referenced this issue Nov 9, 2022
…specific directive is used (pandas-dev#46405)

* Added test representative of pandas-dev#46319. Should fail on CI

* Added a gha worker with non utf 8 zh_CN encoding

* Attempt to fix the encoding so that locale works

* Added the fix, but not using it for now, until CI is able to reproduce the issue.

* Crazy idea: maybe simply removing the .utf8 modifier will use the right encoding !

* Hopefully fixing the locale not available error

* Now simply generating the locale, not updating the ubuntu one

* Trying to install the locale without enabling it

* Stupid mistake

* Testing the optional locale generator condition

* Put back all runners

* Added whatsnew

* Now using the fix

* As per code review: moved locale-switching fixture `overridden_locale` to conftest

* Flake8

* Added comments on the runner

* Added a non-utf8 locale in the `it_IT` runner. Added the zh_CN.utf8 locale in the tests

* Improved readability of fixture `overridden_locale` as per code review

* Added two comments on default encoding

* Fixed pandas-dev#46319 by adding a new `char_to_string_locale` function in the `tslibs.util` module, able to decode char* using the current locale.

* As per code review: modified the test to contain non-utf8 chars. Fixed the resulting issue.

* Split the test in two for clarity

* Fixed test and flake8 error.

* Updated whatsnew to ref pandas-dev#46468 . Updated test name

* Removing wrong whatsnew bullet

* Nitpick on whatsnew as per code review

* Fixed build error rst directive

* Names incorrectly reverted in last merge commit

* Fixed test_localization so that pandas-dev#46595 can be demonstrated on windows targets (even if today these do not run on windows targets, see pandas-dev#46597)

* Fixed `tm.set_locale` context manager, it could error and leak when category LC_ALL was used. Fixed pandas-dev#46595

* Removed the fixture as per code review, and added corresponding parametrization in tests.

* Dummy mod to trigger CI again

* reverted dummy mod

* Attempt to fix the remaining error on the numpy worker

* Fixed issue in `_from_ordinal`

* Added asserts to try to understand

* Reverted debugging asserts and applied fix for numpy repeat from pandas-dev#47670.

* Fixed the last issue on numpy dev: a TypeError message had changed

* Code review: Removed `EXTRA_LOC`

* Code review: removed commented line

* Code review: reverted out of scope change

* Code review: reverted out of scope change

* Fixed unused import

* Fixed revert mistake

* Moved whatsnew to 1.6.0

* Update pandas/tests/io/parser/test_quoting.py

Co-authored-by: Sylvain MARIE <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Localization Internationalization of data
Projects
None yet
4 participants