Skip to content

Python2 test_datetime_name_accessors failure with some locales #22129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ginggs opened this issue Jul 30, 2018 · 11 comments · Fixed by #22213
Closed

Python2 test_datetime_name_accessors failure with some locales #22129

ginggs opened this issue Jul 30, 2018 · 11 comments · Fixed by #22213
Milestone

Comments

@ginggs
Copy link
Contributor

ginggs commented Jul 30, 2018

Problem description

Python2 TestDatetime64.test_datetime_name_accessors[en_IL] fails with ValueError: unknown locale: en_IL. The same test passes in Python3.

By default, tm.get_locales() should only return locales that can be set without throwing an exception.

Compare the output in Python2:

Python 2.7.15 (default, May  1 2018, 05:55:50) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.util.testing as tm; tm.get_locales()
['af_ZA.UTF-8', 'C', 'en_US.UTF-8', 'de_AT.UTF-8', 'de_BE.UTF-8', 'de_CH.UTF-8', 'de_DE.UTF-8', 'de_IT.utf8', 'de_LI.UTF-8', 'de_LU.UTF-8', 'en_AG.UTF-8', 'en_AG.UTF-8', 'en_AU.UTF-8', 'en_BW.UTF-8', 'en_CA.UTF-8', 'en_DK.UTF-8', 'en_GB.UTF-8', 'en_HK.UTF-8', 'en_IE.UTF-8', 'en_IL', 'en_IL.utf8', 'en_IN.UTF-8', 'en_NG.UTF-8', 'en_NG.UTF-8', 'en_NZ.UTF-8', 'en_PH.UTF-8', 'en_SG.UTF-8', 'en_US.UTF-8', 'en_ZA.UTF-8', 'en_ZM.UTF-8', 'en_ZM.UTF-8', 'fr_CH.UTF-8', 'fr_FR.ISO8859-1', 'fr_FR.ISO8859-1', 'fr_FR.ISO8859-1', 'fr_FR.UTF-8', 'C', '']

with that in Python3:

Python 3.6.6 (default, Jun 27 2018, 14:44:17) 
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.util.testing as tm; tm.get_locales()
['af_ZA.UTF-8', 'C', 'en_US.UTF-8', 'de_AT.UTF-8', 'de_BE.UTF-8', 'de_CH.UTF-8', 'de_DE.UTF-8', 'de_IT.UTF-8', 'de_LI.UTF-8', 'de_LU.UTF-8', 'en_AG.UTF-8', 'en_AG.UTF-8', 'en_AU.UTF-8', 'en_BW.UTF-8', 'en_CA.UTF-8', 'en_DK.UTF-8', 'en_GB.UTF-8', 'en_HK.UTF-8', 'en_IE.UTF-8', 'en_IL.UTF-8', 'en_IL.UTF-8', 'en_IN.UTF-8', 'en_NG.UTF-8', 'en_NG.UTF-8', 'en_NZ.UTF-8', 'en_PH.UTF-8', 'en_SG.UTF-8', 'en_US.UTF-8', 'en_ZA.UTF-8', 'en_ZM.UTF-8', 'en_ZM.UTF-8', 'fr_CH.UTF-8', 'fr_FR.ISO8859-1', 'fr_FR.ISO8859-1', 'fr_FR.ISO8859-1', 'fr_FR.UTF-8', 'C', '']

The Python2 output contains 'en_IL', 'en_IL.utf8' while the Python3 output contains 'en_IL.UTF-8', 'en_IL.UTF-8'.

Expected Output

The same locales should be returned in Python2 and Python3.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.23.3
pytest: 3.3.2
pip: 9.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.7.5
patsy: None
dateutil: 2.6.1
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.2.1
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.2.8
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Aug 1, 2018

May be a very platform-specific issue (I cannot reproduce). Which one matches just running locale -a on your system?

@WillAyd WillAyd added 2/3 Compat Needs Info Clarification about behavior needed to assess issue labels Aug 1, 2018
@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

Well, neither is an exact match. The output of locale -a contains some lines with no suffix, and some with .utf8.

I am able to reproduce in Ubuntu with the locales package installed, and in Debian with the locales-all package installed.

I can post the output of locale -a from the same machine as above in a little while.

@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

Output of locale -a:

af_ZA.utf8
C
C.UTF-8
de_AT.utf8
de_BE.utf8
de_CH.utf8
de_DE.utf8
de_IT.utf8
de_LI.utf8
de_LU.utf8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
fr_CH.utf8
french
fr_FR
fr_FR.iso88591
fr_FR.utf8
POSIX

@WillAyd
Copy link
Member

WillAyd commented Aug 1, 2018

The code in that module by default defers to `locale -a:

raw_locales = check_output(['locale -a'], shell=True)

So not sure what is changing the name on Python3. If you want to debug in that module and can find something to fix then PRs are certainly welcome

@WillAyd WillAyd removed the Needs Info Clarification about behavior needed to assess issue label Aug 1, 2018
@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

Ah, thanks!

So in Python2:

>>> import locale; locale.normalize('en_IL.utf8')
'en_IL.utf8'

...and in Python3:

>>> import locale; locale.normalize('en_IL.utf8')
'en_IL.UTF-8'

From the locale module's documentation:
locale.normalize(localename)

Returns a normalized locale code for the given locale name. The returned locale code is formatted for use with setlocale(). If normalization fails, the original name is returned unchanged.

So this looks like a bug in Python2, but perhaps we should skip the locale if locale.normalize() returns the original locale name.

@WillAyd
Copy link
Member

WillAyd commented Aug 1, 2018

Can you post the actual traceback of the test failure?

@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

=================================== FAILURES ===================================
______________ TestDatetime64.test_datetime_name_accessors[en_IL] ______________

self = <pandas.tests.indexes.datetimes.test_misc.TestDatetime64 object at 0xe3039150>
time_locale = 'en_IL'

    @pytest.mark.parametrize('time_locale', [
        None] if tm.get_locales() is None else [None] + tm.get_locales())
    def test_datetime_name_accessors(self, time_locale):
        # Test Monday -> Sunday and January -> December, in that sequence
        if time_locale is None:
            # If the time_locale is None, day-name and month_name should
            # return the english attributes
            expected_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday',
                             'Friday', 'Saturday', 'Sunday']
            expected_months = ['January', 'February', 'March', 'April', 'May',
                               'June', 'July', 'August', 'September',
                               'October', 'November', 'December']
        else:
            with tm.set_locale(time_locale, locale.LC_TIME):
                expected_days = calendar.day_name[:]
                expected_months = calendar.month_name[1:]
    
        # GH 11128
        dti = DatetimeIndex(freq='D', start=datetime(1998, 1, 1),
                            periods=365)
        english_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday',
                        'Friday', 'Saturday', 'Sunday']
        for day, name, eng_name in zip(range(4, 11),
                                       expected_days,
                                       english_days):
            name = name.capitalize()
            assert dti.weekday_name[day] == eng_name
>           assert dti.day_name(locale=time_locale)[day] == name

/usr/lib/python2.7/dist-packages/pandas/tests/indexes/datetimes/test_misc.py:272: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DatetimeIndex(['1998-01-01', '1998-01-02', '1998-01-03', '1998-01-04',
       ...30', '1998-12-31'],
              dtype='datetime64[ns]', length=365, freq='D')
locale = 'en_IL'

    def day_name(self, locale=None):
        """
            Return the day names of the DateTimeIndex with specified locale.
    
            Parameters
            ----------
            locale : string, default None (English locale)
                locale determining the language in which to return the day name
    
            Returns
            -------
            month_names : Index
                Index of day names
    
            .. versionadded:: 0.23.0
            """
        values = self.asi8
        if self.tz is not None:
            if self.tz is not utc:
                values = self._local_timestamps()
    
        result = fields.get_date_name_field(values, 'day_name',
>                                           locale=locale)

/usr/lib/python2.7/dist-packages/pandas/core/indexes/datetimes.py:2553: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/fields.pyx:107: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:213: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:229: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:230: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/strptime.pyx:396: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/strptime.pyx:354: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

category = 2

    def getlocale(category=LC_CTYPE):
    
        """ Returns the current setting for the given locale category as
            tuple (language code, encoding).
    
            category may be one of the LC_* value except LC_ALL. It
            defaults to LC_CTYPE.
    
            Except for the code 'C', the language code corresponds to RFC
            1766.  code and encoding can be None in case the values cannot
            be determined.
    
        """
        localename = _setlocale(category)
        if category == LC_ALL and ';' in localename:
            raise TypeError, 'category LC_ALL is not supported'
>       return _parse_localename(localename)

/usr/lib/python2.7/locale.py:564: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

localename = 'en_IL'

    def _parse_localename(localename):
    
        """ Parses the locale code for localename and returns the
            result as tuple (language code, encoding).
    
            The localename is normalized and passed through the locale
            alias engine. A ValueError is raised in case the locale name
            cannot be parsed.
    
            The language code corresponds to RFC 1766.  code and encoding
            can be None in case the values cannot be determined or are
            unknown to this implementation.
    
        """
        code = normalize(localename)
        if '@' in code:
            # Deal with locale modifiers
            code, modifier = code.split('@', 1)
            if modifier == 'euro' and '.' not in code:
                # Assume Latin-9 for @euro locales. This is bogus,
                # since some systems may use other encodings for these
                # locales. Also, we ignore other modifiers.
                return code, 'iso-8859-15'
    
        if '.' in code:
            return tuple(code.split('.')[:2])
        elif code == 'C':
            return None, None
>       raise ValueError, 'unknown locale: %s' % localename
E       ValueError: unknown locale: en_IL

/usr/lib/python2.7/locale.py:477: ValueError

@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

_______________ TestTimestampProperties.test_names[en_IL-data1] ________________

self = <pandas.tests.scalar.timestamp.test_timestamp.TestTimestampProperties object at 0xda6b99d0>
data = Timestamp('2017-08-28 23:00:00-0500', tz='EST'), time_locale = 'en_IL'

    @pytest.mark.parametrize('data',
                             [Timestamp('2017-08-28 23:00:00'),
                              Timestamp('2017-08-28 23:00:00', tz='EST')])
    @pytest.mark.parametrize('time_locale', [
        None] if tm.get_locales() is None else [None] + tm.get_locales())
    def test_names(self, data, time_locale):
        # GH 17354
        # Test .weekday_name, .day_name(), .month_name
        with tm.assert_produces_warning(FutureWarning,
                                        check_stacklevel=False):
            assert data.weekday_name == 'Monday'
        if time_locale is None:
            expected_day = 'Monday'
            expected_month = 'August'
        else:
            with tm.set_locale(time_locale, locale.LC_TIME):
                expected_day = calendar.day_name[0].capitalize()
                expected_month = calendar.month_name[8].capitalize()
    
>       assert data.day_name(time_locale) == expected_day

/usr/lib/python2.7/dist-packages/pandas/tests/scalar/timestamp/test_timestamp.py:119: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/timestamps.pyx:760: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/timestamps.pyx:364: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/timestamps.pyx:370: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/fields.pyx:107: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:213: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:229: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:230: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/strptime.pyx:396: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/strptime.pyx:354: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

category = 2

    def getlocale(category=LC_CTYPE):
    
        """ Returns the current setting for the given locale category as
            tuple (language code, encoding).
    
            category may be one of the LC_* value except LC_ALL. It
            defaults to LC_CTYPE.
    
            Except for the code 'C', the language code corresponds to RFC
            1766.  code and encoding can be None in case the values cannot
            be determined.
    
        """
        localename = _setlocale(category)
        if category == LC_ALL and ';' in localename:
            raise TypeError, 'category LC_ALL is not supported'
>       return _parse_localename(localename)

/usr/lib/python2.7/locale.py:564: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

localename = 'en_IL'

    def _parse_localename(localename):
    
        """ Parses the locale code for localename and returns the
            result as tuple (language code, encoding).
    
            The localename is normalized and passed through the locale
            alias engine. A ValueError is raised in case the locale name
            cannot be parsed.
    
            The language code corresponds to RFC 1766.  code and encoding
            can be None in case the values cannot be determined or are
            unknown to this implementation.
    
        """
        code = normalize(localename)
        if '@' in code:
            # Deal with locale modifiers
            code, modifier = code.split('@', 1)
            if modifier == 'euro' and '.' not in code:
                # Assume Latin-9 for @euro locales. This is bogus,
                # since some systems may use other encodings for these
                # locales. Also, we ignore other modifiers.
                return code, 'iso-8859-15'
    
        if '.' in code:
            return tuple(code.split('.')[:2])
        elif code == 'C':
            return None, None
>       raise ValueError, 'unknown locale: %s' % localename
E       ValueError: unknown locale: en_IL

/usr/lib/python2.7/locale.py:477: ValueError

@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

___ TestSeriesDatetimeValues.test_dt_accessor_datetime_name_accessors[en_IL] ___

self = <pandas.tests.series.test_datetime_values.TestSeriesDatetimeValues object at 0xda334b70>
time_locale = 'en_IL'

    @pytest.mark.parametrize('time_locale', [
        None] if tm.get_locales() is None else [None] + tm.get_locales())
    def test_dt_accessor_datetime_name_accessors(self, time_locale):
        # Test Monday -> Sunday and January -> December, in that sequence
        if time_locale is None:
            # If the time_locale is None, day-name and month_name should
            # return the english attributes
            expected_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday',
                             'Friday', 'Saturday', 'Sunday']
            expected_months = ['January', 'February', 'March', 'April', 'May',
                               'June', 'July', 'August', 'September',
                               'October', 'November', 'December']
        else:
            with tm.set_locale(time_locale, locale.LC_TIME):
                expected_days = calendar.day_name[:]
                expected_months = calendar.month_name[1:]
    
        s = Series(DatetimeIndex(freq='D', start=datetime(1998, 1, 1),
                                 periods=365))
        english_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday',
                        'Friday', 'Saturday', 'Sunday']
        for day, name, eng_name in zip(range(4, 11),
                                       expected_days,
                                       english_days):
            name = name.capitalize()
            assert s.dt.weekday_name[day] == eng_name
>           assert s.dt.day_name(locale=time_locale)[day] == name

/usr/lib/python2.7/dist-packages/pandas/tests/series/test_datetime_values.py:305: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.core.indexes.accessors.DatetimeProperties object at 0xda158490>
args = (), kwargs = {'locale': 'en_IL'}

    def f(self, *args, **kwargs):
>       return self._delegate_method(name, *args, **kwargs)

/usr/lib/python2.7/dist-packages/pandas/core/accessor.py:89: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.core.indexes.accessors.DatetimeProperties object at 0xda158490>
name = 'day_name', args = (), kwargs = {'locale': 'en_IL'}
Series = <class 'pandas.core.series.Series'>
values = DatetimeIndex(['1998-01-01', '1998-01-02', '1998-01-03', '1998-01-04',
       ...0', '1998-12-31'],
              dtype='datetime64[ns]', length=365, freq=None)
method = <bound method DatetimeIndex.day_name of DatetimeIndex(['1998-01-01', '1998-01-...', '1998-12-31'],
              dtype='datetime64[ns]', length=365, freq=None)>

    def _delegate_method(self, name, *args, **kwargs):
        from pandas import Series
        values = self._get_values()
    
        method = getattr(values, name)
>       result = method(*args, **kwargs)

/usr/lib/python2.7/dist-packages/pandas/core/indexes/accessors.py:99: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DatetimeIndex(['1998-01-01', '1998-01-02', '1998-01-03', '1998-01-04',
       ...0', '1998-12-31'],
              dtype='datetime64[ns]', length=365, freq=None)
locale = 'en_IL'

    def day_name(self, locale=None):
        """
            Return the day names of the DateTimeIndex with specified locale.
    
            Parameters
            ----------
            locale : string, default None (English locale)
                locale determining the language in which to return the day name
    
            Returns
            -------
            month_names : Index
                Index of day names
    
            .. versionadded:: 0.23.0
            """
        values = self.asi8
        if self.tz is not None:
            if self.tz is not utc:
                values = self._local_timestamps()
    
        result = fields.get_date_name_field(values, 'day_name',
>                                           locale=locale)

/usr/lib/python2.7/dist-packages/pandas/core/indexes/datetimes.py:2553: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/fields.pyx:107: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:213: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:229: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/ccalendar.pyx:230: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/strptime.pyx:396: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

pandas/_libs/tslibs/strptime.pyx:354: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

category = 2

    def getlocale(category=LC_CTYPE):
    
        """ Returns the current setting for the given locale category as
            tuple (language code, encoding).
    
            category may be one of the LC_* value except LC_ALL. It
            defaults to LC_CTYPE.
    
            Except for the code 'C', the language code corresponds to RFC
            1766.  code and encoding can be None in case the values cannot
            be determined.
    
        """
        localename = _setlocale(category)
        if category == LC_ALL and ';' in localename:
            raise TypeError, 'category LC_ALL is not supported'
>       return _parse_localename(localename)

/usr/lib/python2.7/locale.py:564: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

localename = 'en_IL'

    def _parse_localename(localename):
    
        """ Parses the locale code for localename and returns the
            result as tuple (language code, encoding).
    
            The localename is normalized and passed through the locale
            alias engine. A ValueError is raised in case the locale name
            cannot be parsed.
    
            The language code corresponds to RFC 1766.  code and encoding
            can be None in case the values cannot be determined or are
            unknown to this implementation.
    
        """
        code = normalize(localename)
        if '@' in code:
            # Deal with locale modifiers
            code, modifier = code.split('@', 1)
            if modifier == 'euro' and '.' not in code:
                # Assume Latin-9 for @euro locales. This is bogus,
                # since some systems may use other encodings for these
                # locales. Also, we ignore other modifiers.
                return code, 'iso-8859-15'
    
        if '.' in code:
            return tuple(code.split('.')[:2])
        elif code == 'C':
            return None, None
>       raise ValueError, 'unknown locale: %s' % localename
E       ValueError: unknown locale: en_IL

/usr/lib/python2.7/locale.py:477: ValueError

@ginggs
Copy link
Contributor Author

ginggs commented Aug 1, 2018

There are actually three different tests that fail.

Full test logs are here:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-cosmic/cosmic/armhf/p/pandas/20180730_074249_58050@/log.gz
The Python2 tests are run first, then Python3.

@ginggs
Copy link
Contributor Author

ginggs commented Aug 2, 2018

It seems en_IL is only available in recent versions of Ubuntu. I can reproduce this behaviour with ak_GH, which is available after installing the locales-all package.

Python2:

Python 2.7.15 (default, May  1 2018, 05:55:50)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.util.testing as tm
>>> tm._can_set_locale('ak_GH')
True
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'ak_GH')
'ak_GH'
>>> locale.getlocale()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/locale.py", line 564, in getlocale
    return _parse_localename(localename)
  File "/usr/lib/python2.7/locale.py", line 477, in _parse_localename
    raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: ak_GH

Python3:

Python 3.6.6 (default, Jun 27 2018, 14:44:17)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.util.testing as tm
>>> tm._can_set_locale('ak_GH')
True
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'ak_GH')
'ak_GH'
>>> locale.getlocale()
('ak_GH', 'UTF-8')

I was also able to reproduce this in Ubuntu 16.04 LTS, with pandas 0.17.1 and Python2 2.7.12.

@jreback jreback added this to the 0.24.0 milestone Aug 9, 2018
TomAugspurger pushed a commit that referenced this issue Aug 13, 2018
* Fix Python2 test failures in certain locales

Check that we can also get the locale, after setting it, without raising an Exception.
Closes: #22129
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018
* Fix Python2 test failures in certain locales

Check that we can also get the locale, after setting it, without raising an Exception.
Closes: pandas-dev#22129
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants