Skip to content

Pandas Test Scripts always fail tried 0.24.0rc1, 0.23.4, 0.23.3 and 0.23.2 #24760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Garet-Jax opened this issue Jan 14, 2019 · 15 comments · Fixed by #27368
Closed

Pandas Test Scripts always fail tried 0.24.0rc1, 0.23.4, 0.23.3 and 0.23.2 #24760

Garet-Jax opened this issue Jan 14, 2019 · 15 comments · Fixed by #27368

Comments

@Garet-Jax
Copy link

Garet-Jax commented Jan 14, 2019

I am trying to install python and pandas on an AWS Linux EC2 instance. I have tried many versions of pandas and many ways to install it, but they all seem to have errors when I try to run the test() function.

I create a brand new EC2 instance and run:

  1. sudo yum update
  2. sudo yum install python36-devel python36-pip gcc gcc-c++
  3. virtualenv ~/env -p python36 && source ~/env/bin/activate

From there, I have tried to install pandas using:
a) python3 -m pip install pandas==0.24.0rc1
b) python3 -m pip install --upgrade pandas==0.24.0rc1
c) pip install pandas==0.24.0rc1

I also install pytest.
If I use 0.24.0rc1, then I also install hypothesis.

The errors I get are:

> ==================================== ERRORS ====================================
>  **ERROR** collecting env/lib64/python3.6/site-packages/pandas/tests/indexes/datetimes/test_misc.py
> env/local/lib/python3.6/site-packages/py/_path/local.py:668: in pyimport
>     __import__(modname)
> <frozen importlib._bootstrap>:971: in _find_and_load
>     ???
> <frozen importlib._bootstrap>:955: in _find_and_load_unlocked
>     ???
> <frozen importlib._bootstrap>:656: in _load_unlocked
>     ???
> <frozen importlib._bootstrap>:626: in _load_backward_compatible
>     ???
> env/local/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:308: in load_module
>     six.exec_(co, mod.__dict__)
> env/local/lib64/python3.6/site-packages/pandas/tests/indexes/datetimes/test_misc.py:91: in <module>
>     class TestDatetime64(object):
> env/local/lib64/python3.6/site-packages/pandas/tests/indexes/datetimes/test_misc.py:246: in TestDatetime64
>     None] if tm.get_locales() is None else [None] + tm.get_locales())
> env/local/lib64/python3.6/site-packages/pandas/util/testing.py:516: in get_locales
>     x, encoding=pd.options.display.encoding))
> E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
>  **ERROR** collecting env/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py
> env/local/lib/python3.6/site-packages/py/_path/local.py:668: in pyimport
>     __import__(modname)
> <frozen importlib._bootstrap>:971: in _find_and_load
>     ???
> <frozen importlib._bootstrap>:955: in _find_and_load_unlocked
>     ???
> <frozen importlib._bootstrap>:656: in _load_unlocked
>     ???
> <frozen importlib._bootstrap>:626: in _load_backward_compatible
>     ???
> env/local/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:308: in load_module
>     six.exec_(co, mod.__dict__)
> env/local/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py:28: in <module>
>     class TestTimestampProperties(object):
> env/local/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py:104: in TestTimestampProperties
>     None] if tm.get_locales() is None else [None] + tm.get_locales())
> env/local/lib64/python3.6/site-packages/pandas/util/testing.py:516: in get_locales
>     x, encoding=pd.options.display.encoding))
> E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
>  **ERROR** collecting env/lib64/python3.6/site-packages/pandas/tests/series/test_datetime_values.py
> env/local/lib/python3.6/site-packages/py/_path/local.py:668: in pyimport
>     __import__(modname)
> <frozen importlib._bootstrap>:971: in _find_and_load
>     ???
> <frozen importlib._bootstrap>:955: in _find_and_load_unlocked
>     ???
> <frozen importlib._bootstrap>:656: in _load_unlocked
>     ???
> <frozen importlib._bootstrap>:626: in _load_backward_compatible
>     ???
> env/local/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:308: in load_module
>     six.exec_(co, mod.__dict__)
> env/local/lib64/python3.6/site-packages/pandas/tests/series/test_datetime_values.py:27: in <module>
>     class TestSeriesDatetimeValues():
> env/local/lib64/python3.6/site-packages/pandas/tests/series/test_datetime_values.py:322: in TestSeriesDatetimeValues
>     None] if tm.get_locales() is None else [None] + tm.get_locales())
> env/local/lib64/python3.6/site-packages/pandas/util/testing.py:516: in get_locales
>     x, encoding=pd.options.display.encoding))
> E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
>  **ERROR** collecting env/lib64/python3.6/site-packages/pandas/tests/util/test_locale.py
> env/local/lib/python3.6/site-packages/py/_path/local.py:668: in pyimport
>     __import__(modname)
> <frozen importlib._bootstrap>:971: in _find_and_load
>     ???
> <frozen importlib._bootstrap>:955: in _find_and_load_unlocked
>     ???
> <frozen importlib._bootstrap>:656: in _load_unlocked
>     ???
> <frozen importlib._bootstrap>:626: in _load_backward_compatible
>     ???
> env/local/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:308: in load_module
>     six.exec_(co, mod.__dict__)
> env/local/lib64/python3.6/site-packages/pandas/tests/util/test_locale.py:13: in <module>
>     _all_locales = tm.get_locales() or []
> env/local/lib64/python3.6/site-packages/pandas/util/testing.py:516: in get_locales
>     x, encoding=pd.options.display.encoding))
> E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
> =============================== warnings summary ===============================
> env/local/lib/python3.6/site-packages/_pytest/config/__init__.py:730
>   /home/ec2-user/env/local/lib/python3.6/site-packages/_pytest/config/__init__.py:730: PytestWarning: Module already imported so cannot be rewritten: hypothesis
>     self._mark_plugins_for_rewrite(hook)
> 
> -- Docs: https://docs.pytest.org/en/latest/warnings.html
> !!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!
> =============== 7 skipped, 1 warnings, 4 error in 46.76 seconds ================
> 

The output of show_versions() is:

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7.final.0 python-bits: 64 OS: Linux OS-release: 4.14.77-70.59.amzn1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.0rc1
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@gfyoung
Copy link
Member

gfyoung commented Jan 14, 2019

@Garet-Jax : Thanks for reporting this! A little unusual to see pytest go down in 🔥 like this, but before we go into that, let's start with this:

Can you execute import pandas in a Python console on your instance?

@Garet-Jax
Copy link
Author

Garet-Jax commented Jan 14, 2019

Can you execute import pandas in a Python console on your instance?

Yes. Inside of the virtual environment, I start python using one of:

python
python3
python36

From there I can run both without issue:

import pandas
import pandas as pd

Thanks.

@TomAugspurger
Copy link
Contributor

Can you run from pandas.tests.scalar.timestamp.test_timestamp import *. That seems to be the file pytest is choking on.

@TomAugspurger
Copy link
Contributor

And can you also try

import pandas.util.testing as tm
tm.get_locales()

@Garet-Jax
Copy link
Author

Can you run from pandas.tests.scalar.timestamp.test_timestamp import *. That seems to be the file pytest is choking on.

(env) [ec2-user@ip-[IP_ADDRESS]~]$ python
Python 3.6.7 (default, Dec 21 2018, 20:31:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.

from pandas.tests.scalar.timestamp.test_timestamp import *
Traceback (most recent call last):
File "", line 1, in
File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py", line 28, in
class TestTimestampProperties(object):
File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py", line 104, in TestTimestampProperties
None] if tm.get_locales() is None else [None] + tm.get_locales())
File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/util/testing.py", line 516, in get_locales
x, encoding=pd.options.display.encoding))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte

@Garet-Jax
Copy link
Author

Garet-Jax commented Jan 14, 2019

And can you also try

import pandas.util.testing as tm
tm.get_locales()

(env) [ec2-user@[IP_ADDRESS]~]$ python
Python 3.6.7 (default, Dec 21 2018, 20:31:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pandas.util.testing as tm
tm.get_locales()
Traceback (most recent call last):
File "", line 1, in
File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/util/testing.py", line 516, in get_locales
x, encoding=pd.options.display.encoding))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte

Thanks a bunch for your help by the way.

@Garet-Jax
Copy link
Author

Garet-Jax commented Jan 14, 2019

I have been trying many different combinations and have found one that works.

I have tried a few different AWS AMIs:

Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-01e3b8c3a51e88954 (64-bit x86)
Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type - ami-0080e4c5bc078760e
amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 - ami-0eee720fcca8e4499
Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-0ac019f4fcb7cb7e6 (64-bit x86)

The first 2 are the standard quick start AMI for launching an EC2 instance. One is Linux 2 and the other is Linux 1.

The 3rd one is really the one I want to use because it is the closest to the Linux used by the AWS Lambda infrastructure. I am trying to parlay this work into a Lambda function and want as much synergy as possible so there are no errors on the back end.

The 4th one is the standard quick start for Ubuntu version 18.04

I have also tried a few different AWS instance sizings:

t2.micro
t2.small
t2.medium

I used amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 - ami-0eee720fcca8e4499 with a t2.micro instance and there were no errors on the test. However the test hung the instance and so never completed. I tried the same AMI with larger instances, but they got the errors.

I tried an Ubuntu server this morning with a t2.medium instance. There were a couple of minor differences, but the pandas.test() worked. This seems to be a systemic problem brought on by Linux version and hardware differences.

I am really confused.

@TomAugspurger
Copy link
Contributor

I'm not sure either. IIRC we had another user reporting issues with tm.get_locale(), but I don't recall the specifics.

@WillAyd
Copy link
Member

WillAyd commented Jan 15, 2019

Interestingly enough there is a comment above the offending line about some locales not being decodable, though there isn't a catch for those either:

# raw_locales is "\n" separated list of locales

What is the output of locale -a from the shell on the AMI? From what I can see documented in the FAQ the default encoding is UTF-8 so very surprised that is the codec that is failing here

@Garet-Jax
Copy link
Author

Garet-Jax commented Jan 15, 2019

What is the output of locale -a from the shell on the AMI?

We may have found the issue. On the Ubuntu machine (this one works):

(env) ubuntu@[IP_ADDRESS]:~$ locale -a
C
C.UTF-8
POSIX
en_US.utf8

On the AMI that doesn't work (basically every locale that exists):

(env) [ec2-user@[IP_ADDRESS] ~]$ locale -a aa_DJ aa_DJ.iso88591 aa_DJ.utf8 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596 ar_EG.utf8 ar_IN ar_IN.utf8 ar_IQ ar_IQ.iso88596 ar_IQ.utf8 ar_JO ar_JO.iso88596 ar_JO.utf8 ar_KW ar_KW.iso88596 ar_KW.utf8 ar_LB ar_LB.iso88596 ar_LB.utf8 ar_LY ar_LY.iso88596 ar_LY.utf8 ar_MA ar_MA.iso88596 ar_MA.utf8 ar_OM ar_OM.iso88596 ar_OM.utf8 ar_QA ar_QA.iso88596 ar_QA.utf8 ar_SA ar_SA.iso88596 ar_SA.utf8 ar_SD ar_SD.iso88596 ar_SD.utf8 ar_SY ar_SY.iso88596 ar_SY.utf8 ar_TN ar_TN.iso88596 ar_TN.utf8 ar_YE ar_YE.iso88596 ar_YE.utf8 as_IN as_IN.utf8 ast_ES ast_ES.iso885915 ast_ES.utf8 ayc_PE ayc_PE.utf8 az_AZ az_AZ.utf8 be_BY be_BY.cp1251 be_BY@latin be_BY.utf8 be_BY.utf8@latin bem_ZM bem_ZM.utf8 ber_DZ ber_DZ.utf8 ber_MA ber_MA.utf8 bg_BG bg_BG.cp1251 bg_BG.utf8 bho_IN bho_IN.utf8 bn_BD bn_BD.utf8 bn_IN bn_IN.utf8 bo_CN bo_CN.utf8 bo_IN bo_IN.utf8 bokmal bokm▒l br_FR br_FR@euro br_FR.iso88591 br_FR.iso885915@euro br_FR.utf8 brx_IN brx_IN.utf8 bs_BA bs_BA.iso88592 bs_BA.utf8 byn_ER byn_ER.utf8 C ca_AD ca_AD.iso885915 ca_AD.utf8 ca_ES ca_ES@euro ca_ES.iso88591 ca_ES.iso885915@euro ca_ES.utf8 ca_FR ca_FR.iso885915 ca_FR.utf8 ca_IT ca_IT.iso885915 ca_IT.utf8 catalan crh_UA crh_UA.utf8 croatian csb_PL csb_PL.utf8 cs_CZ cs_CZ.iso88592 cs_CZ.utf8 cv_RU cv_RU.utf8 cy_GB cy_GB.iso885914 cy_GB.utf8 czech da_DK da_DK.iso88591 da_DK.iso885915 da_DK.utf8 danish dansk de_AT de_AT@euro de_AT.iso88591 de_AT.iso885915@euro de_AT.utf8 de_BE de_BE@euro de_BE.iso88591 de_BE.iso885915@euro de_BE.utf8 de_CH de_CH.iso88591 de_CH.utf8 de_DE de_DE@euro de_DE.iso88591 de_DE.iso885915@euro de_DE.utf8 de_LU de_LU@euro de_LU.iso88591 de_LU.iso885915@euro de_LU.utf8 deutsch doi_IN doi_IN.utf8 dutch dv_MV dv_MV.utf8 dz_BT dz_BT.utf8 eesti el_CY el_CY.iso88597 el_CY.utf8 el_GR el_GR@euro el_GR.iso88597 el_GR.iso88597@euro el_GR.utf8 en_AG en_AG.utf8 en_AU en_AU.iso88591 en_AU.utf8 en_BW en_BW.iso88591 en_BW.utf8 en_CA en_CA.iso88591 en_CA.utf8 en_DK en_DK.iso88591 en_DK.utf8 en_GB en_GB.iso88591 en_GB.iso885915 en_GB.utf8 en_HK en_HK.iso88591 en_HK.utf8 en_IE en_IE@euro en_IE.iso88591 en_IE.iso885915@euro en_IE.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ en_NZ.iso88591 en_NZ.utf8 en_PH en_PH.iso88591 en_PH.utf8 en_SG en_SG.iso88591 en_SG.utf8 en_US en_US.iso88591 en_US.iso885915 en_US.utf8 en_ZA en_ZA.iso88591 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW en_ZW.iso88591 en_ZW.utf8 es_AR es_AR.iso88591 es_AR.utf8 es_BO es_BO.iso88591 es_BO.utf8 es_CL es_CL.iso88591 es_CL.utf8 es_CO es_CO.iso88591 es_CO.utf8 es_CR es_CR.iso88591 es_CR.utf8 es_CU es_CU.utf8 es_DO es_DO.iso88591 es_DO.utf8 es_EC es_EC.iso88591 es_EC.utf8 es_ES es_ES@euro es_ES.iso88591 es_ES.iso885915@euro es_ES.utf8 es_GT es_GT.iso88591 es_GT.utf8 es_HN es_HN.iso88591 es_HN.utf8 es_MX es_MX.iso88591 es_MX.utf8 es_NI es_NI.iso88591 es_NI.utf8 es_PA es_PA.iso88591 es_PA.utf8 es_PE es_PE.iso88591 es_PE.utf8 es_PR es_PR.iso88591 es_PR.utf8 es_PY es_PY.iso88591 es_PY.utf8 es_SV es_SV.iso88591 es_SV.utf8 estonian es_US es_US.iso88591 es_US.utf8 es_UY es_UY.iso88591 es_UY.utf8 es_VE es_VE.iso88591 es_VE.utf8 et_EE et_EE.iso88591 et_EE.iso885915 et_EE.utf8 eu_ES eu_ES@euro eu_ES.iso88591 eu_ES.iso885915@euro eu_ES.utf8 fa_IR fa_IR.utf8 ff_SN ff_SN.utf8 fi_FI fi_FI@euro fi_FI.iso88591 fi_FI.iso885915@euro fi_FI.utf8 fil_PH fil_PH.utf8 finnish fo_FO fo_FO.iso88591 fo_FO.utf8 fran▒ais fr_BE fr_BE@euro fr_BE.iso88591 fr_BE.iso885915@euro fr_BE.utf8 fr_CA fr_CA.iso88591 fr_CA.utf8 fr_CH fr_CH.iso88591 fr_CH.utf8 french fr_FR fr_FR@euro fr_FR.iso88591 fr_FR.iso885915@euro fr_FR.utf8 fr_LU fr_LU@euro fr_LU.iso88591 fr_LU.iso885915@euro fr_LU.utf8 fur_IT fur_IT.utf8 fy_DE fy_DE.utf8 fy_NL fy_NL.utf8 ga_IE ga_IE@euro ga_IE.iso88591 ga_IE.iso885915@euro ga_IE.utf8 galego galician gd_GB gd_GB.iso885915 gd_GB.utf8 german gez_ER gez_ER@abegede gez_ER.utf8 gez_ER.utf8@abegede gez_ET gez_ET@abegede gez_ET.utf8 gez_ET.utf8@abegede gl_ES gl_ES@euro gl_ES.iso88591 gl_ES.iso885915@euro gl_ES.utf8 greek gu_IN gu_IN.utf8 gv_GB gv_GB.iso88591 gv_GB.utf8 ha_NG ha_NG.utf8 hebrew he_IL he_IL.iso88598 he_IL.utf8 hi_IN hi_IN.utf8 hne_IN hne_IN.utf8 hr_HR hr_HR.iso88592 hr_HR.utf8 hrvatski hsb_DE hsb_DE.iso88592 hsb_DE.utf8 ht_HT ht_HT.utf8 hu_HU hu_HU.iso88592 hu_HU.utf8 hungarian hy_AM hy_AM.armscii8 hy_AM.utf8 ia_FR ia_FR.utf8 icelandic id_ID id_ID.iso88591 id_ID.utf8 ig_NG ig_NG.utf8 ik_CA ik_CA.utf8 is_IS is_IS.iso88591 is_IS.utf8 italian it_CH it_CH.iso88591 it_CH.utf8 it_IT it_IT@euro it_IT.iso88591 it_IT.iso885915@euro it_IT.utf8 iu_CA iu_CA.utf8 iw_IL iw_IL.iso88598 iw_IL.utf8 ja_JP ja_JP.eucjp ja_JP.ujis ja_JP.utf8 japanese japanese.euc ka_GE ka_GE.georgianps ka_GE.utf8 kk_KZ kk_KZ.pt154 kk_KZ.utf8 kl_GL kl_GL.iso88591 kl_GL.utf8 km_KH km_KH.utf8 kn_IN kn_IN.utf8 kok_IN kok_IN.utf8 ko_KR ko_KR.euckr ko_KR.utf8 korean korean.euc ks_IN ks_IN@devanagari ks_IN.utf8 ks_IN.utf8@devanagari ku_TR ku_TR.iso88599 ku_TR.utf8 kw_GB kw_GB.iso88591 kw_GB.utf8 ky_KG ky_KG.utf8 lb_LU lb_LU.utf8 lg_UG lg_UG.iso885910 lg_UG.utf8 li_BE li_BE.utf8 lij_IT lij_IT.utf8 li_NL li_NL.utf8 lithuanian lo_LA lo_LA.utf8 lt_LT lt_LT.iso885913 lt_LT.utf8 lv_LV lv_LV.iso885913 lv_LV.utf8 mag_IN mag_IN.utf8 mai_IN mai_IN.utf8 mg_MG mg_MG.iso885915 mg_MG.utf8 mhr_RU mhr_RU.utf8 mi_NZ mi_NZ.iso885913 mi_NZ.utf8 mk_MK mk_MK.iso88595 mk_MK.utf8 ml_IN ml_IN.utf8 mni_IN mni_IN.utf8 mn_MN mn_MN.utf8 mr_IN mr_IN.utf8 ms_MY ms_MY.iso88591 ms_MY.utf8 mt_MT mt_MT.iso88593 mt_MT.utf8 my_MM my_MM.utf8 nan_TW@latin nan_TW.utf8@latin nb_NO nb_NO.iso88591 nb_NO.utf8 nds_DE nds_DE.utf8 nds_NL nds_NL.utf8 ne_NP ne_NP.utf8 nhn_MX nhn_MX.utf8 niu_NU niu_NU.utf8 niu_NZ niu_NZ.utf8 nl_AW nl_AW.utf8 nl_BE nl_BE@euro nl_BE.iso88591 nl_BE.iso885915@euro nl_BE.utf8 nl_NL nl_NL@euro nl_NL.iso88591 nl_NL.iso885915@euro nl_NL.utf8 nn_NO nn_NO.iso88591 nn_NO.utf8 no_NO no_NO.ISO-8859-1 norwegian nr_ZA nr_ZA.utf8 nso_ZA nso_ZA.utf8 nynorsk oc_FR oc_FR.iso88591 oc_FR.utf8 om_ET om_ET.utf8 om_KE om_KE.iso88591 om_KE.utf8 or_IN or_IN.utf8 os_RU os_RU.utf8 pa_IN pa_IN.utf8 pap_AN pap_AN.utf8 pa_PK pa_PK.utf8 pl_PL pl_PL.iso88592 pl_PL.utf8 polish portuguese POSIX ps_AF ps_AF.utf8 pt_BR pt_BR.iso88591 pt_BR.utf8 pt_PT pt_PT@euro pt_PT.iso88591 pt_PT.iso885915@euro pt_PT.utf8 romanian ro_RO ro_RO.iso88592 ro_RO.utf8 ru_RU ru_RU.iso88595 ru_RU.koi8r ru_RU.utf8 russian ru_UA ru_UA.koi8u ru_UA.utf8 rw_RW rw_RW.utf8 sa_IN sa_IN.utf8 sat_IN sat_IN.utf8 sc_IT sc_IT.utf8 sd_IN sd_IN@devanagari sd_IN.utf8 sd_IN.utf8@devanagari se_NO se_NO.utf8 shs_CA shs_CA.utf8 sid_ET sid_ET.utf8 si_LK si_LK.utf8 sk_SK sk_SK.iso88592 sk_SK.utf8 slovak slovene slovenian sl_SI sl_SI.iso88592 sl_SI.utf8 so_DJ so_DJ.iso88591 so_DJ.utf8 so_ET so_ET.utf8 so_KE so_KE.iso88591 so_KE.utf8 so_SO so_SO.iso88591 so_SO.utf8 spanish sq_AL sq_AL.iso88591 sq_AL.utf8 sq_MK sq_MK.utf8 sr_ME sr_ME.utf8 sr_RS sr_RS@latin sr_RS.utf8 sr_RS.utf8@latin ss_ZA ss_ZA.utf8 st_ZA st_ZA.iso88591 st_ZA.utf8 sv_FI sv_FI@euro sv_FI.iso88591 sv_FI.iso885915@euro sv_FI.utf8 sv_SE sv_SE.iso88591 sv_SE.iso885915 sv_SE.utf8 swedish sw_KE sw_KE.utf8 sw_TZ sw_TZ.utf8 szl_PL szl_PL.utf8 ta_IN ta_IN.utf8 ta_LK ta_LK.utf8 te_IN te_IN.utf8 tg_TJ tg_TJ.koi8t tg_TJ.utf8 thai th_TH th_TH.tis620 th_TH.utf8 ti_ER ti_ER.utf8 ti_ET ti_ET.utf8 tig_ER tig_ER.utf8 tk_TM tk_TM.utf8 tl_PH tl_PH.iso88591 tl_PH.utf8 tn_ZA tn_ZA.utf8 tr_CY tr_CY.iso88599 tr_CY.utf8 tr_TR tr_TR.iso88599 tr_TR.utf8 ts_ZA ts_ZA.utf8 tt_RU tt_RU@iqtelif tt_RU.utf8 tt_RU.utf8@iqtelif turkish ug_CN ug_CN.utf8 uk_UA uk_UA.koi8u uk_UA.utf8 unm_US unm_US.utf8 ur_IN ur_IN.utf8 ur_PK ur_PK.utf8 uz_UZ uz_UZ@cyrillic uz_UZ.iso88591 uz_UZ.utf8@cyrillic ve_ZA ve_ZA.utf8 vi_VN vi_VN.utf8 wa_BE wa_BE@euro wa_BE.iso88591 wa_BE.iso885915@euro wa_BE.utf8 wae_CH wae_CH.utf8 wal_ET wal_ET.utf8 wo_SN wo_SN.utf8 xh_ZA xh_ZA.iso88591 xh_ZA.utf8 yi_US yi_US.cp1255 yi_US.utf8 yo_NG yo_NG.utf8 yue_HK yue_HK.utf8 zh_CN zh_CN.gb18030 zh_CN.gb2312 zh_CN.gbk zh_CN.utf8 zh_HK zh_HK.big5hkscs zh_HK.utf8 zh_SG zh_SG.gb2312 zh_SG.gbk zh_SG.utf8 zh_TW zh_TW.big5 zh_TW.euctw zh_TW.utf8 zu_ZA zu_ZA.iso88591 zu_ZA.utf8

@WillAyd
Copy link
Member

WillAyd commented Jan 15, 2019

From what you've pasted I get the impression that the bokmål encoding is the one to blame here. Interesting as I doubt that even renders in the AMI shell correctly given the bytes shown in the traceback map to that accented character in cp1252 (maybe some others) and not unicode:

>>> b'bokm\xe5l'.decode('utf8')  # This simulates the error you get
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte

>>> b'bokm\xe5l'.decode('cp1252')
'bokmål'

@Garet-Jax
Copy link
Author

OK - here is what I did. I followed the following post to effectively remove all locales except for en_us.UTF8.

https://talk.maemo.org/showthread.php?t=94625

My exact commands were:

sudo localedef --list-archive | grep -E -v "en_US.utf8" | xargs sudo localedef --delete-from-archive
sudo mv /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl
sudo build-locale-archive

I then exited my SSH session and started a new one. I re-ran locale -a and got:

(env) [ec2-user@[IP-ADDRESS]~]$ locale -a
C
en_US.utf8
POSIX

I then started my virtual environment, ran a python session, imported pandas and ran pandas.test().

My test ran.

Yay!

Thanks a ton for everyone who took the time to provide a second set of eyes and suggestions. Couldn't have done this without you.

@WillAyd
Copy link
Member

WillAyd commented Jan 15, 2019

@Garet-Jax glad it worked. Something seems strange with the locales there but most likely an issue outside of pandas. Closing as I don't think there's anything to do here on our end

@WillAyd WillAyd closed this as completed Jan 15, 2019
@yojineko
Copy link

Thank you! I had same problem with CentOS7.6.1810 + anaconda3-2018.12.
I tried
sudo localedef --list-archive | grep -E -v "en_US.utf8" | xargs sudo localedef --delete-from-archive
sudo mv /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl
sudo build-locale-archive
after that it resolved.
Thanks a lot!

@Garet-Jax
Copy link
Author

Garet-Jax commented Mar 13, 2019 via email

timcera added a commit to timcera/pandas that referenced this issue Jul 12, 2019
pandas-dev#24760
pandas-dev#23638
There are some special characters encoded with 'window-1252' in lists
created by 'locale -a'.  The two know locales with this problem are
Norwegian 'bokmål', and 'français'.
timcera added a commit to timcera/pandas that referenced this issue Jul 16, 2019
pandas-dev#24760
pandas-dev#23638
There are some special characters encoded with 'window-1252' in lists
created by 'locale -a'.  The two know locales with this problem are
Norwegian 'bokmål', and 'français'.
timcera added a commit to timcera/pandas that referenced this issue Sep 19, 2019
pandas-dev#24760
pandas-dev#23638
There are some special characters encoded with 'window-1252' in lists
created by 'locale -a'.  The two know locales with this problem are
Norwegian 'bokmål', and 'français'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants