Skip to content

Test for Python 3.5 with C locale #14114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Test for Python 3.5 with C locale #14114

wants to merge 5 commits into from

Conversation

nbonnotte
Copy link
Contributor

As @jreback suggested, I'm adding an alternate py3 build to change LOCALE, based on the 3.4 slow build, to reveal some encoding bugs (see #12337)

I'm new to configuring Travis, I just hope this will work as I expect and that the tests will fail.

@codecov-io
Copy link

codecov-io commented Aug 28, 2016

Current coverage is 85.24% (diff: 100%)

Merging #14114 into master will not change coverage

@@             master     #14114   diff @@
==========================================
  Files           140        140          
  Lines         50556      50556          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43095      43095          
  Misses         7461       7461          
  Partials          0          0          

Powered by Codecov. Last update 289cd6d...dadf73c

@nbonnotte
Copy link
Contributor Author

This is disappointing. I added a travis job with LC_ALL=C, and the tests still succeed: https://travis-ci.org/pydata/pandas/jobs/155766390

The new locale seems to be taken into account, since travis says

$ ci/script.sh
inside ci/script.sh
Setting LC_ALL to C
/home/travis/build/pydata/pandas/pandas/computation/__init__.py:19: UserWarning: The installed version of numexpr 2.4.4 is not supported in pandas and will be not be used
  UserWarning)
pandas detected console encoding: ANSI_X3.4-1968

But on my computer,

$ LC_ALL=C nosetests pandas.tests.formats.test_format                                                     
..................S..............................................SSS.........E........................................................................................................
======================================================================
ERROR: test_to_latex_filename (pandas.tests.formats.test_format.TestDataFrameFormatting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/nicolas/Documents/Projects/pandas/pandas/tests/formats/test_format.py", line 2808, in test_to_latex_filename
    df.to_latex(path)
  File "/home/nicolas/Documents/Projects/pandas/pandas/core/frame.py", line 1655, in to_latex
    encoding=encoding)
  File "/home/nicolas/Documents/Projects/pandas/pandas/formats/format.py", line 661, in to_latex
    latex_renderer.write_result(f)
  File "/home/nicolas/Documents/Projects/pandas/pandas/formats/format.py", line 897, in write_result
    buf.write(' & '.join(crow))
UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 7: ordinal not in range(128)

----------------------------------------------------------------------
Ran 182 tests in 3.860s

FAILED (SKIP=4, errors=1)

I'm missing something, obviously. I'm going to try with Python 3.5, which is what I'm using on my computer, but I don't see how that could change things.

@jreback jreback added Bug Unicode Unicode strings IO LaTeX to_latex labels Aug 31, 2016
apt:
packages:
- xsel
# In allow_failures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line (as its duped)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What line should I remove, exactly?

@jreback
Copy link
Contributor

jreback commented Aug 31, 2016

so the JOB_NAME determines which files to pick up in the ci\requirements- dir. You need to add a copy of the files for the job which you copied. For something like this we would only need minimal dependencies (so its faster). you can zonk most of them (e.g. I assume you copied requirements-3.5.run (and .pip and .build), so edit the .run and remove evetything but pytz,dateutil,numpy

@nbonnotte nbonnotte changed the title Test for Python 3.4 with C locale Test for Python 3.5 with C locale Sep 3, 2016
@nbonnotte
Copy link
Contributor Author

There is no requirements-3.5.pip . And I guess you meant JOB_TAG determines the files to pickup in the ci/ folder?

I don't understand, the new test does not show up in Travis. It was the case with the previous commit, even though the new test was not in the "allowed failures". But it disappeared with my last commit...

@jorisvandenbossche
Copy link
Member

@nbonnotte You have to add it in both places (both in the base matrix, as repeat it in the allowed failures section. As you had it in the first commit I think)

@nbonnotte
Copy link
Contributor Author

It works! I mean, it fails: https://travis-ci.org/pydata/pandas/jobs/157552565

@jreback
Copy link
Contributor

jreback commented Sep 5, 2016

great!

I think something up with the locale string itself - it doesn't seem to fully parse (error above the latex one); the final error is tagged in another issue

@jreback
Copy link
Contributor

jreback commented Sep 5, 2016

======================================================================
ERROR: test_set_locale (pandas.tools.tests.test_util.TestLocaleUtils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tools/tests/test_util.py", line 72, in test_set_locale
    lang, enc = LOCALE_OVERRIDE.split('.')
ValueError: not enough values to unpack (expected 2, got 1)

this looks wrong

@jreback jreback added this to the 0.19.0 milestone Sep 5, 2016
@nbonnotte
Copy link
Contributor Author

nbonnotte commented Sep 5, 2016

@jreback I've set the locale to "C", which unlike "en_US.UTF-8" or "fr_FR.UTF-8" doesn't have any dot in it. Is it wrong? How should I set the system encoding to ascii?

Edit: I'll try with ANSI_X3.4-1968

@jreback
Copy link
Contributor

jreback commented Sep 5, 2016

@nbonnotte I have no idea what to put here

- LOCALE_OVERRIDE="zh_CN.UTF-8

looks like you need, country.encoding maybe? look thru python docs

@jreback
Copy link
Contributor

jreback commented Sep 5, 2016

@nbonnotte
Copy link
Contributor Author

ANSI_X3.4-1968 doesn't work:

  File "/home/travis/build/pydata/pandas/pandas/__init__.py", line 4, in <module>
    locale.setlocale(locale.LC_ALL, 'ANSI_X3.4-1968')
  File "/home/travis/miniconda/envs/pandas/lib/python3.5/locale.py", line 594, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

What locale should we use?

@nbonnotte
Copy link
Contributor Author

@jreback I have searched! The thing is, all xx_XX.encoding have encodings that can tolerate some accents: 8859-1, UTF-8... Still, I'm going to try with 8859-1, just in case.

I doubt zh_CN.UTF-8 would work, as I think it should set the encoding to utf-8, which is exactly what we need to avoid.

BTW, is it not wrong to assume that all locale settings match this pattern, xx_XX.encoding ? A user could set their locale to C. Maybe we can edit

  File "/home/travis/build/pydata/pandas/pandas/tools/tests/test_util.py", line 72, in test_set_locale
    lang, enc = LOCALE_OVERRIDE.split('.')

@jreback
Copy link
Contributor

jreback commented Sep 5, 2016

I was giving I am example of what other tests use

the locale utility is only for testing
u can change it to work with C I think

@nbonnotte
Copy link
Contributor Author

$ ci/script.sh
inside ci/script.sh
ci/script.sh: line 13: warning: setlocale: LC_ALL: cannot change locale (en_US.iso88591): No such file or directory

Which locale are available, then?

@jreback
Copy link
Contributor

jreback commented Sep 5, 2016

@nbonnotte
Copy link
Contributor Author

@jreback is this an acceptable solution?

- CLIPBOARD=xsel
- CACHE_NAME="35_ascii"
- USE_CACHE=true
addons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can take the CLIPBOARD and FULL_DEPS vars off both section

@jreback
Copy link
Contributor

jreback commented Sep 8, 2016

can you update & rebase

@nbonnotte
Copy link
Contributor Author

@jreback all done!

As soon as the tests are confirmed, I'll correct the bug #12337

@jreback jreback closed this in 1e61aed Sep 10, 2016
@jreback
Copy link
Contributor

jreback commented Sep 10, 2016

thanks!

ok if you'd submit on the fix for the issue would be great!

@jreback jreback mentioned this pull request Sep 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO LaTeX to_latex Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: UnicodeEncodeError in test_to_latex_filename (pandas.tests.test_format.TestDataFrameFormatting)
4 participants