sphinxext py3 compatibility #5530

metakermit · 2013-11-16T20:43:28Z

Temporary sphinxext py3 compatibility fix as discussed in #5479 before #5221 is resolved. So far test_docscrape.py passes on both Python 2.7 and 3.3 (which should be the typical developer interpreters).

jorisvandenbossche · 2013-11-19T09:35:18Z

I can't test this at the moment, but if it solves the python 3 issues with the docs for the moment, good for me!

hayd · 2013-11-19T23:51:56Z

Are these tests skipped by travis?

jtratner · 2013-11-20T00:03:40Z

we don't test doc builds with Travis. It's difficult to do and it fails for silly reasons (like latex not being installed)

hayd · 2013-11-20T00:14:07Z

But we could test this specific file (it just requires sphinx) ?

metakermit · 2013-11-20T01:03:49Z

Is this file skipped? I don't see it anywhere in the Travis log (which lists the skipped modules). If it is skipped I could try adding it, but I don't see how - the script.sh file that gets called by travis looks pretty inclusive:

 nosetests --exe -w /tmp -A "$NOSE_ARGS" pandas --with-xunit --xunit-file=/tmp/nosetests.xml

With the $NOSE_ARGS getting just the 'not slow' parameter (and docscrape tests don't have any slow decorators). I don't know what does this xunit-file do.

Anyway, this commit solves some of the issues (enough to pass the test_docscrape.py on the latest versions of Python 2 and 3 on my machine), but not all, so @phaebz (a colleague of mine, I added him as a contributor to the branch) and I will probably contribute more fixes to this branch.

hayd · 2013-11-20T01:34:25Z

I guess it must be skipped as other branches here pass on python 3!

It could be the pandas in the middle (does this only run tests within pandas directory? and not any in doc/sphinxext) ...Obviously passing that test won't confirm that the docs build on python 3, but this could be a start/prevent regressions :)

That you guys are working on this is great!

jtratner · 2013-11-20T02:02:21Z

it's only running the pandas directory. that test is under docs. But it needs to only run if sphinx is installed, no?

hayd · 2013-11-20T02:31:48Z

So the question is should we run this (currently) small test to imply the docs are buildable.

Perhaps this is out of scope of what travis should be doing usually...

phaebz · 2013-11-22T15:53:40Z

I had a look at the pandas.compat as indicated by @cpcloud over here #5479 (comment). I gathered that this compatibility layer is for exposing method redefinitions for e.g. .iteritems() vs. .items(), unicode handling (?) via str_to_bytes and the like.

Now I want to just use the import logic, namely the try/except part of pandas.compat.__init__.py, in other files (namely for cStringIO in ipython_directive.py) to expose the library naming changes to those files. Is this a feasible thing to do? If so, any ideas on how to achieve that?

jtratner · 2013-11-22T16:25:28Z

Not clear what the issue is, if this is in pandas repo you can assume pandas is installed. Therefore, you can use compat.

phaebz · 2013-11-22T16:30:22Z

Nevermind, I think I figured it out - see upcoming commit(s?). My question stemmed from the fact that I am new to compat issues between python 2 and python 3 in terms of code base organisation as big as pandas.

phaebz · 2013-11-22T16:40:06Z

... or not figured it out. Concrete example: I want to make doc/sphinxext/ipython_directive.py py3 compatible. One problem in there is the cStringIO.StringIO() instantiation. I naively tried to add a from pandas.compat import cStringIO in hope that it would take care to get a sane cStringIO module usable by py2 and py3. Instead, I get the error:

AttributeError: type object '_io.StringIO' has no attribute 'StringIO'

which looks like a "private" by the underscore convention (?).

jtratner · 2013-11-22T17:22:10Z

It's not the module, you actually get the object to instantiate, eg:

from pandas.compat import cStringIO buffer = cStringIO()

phaebz · 2013-11-22T18:57:26Z

cStringIO is later used in this file as cStringIO.StringIO. I will experiment a bit with replacing that with just StringIO. Anyways, I will work on the commits, so we can talk about actual code.

phaebz · 2013-11-22T23:40:29Z

I tried to use as much pandas.compat as possible, please review. The sphinxext seems ok. Sphinx starts to build under Python 2 and produces the following error

Exception occurred while building, starting debugger:                                        
Traceback (most recent call last):
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/cmdline.py", line 189, in main
    app.build(force_all, filenames)
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/application.py", line 204, in build
    self.builder.build_update()
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 196, in build_update
    'out of date' % len(to_build))
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 216, in build
    purple, length):
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 120, in status_iterator
    for item in iterable:
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/environment.py", line 613, in update_generator
    self.read_doc(docname, app=app)
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/environment.py", line 764, in read_doc
    raise SphinxError(str(err))
SphinxError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)
> /Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/environment.py(764)read_doc()
-> raise SphinxError(str(err))
(Pdb)

Under Python 3 sphinx runs through and says docs built successfully, but I get a ton of autodoc/docstring errors related to StringIO, xrange and pandas submodule imports

Traceback (most recent call last):
  File "/Users/user/.virtualenvs/pandas-dev/lib/python3.3/site-packages/sphinx/ext/autodoc.py", line 321, in import_object
    __import__(self.modname)
ImportError: No module named 'pandas.Series'

jorisvandenbossche · 2013-11-23T09:45:01Z

Are you working on Windows? In that case, the first error (on python 2 causing the debugger to start) could possibly attributed to io.rst (there is a unicode example that does not build on windows/python 2, see #5142).

phaebz · 2013-11-23T12:01:35Z

I am working on OS X 10.8.5. I had a look around in the debugger and it turns out that indeed the docname in question is 'io'.

>>> import locale
>>> print(locale.getlocale())
('en_US', 'UTF-8')
>>> print(locale.getdefaultlocale())
('en_US', 'UTF-8')

jorisvandenbossche · 2013-11-23T12:17:50Z

@phaebz Something else, I tried to build the docs with the IPython sphinxextensions, and it worked with some warnings (on Python 2, but they should be python 3 compatible). This could also be a way to, at the moment, build the docs on python 3.
The only two lines I changed is in conf.py:

Replace

              'ipython_directive',
              'ipython_console_highlighting',

with

              'IPython.sphinxext.ipython_directive',
              'IPython.sphinxext.ipython_console_highlighting',

You need IPython 1.0 (or larger) for this.
This worked for me to build the docs, you only get some errors about the okexcept parameter which is unknown (this parameter for the ipython directive is not in upstream), and some errors in the "Enhancing performance" section of the docs with the cython code (adaptations to run this is also not upstreamed). But you can build the docs (only on those affected pages the html pages will not be perfect).

phaebz · 2013-11-23T13:38:54Z

Thanks for letting me know, this serves ok for now. I will use this locally from now on when working on docs. As this PR was intended to be a temporary fix before #5221 made it through / while it is discussed I will check back with @Kermit666 how/if we will proceed.

jreback · 2014-01-03T22:16:21Z

@Kermit666 how's this coming along?

metakermit · 2014-01-09T23:05:44Z

@jreback currently lost in some "maximum recursion depth" errors I didn't expect to encounter in a Python 2 to 3 conversion:

RuntimeError: maximum recursion depth exceeded in comparison
> /Users/kermit/.virtualenvs/pandas3/lib/python3.3/site-packages/jinja2/utils.py(389)__getitem__()
-> if self._queue[-1] != key:

I'll try to work on it some more to figure it out.

jorisvandenbossche · 2014-01-09T23:11:00Z

That is the same error as @cancan101 got, and I also saw it once (I was thinking it maybe something to do with sphinx 1.2 vs 1.1.3, but I still have to test that)

jorisvandenbossche · 2014-01-10T10:59:05Z

@Kermit666 See #5899, do you use Sphinx 1.2? And can you try with 1.1.3?

metakermit · 2014-01-10T14:22:19Z

Thanks @jorisvandenbossche , that seems to be the root. I tried with Sphinx 1.1.3, but that one has some other errors (possibly py3-related) and now I'm working on the development version of Sphinx (1.3a).

I am also getting weird errors with pandas itself.

>>> import pandas
dlopen(./pandas/hashtable.so, 2): Symbol not found: _PyCObject_Type

I even tried rebasing to the latest master, but that didn't help. Is master unstable at the moment (in which case - should I be working off of the latest tag instead?) or am I maybe missing something locally?

jorisvandenbossche · 2014-01-12T18:44:49Z

Normally master should be stable. Did you build pandas again after rebasing? (eg python setup.py build_ext --inplace)

And with the development version of Sphinx (1.3a), is the 'maximum recursion error' solved?

metakermit · 2014-01-12T23:36:20Z

OK, you are right @jorisvandenbossche . I thought that pip install -e . would trigger the extensions rebuild (because it builds them initially), but it appears it doesn't. After calling build_ext explicitly I got pandas to import again.

Yes, the dev version of Sphinx resolves the 'maximum recursion error'.

As far as the documentation itself is concerned - on the current branch I am able to generate it, but I get a ton of warnings and import errors along the way (import errors for missing scipy, xlrd,..., missing file 'foo.xlsx', toctree warnings). This whole build process seems very complex... I expected something more in the line of mark-up -> html :)

jorisvandenbossche · 2014-01-13T08:53:23Z

@Kermit666 The reason building the docs seems complex (lot of errors and warnings etc) is because all code examples in the docs are run and the output saved during the docs build (using the ipython directive, so it's not simply rst -> html, the code examples in rst are first generated). This also means that, if you don't have all optional dependencies for pandas (hdf5-related, html-related, excel-related, etc) some of the code examples will generate errors.
This is not really a problem for working locally on the docs, as the docs will still just build (only for the topics you don't have the dependencies, the code examples will show the error instead of the useful output). Also, when you build the docs a second time, only those pages you changed are rebuild (which should also decrease the amount of errors/warnings you see).
With a clean build, the only warning you should see is "class.rst:: WARNING: document isn't included in any toctree".

metakermit · 2014-01-13T10:01:39Z

OK, thanks. I'll play with it tonight some more with all the optional dependencies then.

phaebz · 2014-01-28T13:19:57Z

@jorisvandenbossche I adapted the test for buflist. I get the Windows error in #5530 (comment) with 474e5fd applied. Have you tested under Windows already?

jorisvandenbossche · 2014-01-28T16:27:40Z

Tested it with Windows/python 2, and I also get a UnicodeError while parsing io.rst (as before @y-p's unicode fix 0543a21). This was with just using readlines instead of buflist (metakermit@5165a53).

Now, I don't really get my head around what this buflist actually is/does. It is in every case not the same as readlines (just try simply s = StringIO('test'); s.buflist / s.readlines()). The reason it worked on Linux is that there the unicode-fix with buflist is not needed, so it didn't matter that it didn't work. So for now, we could get it to work for the combo linux/python3 by just skipping that unicode fix with buflist for python 3.

To also getting it to work on python3 / windows, we would need a python 3 equivalent of @y-p unicode fix 0543a21.

ghost · 2014-01-28T17:18:05Z

This PR has been troubling me and I've been giving some thought as to why.

Here's my take:

No problem having the docs build on py3, go right ahead.
If making the code examples py3 compat requires intrusive changes, or
requires introducing unfamiliar (to readers) stuff from pandas.compat such as u().
That's bad.
We want to make doc contributions easier, not more difficult.

The last point is crucial. Building the docs currently takes a long time. Cross-checking
code examples on both py2/py3 takes some time. Both are a disincentive to doc
contributions. Making the process more difficult for the majority of contributors is
moving in the wrong direction.

So if you can make the docs build on py3 without intrusive changes, that's great.
But no one who submits a good contribution to the docs should be penalized by
requiring them to invest extra effort in making them py3 compat. afaic, that's not
an issue.

as an aside, two suggestions on how to make a significant contribution
to pandas' documentaion:

Write some more.
Find ways to make the docs build faster. 60 seconds or less would get you a
standing ovation from me.

Possible ways to go about achieving the latter are:

Contributing to upstream sphinx, making it faster for everyone.
~~Adapt the build script to make it possible to build just a specified section of the docs. i.e make.py source/basics.rst.~~ BLD/DOC: Faster doc building via jinja2 and conf.py logic #6179
Work out some hack to make sphinx invalidate less of it's cache when moving
between commits, or keeping it around for longer. I'd happily trade disk for speed.

ghost · 2014-01-28T17:23:59Z

re the buflist fix, I should have added some comments. The build was failing
on cout.seek() and the problem was that some of the writes ip_d made to "stdout"
were unicode while others were of non-ascii bytestrings. buflist contains the individual
writes made and the patch modifies those in place to decode them all into unicode
where required, forcfully if needed.

It's an ugly hack, which is also why I'm not going to submit it to upstream ipython.

jorisvandenbossche · 2014-01-28T21:15:12Z

@y-p I completely agree with the fact that contributing to the docs should be as easy as possible, but some thoughts:

The docs are already made for a large part python3 compatible (see eg CLN: change print to print() in docs #4972 for print to print(), DOC: lrange, lzip --> list(range and list(zip #4450 for range to list(range())). But indeed, for new contributions it makes it a little bit more difficult (although, a print statement is not that common in the docs).
And things from pandas.compat should not be needed I think?
At some point, the move to python3 for the docs will have to be made. You can of course discuss if this is the right time.

So I think that the changes are not that intrusive (and are already there in the docs). And we can say that we don't expect from new contributions that they have checked on python3. If we see an obvious one (like a print statement), we can comment, and for the rest it is up to the contributors using python 3 to build the docs once in a while and push needed changes?

What I propose:

the numpydoc list comprehension fix (metakermit@247f7f2) is OK and can be merged I think (and upstreamed to numpydoc)
for the buflist issue, we could put a if not PY3 around that, so the docs at least build on linux. @y-p or is the conversion to unicode also needed on linux?

I think with those two fixes, the docs build with python 3 on linux (correct @phaebz @Kermit666 ?). Thoughts?

And further leave the buflist / windows decode error as is for the moment (unless someone wants to dive into that).

ghost · 2014-01-28T21:31:16Z

I concur with every single item.

No problem bypassing the buflist fix on py3 if you wish. Having py3 for utf8 systems beats having
py3 nowhere. The issue is not linux/windows, but the default console encoding which happens to be
utf8 by default roughly everywhere but windows.

All the rest is completely reasonable as well. bombs away.

jorisvandenbossche · 2014-01-28T21:35:56Z

BTW, (I am not a python 3 expert) but is this list(range()) actually needed in python 3?

Eg pd.DataFrame(np.random.randn(5,5), index=range(5)) just works as well, why is it needed to have pd.DataFrame(np.random.randn(5,5), index=list(range(5))) in the docs?

jorisvandenbossche · 2014-01-28T21:41:37Z

@Kermit666 @phaebz If someone of you can adapt the commits here to the two points in my previous comment, we can move forward and merge this :-)

(There is still the issue of rebuilding, but leave that for another PR, it's getting long enough here)

ghost · 2014-01-28T21:46:11Z

It's not required on py3 nor on py2.

pd.compat.range however is xrange on py2 and range on py3 both of which return an iterator.
The pandas codebase uses compat.range everywhere, so on py2 list(compat.range()) is required
to get back the behavior of what used to be simply range().

It's a bit ugly, yeah. But saves the 2to3 pass.

jorisvandenbossche · 2014-01-28T22:13:44Z

I understand that it is used like that internally, but is needed in the docs? As I can, as a user, just type index=range(5) and it works in both python 2 and 3.

ghost · 2014-01-28T22:33:24Z

range means different things in py2 and py3, so in general: it depends.

phaebz · 2014-01-29T09:22:42Z

@jorisvandenbossche RE: the buflist issue, I did not opt for the if not PY3, because AFAICS there is no StringIO.buflist in Python 2.7.5 - please correct me if I am wrong, I wrote this in the commuter train...

If this is ok with you guys I will talk to @Kermit666 and we do the squashing.

jorisvandenbossche · 2014-01-29T09:29:00Z

No, I think it's only removed in python 3 (I have python 2.7.6 and there StringIO has buflist and the fix works). The if not PY3 check would just be to prevent an error like StirngIO object has no attribute buflist on python 3

phaebz · 2014-01-29T11:11:12Z

you are right, I did not instantiate correctly when I tried it. Amended.

…tributeError on Pandas.str

metakermit · 2014-01-29T16:56:47Z

OK, I squashed the branch into 2 commits regarding the relevant changes and rebased to master so that you can easily merge it.

@y-p I agree with the stuff you said. The last thing we want is to fend off potential contributors for being too zealous. Also, we'll try to consider the things you suggested for future improvements when picking the next thing to work on ;)

ghost · 2014-01-29T17:19:08Z

Thank you, I appreciate your cooperation.

jorisvandenbossche · 2014-01-29T21:35:10Z

@Kermit666 @phaebz Super. Just a last small comment: at the if not PY3 can you put in a small comment for the reason of that?

And are you going to try to push the numpydoc fix upstream?

phaebz · 2014-01-30T10:40:42Z

I just dropped b3297dd since @y-p moved the relevant section in 438d614 anyways

jorisvandenbossche · 2014-01-30T10:49:15Z

Yes indeed, merging this then. Can you also try out @y-p new try on the unicode issue on python 3 (I tested it on windows, but not yet python 3).

@Kermit666 @phaebz Thanks again for your effort on this!

sphinxext py3 compatibility

jorisvandenbossche · 2014-01-30T11:09:55Z

@phaebz I was just thinking, the try ... except AttributeError, was this just for the pd.series.str that was causing an error, and which is now solved? Because then, this should maybe be removed, as it is interesting to know that there is a method nog recognized?

metakermit · 2014-01-30T11:57:43Z

@jorisvandenbossche yes, the except AttributeError shouldn't be necessary any more, but I think it does no harm in case an error such as this one pops again in the future. I sent a PR upstream just now (numpy/numpydoc#11).

metakermit mentioned this pull request Nov 16, 2013

Building docs SyntaxError #5479

Closed

jorisvandenbossche mentioned this pull request Jan 10, 2014

DOC: docs don't build with new Sphinx version 1.2 #5899

Closed

listcomp -> for-loop (listcopms create a new scope in py3), bypass At…

026c9be

…tributeError on Pandas.str

ghost mentioned this pull request Jan 30, 2014

BLD: ipython_directive, handle non-ascii execution results #6185

Merged

jorisvandenbossche added a commit that referenced this pull request Jan 30, 2014

Merge pull request #5530 from kermit666/sphinxext-py3

f0c4953

sphinxext py3 compatibility

jorisvandenbossche merged commit f0c4953 into pandas-dev:master Jan 30, 2014

metakermit mentioned this pull request Jan 30, 2014

listcomp -> for-loop (listcopms create a new scope in py3), bypass Attri... numpy/numpydoc#11

Merged

phaebz mentioned this pull request Jan 31, 2014

DOC/PY3: DataFrame.groupby fails in docs examples #6212

Closed

This was referenced May 25, 2014

Docs do not compile #7231

Closed

DOC: GH3850, add .isin to 10min.rst #7239

Merged

seth-p mentioned this pull request Jul 22, 2014

BUG: rolling_* functions should not shrink window #7766

Closed

sphinxext py3 compatibility #5530

sphinxext py3 compatibility #5530

Conversation

metakermit commented Nov 16, 2013

jorisvandenbossche commented Nov 19, 2013

hayd commented Nov 19, 2013

jtratner commented Nov 20, 2013

hayd commented Nov 20, 2013

metakermit commented Nov 20, 2013

hayd commented Nov 20, 2013

jtratner commented Nov 20, 2013

hayd commented Nov 20, 2013

phaebz commented Nov 22, 2013

jtratner commented Nov 22, 2013

phaebz commented Nov 22, 2013

phaebz commented Nov 22, 2013

jtratner commented Nov 22, 2013

phaebz commented Nov 22, 2013

phaebz commented Nov 22, 2013

jorisvandenbossche commented Nov 23, 2013

phaebz commented Nov 23, 2013

jorisvandenbossche commented Nov 23, 2013

phaebz commented Nov 23, 2013

jreback commented Jan 3, 2014

metakermit commented Jan 9, 2014

jorisvandenbossche commented Jan 9, 2014

jorisvandenbossche commented Jan 10, 2014

metakermit commented Jan 10, 2014

jorisvandenbossche commented Jan 12, 2014

metakermit commented Jan 12, 2014

jorisvandenbossche commented Jan 13, 2014

metakermit commented Jan 13, 2014

phaebz commented Jan 28, 2014

jorisvandenbossche commented Jan 28, 2014

ghost commented Jan 28, 2014

ghost commented Jan 28, 2014

jorisvandenbossche commented Jan 28, 2014

ghost commented Jan 28, 2014

jorisvandenbossche commented Jan 28, 2014

jorisvandenbossche commented Jan 28, 2014

ghost commented Jan 28, 2014

jorisvandenbossche commented Jan 28, 2014

ghost commented Jan 28, 2014

phaebz commented Jan 29, 2014

jorisvandenbossche commented Jan 29, 2014

phaebz commented Jan 29, 2014

metakermit commented Jan 29, 2014

ghost commented Jan 29, 2014

jorisvandenbossche commented Jan 29, 2014

phaebz commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

metakermit commented Jan 30, 2014