Skip to content

sphinxext py3 compatibility #5530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2014

Conversation

metakermit
Copy link
Contributor

Temporary sphinxext py3 compatibility fix as discussed in #5479 before #5221 is resolved. So far test_docscrape.py passes on both Python 2.7 and 3.3 (which should be the typical developer interpreters).

@jorisvandenbossche
Copy link
Member

I can't test this at the moment, but if it solves the python 3 issues with the docs for the moment, good for me!

@hayd
Copy link
Contributor

hayd commented Nov 19, 2013

Are these tests skipped by travis?

@jtratner
Copy link
Contributor

we don't test doc builds with Travis. It's difficult to do and it fails for silly reasons (like latex not being installed)

@hayd
Copy link
Contributor

hayd commented Nov 20, 2013

But we could test this specific file (it just requires sphinx) ?

@metakermit
Copy link
Contributor Author

Is this file skipped? I don't see it anywhere in the Travis log (which lists the skipped modules). If it is skipped I could try adding it, but I don't see how - the script.sh file that gets called by travis looks pretty inclusive:

 nosetests --exe -w /tmp -A "$NOSE_ARGS" pandas --with-xunit --xunit-file=/tmp/nosetests.xml

With the $NOSE_ARGS getting just the 'not slow' parameter (and docscrape tests don't have any slow decorators). I don't know what does this xunit-file do.

Anyway, this commit solves some of the issues (enough to pass the test_docscrape.py on the latest versions of Python 2 and 3 on my machine), but not all, so @phaebz (a colleague of mine, I added him as a contributor to the branch) and I will probably contribute more fixes to this branch.

@hayd
Copy link
Contributor

hayd commented Nov 20, 2013

I guess it must be skipped as other branches here pass on python 3!

It could be the pandas in the middle (does this only run tests within pandas directory? and not any in doc/sphinxext) ...Obviously passing that test won't confirm that the docs build on python 3, but this could be a start/prevent regressions :)

That you guys are working on this is great!

@jtratner
Copy link
Contributor

it's only running the pandas directory. that test is under docs. But it needs to only run if sphinx is installed, no?

@hayd
Copy link
Contributor

hayd commented Nov 20, 2013

So the question is should we run this (currently) small test to imply the docs are buildable.

Perhaps this is out of scope of what travis should be doing usually...

@phaebz
Copy link
Contributor

phaebz commented Nov 22, 2013

I had a look at the pandas.compat as indicated by @cpcloud over here #5479 (comment). I gathered that this compatibility layer is for exposing method redefinitions for e.g. .iteritems() vs. .items(), unicode handling (?) via str_to_bytes and the like.

Now I want to just use the import logic, namely the try/except part of pandas.compat.__init__.py, in other files (namely for cStringIO in ipython_directive.py) to expose the library naming changes to those files. Is this a feasible thing to do? If so, any ideas on how to achieve that?

@jtratner
Copy link
Contributor

Not clear what the issue is, if this is in pandas repo you can assume pandas is installed. Therefore, you can use compat.

@phaebz
Copy link
Contributor

phaebz commented Nov 22, 2013

Nevermind, I think I figured it out - see upcoming commit(s?). My question stemmed from the fact that I am new to compat issues between python 2 and python 3 in terms of code base organisation as big as pandas.

@phaebz
Copy link
Contributor

phaebz commented Nov 22, 2013

... or not figured it out. Concrete example: I want to make doc/sphinxext/ipython_directive.py py3 compatible. One problem in there is the cStringIO.StringIO() instantiation. I naively tried to add a from pandas.compat import cStringIO in hope that it would take care to get a sane cStringIO module usable by py2 and py3. Instead, I get the error:

AttributeError: type object '_io.StringIO' has no attribute 'StringIO'

which looks like a "private" by the underscore convention (?).

@jtratner
Copy link
Contributor

It's not the module, you actually get the object to instantiate, eg:

from pandas.compat import cStringIO buffer = cStringIO()

@phaebz
Copy link
Contributor

phaebz commented Nov 22, 2013

cStringIO is later used in this file as cStringIO.StringIO. I will experiment a bit with replacing that with just StringIO. Anyways, I will work on the commits, so we can talk about actual code.

@phaebz
Copy link
Contributor

phaebz commented Nov 22, 2013

I tried to use as much pandas.compat as possible, please review. The sphinxext seems ok. Sphinx starts to build under Python 2 and produces the following error

Exception occurred while building, starting debugger:                                        
Traceback (most recent call last):
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/cmdline.py", line 189, in main
    app.build(force_all, filenames)
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/application.py", line 204, in build
    self.builder.build_update()
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 196, in build_update
    'out of date' % len(to_build))
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 216, in build
    purple, length):
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 120, in status_iterator
    for item in iterable:
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/environment.py", line 613, in update_generator
    self.read_doc(docname, app=app)
  File "/Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/environment.py", line 764, in read_doc
    raise SphinxError(str(err))
SphinxError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)
> /Users/user/.virtualenvs/pandas-dev-py2/lib/python2.7/site-packages/sphinx/environment.py(764)read_doc()
-> raise SphinxError(str(err))
(Pdb)

Under Python 3 sphinx runs through and says docs built successfully, but I get a ton of autodoc/docstring errors related to StringIO, xrange and pandas submodule imports

Traceback (most recent call last):
  File "/Users/user/.virtualenvs/pandas-dev/lib/python3.3/site-packages/sphinx/ext/autodoc.py", line 321, in import_object
    __import__(self.modname)
ImportError: No module named 'pandas.Series'

@jorisvandenbossche
Copy link
Member

Are you working on Windows? In that case, the first error (on python 2 causing the debugger to start) could possibly attributed to io.rst (there is a unicode example that does not build on windows/python 2, see #5142).

@phaebz
Copy link
Contributor

phaebz commented Nov 23, 2013

I am working on OS X 10.8.5. I had a look around in the debugger and it turns out that indeed the docname in question is 'io'.

>>> import locale
>>> print(locale.getlocale())
('en_US', 'UTF-8')
>>> print(locale.getdefaultlocale())
('en_US', 'UTF-8')

@jorisvandenbossche
Copy link
Member

@phaebz Something else, I tried to build the docs with the IPython sphinxextensions, and it worked with some warnings (on Python 2, but they should be python 3 compatible). This could also be a way to, at the moment, build the docs on python 3.
The only two lines I changed is in conf.py:

Replace

              'ipython_directive',
              'ipython_console_highlighting',

with

              'IPython.sphinxext.ipython_directive',
              'IPython.sphinxext.ipython_console_highlighting',

You need IPython 1.0 (or larger) for this.
This worked for me to build the docs, you only get some errors about the okexcept parameter which is unknown (this parameter for the ipython directive is not in upstream), and some errors in the "Enhancing performance" section of the docs with the cython code (adaptations to run this is also not upstreamed). But you can build the docs (only on those affected pages the html pages will not be perfect).

@phaebz
Copy link
Contributor

phaebz commented Nov 23, 2013

Thanks for letting me know, this serves ok for now. I will use this locally from now on when working on docs. As this PR was intended to be a temporary fix before #5221 made it through / while it is discussed I will check back with @Kermit666 how/if we will proceed.

@jreback
Copy link
Contributor

jreback commented Jan 3, 2014

@Kermit666 how's this coming along?

@metakermit
Copy link
Contributor Author

@jreback currently lost in some "maximum recursion depth" errors I didn't expect to encounter in a Python 2 to 3 conversion:

RuntimeError: maximum recursion depth exceeded in comparison
> /Users/kermit/.virtualenvs/pandas3/lib/python3.3/site-packages/jinja2/utils.py(389)__getitem__()
-> if self._queue[-1] != key:

I'll try to work on it some more to figure it out.

@jorisvandenbossche
Copy link
Member

That is the same error as @cancan101 got, and I also saw it once (I was thinking it maybe something to do with sphinx 1.2 vs 1.1.3, but I still have to test that)

@jorisvandenbossche
Copy link
Member

@Kermit666 See #5899, do you use Sphinx 1.2? And can you try with 1.1.3?

@metakermit
Copy link
Contributor Author

Thanks @jorisvandenbossche , that seems to be the root. I tried with Sphinx 1.1.3, but that one has some other errors (possibly py3-related) and now I'm working on the development version of Sphinx (1.3a).

I am also getting weird errors with pandas itself.

>>> import pandas
dlopen(./pandas/hashtable.so, 2): Symbol not found: _PyCObject_Type

I even tried rebasing to the latest master, but that didn't help. Is master unstable at the moment (in which case - should I be working off of the latest tag instead?) or am I maybe missing something locally?

@jorisvandenbossche
Copy link
Member

Normally master should be stable. Did you build pandas again after rebasing? (eg python setup.py build_ext --inplace)

And with the development version of Sphinx (1.3a), is the 'maximum recursion error' solved?

@metakermit
Copy link
Contributor Author

OK, you are right @jorisvandenbossche . I thought that pip install -e . would trigger the extensions rebuild (because it builds them initially), but it appears it doesn't. After calling build_ext explicitly I got pandas to import again.

Yes, the dev version of Sphinx resolves the 'maximum recursion error'.

As far as the documentation itself is concerned - on the current branch I am able to generate it, but I get a ton of warnings and import errors along the way (import errors for missing scipy, xlrd,..., missing file 'foo.xlsx', toctree warnings). This whole build process seems very complex... I expected something more in the line of mark-up -> html :)

@jorisvandenbossche
Copy link
Member

@Kermit666 The reason building the docs seems complex (lot of errors and warnings etc) is because all code examples in the docs are run and the output saved during the docs build (using the ipython directive, so it's not simply rst -> html, the code examples in rst are first generated). This also means that, if you don't have all optional dependencies for pandas (hdf5-related, html-related, excel-related, etc) some of the code examples will generate errors.
This is not really a problem for working locally on the docs, as the docs will still just build (only for the topics you don't have the dependencies, the code examples will show the error instead of the useful output). Also, when you build the docs a second time, only those pages you changed are rebuild (which should also decrease the amount of errors/warnings you see).
With a clean build, the only warning you should see is "class.rst:: WARNING: document isn't included in any toctree".

@metakermit
Copy link
Contributor Author

OK, thanks. I'll play with it tonight some more with all the optional dependencies then.

@phaebz
Copy link
Contributor

phaebz commented Jan 28, 2014

@jorisvandenbossche I adapted the test for buflist. I get the Windows error in #5530 (comment) with 474e5fd applied. Have you tested under Windows already?

@jorisvandenbossche
Copy link
Member

Tested it with Windows/python 2, and I also get a UnicodeError while parsing io.rst (as before @y-p's unicode fix 0543a21). This was with just using readlines instead of buflist (metakermit@5165a53).

Now, I don't really get my head around what this buflist actually is/does. It is in every case not the same as readlines (just try simply s = StringIO('test'); s.buflist / s.readlines()). The reason it worked on Linux is that there the unicode-fix with buflist is not needed, so it didn't matter that it didn't work. So for now, we could get it to work for the combo linux/python3 by just skipping that unicode fix with buflist for python 3.

To also getting it to work on python3 / windows, we would need a python 3 equivalent of @y-p unicode fix 0543a21.

@ghost
Copy link

ghost commented Jan 28, 2014

This PR has been troubling me and I've been giving some thought as to why.

Here's my take:

  • No problem having the docs build on py3, go right ahead.
  • If making the code examples py3 compat requires intrusive changes, or
    requires introducing unfamiliar (to readers) stuff from pandas.compat such as u().
    That's bad.
  • We want to make doc contributions easier, not more difficult.

The last point is crucial. Building the docs currently takes a long time. Cross-checking
code examples on both py2/py3 takes some time. Both are a disincentive to doc
contributions. Making the process more difficult for the majority of contributors is
moving in the wrong direction.

So if you can make the docs build on py3 without intrusive changes, that's great.
But no one who submits a good contribution to the docs should be penalized by
requiring them to invest extra effort in making them py3 compat. afaic, that's not
an issue.

as an aside, two suggestions on how to make a significant contribution
to pandas' documentaion:

  • Write some more.
  • Find ways to make the docs build faster. 60 seconds or less would get you a
    standing ovation from me.

Possible ways to go about achieving the latter are:

  • Contributing to upstream sphinx, making it faster for everyone.
  • Adapt the build script to make it possible to build just a specified section of the docs. i.e make.py source/basics.rst. BLD/DOC: Faster doc building via jinja2 and conf.py logic #6179
  • Work out some hack to make sphinx invalidate less of it's cache when moving
    between commits, or keeping it around for longer. I'd happily trade disk for speed.

@ghost
Copy link

ghost commented Jan 28, 2014

re the buflist fix, I should have added some comments. The build was failing
on cout.seek() and the problem was that some of the writes ip_d made to "stdout"
were unicode while others were of non-ascii bytestrings. buflist contains the individual
writes made and the patch modifies those in place to decode them all into unicode
where required, forcfully if needed.

It's an ugly hack, which is also why I'm not going to submit it to upstream ipython.

@jorisvandenbossche
Copy link
Member

@y-p I completely agree with the fact that contributing to the docs should be as easy as possible, but some thoughts:

  • The docs are already made for a large part python3 compatible (see eg CLN: change print to print() in docs #4972 for print to print(), DOC: lrange, lzip --> list(range and list(zip #4450 for range to list(range())). But indeed, for new contributions it makes it a little bit more difficult (although, a print statement is not that common in the docs).
    And things from pandas.compat should not be needed I think?
  • At some point, the move to python3 for the docs will have to be made. You can of course discuss if this is the right time.

So I think that the changes are not that intrusive (and are already there in the docs). And we can say that we don't expect from new contributions that they have checked on python3. If we see an obvious one (like a print statement), we can comment, and for the rest it is up to the contributors using python 3 to build the docs once in a while and push needed changes?

What I propose:

  • the numpydoc list comprehension fix (metakermit@247f7f2) is OK and can be merged I think (and upstreamed to numpydoc)
  • for the buflist issue, we could put a if not PY3 around that, so the docs at least build on linux. @y-p or is the conversion to unicode also needed on linux?

I think with those two fixes, the docs build with python 3 on linux (correct @phaebz @Kermit666 ?). Thoughts?

And further leave the buflist / windows decode error as is for the moment (unless someone wants to dive into that).

@ghost
Copy link

ghost commented Jan 28, 2014

I concur with every single item.

No problem bypassing the buflist fix on py3 if you wish. Having py3 for utf8 systems beats having
py3 nowhere. The issue is not linux/windows, but the default console encoding which happens to be
utf8 by default roughly everywhere but windows.

All the rest is completely reasonable as well. bombs away.

@jorisvandenbossche
Copy link
Member

BTW, (I am not a python 3 expert) but is this list(range()) actually needed in python 3?

Eg pd.DataFrame(np.random.randn(5,5), index=range(5)) just works as well, why is it needed to have pd.DataFrame(np.random.randn(5,5), index=list(range(5))) in the docs?

@jorisvandenbossche
Copy link
Member

@Kermit666 @phaebz If someone of you can adapt the commits here to the two points in my previous comment, we can move forward and merge this :-)

(There is still the issue of rebuilding, but leave that for another PR, it's getting long enough here)

@ghost
Copy link

ghost commented Jan 28, 2014

It's not required on py3 nor on py2.

pd.compat.range however is xrange on py2 and range on py3 both of which return an iterator.
The pandas codebase uses compat.range everywhere, so on py2 list(compat.range()) is required
to get back the behavior of what used to be simply range().

It's a bit ugly, yeah. But saves the 2to3 pass.

@jorisvandenbossche
Copy link
Member

I understand that it is used like that internally, but is needed in the docs? As I can, as a user, just type index=range(5) and it works in both python 2 and 3.

@ghost
Copy link

ghost commented Jan 28, 2014

range means different things in py2 and py3, so in general: it depends.

@phaebz
Copy link
Contributor

phaebz commented Jan 29, 2014

@jorisvandenbossche RE: the buflist issue, I did not opt for the if not PY3, because AFAICS there is no StringIO.buflist in Python 2.7.5 - please correct me if I am wrong, I wrote this in the commuter train...

If this is ok with you guys I will talk to @Kermit666 and we do the squashing.

@jorisvandenbossche
Copy link
Member

No, I think it's only removed in python 3 (I have python 2.7.6 and there StringIO has buflist and the fix works). The if not PY3 check would just be to prevent an error like StirngIO object has no attribute buflist on python 3

@phaebz
Copy link
Contributor

phaebz commented Jan 29, 2014

you are right, I did not instantiate correctly when I tried it. Amended.

@metakermit
Copy link
Contributor Author

OK, I squashed the branch into 2 commits regarding the relevant changes and rebased to master so that you can easily merge it.

@y-p I agree with the stuff you said. The last thing we want is to fend off potential contributors for being too zealous. Also, we'll try to consider the things you suggested for future improvements when picking the next thing to work on ;)

@ghost
Copy link

ghost commented Jan 29, 2014

Thank you, I appreciate your cooperation.

@jorisvandenbossche
Copy link
Member

@Kermit666 @phaebz Super. Just a last small comment: at the if not PY3 can you put in a small comment for the reason of that?

And are you going to try to push the numpydoc fix upstream?

@phaebz
Copy link
Contributor

phaebz commented Jan 30, 2014

I just dropped b3297dd since @y-p moved the relevant section in 438d614 anyways

@jorisvandenbossche
Copy link
Member

Yes indeed, merging this then. Can you also try out @y-p new try on the unicode issue on python 3 (I tested it on windows, but not yet python 3).

@Kermit666 @phaebz Thanks again for your effort on this!

jorisvandenbossche added a commit that referenced this pull request Jan 30, 2014
@jorisvandenbossche jorisvandenbossche merged commit f0c4953 into pandas-dev:master Jan 30, 2014
@jorisvandenbossche
Copy link
Member

@phaebz I was just thinking, the try ... except AttributeError, was this just for the pd.series.str that was causing an error, and which is now solved? Because then, this should maybe be removed, as it is interesting to know that there is a method nog recognized?

@metakermit
Copy link
Contributor Author

@jorisvandenbossche yes, the except AttributeError shouldn't be necessary any more, but I think it does no harm in case an error such as this one pops again in the future. I sent a PR upstream just now (numpy/numpydoc#11).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants