-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pandas appears to crash doing pivot table from 10 minute examples #3922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can u put up the exact steps u used to create the virtualenv and build pandas? |
to build pandas, I used the command "pip install pandas". To install the virtualenv, it was so long ago I cannot remember. I just followed the simple instructions they provide. To create the virtualenv i run the command "source /users/rmschne/.virtualenv/rmspython/bin/activate". Is there a "right" way for pandas to work? I did not notice anything special in docs. Update: I looked in my records, and I used http://docs.python-guide.org/en/latest/starting/install/osx.html for how to install virtualenv, pip, etc. |
Did you install Cython? I don't think pandas installs it by default. |
I did not install Cython. per http://pandas.pydata.org/pandas-docs/stable/install.html#dependencies it is "Only necessary to build development version. Version 0.17.1 or higher". However, I just installed it with "pip install cython" and it appeared to be successful. then, running:
The session hangs. No error message. ctrl-c returns: File "", line 1, in |
you're right you don't need it my bad |
here's what you should do, if you're up for it, since i can't reproduce this on the latest master or on pandas from PyPI:
|
hang is here: ipdb> n
ipdb> n
ipdb> n
ipdb> n
ipdb> n
ipdb> HANG FROM HERE |
can u try just a simple
|
thanks for staying with me. got same:
|
ok so now try |
few more things
so far git master builds and runs on windows (i think 7 is what ppl are building with @jreback?), arch linux, ubuntu 12.04 and 12.10, fedora 17, centos and a couple of mac versions not sure which ones though. personally i've tested all the linux versions i just mentioend except centos and they're fine. can u post the output of |
10.8.5? Are you running a development version of OSX? |
Re michaelaye's question. No. Macbook Pro OSX Ver 10.8.4. |
df.unstack(0) works as expected. |
@michaelaye ... following are all the versions in place in the virtual env. While OSX is not development, is the comment from yolk -l "active development" for Python 2.7.2 relevant to your question? I never noticed word "development". I've not touched the built-in Python to my knowledge. New machine in Nov 2012. Cython - 0.19.1 - active |
re "can u post the output of agged and of to_unstack (at the point right before it hangs)?" see below. code used: ipdb> s
ipdb> to_unstack HANGS HERE NOW |
Re "build the latest git master and see if the problem is alleviated" ... don't know how to do this. I rely on "pip install pandas". Can you send me a link to instructions, or give the command? |
@buckeye1 download the source, go into the directory and run |
Thanks. I know the "python setup.py install" step. As I can't find the tar file, can you please give me the URL to where I can find how to make a local git repository and then download it? I presume that the process. Or something simpler? |
when you go here: https://github.com/pydata/pandas you see in the lower right side the download link and an icon to copy the url into your clipboard. Now go to your terminal, change directory to where you want to have these kind of data stored and type : git clone
|
installed pandas-0.11.1.dev_36c1263. on Mac OSX (latest) in virtual env. first run failed as missing "nose". so used pip to install that. import pandas as pd (note: to save time on debugging pressing 'n' and 's' changed from 6 to 3 periods) After 45 minutes of pressing 's', I conclude it's in a loop. That conclusion may be wrong, but after 45 minutes which is on a reduced set of periods (see above) ipdb> s
ipdb> s
ipdb> s
ipdb> s
ipdb> s
ipdb> s
ipdb> s
|
I think this issue exists on #3902 on mac as well something fishy in the int max....can you post these 2 values
|
ipdb> np.iinfo(np.int64).max interesting. |
so this is the cause of the loop it appears (only on mac) that the max int64 value is 263-1 (rather than 263) but more insidious it appears to wrap around (which i think is really weird) on amd64
so this is a bug, but a very odd one....marking it as such |
post details on your os? do you have an older (or newer) version? wondering if this is a function of all mac os |
not sure i understand here's what i get
|
oh i see the wrap around but the max value is the same.. |
The wrap around does not happen on my Mac? In [2]: iinfo(np.int64).max
Out[2]: 9223372036854775807
In [3]: iinfo(np.int64).max+1
Out[3]: 9223372036854775808L
In [4]: 2**63
Out[4]: 9223372036854775808L
In [5]: import sys;sys.maxint
Out[5]: 9223372036854775807
In [6]: !uname -a
Darwin blastoid.ess.ucla.edu 12.4.0 Darwin Kernel Version 12.4.0: Wed May 1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64 np 1.7.1 though? and Python 2.7.3 |
@michaelaye what is ur osx version? |
10.8.4 all patches. |
@buckeye1 has 10.8.5, could that be the difference? |
no he confirmed after my question that he has 10.8.4 (as there is no 10.8.5 official yet) |
ok....can you tell if his install is different that yours (prob not enough info)..? are there different flavors (like in linux) on mac? |
@michaelaye what is ur python version? @buckeye1 is using 2.7.2 |
i can sort of repro this, but it's to be expected because numpy doesn't do
but why are the python ints behaving this way?! they should cast to long.... |
As I edited above cough I'm using 2.7.3. And this seems to become a Robert Kern question ... ;) |
@buckeye1 I noticed that your Python is stored in a Cellar, so you installed it with homebrew? That makes me wonder why it is such an old Python, as homebrew is usually up-to-date? I have no clue if it is maybe 'frozen' when using the Python from homebrew in your virtual env as I luckily still never saw the need for a virtualenv, but maybe it's worth to quickly install the free Enthought Python distribution (I am not affiliated but a long time user of it) and see if problems continue to exist there? |
System details (and yes, I had a thick figure problem on my first reply on Mac OSX Version) MacOSX 10.8.4, 2.7 ghz Intel Core i7 16 gb |
there's the |
I don't recall installing python from source, and I don't remember how I setup the virtualenv, exactly. I know enough about the systems to be dangerous but my real interest is getting things done with python and django (and hopefully pandas). I want an virtual env so that I can install stuff outside of Apple's standard install. I got it going, it worked, and that was in Nov when the machine was new. What version of python should I install into this virtual env? |
to check that, @buckeye1 should do the following: In [8]: sys.long_info
Out[8]: sys.long_info(bits_per_digit=30, sizeof_digit=4) |
nope that doesn't do anything |
@michaelaye thanks that's much faster than recompilling |
@buckeye1 You do NOT need a virtual environment if your only goal is to leave Apple's Python alone! Just installing the latest 2.7.x release from python.org and activating it in your .bash_profile will do exactly that, I am doing that for years on my Macs. Or save yourself some installation time and go with the free EPD, then you already have the most important packages installed and still can install whatever you need on top, with pip as you did before. Your Python is located in the typical position for a homebrew installation, but as you don't recall, you never ever called |
Python 2.7.2 (default, Nov 3 2012, 16:25:27) |
can u do |
@michaelaye. yes, I'm dangerous!!! ill-informed amateur. |
|
launch your system python like so: |
Now we are getting somewhere. (rmspython)pluto:~ rmschne$ /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python
|
Right, so you gotta replace your Python interpreter. I gave you some options above, which way you go, is up to you. But I would recommend to redo a clean setup and not force it on top of your current one where you don't remember how it was created. |
I think all we established is that different flavours of Python computes 2**63 differently and that pandas doesn't accommodate that. Those two assertions are to be pondered. That being said. I still don't have pandas working. I'll work to find a version of Python that allows it to work, along with all the other stuff I must have to work, and use my test.py script as the acceptance criteria. Thanks to all who helped me with this. |
pandas 0.11.0 installed with Python 2.7.2 using "pip install pandas" on fully patched mac osx 10.8.5, running in a virtual environment. running any pivot table against any DataFrame create so for -- nothing happens. pressing ctrl-c I see the latest lines run are:
--> 424 while com._long_prod(sizes[:i]) >= 2 ** 63:
425 i -= 1
The problem is reproducible using http://pandas.pydata.org/pandas-docs/stable/10min.html and running:
from pandas import DataFrame, Series
import pandas as pd
import numpy as np
then running lines [97], [98], and [99] in iPython and Python native.
I can't see any way to debug or go further.
The text was updated successfully, but these errors were encountered: