-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
to_csv issue #8621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
better just to create a test programmatically eg df = Dataframe(np.random....) with the above shape (if this doesn't fail then u have something in your dtypes that causes this to fail) |
pd.DataFrame(np.random.randn(3, 100000)).to_csv('test.csv') works pd.DataFrame(np.random.randn(3, 100001)).to_csv('test.csv') doesnt work... (well it does seem to write a usable csv out) So it looks like its a limit off 100000... possibly something to do with % ag 100000 /usr/lib/python3.4/site-packages/pandas/core/format.py
1157: chunksize = (100000 / (len(self.cols) or 1)) or 1 |
@rmorgans perfect. Want to do a pull request with that test and a fix? follow along the same format in |
I'll see if I can work out what's going on (also my first ever attempt at a pull request) Is something like this OK for the test inside def test_to_csv_wide_frame_formatting(self):
with ensure_clean() as path:
pd.DataFrame(np.random.randn(3, 100001)).to_csv(path) |
close create the frame |
Hi Folks I've worked out a suitable test Here's the offending part if chunksize is None:
chunksize = (100000 / (len(self.cols) or 1)) or 1
self.chunksize = int(chunksize) What's happening is the chunksize, if None, is being guessed - but why is it being guessed as a function of the number of columns? From the docs chunksize should be related to number of rows surely? What was the original intent here, and what behaviour is desired? |
gr8 you can do s pull-request to submit the test/fix for this issue the intent was write a fixed size number of total elements at a time to keep s constant memory usage if u would like to provide a better guessing function would welcome that small rows, small columns in my limited tests didn't find much reason to have s very large chunksize |
a very similar chunking mechanism is inplace for HDFStore, to_sql |
will give this a try |
python3 issue
@fvia mentioned changing |
@jreback, I feel the chunksize in HDFStore, to_sql having the same name, has different logic. |
if u wanted to refactor it out to a function in |
I have an issue using to_csv on a DataFrame object. It has a large number of columns d.shape = (3,454731).
Not sure what's going on here - I've written a nosetest here (any tips for improvements in my test?)
rmorgans@f3d0a9e
The text was updated successfully, but these errors were encountered: