Skip to content

API/DOC: status of low_memory kwarg of read_csv/table #5888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cancan101 opened this issue Jan 9, 2014 · 12 comments
Closed

API/DOC: status of low_memory kwarg of read_csv/table #5888

cancan101 opened this issue Jan 9, 2014 · 12 comments
Labels
Bug Docs IO CSV read_csv, to_csv

Comments

@cancan101
Copy link
Contributor

I am getting the following warning:

/usr/local/lib/python2.7/dist-packages/pandas-0.13.0rc1_78_g142ca62-py2.7-linux-x86_64.egg/pandas/io/parsers.py:1050: 
DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

but I can find no documentation for low_memory

@jreback
Copy link
Contributor

jreback commented Jan 9, 2014

its a kind of deprecated option (but still works)

@cancan101
Copy link
Contributor Author

If the low_memory parameter is deprecated, it should be marked as such. Also the warning message should be removed or re-worded.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2014

I said kind of deprecated in that I don't think it's necessary anymore

@randyzwitch
Copy link

As another data point, I got this warning about mixed types. Setting to low_memory=False as suggested actually crashed Python (Win7 64-bit, through IPython Notebook). I'm nowhere near my memory limits, so removing the argument and keeping the warning was no big deal for me.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 29, 2014
@jorisvandenbossche
Copy link
Member

There is also an example in the book "Python for Data Analysis" that leads to this warning (p278):

In [13]: fec = pd.read_csv('ch09/P00000001-ALL.csv')

C:\Anaconda\lib\site-packages\pandas\io\parsers.py:1130: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

So I agree with above:

  • or the option is deprecated, and then the mention in the warning should be removed
  • or the option is not deprecated and should be documented (and should not crash your python)

From the code, it does not seem like deprecated (it is still used: https://github.com/pydata/pandas/blob/master/pandas/parser.pyx#L727), but it seems that it is given a default value of True in read_csv regardless of what you specify (https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L354). There are also still tests specifically for high memery: https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_parsers.py#L2897

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.15.0, 0.15.1 Jul 7, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Sep 8, 2014
@jorisvandenbossche jorisvandenbossche changed the title low_memory on read_table and read_csv is undocumented API/DOC: status of low_memory kwarg of read_csv/table Feb 27, 2015
@jorisvandenbossche
Copy link
Member

@mdmueller @selasley As csv parser experts, somebody interested in looking into this? (What does low_memory do exactly? Do we still need it (should it be deprecated or not)? And depending on that, document it or really deprecate it (and remove as suggestion in one of the warnings).

This came up again at SO: http://stackoverflow.com/questions/28697501/how-to-know-line-and-col-when-the-read-csv-method-of-pandas-thows-exception/28702078#28702078

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jondo
Copy link

jondo commented Nov 9, 2016

Why is this not visible in the online documentation yet?

Should this documentation also be added for pandas.read_table (which has the same behavior)?

@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

it's in the 0.19.0 and greater docs
https://github.com/pandas-dev/pandas/pull/13293/files

@jondo
Copy link

jondo commented Nov 11, 2016

Will this change also become visible in the pd.read_table documentation?

@jorisvandenbossche
Copy link
Member

Ah, apparantly there is something wrong with the read_csv page. This is still from 0.18.1, although the main docs under 'stable' are for 0.19.1. So, @jreback apparently something went wrong when I uploaded the docs and the generated pages were not updated

@jorisvandenbossche
Copy link
Member

@jondo docs should be fixed now (be sure to refresh your browser). Thanks for noticing!

@jondo
Copy link

jondo commented Nov 11, 2016

It's me who has to thank you!
Yes, now the pages of read_csv and read_table both contain the added text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants