-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Add TextFileReader to docs #46308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
TextFileReader | ||
|
||
TextFileReader.get_chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We obviously don't want to add these. But when I remove them, sphinx complains that it can not find them in any toctree. Anyone any ideas how to solve this? For ExcelWriter below this works, so I am probably missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the past we used a section with :hidden:
, like in https://github.com/pandas-dev/pandas/blame/main/doc/source/getting_started/index.rst#L641
But I don't see it being used for the API anymore. I guess we should just make them private by using _get_chunk
... if we don't want them public and in the documentation.
Or maybe just _TextFileReader
and make the whole class private if we don't want it being part of our public API.
I'm personally fine with any of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could deprecate it and remove it later, that sounds good to me.
We can't makeTextFileReader private, since it is returned by read_csv if you are reading the file in chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, true. Sounds good to me then. Also find to simply document the methods and publish them in the docs. Maybe we can just add a not for now that we recommend using the magic methods way.
|
||
TextFileReader.get_chunk | ||
TextFileReader.close | ||
TextFileReader.read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close
and read
can probably be public but more important are the magic methods __enter__
, __exit__
, and __next__
. Ideally, people interact with TextFileReader
in this manner:
with pd.read_csv("test.csv", iterator=True) as reader:
for chunk in reader:
print(chunk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to add a doc-string to read
(and maybe also to close
)
If it is a public class, it should also be possible to import it (for example, for typing purposes). Need to declare that Need to import io.parsers here and add it to this list: Line 12 in 471319b
|
pandas/io/parsers/readers.py
Outdated
Passed dialect overrides any of the related parser options. | ||
|
||
Passed dialect overrides any of the related parser options | ||
|
||
Only __enter__, __exit__ and __next__ are public. All other | ||
attributes are considered private and can change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably quite cryptic if the idea here is to make this documentation public. I would have something like:
Iterator class to process text files in chunks.
An instance of this class is returned by `read_csv` when it is processed in
chunks, instead of returning a single `DataFrame`.
When iterating over `TestFileReader`, every item returned will be a dataframe.
Examples
---------
>>> with pandas.read_csv(..., iterator=True) as text_file_reader:
... for df in text_file_reader:
... ...
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@phofl do you still want to work on this? |
pandas/io/parsers/readers.py
Outdated
|
||
Examples | ||
--------- | ||
>>> with pandas.read_csv(..., iterator=True) as text_file_reader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to import pandas or use pd
which is imported by default to make the CI happy: https://github.com/pandas-dev/pandas/runs/7043600368?check_suite_focus=true#step:8:132
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Still interested in this @phofl? |
I should :) Would like to keep open for a bit longer |
@phofl went ahead and fixed the typo, let's see if CI is green, and I think we can get this merged. |
Seems like the doctests still don't pass because of the ellipsis, will have a look later. |
@phofl is this still active? |
Yeah we should probably fix this |
Looks like this has gone stale, feel free to reopen if/when you have time to circle back |
@twoertwein Anything else you would want to add?