-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Add TextFileReader to docs #46308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
1e91695
d10d829
59ecfdf
1e487ad
0192267
6db27ea
c367d3d
db74d99
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,19 @@ Flat file | |
DataFrame.to_csv | ||
read_fwf | ||
|
||
.. currentmodule:: pandas.io.parsers | ||
|
||
.. autosummary:: | ||
:toctree: api/ | ||
|
||
TextFileReader | ||
|
||
TextFileReader.get_chunk | ||
TextFileReader.close | ||
TextFileReader.read | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
with pd.read_csv("test.csv", iterator=True) as reader:
for chunk in reader:
print(chunk) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would need to add a doc-string to |
||
|
||
.. currentmodule:: pandas | ||
|
||
Clipboard | ||
~~~~~~~~~ | ||
.. autosummary:: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1367,9 +1367,21 @@ def read_fwf( | |
|
||
class TextFileReader(abc.Iterator): | ||
""" | ||
Passed dialect overrides any of the related parser options. | ||
|
||
Passed dialect overrides any of the related parser options | ||
Iterator class used to process to text files read via read_csv in | ||
chunks. | ||
|
||
An instance of this class is returned by `read_csv` when it is processed in | ||
chunks, instead of returning a single `DataFrame`. | ||
|
||
When iterating over `TextFileReader`, every item returned will be a DataFrame. | ||
|
||
Examples | ||
--------- | ||
>>> with pandas.read_csv(..., iterator=True) as text_file_reader: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to import pandas or use
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
... for df in text_file_reader: | ||
... ... | ||
""" | ||
|
||
def __init__( | ||
|
@@ -1422,6 +1434,7 @@ def __init__( | |
self._engine = self._make_engine(f, self.engine) | ||
|
||
def close(self) -> None: | ||
"""Closes the file handle.""" | ||
if self.handles is not None: | ||
self.handles.close() | ||
self._engine.close() | ||
|
@@ -1730,6 +1743,18 @@ def _failover_to_python(self) -> None: | |
raise AbstractMethodError(self) | ||
|
||
def read(self, nrows: int | None = None) -> DataFrame: | ||
""" | ||
Reads the text file and stores the result in a DataFrame. | ||
|
||
Parameters | ||
---------- | ||
nrows: int, optional, default None | ||
The number of rows to read in one go. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
""" | ||
if self.engine == "pyarrow": | ||
try: | ||
# error: "ParserBase" has no attribute "read" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We obviously don't want to add these. But when I remove them, sphinx complains that it can not find them in any toctree. Anyone any ideas how to solve this? For ExcelWriter below this works, so I am probably missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the past we used a section with
:hidden:
, like in https://github.com/pandas-dev/pandas/blame/main/doc/source/getting_started/index.rst#L641But I don't see it being used for the API anymore. I guess we should just make them private by using
_get_chunk
... if we don't want them public and in the documentation.Or maybe just
_TextFileReader
and make the whole class private if we don't want it being part of our public API.I'm personally fine with any of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could deprecate it and remove it later, that sounds good to me.
We can't makeTextFileReader private, since it is returned by read_csv if you are reading the file in chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, true. Sounds good to me then. Also find to simply document the methods and publish them in the docs. Maybe we can just add a not for now that we recommend using the magic methods way.