DOC: Add TextFileReader to docs #46308

phofl · 2022-03-10T15:52:08Z

closes BUG: AttributeError: 'TextFileReader' object has no attribute 'f' #46187 (Replace xxxx with the Github issue number)
All code checks passed.

@twoertwein Anything else you would want to add?

phofl · 2022-03-10T15:52:56Z

doc/source/reference/io.rst

+
+   TextFileReader
+
+   TextFileReader.get_chunk


We obviously don't want to add these. But when I remove them, sphinx complains that it can not find them in any toctree. Anyone any ideas how to solve this? For ExcelWriter below this works, so I am probably missing something.

I think in the past we used a section with :hidden:, like in https://github.com/pandas-dev/pandas/blame/main/doc/source/getting_started/index.rst#L641

But I don't see it being used for the API anymore. I guess we should just make them private by using _get_chunk... if we don't want them public and in the documentation.

Or maybe just _TextFileReader and make the whole class private if we don't want it being part of our public API.

I'm personally fine with any of them.

I think we could deprecate it and remove it later, that sounds good to me.

We can't makeTextFileReader private, since it is returned by read_csv if you are reading the file in chunks

Ah, true. Sounds good to me then. Also find to simply document the methods and publish them in the docs. Maybe we can just add a not for now that we recommend using the magic methods way.

twoertwein · 2022-03-10T16:37:51Z

doc/source/reference/io.rst

+
+   TextFileReader.get_chunk
+   TextFileReader.close
+   TextFileReader.read


close and read can probably be public but more important are the magic methods __enter__, __exit__, and __next__. Ideally, people interact with TextFileReader in this manner:

with pd.read_csv("test.csv", iterator=True) as reader: for chunk in reader: print(chunk)

Would need to add a doc-string to read (and maybe also to close)

twoertwein · 2022-03-10T19:39:19Z

If it is a public class, it should also be possible to import it (for example, for typing purposes). Need to declare that parsers is public in io/__init__.py (io/parsers/__init_.py already limits the scope to not make everything in it public).

Need to import io.parsers here and add it to this list:

pandas/pandas/io/__init__.py

Line 12 in 471319b

__all__ = ["formats", "json", "stata"]

datapythonista · 2022-04-04T23:49:07Z

pandas/io/parsers/readers.py

+    Passed dialect overrides any of the related parser options.

-    Passed dialect overrides any of the related parser options
-
+    Only __enter__, __exit__ and __next__ are public. All other
+    attributes are considered private and can change.


This is probably quite cryptic if the idea here is to make this documentation public. I would have something like:

Iterator class to process text files in chunks. An instance of this class is returned by `read_csv` when it is processed in chunks, instead of returning a single `DataFrame`. When iterating over `TestFileReader`, every item returned will be a dataframe. Examples --------- >>> with pandas.read_csv(..., iterator=True) as text_file_reader: ... for df in text_file_reader: ... ...

github-actions · 2022-05-05T00:05:18Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

datapythonista · 2022-06-16T05:14:14Z

@phofl do you still want to work on this?

datapythonista · 2022-07-21T05:04:36Z

pandas/io/parsers/readers.py

+
+    Examples
+    ---------
+    >>> with pandas.read_csv(..., iterator=True) as text_file_reader:


You need to import pandas or use pd which is imported by default to make the CI happy: https://github.com/pandas-dev/pandas/runs/7043600368?check_suite_focus=true#step:8:132

github-actions · 2022-08-21T00:07:52Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-10-24T21:24:51Z

Still interested in this @phofl?

phofl · 2022-10-24T21:39:15Z

I should :) Would like to keep open for a bit longer

pandas/io/parsers/readers.py

datapythonista · 2023-01-05T05:05:04Z

@phofl went ahead and fixed the typo, let's see if CI is green, and I think we can get this merged.

datapythonista · 2023-01-05T07:51:48Z

Seems like the doctests still don't pass because of the ellipsis, will have a look later.

simonjayhawkins · 2023-02-22T15:07:30Z

@phofl is this still active?

phofl · 2023-02-23T14:31:46Z

Yeah we should probably fix this

mroeschke · 2023-07-07T17:34:42Z

Looks like this has gone stale, feel free to reopen if/when you have time to circle back

DOC: Add TextFileReader to docs

1e91695

phofl commented Mar 10, 2022

View reviewed changes

phofl added Docs IO CSV read_csv, to_csv labels Mar 10, 2022

jonashaag mentioned this pull request Mar 10, 2022

WIP Add ccache to Azure builds #46309

Closed

4 tasks

Fix docstring

d10d829

twoertwein reviewed Mar 10, 2022

View reviewed changes

jreback added this to the 1.5 milestone Mar 11, 2022

datapythonista reviewed Apr 4, 2022

View reviewed changes

github-actions bot added the Stale label May 5, 2022

phofl added 2 commits June 24, 2022 16:32

Merge remote-tracking branch 'upstream/main' into 46187

59ecfdf

Add docstrings

1e487ad

datapythonista removed the Stale label Jun 27, 2022

datapythonista reviewed Jul 21, 2022

View reviewed changes

twoertwein mentioned this pull request Jul 22, 2022

Modernize IO using only the API pandas-dev/pandas-stubs#164

Closed

52 tasks

github-actions bot added the Stale label Aug 21, 2022

mroeschke removed this from the 1.5 milestone Aug 22, 2022

datapythonista reviewed Jan 5, 2023

View reviewed changes

pandas/io/parsers/readers.py Outdated Show resolved Hide resolved

datapythonista added 2 commits January 5, 2023 12:03

Update pandas/io/parsers/readers.py

0192267

Merge branch 'main' into 46187

6db27ea

simonjayhawkins removed the Stale label Feb 22, 2023

phofl and others added 2 commits February 23, 2023 14:31

Merge branch 'main' into 46187

c367d3d

Merge branch 'main' into 46187

db74d99

mroeschke closed this Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Add TextFileReader to docs #46308

DOC: Add TextFileReader to docs #46308

phofl commented Mar 10, 2022

phofl Mar 10, 2022

datapythonista Jun 27, 2022

phofl Jun 27, 2022

datapythonista Jun 27, 2022

twoertwein Mar 10, 2022

twoertwein Mar 10, 2022

twoertwein commented Mar 10, 2022

datapythonista Apr 4, 2022

github-actions bot commented May 5, 2022

datapythonista commented Jun 16, 2022

datapythonista Jul 21, 2022

github-actions bot commented Aug 21, 2022

mroeschke commented Oct 24, 2022

phofl commented Oct 24, 2022

datapythonista commented Jan 5, 2023

datapythonista commented Jan 5, 2023

simonjayhawkins commented Feb 22, 2023

phofl commented Feb 23, 2023

mroeschke commented Jul 7, 2023

DOC: Add TextFileReader to docs #46308

DOC: Add TextFileReader to docs #46308

Conversation

phofl commented Mar 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twoertwein commented Mar 10, 2022

Choose a reason for hiding this comment

github-actions bot commented May 5, 2022

datapythonista commented Jun 16, 2022

Choose a reason for hiding this comment

github-actions bot commented Aug 21, 2022

mroeschke commented Oct 24, 2022

phofl commented Oct 24, 2022

datapythonista commented Jan 5, 2023

datapythonista commented Jan 5, 2023

simonjayhawkins commented Feb 22, 2023

phofl commented Feb 23, 2023

mroeschke commented Jul 7, 2023