Skip to content

DOC: Add TextFileReader to docs #46308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

DOC: Add TextFileReader to docs #46308

wants to merge 8 commits into from

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 10, 2022

@twoertwein Anything else you would want to add?


TextFileReader

TextFileReader.get_chunk
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We obviously don't want to add these. But when I remove them, sphinx complains that it can not find them in any toctree. Anyone any ideas how to solve this? For ExcelWriter below this works, so I am probably missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the past we used a section with :hidden:, like in https://github.com/pandas-dev/pandas/blame/main/doc/source/getting_started/index.rst#L641

But I don't see it being used for the API anymore. I guess we should just make them private by using _get_chunk... if we don't want them public and in the documentation.

Or maybe just _TextFileReader and make the whole class private if we don't want it being part of our public API.

I'm personally fine with any of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could deprecate it and remove it later, that sounds good to me.

We can't makeTextFileReader private, since it is returned by read_csv if you are reading the file in chunks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true. Sounds good to me then. Also find to simply document the methods and publish them in the docs. Maybe we can just add a not for now that we recommend using the magic methods way.

@phofl phofl added Docs IO CSV read_csv, to_csv labels Mar 10, 2022
@jonashaag jonashaag mentioned this pull request Mar 10, 2022
4 tasks

TextFileReader.get_chunk
TextFileReader.close
TextFileReader.read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close and read can probably be public but more important are the magic methods __enter__, __exit__, and __next__. Ideally, people interact with TextFileReader in this manner:

with pd.read_csv("test.csv", iterator=True) as reader:
    for chunk in reader:
        print(chunk)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to add a doc-string to read (and maybe also to close)

@twoertwein
Copy link
Member

If it is a public class, it should also be possible to import it (for example, for typing purposes). Need to declare that parsers is public in io/__init__.py (io/parsers/__init_.py already limits the scope to not make everything in it public).

Need to import io.parsers here and add it to this list:

__all__ = ["formats", "json", "stata"]

@jreback jreback added this to the 1.5 milestone Mar 11, 2022
Comment on lines 1362 to 1365
Passed dialect overrides any of the related parser options.

Passed dialect overrides any of the related parser options

Only __enter__, __exit__ and __next__ are public. All other
attributes are considered private and can change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably quite cryptic if the idea here is to make this documentation public. I would have something like:

Iterator class to process text files in chunks.

An instance of this class is returned by `read_csv` when it is processed in
chunks, instead of returning a single `DataFrame`.

When iterating over `TestFileReader`, every item returned will be a dataframe.

Examples
---------
>>> with pandas.read_csv(..., iterator=True) as text_file_reader:
...     for df in text_file_reader:
...         ...

@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 5, 2022
@datapythonista
Copy link
Member

@phofl do you still want to work on this?


Examples
---------
>>> with pandas.read_csv(..., iterator=True) as text_file_reader:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to import pandas or use pd which is imported by default to make the CI happy: https://github.com/pandas-dev/pandas/runs/7043600368?check_suite_focus=true#step:8:132

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 21, 2022
@mroeschke mroeschke removed this from the 1.5 milestone Aug 22, 2022
@mroeschke
Copy link
Member

Still interested in this @phofl?

@phofl
Copy link
Member Author

phofl commented Oct 24, 2022

I should :) Would like to keep open for a bit longer

@datapythonista
Copy link
Member

@phofl went ahead and fixed the typo, let's see if CI is green, and I think we can get this merged.

@datapythonista
Copy link
Member

Seems like the doctests still don't pass because of the ellipsis, will have a look later.

@simonjayhawkins
Copy link
Member

@phofl is this still active?

@phofl
Copy link
Member Author

phofl commented Feb 23, 2023

Yeah we should probably fix this

@mroeschke
Copy link
Member

Looks like this has gone stale, feel free to reopen if/when you have time to circle back

@mroeschke mroeschke closed this Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: AttributeError: 'TextFileReader' object has no attribute 'f'
6 participants