Skip to content

DOC: Improve to_csv mode documentation #51839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
MrPowers opened this issue Mar 8, 2023 · 7 comments · Fixed by #51881
Closed
1 task done

DOC: Improve to_csv mode documentation #51839

MrPowers opened this issue Mar 8, 2023 · 7 comments · Fixed by #51881

Comments

@MrPowers
Copy link

MrPowers commented Mar 8, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

Documentation problem

The to_csv mode method does not explicitly list the options for users. It links to the Python open() method that contains some options that aren't relevant.

Suggested fix for documentation

It'd be nice to list the mode options right in the to_csv docs. This should be significantly more user-friendly. Suggested text:

mode: str, default ‘w’

The file write mode which can be `w`, `x`, or `a`.  `w` will write the file and overwrite another file if it already exists.  `x` will write the file, but error out if a file with the same name already exists.  `a` will append to the existing file.

The available write modes are the same as [open()](https://docs.python.org/3/library/functions.html#open).
@MrPowers MrPowers added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 8, 2023
@MrPowers
Copy link
Author

MrPowers commented Mar 8, 2023

@datapythonista - Thanks for the help on this one!

@datapythonista datapythonista added good first issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 8, 2023
@datapythonista
Copy link
Member

Thanks for reporting.

When only a subset of the values is expected, instead of using the type str, we can directly use mode : {'w', 'x', 'a'}, default 'w'.

To explain the types, we can also use bullet points, we do that in some docs, may make things easier to read.

It'd be good to see which other to_* functions have mode as a parameter, and update them all. Probably better to start by only one, get a code review, and when we're happy with it, apply it to the rest.

@twoertwein
Copy link
Member

One important value is also "wb" for binary file handles that do not have the .mode attribute themselves (this is hinted in the documentation for path_or_buf).

@HamidrezaSK
Copy link
Contributor

take

@MrPowers
Copy link
Author

MrPowers commented Mar 8, 2023

It'd be good to see which other to_* functions have mode as a parameter, and update them all.

@datapythonista - strongly agree with this, especially because the different to_* APIs have different write modes that make sense. I'm not sure if to_parquet supports mode because I don't see it in the docs, but appending obviously won't work because Parquet files are immutable.

I don't mean to overcomplicate this too much, but for to_parquet we should also think about how mode and partition_cols interact. "appending" in that case probably means something more like how Spark uses "append" (i.e. adding files to an existing folder).

I mention this because the to_csv writer should probably also support partition_cols as well. Just documenting the existing options is a good start, but we should probably do some longer-term planning as well.

@qudus4l
Copy link

qudus4l commented Mar 8, 2023

I agree that it would be helpful to have the available modes for to_csv() explicitly listed in the documentation. Thanks for suggesting this improvement!
Just a small note: the suggested text says that mode defaults to 'w', but in fact the default value is 'w' only if path_or_buf is a file path (otherwise, it defaults to None). Maybe it would be good to clarify this in the text.
Apart from that, the suggested text looks great to me! It would make it much easier for users to understand what the mode parameter does and what the available options are.

@datapythonista
Copy link
Member

From a quick look I can only see to_json having a mode parameter. The description of its docstring will have to be extended from a generic one, since mode only makes sense with lines=True, and we need to say that.

HamidrezaSK added a commit to HamidrezaSK/pandas that referenced this issue Mar 10, 2023
HamidrezaSK added a commit to HamidrezaSK/pandas that referenced this issue Mar 10, 2023
HamidrezaSK added a commit to HamidrezaSK/pandas that referenced this issue Mar 10, 2023
Modify write mode descriptions and add an explanation for b and t mode.
HamidrezaSK added a commit to HamidrezaSK/pandas that referenced this issue Mar 13, 2023
HamidrezaSK added a commit to HamidrezaSK/pandas that referenced this issue Mar 15, 2023
Remove the 'b' and 't' modes from the description.
HamidrezaSK added a commit to HamidrezaSK/pandas that referenced this issue Mar 17, 2023
Modify 'w', 'a', and 'x' write mode's description.
mroeschke pushed a commit that referenced this issue Mar 17, 2023
* DOC update DataFrame.to_csv write modes (#51839)

* DOC: style fix (#51839)

* DOC update DataFrame.to_csv write modes (#51839)

Modify write mode descriptions and add an explanation for b and t mode.

* DOC: update DataFrame.to_csv write modes (#51839)

Put path_or_buf in backticks.

* DOC: update DataFrame.to_csv write modes (#51839)

Remove the 'b' and 't' modes from the description.

* DOC: update DataFrame.to_csv write modes (#51839)

Modify 'w', 'a', and 'x' write mode's description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants