Skip to content

Conform Series.to_csv to DataFrame.to_csv #19745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dahlbaek
Copy link
Contributor

@dahlbaek dahlbaek commented Feb 18, 2018

(EDIT: Most of this comment is related to an old attempt. See this comment and below for the content of the current PR)

I tried my best to keep everything backwards compatible. A simpler solution
would have been simply

def to_csv(self, path_or_buf=None, *args, **kwargs):
    """
    Write Series to a comma-separated values (csv) file

    Parameters
    ----------
    path_or_buf : string or file handle, default None
        File path or object, if None is provided the result is returned as
        a string.
    args
        Positional arguments passed to :func:`pandas.DataFrame.to_csv`
    kwargs
        Keyword arguments passed to :func:`pandas.DataFrame.to_csv`

    Returns
    -------
    None or string
        If `path_or_buf` is None, returns the result as a string, otherwise
        returns None.

    Notes
    -----
    Constructs a DataFrame using :func:`~pandas.Series.to_frame`, then calls
    the method :func:`pandas.DataFrame.to_csv` on that DataFrame.
    """

    df = self.to_frame()
    # result is a string if path_or_buf is None, otherwise None
    result = df.to_csv(path_or_buf, *args, **kwargs)
    return result

Unfortunately, such a solution breaks backwards compatibility for the following
reasons:

  • If given as keyword arguments, path_or_buf of DataFrame.to_csv
    corresponds to the differently named argument path of Series.to_csv.
  • The ordering of positional arguments for DataFrame.to_csv does not agree
    fully with the ordering of positional arguments for Series.to_csv
  • The default value of header is True for DataFrame.to_csv but False for
    Series.to_csv

Maybe this is something that could be implemented in the long run?

@codecov
Copy link

codecov bot commented Feb 18, 2018

Codecov Report

Merging #19745 into master will increase coverage by <.01%.
The diff coverage is 96.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19745      +/-   ##
==========================================
+ Coverage   91.95%   91.95%   +<.01%     
==========================================
  Files         166      166              
  Lines       50274    50290      +16     
==========================================
+ Hits        46231    46246      +15     
- Misses       4043     4044       +1
Flag Coverage Δ
#multiple 90.35% <96.55%> (ø) ⬆️
#single 42.17% <55.17%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.18% <ø> (-0.02%) ⬇️
pandas/core/generic.py 96.47% <100%> (+0.01%) ⬆️
pandas/core/series.py 94.09% <94.44%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 610a19a...22a3dca. Read the comment docs.

@dahlbaek
Copy link
Contributor Author

I realised that there's a decorator for deprecating keyword arguments and did my best to use that instead. I also tried to add versionadded and deprecated directives to the docstring by mimicking the way they are used in pandas.read_excel.

float_format=None, header=False, index_label=None,
mode='w', encoding=None, compression=None, date_format=None,
decimal='.'):
decimal='.', **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pass kwargs. rather list all of the args.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not passing kwargs ? Otherwise each time we add something to to_csv, we will forget to add it here. We can just clearly say in the docstring here: "if you want to see all params, check DataFrame.to_csv"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good pattern
much better to have a consistent signature
defined

very unlikely to add new keywords anyhow
shared docs are the way to do this
and follow our pattern

kwargs are not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @dahlbaek said, and it might be a pity, but the kwarg order is not equal, so we can't just share. I agree with you in general, but I don't think it is worth to put much effort in this case to make it work (deprecate positional keywords etc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let’s just fix the args and conform to DataFrame.to_csv
otherwise this will go on forever

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changing the order is a backwards incompatible change, and needs to be deprecated first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course but it’s better to have a 1 time break
than let this go on forever

Copy link
Contributor Author

@dahlbaek dahlbaek Feb 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is there a standard way to deprecate ordering of positional arguments?
  • Is there a standard way to deprecate default value of an argument?
  • Would it be ok to go for a one argument per line approach, as is done in pandas.read_excel? I find this style much more readable, and changes are easier to track too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a similar example currently in the codebase. But I think doing a *args, **kwargs as you proposed in the top post, is the easiest way to go. You can then introspect the args, assign them manually to the correct keywords, and raise a DeprecationWarning if needed.

@@ -2878,16 +2878,22 @@ def from_csv(cls, path, sep=',', parse_dates=True, header=None,

return result

def to_csv(self, path=None, index=True, sep=",", na_rep='',
@deprecate_kwarg(old_arg_name='path', new_arg_name='path_or_buf')
def to_csv(self, path_or_buf=None, index=True, sep=",", na_rep='',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we should have a shared doc-string defined in generic.py (parmeterized by klass) to avoid repeating all of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a best-practices example for this that I could use for guidance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are lots of examples virtually every function in generic, e.g. .reindex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be easier to wait with moving to generic.py until deprecation of positional args is done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecating positional parameters is pretty silly here and will be way too messy. just conform the signature, write a nice message in the whatsnew and let's call it a day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is perfectly possible to do this (deprecating positional arguments), so why not do that? IMO there is no reason in possibly breaking user's code if we can easily prevent it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go for it then. this is going to be a big mess.

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string API Design IO CSV read_csv, to_csv labels Feb 18, 2018
@jreback
Copy link
Contributor

jreback commented Feb 18, 2018

xref #18958

@jreback
Copy link
Contributor

jreback commented Feb 18, 2018

ideally we would consolidate the tests with DataFrame.to_csv to ensure coverage of all of the kwargs. could be in this PR or followon

@jreback
Copy link
Contributor

jreback commented Jul 8, 2018

@dahlbaek can you or @pandas-dev/pandas-core have a look here

@gfyoung
Copy link
Member

gfyoung commented Jul 8, 2018

@jreback @jorisvandenbossche : Maybe a silly question, but couldn't we just implement to_csv in NDFrame, deprecate Series.to_csv entirely and then delete the method later on? This would ensure conformity for DataFrame and Series (in the future) and avoid this whole mess of keyword deprecations.

@jreback
Copy link
Contributor

jreback commented Jul 8, 2018

@gfyoung

yes

@gfyoung
Copy link
Member

gfyoung commented Jul 8, 2018

Alrighty then. I'll create a new PR to supersede this one.

@toobaz
Copy link
Member

toobaz commented Jul 9, 2018

deprecate Series.to_csv entirely and then delete the method later on?

... but wouldn't the deprecation message look like "Series.to_csv is being deprecated, please use Series.to_csv"?! I mean: from the user point of view, wouldn't it just be a deprecation of arguments?

(and in that case, wouldn't we better immediately apply the solution above, with deprecation warnings for the three problematic keywords?)

@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 9, 2018

I've been meaning to give this a second go for a while, but haven't had time. Clearly, I'm not as experienced as others in this thread regarding the pandas codebase, but maybe someone will find some of my thoughts interesting anyway:

  • If the keywords path and path_or_buf are going to be aligned, perhaps they should both be changed to filepath_or_buffer in order to also align with the read_csv method.
  • I feel like the cleanest solution would be to
    1. Move the DataFrame method to NDFrame as-is. This would be backwards compatible, so you do not need any deprecation warnings (or, if path_or_buf is changed to filepath_or_buffer, the current keyword deprecation decorator could be applied to the new method on NDFrame).
    2. Add general purpose decorator(s) to enable deprecation of the order of positional arguments and default value of a keyword argument. The deprecation warning for ordering would trigger if you are using "enough" positional arguments for the reordering to have any effect, and the default value warning would trigger if you are not explicitly assigning a value to the affected keyword argument. The decorators might prove useful in other situations.
    3. Implement required changes on Series in order to align with the NDFrame method, using the new deprecation decorators.
    4. Once the deprecation period is over (so that warnings are no longer displayed, but the Series method is fully aligned with the NDFrame method), move the Series method to NDFrame, which would be an internal change that users would not need to know about.
  • Alternatively, I feel the quickest solution would be moving everything to NDFrame in a single breaking change with a nice whatsnew message and get it over with. However, saving stuff to csv seems like a pretty core functionality — keeping in mind that quite a few users of pandas probably never read the whatsnew or even keep track of which version of pandas they are using, I feel like some kind of deprecation warning is in order.

If a decision is made as to how to proceed, I would be happy to lend a hand. Unfortunately, I do not currently have time to go through all the steps for a "clean" solution on my own.

EDIT: Reading through my "clean" solution, I have to admit it does not look too clean. Here is an alternative: Move everything to NDFrame, but for a limited time, use the path and path_or_buf keywords to trigger deprecation warnings. Basically, to align with read_csv those arguments should be changed to filepath_or_buffer, so any user who is supplying path, path_or_buf or relying on the position of that argument would be suspected of relying on the old method signatures. This way, there will be false positives, but I think that is better than having code break silently.

@gfyoung
Copy link
Member

gfyoung commented Jul 10, 2018

@dahlbaek : Your point regarding path_or_buf is well-taken. I would do that however after migrating to NDFrame (in the interest of modularity and fewer moving parts).

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 12, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 12, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 12, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 12, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 12, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 13, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jul 13, 2018
Warns about aligning Series.to_csv's signature
with that of DataFrame.to_csv's.

In anticipation, we have moved DataFrame.to_csv
to generic.py so that we can later delete the
Series.to_csv implementation, and allow it
to automatically adopt DataFrame's to_csv
due to inheritance.

Closes pandas-devgh-19745.
@dahlbaek dahlbaek force-pushed the feat/series/to_csv branch from e1fb1a8 to 59ccebd Compare July 13, 2018 20:10
@pep8speaks
Copy link

pep8speaks commented Jul 13, 2018

Hello @dahlbaek! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 13, 2018 at 22:24 Hours UTC

@dahlbaek dahlbaek force-pushed the feat/series/to_csv branch from 59ccebd to 9bac25c Compare July 13, 2018 20:19
@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 13, 2018

So, I figured I would do a proof of concept with a decorator-based approach. If anyone finds this appropriate, I will fill in tests based on the work by @gfyoung in #21868.

Pros:

  • Passing all arguments through a decorator seems like the natural thing to do — this way full information is available for inspection, making the deprecation warnings easy to write.
  • The signature of Series.to_csv stays intact and honest.
  • Deprecation is all contained in a single separate decorator (arguably not important, since Series.to_csv would live on only for the deprecation period).

Cons:

  • The decorator adds an extra level of indirection.
  • Users that insist on passing multiple positional arguments (instead of using keyword arguments) will have to live with the deprecation warning throughout the deprecation period.
  • Users that use positional arguments (and who are OK with the warning) cannot switch to the new ordering until the end of the deprecation period — and then they must do so immediately when upgrading.

(EDIT: Changed pros and cons according to comments by @toobaz)

@toobaz
Copy link
Member

toobaz commented Jul 13, 2018

Deprecation is all contained in a single separate decorator.

As correctly pointed out by @gfyoung , this is already true for Series.to_csv itself.

@toobaz
Copy link
Member

toobaz commented Jul 13, 2018

Users that insist on passing multiple positional arguments (in favor of using keyword arguments) will have to live with the deprecation warning throughout the deprecation period.

This is a consequence of your specific implementation, not of the fact that it's contained in a decorator.

@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 13, 2018

As correctly pointed out by @gfyoung , this is already true for Series.to_csv itself.

No, that comment has to do with what should happen at the end of the deprecation period. The way I implemented things here you would not be able to simply remove the decorator, because I want to keep the signature of to_csv intact and honest. The point I want to make is different: This way, all of the code related to deprecation is contained in a shell surrounding to_csv instead of being cooked into to_csv. It's a matter of code separation.

This is a consequence of your specific implementation, not of the fact that it's contained in a decorator.

That is correct.

@dahlbaek dahlbaek force-pushed the feat/series/to_csv branch from 9bac25c to 26f7eea Compare July 13, 2018 22:23
- Deprecate signature of Series.to_csv.
- Move DataFrame.to_csv to generic.py
@dahlbaek dahlbaek force-pushed the feat/series/to_csv branch from 26f7eea to 22a3dca Compare July 13, 2018 22:24
@toobaz
Copy link
Member

toobaz commented Jul 13, 2018

It's a matter of code separation.

Series.to_csv makes sense only in light of the deprecation process, there is no "real" code to separate. Anyway, we are used to deprecations managed through decorators, so the second of the "Pros" still holds.

Among the "Cons" we should add "users who use positional arguments [and who are OK with the warning] cannot switch to the new ordering until the end of the deprecation period - and then they must immediately when upgrading", right? I consider this more relevant than the warning. (Compare with #21896 )

@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 13, 2018

Agreed and added. This solution is not friendly towards users that want to pass positional arguments (other than path_or_buf).

One possibility would be to combine the decorator approach with the approach of #21896. That way, one could have both the intact signature and a solution which works for users that like to use positional arguments.

@gfyoung gfyoung added this to the No action milestone Jul 25, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 25, 2018

We're pushing to merge #21896 over this one.

@gfyoung gfyoung closed this Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conform Series.to_csv to DataFrame.to_csv
6 participants