Conform Series.to_csv to DataFrame.to_csv #19745

dahlbaek · 2018-02-18T01:06:49Z

(EDIT: Most of this comment is related to an old attempt. See this comment and below for the content of the current PR)

I tried my best to keep everything backwards compatible. A simpler solution
would have been simply

def to_csv(self, path_or_buf=None, *args, **kwargs):
    """
    Write Series to a comma-separated values (csv) file

    Parameters
    ----------
    path_or_buf : string or file handle, default None
        File path or object, if None is provided the result is returned as
        a string.
    args
        Positional arguments passed to :func:`pandas.DataFrame.to_csv`
    kwargs
        Keyword arguments passed to :func:`pandas.DataFrame.to_csv`

    Returns
    -------
    None or string
        If `path_or_buf` is None, returns the result as a string, otherwise
        returns None.

    Notes
    -----
    Constructs a DataFrame using :func:`~pandas.Series.to_frame`, then calls
    the method :func:`pandas.DataFrame.to_csv` on that DataFrame.
    """

    df = self.to_frame()
    # result is a string if path_or_buf is None, otherwise None
    result = df.to_csv(path_or_buf, *args, **kwargs)
    return result

Unfortunately, such a solution breaks backwards compatibility for the following
reasons:

If given as keyword arguments, path_or_buf of DataFrame.to_csv
corresponds to the differently named argument path of Series.to_csv.
The ordering of positional arguments for DataFrame.to_csv does not agree
fully with the ordering of positional arguments for Series.to_csv
The default value of header is True for DataFrame.to_csv but False for
Series.to_csv

Maybe this is something that could be implemented in the long run?

closes Conform Series.to_csv to DataFrame.to_csv #19715
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-02-18T01:44:48Z

Codecov Report

Merging #19745 into master will increase coverage by <.01%.
The diff coverage is 96.55%.

@@            Coverage Diff             @@
##           master   #19745      +/-   ##
==========================================
+ Coverage   91.95%   91.95%   +<.01%     
==========================================
  Files         166      166              
  Lines       50274    50290      +16     
==========================================
+ Hits        46231    46246      +15     
- Misses       4043     4044       +1

Flag	Coverage Δ
#multiple	`90.35% <96.55%> (ø)`	⬆️
#single	`42.17% <55.17%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.18% <ø> (-0.02%)`	⬇️
pandas/core/generic.py	`96.47% <100%> (+0.01%)`	⬆️
pandas/core/series.py	`94.09% <94.44%> (-0.02%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 610a19a...22a3dca. Read the comment docs.

dahlbaek · 2018-02-18T15:29:18Z

I realised that there's a decorator for deprecating keyword arguments and did my best to use that instead. I also tried to add versionadded and deprecated directives to the docstring by mimicking the way they are used in pandas.read_excel.

jreback · 2018-02-18T17:52:22Z

pandas/core/series.py

               float_format=None, header=False, index_label=None,
               mode='w', encoding=None, compression=None, date_format=None,
-               decimal='.'):
+               decimal='.', **kwargs):


don't pass kwargs. rather list all of the args.

Why not passing kwargs ? Otherwise each time we add something to to_csv, we will forget to add it here. We can just clearly say in the docstring here: "if you want to see all params, check DataFrame.to_csv"

this is not a good pattern
much better to have a consistent signature
defined

very unlikely to add new keywords anyhow
shared docs are the way to do this
and follow our pattern

kwargs are not

As @dahlbaek said, and it might be a pity, but the kwarg order is not equal, so we can't just share. I agree with you in general, but I don't think it is worth to put much effort in this case to make it work (deprecate positional keywords etc)

let’s just fix the args and conform to DataFrame.to_csv
otherwise this will go on forever

Just changing the order is a backwards incompatible change, and needs to be deprecated first.

of course but it’s better to have a 1 time break
than let this go on forever

Is there a standard way to deprecate ordering of positional arguments?

Is there a standard way to deprecate default value of an argument?

Would it be ok to go for a one argument per line approach, as is done in pandas.read_excel? I find this style much more readable, and changes are easier to track too.

I don't think there is a similar example currently in the codebase. But I think doing a *args, **kwargs as you proposed in the top post, is the easiest way to go. You can then introspect the args, assign them manually to the correct keywords, and raise a DeprecationWarning if needed.

jreback · 2018-02-18T17:52:52Z

pandas/core/series.py

@@ -2878,16 +2878,22 @@ def from_csv(cls, path, sep=',', parse_dates=True, header=None,

        return result

-    def to_csv(self, path=None, index=True, sep=",", na_rep='',
+    @deprecate_kwarg(old_arg_name='path', new_arg_name='path_or_buf')
+    def to_csv(self, path_or_buf=None, index=True, sep=",", na_rep='',


ideally we should have a shared doc-string defined in generic.py (parmeterized by klass) to avoid repeating all of it.

Is there a best-practices example for this that I could use for guidance?

there are lots of examples virtually every function in generic, e.g. .reindex

I think it will be easier to wait with moving to generic.py until deprecation of positional args is done

deprecating positional parameters is pretty silly here and will be way too messy. just conform the signature, write a nice message in the whatsnew and let's call it a day.

It is perfectly possible to do this (deprecating positional arguments), so why not do that? IMO there is no reason in possibly breaking user's code if we can easily prevent it.

go for it then. this is going to be a big mess.

jreback · 2018-02-18T17:54:47Z

xref #18958

jreback · 2018-02-18T17:56:00Z

ideally we would consolidate the tests with DataFrame.to_csv to ensure coverage of all of the kwargs. could be in this PR or followon

jreback · 2018-07-08T15:51:54Z

@dahlbaek can you or @pandas-dev/pandas-core have a look here

gfyoung · 2018-07-08T20:55:49Z

@jreback @jorisvandenbossche : Maybe a silly question, but couldn't we just implement to_csv in NDFrame, deprecate Series.to_csv entirely and then delete the method later on? This would ensure conformity for DataFrame and Series (in the future) and avoid this whole mess of keyword deprecations.

jreback · 2018-07-08T21:02:25Z

@gfyoung

yes

gfyoung · 2018-07-08T21:07:03Z

Alrighty then. I'll create a new PR to supersede this one.

toobaz · 2018-07-09T06:51:36Z

deprecate Series.to_csv entirely and then delete the method later on?

... but wouldn't the deprecation message look like "Series.to_csv is being deprecated, please use Series.to_csv"?! I mean: from the user point of view, wouldn't it just be a deprecation of arguments?

(and in that case, wouldn't we better immediately apply the solution above, with deprecation warnings for the three problematic keywords?)

dahlbaek · 2018-07-09T11:32:13Z

I've been meaning to give this a second go for a while, but haven't had time. Clearly, I'm not as experienced as others in this thread regarding the pandas codebase, but maybe someone will find some of my thoughts interesting anyway:

If the keywords path and path_or_buf are going to be aligned, perhaps they should both be changed to filepath_or_buffer in order to also align with the read_csv method.
I feel like the cleanest solution would be to
1. Move the DataFrame method to NDFrame as-is. This would be backwards compatible, so you do not need any deprecation warnings (or, if path_or_buf is changed to filepath_or_buffer, the current keyword deprecation decorator could be applied to the new method on NDFrame).
2. Add general purpose decorator(s) to enable deprecation of the order of positional arguments and default value of a keyword argument. The deprecation warning for ordering would trigger if you are using "enough" positional arguments for the reordering to have any effect, and the default value warning would trigger if you are not explicitly assigning a value to the affected keyword argument. The decorators might prove useful in other situations.
3. Implement required changes on Series in order to align with the NDFrame method, using the new deprecation decorators.
4. Once the deprecation period is over (so that warnings are no longer displayed, but the Series method is fully aligned with the NDFrame method), move the Series method to NDFrame, which would be an internal change that users would not need to know about.
Alternatively, I feel the quickest solution would be moving everything to NDFrame in a single breaking change with a nice whatsnew message and get it over with. However, saving stuff to csv seems like a pretty core functionality — keeping in mind that quite a few users of pandas probably never read the whatsnew or even keep track of which version of pandas they are using, I feel like some kind of deprecation warning is in order.

If a decision is made as to how to proceed, I would be happy to lend a hand. Unfortunately, I do not currently have time to go through all the steps for a "clean" solution on my own.

EDIT: Reading through my "clean" solution, I have to admit it does not look too clean. Here is an alternative: Move everything to NDFrame, but for a limited time, use the path and path_or_buf keywords to trigger deprecation warnings. Basically, to align with read_csv those arguments should be changed to filepath_or_buffer, so any user who is supplying path, path_or_buf or relying on the position of that argument would be suspected of relying on the old method signatures. This way, there will be false positives, but I think that is better than having code break silently.

gfyoung · 2018-07-10T00:46:35Z

@dahlbaek : Your point regarding path_or_buf is well-taken. I would do that however after migrating to NDFrame (in the interest of modularity and fewer moving parts).

Warns about aligning Series.to_csv's signature with that of DataFrame.to_csv's. In anticipation, we have moved DataFrame.to_csv to generic.py so that we can later delete the Series.to_csv implementation, and allow it to automatically adopt DataFrame's to_csv due to inheritance. Closes pandas-devgh-19745.

pep8speaks · 2018-07-13T20:10:59Z

Hello @dahlbaek! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 13, 2018 at 22:24 Hours UTC

dahlbaek · 2018-07-13T20:28:37Z

So, I figured I would do a proof of concept with a decorator-based approach. If anyone finds this appropriate, I will fill in tests based on the work by @gfyoung in #21868.

Pros:

Passing all arguments through a decorator seems like the natural thing to do — this way full information is available for inspection, making the deprecation warnings easy to write.
The signature of Series.to_csv stays intact and honest.
Deprecation is all contained in a single separate decorator (arguably not important, since Series.to_csv would live on only for the deprecation period).

Cons:

The decorator adds an extra level of indirection.
Users that insist on passing multiple positional arguments (instead of using keyword arguments) will have to live with the deprecation warning throughout the deprecation period.
Users that use positional arguments (and who are OK with the warning) cannot switch to the new ordering until the end of the deprecation period — and then they must do so immediately when upgrading.

(EDIT: Changed pros and cons according to comments by @toobaz)

toobaz · 2018-07-13T20:37:05Z

Deprecation is all contained in a single separate decorator.

As correctly pointed out by @gfyoung , this is already true for Series.to_csv itself.

toobaz · 2018-07-13T20:39:10Z

Users that insist on passing multiple positional arguments (in favor of using keyword arguments) will have to live with the deprecation warning throughout the deprecation period.

This is a consequence of your specific implementation, not of the fact that it's contained in a decorator.

dahlbaek · 2018-07-13T21:57:45Z

As correctly pointed out by @gfyoung , this is already true for Series.to_csv itself.

No, that comment has to do with what should happen at the end of the deprecation period. The way I implemented things here you would not be able to simply remove the decorator, because I want to keep the signature of to_csv intact and honest. The point I want to make is different: This way, all of the code related to deprecation is contained in a shell surrounding to_csv instead of being cooked into to_csv. It's a matter of code separation.

This is a consequence of your specific implementation, not of the fact that it's contained in a decorator.

That is correct.

- Deprecate signature of Series.to_csv. - Move DataFrame.to_csv to generic.py

toobaz · 2018-07-13T22:37:08Z

It's a matter of code separation.

Series.to_csv makes sense only in light of the deprecation process, there is no "real" code to separate. Anyway, we are used to deprecations managed through decorators, so the second of the "Pros" still holds.

Among the "Cons" we should add "users who use positional arguments [and who are OK with the warning] cannot switch to the new ordering until the end of the deprecation period - and then they must immediately when upgrading", right? I consider this more relevant than the warning. (Compare with #21896 )

dahlbaek · 2018-07-13T22:53:42Z

Agreed and added. This solution is not friendly towards users that want to pass positional arguments (other than path_or_buf).

One possibility would be to combine the decorator approach with the approach of #21896. That way, one could have both the intact signature and a solution which works for users that like to use positional arguments.

gfyoung · 2018-07-25T16:49:08Z

We're pushing to merge #21896 over this one.

jreback requested changes Feb 18, 2018

View reviewed changes

jreback added Output-Formatting __repr__ of pandas objects, to_string API Design IO CSV read_csv, to_csv labels Feb 18, 2018

gfyoung mentioned this pull request Jul 12, 2018

DEPR: Warn about Series.to_csv signature alignment #21868

Closed

dahlbaek force-pushed the feat/series/to_csv branch from e1fb1a8 to 59ccebd Compare July 13, 2018 20:10

dahlbaek force-pushed the feat/series/to_csv branch from 59ccebd to 9bac25c Compare July 13, 2018 20:19

gfyoung mentioned this pull request Jul 13, 2018

Conform Series.to_csv to DataFrame.to_csv #19715

Closed

dahlbaek force-pushed the feat/series/to_csv branch from 9bac25c to 26f7eea Compare July 13, 2018 22:23

Align Series.to_csv with DataFrame.to_csv

22a3dca

- Deprecate signature of Series.to_csv. - Move DataFrame.to_csv to generic.py

dahlbaek force-pushed the feat/series/to_csv branch from 26f7eea to 22a3dca Compare July 13, 2018 22:24

gfyoung mentioned this pull request Jul 14, 2018

DEPR: Deprecate Series.to_csv signature #21896

Merged

4 tasks

gfyoung added this to the No action milestone Jul 25, 2018

gfyoung closed this Jul 25, 2018

Conform Series.to_csv to DataFrame.to_csv #19745

Conform Series.to_csv to DataFrame.to_csv #19745

Conversation

dahlbaek commented Feb 18, 2018 • edited Loading

codecov bot commented Feb 18, 2018 • edited Loading

Codecov Report

dahlbaek commented Feb 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dahlbaek Feb 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 18, 2018

jreback commented Feb 18, 2018

jreback commented Jul 8, 2018

gfyoung commented Jul 8, 2018 • edited Loading

jreback commented Jul 8, 2018

gfyoung commented Jul 8, 2018

toobaz commented Jul 9, 2018

dahlbaek commented Jul 9, 2018 • edited Loading

gfyoung commented Jul 10, 2018

pep8speaks commented Jul 13, 2018 • edited Loading

Comment last updated on July 13, 2018 at 22:24 Hours UTC

dahlbaek commented Jul 13, 2018 • edited Loading

toobaz commented Jul 13, 2018

toobaz commented Jul 13, 2018

dahlbaek commented Jul 13, 2018 • edited Loading

toobaz commented Jul 13, 2018

dahlbaek commented Jul 13, 2018 • edited Loading

gfyoung commented Jul 25, 2018

dahlbaek commented Feb 18, 2018 •

edited

Loading

codecov bot commented Feb 18, 2018 •

edited

Loading

dahlbaek Feb 18, 2018 •

edited

Loading

gfyoung commented Jul 8, 2018 •

edited

Loading

dahlbaek commented Jul 9, 2018 •

edited

Loading

pep8speaks commented Jul 13, 2018 •

edited

Loading

dahlbaek commented Jul 13, 2018 •

edited

Loading

dahlbaek commented Jul 13, 2018 •

edited

Loading

dahlbaek commented Jul 13, 2018 •

edited

Loading