-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: read_csv, to_csv line_terminator keyword inconsistency #35399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arw2019 for the PR. some reservations regarding the api.
pandas/core/generic.py
Outdated
@@ -3004,6 +3004,7 @@ def to_csv( | |||
quoting: Optional[int] = None, | |||
quotechar: str = '"', | |||
line_terminator: Optional[str] = None, | |||
lineterminator: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this signature is displayed in the documentation so doing it like this could be confusing. (also this function is not kwargs only, so although bonkers, passing arguments as positional arguments would break)
maybe add a **kwargs at the end and document as 'for compatbility with csv module'
then in code something like kwargs.setdefault('lineterminator', line_terminator)
and pass on kwargs instead of line_terminator to CSVFormatter which just passes them onto csvlib.writer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I implemented this idea and added optional kwargs to the docstrings.
In to_csv
it's maybe a little awkward
kwargs.setdefault("lineterminator", line_terminator)
line_terminator = line_terminator or kwargs["lineterminator"]
because we have to feed line_terminator
to CSVFormatter
but also we want to keep line_terminator
explicit for the docs. It works but I'm happy to write this another way if these lines look odd
@@ -579,6 +579,7 @@ def read_csv( | |||
compression="infer", | |||
thousands=None, | |||
decimal: str = ".", | |||
line_terminator=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, in the signature, I think we should only have the one parameter and the compatibility keyword accepted though **kwargs
I am -1 on this change as I don't think adding kwargs to these functions is a good idea; we struggle enough already with how large these signatures are I don't know if any change here is really worth the churn |
@WillAyd @simonjayhawkins #27544 might be relevant |
That change is about keyword only calls, i.e. you can't call items positionally. That is separate from allowing kwargs, which I think we generally have moved away from because it actually can cause more confusion than good for end users |
Ok, right. xref #34976 added tests to check for incorrect keyword arguments to The options seem to be:
I'm agnostic on which way to go - but happy to implement the changes if we're doing 2/3/4. |
agree with this point in general, however in this case we want to avoid the inconsistency AND allow both 'line_terminator' and 'lineterminator', see #9568 (comment) and #9568 (comment) I'm not keen on having both parameters in the signature, so I feel the **kwargs is the lesser of two evils. |
If we change something I would agree with the comment to just add both; it would be less code than kwargs and doesn't have any side effects. We can just document it as |
Checking in on this. It seems like people do agree on having both
Looping in @jreback @jorisvandenbossche since you guys discussed this back in #9568 |
pandas/core/generic.py
Outdated
@@ -3141,6 +3146,9 @@ def to_csv( | |||
|
|||
from pandas.io.formats.csvs import CSVFormatter | |||
|
|||
kwargs.setdefault("lineterminator", line_terminator) | |||
line_terminator = line_terminator or kwargs["lineterminator"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to watch out for corner case where user passes both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Do we throw an error or would we use one (maybe line_terminator
) and a throw warning?
What about something like
|
I'll compile the docs and check! |
@jbrockmendel this renders fine in the docs. To move forward, should I add this decorator to This is very cool - thanks for suggesting it!! |
@@ -3065,6 +3065,7 @@ def to_csv( | |||
decimal: Optional[str] = ".", | |||
errors: str = "strict", | |||
storage_options: StorageOptions = None, | |||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that @simonjayhawkins and I have a different point of view here but I am strongly against adding kwargs to read_csv. The signature is already massive and adding this only makes things worse
I think we either stick to adding just one keyword arg or just leave this for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reason i'm not keen on adding the alias to the signature is that we also have doublequote, escapechar, quotechar and skipinitialspace as parameters to read_csv. So if we have both line_terminator and lineterminator, some bright spark will want snake case equivalents of the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you guys be happy with @jbrockmendel solution? I think it doesn't add kwargs and lineterminator
won't appear in the signature
def mywrapper(func):
@wraps(func)
def new_func(*args, **kwargs):
if "lineterminator" in kwargs:
kwargs["line_terminator"] = kwargs.pop("lineterminator")
return func(*args, **kwargs)
return new_func
@mywrapper
def read_csv(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decorator would be a good solution, however some of our decorators 'lose' the signature in the docs e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_stata.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to keep both ad infinitum, it may be possible to mostly use deprecate_kwarg
and just eat the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins The "solution" to that issue is probably for the wrapper to rewrite the docstring with an explicit signature that can be created using Signature
within the docstring. For example, after post processing the docstring for
def foo(x, y):
"""
Does foo
Parameters
---------
x : DataFrame
y : DataFrame
"""
would become
"""
foo(x, y)
Does foo
Parameters
---------
x : DataFrame
y : DataFrame
"""
Of course, this would be for a different PR. The string variant is the method used to document the signature of Cython functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decorator would be a good solution, however some of our decorators 'lose' the signature in the docs e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_stata.html
@simonjayhawkins I tried adding this one to read_csv
and compiling. Seems ok but I can't say how robust that is
Is this just a case for That only |
see #9568 (comment) and #9568 (comment) so from the issue, the current 'requirement' is to accept both. |
Deprecating doesn't mean it has to be removed -- it just tells users not to use it. Technically I think |
@bashtage @simonjayhawkins I'm down to do either (implement the decorator in a separate PR then come back here or deprecate) |
To be honest I don't think this is that big of an issue that we need to add another layer of wrapping to the csv function, so slight -1 on that design |
How about in |
Closing this PR as I think still needs some discussion. If a point of interest please comment back on the original issue to get alignment on design approach |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I followed discussion in #9568. The aim of this PR is to allow both
line_terminator
andlineterminator
keyword args toread_csv
andto_csv
but only documentline_terminator
. As per @jorisvandenbossche's suggestion we preservelineterminator
to stay compatible withcsv
dialects.