-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Standardize usage of underscores in multi-word kwargs #22639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
as i said in the issue pls show a matrix of all the proposed changes |
Here's a first pass.
|
can u add another column about disposal |
I'd be happy to make that work part of this ticket, but I'm not sure I have firm enough opinions about which kwargs ought to be deprecated. I'd nominate |
right that’s the whole point here - to decide these things |
It's done
…On Tue, Sep 11, 2018, 4:46 AM Jeff Reback ***@***.***> wrote:
right that’s the whole point here - to decide these things
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22639 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAnCUAMVdt57-zip-3GNKWr7oX5cX-Uks5uZ6KIgaJpZM4WgBCv>
.
|
@palewire looks like he name column is hidden? |
I've posted that fix. Thank you for your patience. |
+1 on the whole thing, While we're at it, I would also prefer a better name for |
@h-vetinari : The |
@h-vetinari : Suggestions? I find it pretty descriptive IMO. |
Can't think of something shorter off the top of my head. My thoughts are that "mangle" (and "dupe") is not an easy word for non-native speakers, and it sounds much more informal than the rest of the kwargs. Edit: maybe something like
to be unclear - first it says "rather than ‘X’…’X’", but then it says duplicates are dropped? Maybe it would be clearer to have something like |
@h-vetinari : Hmm...not sure. That has the downside of not being clear as to what happens when |
Small note: I presume the reason why So if |
Note we discussed this in #13386 (in the PR, the issue was already linked above), and basically we then decided this was not worth it (which of course does not mean it cannot be reconsidered, but it is worth looking at that discussion) |
@jorisvandenbossche, I'd encourage you to consider the response from users in #22587. There is a desire out there for "tidying" of this sort. IMHO, it is worth the work. |
@jorisvandenbossche : That is a fair point. One consideration is that with |
@palewire Yes, I read that thread. But there is also the desire of users to have code that keeps working (who are maybe less vocal here) There are many warts in pandas I would like to see cleaned up, but many of them we nevertheless keep for historical reasons, and fixing them would have a too big impact. Now, we can also think about ways that have less impact. In the previous discussion we mentioned keeping the old ones as aliases instead of deprecating ("deprecate by documentation"). But of course, seeing them different ones used can also create confusion. |
@jorisvandenbossche, I am also sensitive to the need for backwards compatibility. That's why my exploratory patch in the pull request included an effort to continue to support the old keyword namespace and interject a warning of potential future deprecation for the user. Taking that step and standardizing the names in My goal here is to simply extend the logic of patches like that across a wider stretch of methods. If we proceed along those lines, without deprecating any keywords entirely, would you object? |
I agree that underscore spacing for multi-word arguments is most pythonic and readable. Hopefully, this issue will help ensure new keyword arguments use underscore spacing. Upgrading existing arguments will be painful, both in terms of developer time and the deprecation / backwards incompatibility issues that will arise. On the other hand, if these changes are going to be made, then sooner is better. I lean on the side of continually improving the Pandas design, since data science is a quickly-developing field. It seems like disruptive changes like this would be most natural for a pandas overhaul like Pandas 2.0. However, it's not clear whether Pandas 2 is an active proposal? If not, perhaps it makes sense to bite the bullet now on standardizing existing argument names. |
@palewire would you have time to update your list to indicate whether a kwarg is mimicking a keyword from another library (e.g. dayfirst python-dateutil, quotechar char from the stdlib) and whether the keyword is maybe used in other places in pandas (e.g. chunksize) |
@TomAugspurger, are you of the opinion that certain kwargs should not be given an underscore if they are compound words in other libraries or elsewhere in pandas? |
Yes. I think the value of matching the stdlib in cases like Things like |
I'm happy to do the research, though I'm not sure I fully agree. Some kind of scope should be set on that rule, though, shouldn't it? Keywords used in the standard Python library? |
I don't think there's a clear dividing line here. It's just a values
judgement.
…On Tue, Oct 2, 2018 at 11:16 AM Ben Welsh ***@***.***> wrote:
I'm happy to do the research, though I'm not sure I fully agree. Some kind
of scope should be set on that rule, though, shouldn't it? Keywords used in
the standard Python library?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22639 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIjqSte7BvR5grHfeI2Naq1nNiF-Yks5ug5FhgaJpZM4WgBCv>
.
|
Hi added PR #23158 in regards to deprecating |
This ticket is an outgrowth of a discussion in pull request #22587
By my rough count, the
read_csv
method has nearly 50 keyword arguments.Of those, 32 arguments are made up or two or more words. Twenty of those multi-word arguments use an underscore to mark the space between words, like
skip_blank_lines
andparse_dates
. Twelve do not, likechunksize
andlineterminator
.It is my opinion this is a small flaw in pandas' API, and that the library would benefit by standardizing how spaces are handled. It would make pandas more legible and consistent, and therefore easier for users of all experience levels.
I have taught pandas to dozens of newbies across the country and I can testify from experience that small variations in the naming style of commonly used methods introduces unnecessary frustration, and
can even reduce user confidence in the quality of the overall product.
As a frequent user of pandas, I can also attest that the inconsistencies require me, someone who uses the library daily, to routinely consult the documentation to ensure I use the proper kwarg naming style.
I am sympathetic to the desire to maintain backwards compatibility, which I believe could be managed with deprecation warnings that, if included, could be temporary, and ultimately removed in a future version, much in the way
sort_values
was introduced.Since the underscore method of handling word breaks is more common and more legible, I propose it be adopted. All existing multi-word arguments without an underscore would need to be modified. You can find an experimental patch of the
skiprows
kwargs, and considerable support from other users for pursuing this type of change, in #22587.If that pull request is ultimately merged, and the maintainers agree with the larger goal I've tried to articulate here, I would be pleased to lead an effort to expand whatever design pattern is agreed upon to other keyword arguments across the library.
The text was updated successfully, but these errors were encountered: