Skip to content

GH6848 silently changed series.sort from stable to unstable sort #7750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
armaganthis3 opened this issue Jul 14, 2014 · 20 comments
Closed

GH6848 silently changed series.sort from stable to unstable sort #7750

armaganthis3 opened this issue Jul 14, 2014 · 20 comments

Comments

@armaganthis3
Copy link

#6848 is no where mentioned in the release notes. There's a single item on .order

changing to quicksort, but it doesn't explicitly warn that the sort was previously stable
and now isn't. There's also no warning about series.sort changing as well.
That API has been around forever and afaict nothing was gained by breaking it.

This was a fairly reckless change in my opinion, I do wish the current maintainers
cared more about backwards-compatibility then they seem to in the latest releases.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-14-0-may-31-2014

about 1/2 down in API changes. not sure where you are looking.

backwards compat is cared about WAY too much IMHO actually.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

@armaganthis3 you are free of course to comment on any and all issues. This change was to conform a completely broken API that was different for inplace (sort) and a copy (order). IMHO that has been a huge source of confusion for quite a while.

Their are always a number of API changes in major versions; you are encouraged to read them. Not every change can warrant a big warning (and IMHO the warnings are ignored just as much as the whatsnew).

@jreback jreback closed this as completed Jul 14, 2014
@armaganthis3
Copy link
Author

Pandas has more breaking changes per version, those releases are much more frequent
compared to other pydata packages and unlike others pandas does not maintain a stable branch.

Considering that, I think you should care more about backwards-compatibility not less.
It's terrible that you don't think this stuff matters. That you actually think APIs are not
being broken fast enough is... well.

Just one example:
#5027 (comment)

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

pandas breaks on big version number. and those are all very carefully though out. and very well documented. If you didn't read about this change, then whom are you blaming?

I am not sure what 'does not maintain a stable branch means'. you can install any version you want. it is your choice to upgrade or not. you do not have to upgrade at all if you don't want.

I am not sure where you get the statement about backwards compatibility.

I personally care about back compat quite a lot. Have you seen the tremendous lengths pandas has gone to maintain backwards compat with numpy 1.6 and pickle compat < 0.12!!!!!

This takes quite a bit of time, and pandas does not have the resources as other pydata projects, yet still seems to produce very high quality s/w with many many contributors.

I think you are completely misreading my comments. pls read them again.

@armaganthis3
Copy link
Author

Pandas breaks only on major version numbers, ok, but it also releases a new major version
every 3 months. twice you've claimed I haven't read the release notes - you might
read my original comment again, you might have misread that.

"does not maintain a stable branch" means what it says: there's no long term stable branch.
bug fixes are not back-ported, perf regressions are not back-ported. perf regressions are common
and bugs are common (e.g 0.13), to get them fixed, upgrading is the only option pandas offers.

Your view of backwards compat is based on your words:

backwards compat is cared about WAY too much IMHO actually.

It's funny you should mention those two example for maintaing back-compat. First because
back-compat in this case has nothing to do with what version of numpy you support, the OP is about pandas code remaining compatible with pandas, and secondly because you're about
to throw away numpy 1.6 in another issue (#7711)
even though it's still part of some LTS distros, for no good reason, and pickle compat was
broken by 0.14 see #7748, where you described it as:

not a bug per-se, more of a slightly change in the compat of frequencies in terms of
backward-compat for pickle

Which is perhaps the worst double-speak I've come across outside fox news.

As for having no resources (but at the same time, having "many many contributors")
sure ok, I'd expect that to lead to slower development, not accelerated breakage,
but nevermind point taken. I'm not here to assign grades anyway - not my role, not my right.

Very simply, as a user who maintains a prod system that relies on pandas I care
muchly about back-compat. I would like pandas code I write today to work a year from now.
It doesn't seem like that much to expect nor does it seem to be much of an issue with other
projects I rely on.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

@armaganthis3 talk about double-speak!

I read your original comment. The change that you brought up WAS in the whatsnew. I guess you STILL did not read it. Yes its a one line change, for a VERY MINOR change. WHICH was documented. and provided consistency between a very confusing function. I guess you are annoyed that you simply didn't read the whatsnew are now trying to 'blame' someone else. go figure.

you want back ported fixes/perf regressions. great! I would like to see some contributions then.

pandas development has always been done this way (I think prob because of limited resources; maintaing a 'stable' branch is VERY time consuming).

and to be frank, pandas has introduced many new features over time. things break. things get fixed. pandas is an active project.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

@armaganthis3 you point out #7711. do you have any idea what back-compatilbility even means? what a PITA maintaining 1.6 IS????? great. LTS still has an old numpy. fantastic, but so WHAT? the system installs are pretty much useless for scientific python, if you would actually read my comments in that issue. And it has been brought up since 0.11 IIRC. This has been delayed quite a long time. Talk about back compat!

That is what I am talking about in terms of time spent. If we didn't have to maintiain this 1.6 compat then I suspect a fair number of new features could be introduced.

Yes pickle compat breaks occasionally. We went to VERY great lengths to preserve this. but again if would actually read #7748, that actually its a very recent bug fix in 0.14.1 to provide COMPAT across frequencies. Did happen to have a test for this, so it broke. it will get fixed. So instead of 6000 odd tests, we will have 6001.

How many projects have this level of care about back-compat????

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

@armaganthis3 my last point. Would you rather have a project release every 3-4 months or debate everything endlessly and have it release once a year. More frequent releases allow for more community uptake, so are you saying that is a BAD thing?

@jorisvandenbossche
Copy link
Member

@armaganthis3 @jreback

@armaganthis3 You raise good points, and it always interesting to get the experience from another viewpoint. Pandas has maybe the 'bad luck' of already being very popular but still rather young and rapidly changing, and a good balance between both is not always easy.
But @jreback also has a point that we already care a lot about this. Hard work that is maybe not always directly visible. But can it better? Probably, yes. We are also human, and things can always be improved. Getting constructive feedback from users like you is essential for that.

Backwards compatibility is an important issue. As the heating of this discussion also shows! It shows that it is something both of you care about. But a heated discussion is most of the time not very constructive. So maybe we should let this cool down for a moment. Please do that, both of you.
And maybe we can continue it later, in a constructive way, to look at concrete points that could be improved. As that is what we all want, to improve pandas!

@armaganthis3
Copy link
Author

Here we are 2 days later, and hopefully calm. Excessive punctuation neatly tucked away I hope.
Let's pick this up, It's important.

@jreback made a lot of comments, I'll try to cover most of his points, skipping over the
nasty provoking bits and try to focus back on the issue I actually care about.

Personally, I don't need numpy 1.6 support. Others might or might not. The essential issues
I'm raising is not numpy support but the stability of pandas APIs, that is, keeping pandas code working across versions. I found it ironic that that the example jreback chose to give is what's
being eliminated in the next version, but that's a digression.

Keeping pandas code working across pandas versions is what back-compatible means,
it's what I care about, It's what I'm worried about.

jreback says that bugs and regressions are inevitable and in that a least he is absolutely right.
They are also by definition unintentional so It's an entirely different thing when
maintainers break an API, knowingly and deliberately. It's not an "unavoidable mistake"
it's a concious choice.

You may claim this breakages are "well-thought out", from what I've seen
the reason is not an unavoidable breakage due to evolving code, but misguided
aesthetics (toted as "consistency" or "cleanliness" or "nice") being masquaraded
as an "overpowering" reason to break established API and the code that uses them.

"So yur saying frequet release is badd ?????" jreback asks. Not at all.
You could have daily releases - no one is forcing you to break APIs while doing it.
The problem is not the release scheduele: it's the combination of having frequent
major releases, having no maintenance branch and having a liberal attitude about
breaking APIs. It's The combination of all these.

"WELL, are YOU going to do the VERY time consuming work involved in keeping a
maintenance branch????" jreback might say (duly ignoring the combination point I made).
You're right, I'm not. Though pandas never had a maintenance afaik and how time-consuming
it is not known, but let's suppose it is...

Well, how about simply not break APIs so liberally? Use stricter criteria for allowing
breakage, make legacy code support higher priority then it is. Do not break existing
code deliberately. That would solve the problem without requiring more resources or
changes to the release schedule.

Would it help if users said please-oh-please don't break APIs?

Finally, about the point that pandas is young, I'll note that pandas is 3 years old, has (10s of) thousands of users in commercial and science application, Is used in production. Is on it's
2nd or 3rd generation of developers. There are lots of SO questions and existing tutorials.
Lots of existing code. It has great documentation. In fact It's mature by all criteria I can
think off. all except API stability that is.

@armaganthis3
Copy link
Author

I'll add another example of how back-compat is often dismissed lightly, this is from today,
It's quite easy to find them even if you're not looking: #7329

Another issue with legacy pickles, which @jreback so forcefully claimed he cares about.

@Kermit666 said:

I created it in Pandas 0.13.1 and I can read it normally with that version. (but not 0.14.1)

@jreback's response:

this issue is not fixable in that older pickles that are prior to this fix are not unpickle able

pandas 0.13.1 can pickle and unpickle a data file, and 0.14.1 can't. according
to jreback, that's not a problem. it's "minor" perhaps that users can't open their data
after upgrading pandas, and it's not "fixable".

By the way, introduced in 0.14.1, not a major release. care indeed.

@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2014

@armaganthis3 The pickle breakage seems like a regression to me. While I lean towards the position of moving fast and upgrading only if you've you read the release notes thoroughly, breaking IO across a single (major) version number is moving too fast. Pickles suck, but people use them and we support them so we should try really hard not to break IO related code IMHO.

In the case of sort vs. order I disagree. The release notes are there, and you don't have to upgrade if you don't want to. IMO it is more consistent if you have them either both stable or both unstable, but not different.

@armaganthis3 Would like you to make a list of the API changes you find especially egregious? That would help us evaluate whether there are unintended regressions or intentional API breakage.

I agree that this in an important issue, but I don't think it's as simple as "preserve back-compatibility at all costs".

@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

@armaganthis3 you are excellent at taking things out of context / not reading the entire issue.

Would it help if users said please-oh-please don't break APIs?

This feature addition for 0.15.0, #7217 (comment), does NOT BREAK ANY API's at all.

So why are you bringing up an point that is completely irrelevant?

and you point at #7329, which was warned right here: http://pandas.pydata.org/pandas-docs/version/0.14.0/whatsnew.html, and turns out is a regression, that should have gotten caught in 0.14.0, but unfortunately our 6000 tests didn't help. Well a fix going forward, as well as a work-around was put in place. I'd like to catch these things on the first go around, but pandas is a big project and not everything can be caught.

@armaganthis3
Copy link
Author

Thanks @cpcloud, I also believe @jreback shound't have dismissed a user reporting a serious regression, but acknowledged it and fixed it.

I genuinely do not know if it would have been fixed had I not made an example of it
in this thread (maybe @immerr would have anyway, I dunno). jreback tries to pass off these examples as unavoidable bugs or regressions , but I've already shown that intentional API breakages
can't be "accidents" and #7329 showed @jreback blowing off the user that reported the regression:

this issue is not fixable in that older pickles that are prior to this fix are not unpickle able

after I made all this noise, @immerr stepped in to fix it the "unfixable". @jreback is trying
isn't taking reposnibility for his actions, that's cowardly. and dishonest, when claims:

Well a fix going forward, as well as a work-around was put in place.

it wasn't @jreback who fixed it, but @immerr, after my loud complaints. maybe @immerr
stepped in independently, but it sure wasn't @jreback who took steps to correct the issue
,if anything - he ignored it.

What bothers me is not that fact that @jreback, like me and like any human being, makes mistakes. It's his deliberate disregard and recklessness that angers me, the fobbing-off of the users who complain with nonsense excuses, and the dishonest way in which avoids taking responsibility
for his behavior.

More example are plentiful, seemingly in every comment he makes. jreback just wrote:

and you point at #7329, which was warned right here: http://pandas.pydata.org/pandas-docs/version/0.14.0/whatsnew.html

Here are all the items in the release notes that mentioned the term "pickle":

  • "Improved performance of compatible pickles (GH6899)"
  • "Pickle compatibility is preserved for pickles created prior to 0.13. These must be unpickled with pd.read_pickle, see Pickling."
  • "Fixed running of tox under python3 where the pickle import was getting rewritten in an incompatible way (GH4062, GH4063)"
  • "fixed issue with missing attributes after loading a pickled dataframe (GH2431)"

Which of these is a warning that some 0.13.0 pickles can't be read in 0.14.0 anymore? Can you see it? I can't.

Let's drive the point home ever further by reading that again:

and you point at #7329, which was warned right here: http://pandas.pydata.org/pandas-docs/version/0.14.0/whatsnew.html, and turns out is a regression,

@jreback is saying that this is both a documented change warned against and
at the same time an accidental regression. If you're unhappy with my criticism, aristotle
himself would tell you that this is impossible.

This issue was meant to be about my concerns about the trend in pandas treatment of back-compat, but @jreback's shameful attempts to muddy the waters is even more reprehensible.

@armaganthis3
Copy link
Author

@cpcloud, briefly, you agree that breaking pickle support within a major version is unacceptable,
this suggests you might think it's reasonable to do across major versions (if not, I apologize).

Keeping cross-versions pickle supported has seen lots of work, by wesm initially and
by jreback more recently. If the pandas team now thinks that breaking pickles across major versions is acceptable, you are breaking with a pandas policy that's been around since probably 0.7.3 (e.g. forever).

That itself suggests that your care for back-compat is slipping, I wish it weren't so.

@armaganthis3
Copy link
Author

Sorry, @jreback didn't mean to ignoe your question:

This feature addition for 0.15.0, #7217 (comment), does NOT BREAK ANY API's at all.
So why are you bringing up an point that is completely irrelevant?

because I believe @jseabold there was just as worried as me about your recklessness,
and I have doubts whether you would have been responsible or careful enough about
back-compat to ensure no code would be broken if he hadn't warned against it, as I did in #7329.

Note his tone there, It's not accidental IMO. admitedly, he's much more soft-spoken about your
choices then I am.

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2014

@armaganthis3

@jreback is trying isn't taking reposnibility for his actions, that's cowardly. and dishonest

This is just not true. @jreback is the first to own up to any bugs and is constantly fixing them (use pandas to create a bar plot of commit counts by author, you'll see he outpaces the next active contributor by a factor of about 3.5). Also, look up "duplicate indexing", "datetime support", and "series refactor", and then tell me @jreback doesn't take responsibility for his actions.

I don't think these kinds of libelous remarks are helpful in any way. I think criticism is great, but doing so in a way that leads to improvements is generally a more effective way to approach it.

it wasn't @jreback who fixed it, but @immerr, after my loud complaints.

In light of the amount of work @jreback does for pandas for free and his helpfulness to people who actively criticize his work (#6011, the negative remarks are gone now but his apology is still there), it doesn't matter who addressed the issue. Pointing fingers is best left to git blame.

Let me be clear: backwards compatibility is an important issue. But so is being able to fix things that are broken, even if that means breaking back-compat occasionally. How often occasionally is is determined (in no particular order), by the core devs, the feature being broken, the severity of the bug or inconsistency. That said, breaking pickles is unacceptable. However, if something isn't tested, there's no guarantee that it will work (I hope you'll agree with this). If you want to submit a test for things like this in the future so that we have a contract that it works in the future, please do so and we'll gladly review and probably accept your patch. We like tests.

@jreback is being a bit shy about the number of tests: we're above 7000 right now. And lest you accuse me of "playing the numbers game", I'll add a disclaimer to take the number of tests comment with a grain of salt.

That itself suggests that your care for back-compat is slipping, I wish it weren't so.

On the contrary, pandas is where I learned all about it. I don't have as strong a feeling about pickles being back compat, but if it's causing this much pain for someone I think we should try to mitigate that pain.

No one likes name-calling, and accusing @jreback of doing cowardly things is likely to elicit no response or active avoidance of your bug reports in the future. That's probably how I would handle it. But, knowing @jreback, he'll probably fix them anyway because that's the kind of guy that he is.

Alright, I'm not commenting any further on this issue.

@armaganthis3 If you actually have something to contribute other than commentary, like offering to help maintain a "stable" branch, that would be most helpful.

@hayd
Copy link
Contributor

hayd commented Jul 20, 2014

@armaganthis3

The way to ensure you don't get bit by an API change:

  1. test your code (run test suite before upgrading production).
  2. test RC and .0 releases with your code base TELL US if there are issues, that way they can be addressed before the .1 release. IMO this is an appropriate time to "raise hell" (politely).
  3. push tests "upstream" into pandas codebase

The numpy issue (see the fervour on the mailing list to the proposal to drop 1.6 support) along with the other "evidence" you post is indeed damning... it backs up @jreback entirely:

I personally care about back compat quite a lot. Have you seen the tremendous lengths pandas has gone to maintain backwards compat with numpy 1.6 and pickle compat < 0.12!!!!!

If you're not impressed with @jreback's patience/care/heroism/et al... you're misreading the situation.

Anyway. It's worth mentioning you're not the first to bring up the idea of a "stable" branch, and it does sounds appealing in principle. However no-one (thus far) has come forward as willing to maintain. Here's some thoughts (perhaps these belong as a sep issue):

  1. Which version is "stable" (do you bug-fix all versions?? Since when? When do you release?)
  2. Which bug-fixes do you pull? No doubt some may be mangled with features. Merge conflicts and potentially weirder bugs will appear - IMO would need very deep understanding of all versions to fix.
  3. Even though difficult, this would actually be a very boring thing to maintain.
  4. Most users just upgrade on pip/conda, don't use "advanced" versioning, or indeed environments.
  5. Anyone who's anyone will be using latest stable.
  6. User fragmentation, weirder bugs and version nuances (often on SO, solution is basically "upgrade").

Saying that, perhaps this situation can be reassessed when/if pandas reaches "1.0". Of course, as @cpcloud points out: it's MIT so you're free to create a stable branch 🍰

@armaganthis3
Copy link
Author

I appreciate you guys stepping up to defend jreback, and I agree it's time to wrap this up,
being about as tired of it as you. But most if not all of your arguments don't hold up when I
read them, so here's a last volley afaic.

@cpcloud
The amount of work jreback puts in is very impressive, but not to the point since I didn't
claim otherwise. I did criticize the mishandling of back-compat (as I see it) and his dismissal
of users who report these issues (evident in the pickle case). git blame has nothing to do
with jreback's dismissing serious issues as irrelevant, you know this. His attempts at
misdirection in this discussion made this thread much worse then it needed to be. it's
unpleasant for everyone including myself.

free development has nothing to do with it: either my criticism is correct or it's not.
Consistently bad decisions aren't made right because jreback isn't payed to make them.

@cpcloud, I agree with your description of how decisions are made (though I don't see
why you included it) on breaking back-compat. as I've repeatedly said: back-compat
is being broken much too lightly, for frivolous reasons.

You seem very proud of the number of tests pandas has. None of them help when
back-compat is broken deliberately with little cause, or when users reporting regressions
are blown off without reason. If you had 100,000 tests, such behavior would still be a problem.

That itself suggests that your care for back-compat is slipping, I wish it weren't so.

On the contrary, pandas is where I learned all about it.

I meant the "pandas dev team", though I'm glad pandas has been education for you.
The point your contradicting doesn't seem debatable to me: if pickle back-compat has
been zealously maintained over 6 major versions and over two years, and now the team thinks
breaking it is no big deal, your standards are slipping. It's just true.

as for you jolly-good-jreback spiel, you're probably right, in any case whether you fix any
bug reports, mine or others, is something I have no control over. do what you will.

@hayd

  • Having a survery with no responses proves nothing either way . You must have
    has some stats/research training. Be serious.
  • numpy 1.6 is not an issue I raised, nor did i suggest jreback didn't care about it. what are
    you contradicting? as I've said, it seemed ironic that jreback chose that of all things an example
    just when it is being dropped. It's a digression from the issues I raise.
  • Testing and reporting back-compat breakage as you suggest would not (did not) help,
    if the devs delibarately choose to break stuff, and disclaim regression as jreback has done.
  • I did not, at any point, suggest you need to have a stable branch. I did make the point over
    and over (and you too ignored this) that in the absence of a stable branch and freuqent
    releases with back-compat breaks, users are (forgive me) in a pickle.
  • Some of your "stable branch" points make sense to me and some make no sense at all.
    "Anyone who's anyone" is quite funny, though, as if release engineering has something to do
    with a pythonic parallel to beyonce. Like I said, I didn't ask for a stable branch, my suggestion
    was actually that far less frequent breakage would be the best choice.

@cpcloud points out: it's MIT so you're free to create a stable branch 🍰

It's actually 3-clause BSD, but you meant to say it's free software which means you're
right that I'm free to fork the project, maintain my own branch and in any case I'm
free to roll-my-own from scratch.

Sadly, that's a default answer to any criticism anyone could ever raise to to you, justified
or not and it leaves little else to discuss. a sort of panic button really. ok then.

@hayd
Copy link
Contributor

hayd commented Jul 20, 2014

ok, great!

@pandas-dev pandas-dev locked and limited conversation to collaborators Jul 20, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants