-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Un-deprecate DataFrame.from_items() #21850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jreback I would appreciate your review of this topic. |
I disgree, this is a good deprecation. Keeping around old code is not helpful. So -1 on adding back. |
@jreback It is clear that you think this is a good deprecation, otherwise you would not have done it. However, the initial justification ("only 6 hits on Stack Overflow") completely ignored the larger picture, which is that this function is widely used in a huge number of applications and libraries. It might be the case that the original implementation was unwieldy or costly to maintain. But what could be the reason not to maintain backward compatibility by a 3-line function that delegates the work to OrderedDict and the main DataFrame constructor? In other words, maybe the old code was not helpful. Would you accept new code to implement this widely-used function in a single place, rather than having thousands of users sprinkle the same into their local codebases? |
there is a burden to maintain apis and code |
This removes the deprecation warnings introduced in pandas-dev#18262, by reimplementing DataFrame.from_items() in the recommended way using DataFrame.from_dict() and collections.OrderedDict. This eliminates the maintenance burden of separate code for from_items(), while allowing existing uses to keep working. A small cleanup can be done once pandas-dev#8425 is fixed.
This removes the deprecation warnings introduced in pandas-dev#18262, by reimplementing DataFrame.from_items() in the recommended way using DataFrame.from_dict() and collections.OrderedDict. This eliminates the maintenance burden of separate code for from_items(), while allowing existing uses to keep working. A small cleanup can be done once pandas-dev#8425 is fixed.
This removes the deprecation warnings introduced in pandas-dev#18262, by reimplementing DataFrame.from_items() in the recommended way using DataFrame.from_dict() and collections.OrderedDict. This eliminates the maintenance burden of separate code for from_items(), while allowing existing uses to keep working. A small cleanup can be done once pandas-dev#8425 is fixed.
I agree with @jzwinck that "old code" and "burden to maintain apis and code" is not a good reason for deprecating If that was the original reason to deprecate it, I think the PR of @jzwinck is a perfectly reasonable solution. However, often when we deprecate one of the public methods, the main reasons are related to the "User API" design: inconsistent apis, duplicate functionality, hardly useful functionality that clobbers the API, etc are all reasons to consider deprecating a functionality. So also for And to be clear, such a trade-off is difficult and subjective. That said, I personally never had a use case myself where If this would have been a new method proposal, I would probably be -1 to add it. But given that this is about removing an already existing method, I personally don't know. |
Thanks for the insight @jorisvandenbossche. If this were a brand new API, I would not promote it either. But as you point out, it is an API Pandas has had for a long time. To me the largest concerns are
On 24 August 2017, @bashtage asked:
This was a prescient question: it is exactly the case which triggers #8425 if you can't use I could see making the docs say |
I haven't seen any mention that the import pandas
from collections import OrderedDict
def test_from_dict_replacing_from_items():
rows = [(1, (2,)), (1, (2,))]
df1 = pandas.DataFrame.from_items(rows, columns=('a', ), orient='index')
df2 = pandas.DataFrame.from_dict(OrderedDict(rows), columns=('a', ), orient='index')
pandas.testing.assert_frame_equal(df1, df2) |
@PatrickDRusk Good point. There are several cases where the replacement for |
@jzwinck : I should point out that @jorisvandenbossche was willing to reopen the issue, but you promptly closed it. From the looks of it, your comments are in fact somewhat hypocritical. I would caution your tone before you make such broad statements about us like that. We don't delete code on a whim, and with so many users of this codebase it is difficult to please everyone when it comes to making a library like this work. If you want to have a discussion again about this topic, then please re-open the issue. |
@gfyoung OK, I've reopened this issue. The current state of play is that there are a number of users of Pandas who can't easily upgrade to 0.23 or later because What should we do next? |
Okay, let's take stock of what we have here:
Hmmm...duplicate indices generally have not been a fan-favorite for us as maintainers because such support can be very hard to maintain, not just for one case, but as a "feature" across the entire interface of @PatrickDRusk : So admittedly I'm somewhat torn: the statement is indeed inaccurate with regards to duplicate indices, but committing to supporting such behavior could send the wrong message that we intend to support duplicates full-throttle, when that is far from the case in our codebase. @pandas-dev/pandas-core : Thoughts on this?
@jzwinck : This statement unfortunately is too vague for us to work with. Especially since we already made the decision to deprecate, we will need more than words and GitHub search results to make a convincing argument. Meet us halfway and present us with more examples to illustrate that the suggested change is incompatible. Caveat: if they fall into same category of duplicates, I personally wouldn't count it unless it is somehow different (I'll leave it to your discretion to judge appropriately) from what @PatrickDRusk has given. |
I'm +0.5 on keeping the method. In general, I don't see "easy to replace with user code" as an argument to deprecate - maybe the opposite, if the code is so simple and so widely used... Anyway, if we really want to discourage users from using duplicated labels, the the correct way to do it is to introduce a general warning for it, inside the |
By the way: deprecating |
@gfyoung I don't personally need support for duplicate values in the index, and I wrote #22094 without considering that. For people who like reduced implementation code and don't want duplicate values in the index, #22094 provides both. You said:
Please look at the logic in #22094. That's what I had to write to make the preexisting unit tests pass. In other words, the unit tests I restored in that PR are the examples you are looking for. Any code using |
@toobaz : I beg to differ to the extent in which duplicates of labels, whether it be in columns or indices is supported as well as non-duplicates, but let's agree to disagree on this one for the moment. My point is not about "sending a message" by not supporting it. It's actually about "not sending the wrong message" that's more important IMO. |
@jzwinck : I read it, and I'm really confused here...what pre-existing unit test was failing before your change? You seemed to have removed several tests + removed deprecation warnings that we had in place that we were checking for... |
@gfyoung I should have said "to make the tests pass as they did before the deprecation warnings were added." In other words, I put the unit tests back to the way they were before the deprecation, and then implemented the logic in |
I would also lean towards keeping |
If
I set up a GitHub issue to let them know the change breaks their code. |
Is there any workaround that allows us to have a fast version of |
Under #18262 a
FutureWarning
was added suggesting that existing code like this:Should be changed to this:
The fact that
from_items()
appeared only 6 times (now 8 times) in a Stack Overflow search was used as partial justification for removing it. But if you search on GitHub,pd.DataFrame.from_items()
appears more than 15,000 times in Python--almost half as many asfrom_records()
!We should celebrate the fact that this function doesn't cause enough confusion to appear often on Stack Overflow. But it does occur (a lot!) in real code, and deprecating it is a mistake.
If constructing a temporary
OrderedDict
around items is the best way to construct a DataFrame, Pandas should implement that as a short function calledDataFrame.from_items()
, rather than asking thousands of people to busy themselves to accommodate this unnecessary API change.I recommend removing the FutureWarning, and retaining this widely-used, longstanding function.
For reference, the
FutureWarning
starts in 0.23 and looks like this:The text was updated successfully, but these errors were encountered: