-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Confusing behaviour of df.empty #12393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you show an example, the following is works. I agree certainly could update the documentation (with some examples and such)
|
That is exactly the behavior I was questioning, I think out[9] should be True. It seems to me that a dataframe containing nothing by na cells is "empty" according to most definitions..... Phil |
I certainly expected that df.empty would be true if a data frame contained nothing but na cells. |
no, that is not a normal definition of empty which is 0-len. nulls are real values which are placeholders. The key here is that you actually have a valid index. Changing this would involve the definition dependent on the data itself which is not a good thing. welcome to have a doc update with some examples though. |
Well its certainly not the case that df.empty is the same as len(df.index==0) e.g.
Also
|
So not only do I dispute that len=0 is the semantic definition of empty, that doesn't appear to be the implementation anyway. |
Also stuff like:
which just seems plain wrong. If I have a column that I have never added any data to, it should not return false when asked if its empty. I guess under the hood when you specify an index and columns it autofills the dataframe somehow, and that results in this behaviour, but the definition of empty should really play nicely with the default dataframe constructor in these examples imo. Else its just confusing. |
its very simple. its empty only if all axes are len 0 |
It sounds like you're looking for some other collection of |
@phil20686 You can eg use:
|
@jrebeck my example shows that that is not the implemented behavior:
I really find it super weird that pre-allocation should result in empty=False, e.g. if you concatenate an empty series with a non empty dataframe it will get preallocated and then extracting it means empty has changed from True to False. This seems very strange to me, in some abstract sense series C is the same object, but merely moving it around has changed its properties.
Anyway, my main point is that this behavior should be documented, because its quite counter-intuitive, not to argue about definitions of empty. |
@phil20686 every one of those results is correct, what exactly is counter intuitive here? there isn't any 'pre-allocation' at all. You have indices. If the indicies are 0 in any way (could be 1 dim or not) then you are empty, otherwise you are not. What exactly are you using |
Um. I had a series that had .empty = True, I concat it with a dataframe, and then I extract the series, and then magically .empty=False? Even though at no time has the user added data to it? Similarly, you can create a dataframe with either an index or columns but no values and its "empty", but if it has both and no values its "non-empty". You don't think that is counter intuitive behavior? Anyway, I think most people would assume that empty == contains no data. That clearly isn't the case as it looks like its the same as
Anyway, my main point was that the documentation of Dataframe.empty should note these behaviors. |
@phil20686 one of the highlites of pandas is that it aligns data. When you put in a series it was empty, however, the concat realigned the Series to the other values in the DataFrame
then [21] is clearly NOT empty; yes it is all null. Which is a MUCH more common operation. |
@phil20686 In trying to clear some things up, I think we have to make a distinction between two points:
I just want to point out that I think the confusion you get has another root cause than the ** I don't say it is 'clear' in the docs, I mean in implementation But indeed, the docs of empty can certainly point that out. Do you want to do a PR to specify that this is not about NaNs ? |
I agree that the docs of empty could point this out (I actually had a student just ask me about this). Since it's been a few days, I can do a quick PR. |
This is as much a documentation issue as anything else. Basically it seems confusing that df.empty != df.dropna().empty I.e. that a a dataframe consiting entirely of na is not treated as empty. Obviously this is a bit of an edge case, but it caused a bunch of failures for me when used eith pd.read_sql methods, as database tables will often have columns that are not available for partiicular entities, and so can return an entire series of na.
It seems to me that in all cases df.empty should be the same as df.dropna().empty, but I understand that opinions might differ on this point, but at least the behaviour should be clearly documented.
The text was updated successfully, but these errors were encountered: