-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add empty() methods for DataFrame and Series #12291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added empty() methods to the Series and DataFrame classes analogous to the empty() function in the numpy library that can also accept scipy duck-type dtypes in addition to numpy dtypes.
Besides the |
.empty is a property of NDFrames already Series(index=range(4)) does this already for example |
On second look, your suggestion doesn't quite entirely match what I was proposing:
I would think that |
and so now numpy support missing values with that is the exception to the rule atm. |
I'm not sure I understand your question. |
You can't store the value In [1]: np.array([np.nan], dtype='int64')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-12005059c4f1> in <module>()
----> 1 np.array([np.nan], dtype='int64')
ValueError: cannot convert float NaN to integer See here for more. This could change at some point, but it's currently how things are. Your change of filling with random bits of memory via |
@TomAugspurger : Agreed. Nevertheless, being able to create dummy |
@gfyoung how is pandas coerces from a user perspective so you can be giving a specification which is not supported but will just work. |
And if you really need the empty data, then |
@jreback @TomAugspurger : Maybe I have been stuck in the |
@gfyoung empty is not uninstalized as its in numpy. its full of a dtype compat missing values. The other issue is related to the internals and how to handle these types of numpy bugs/issues. We have to work around them. A series constructor should just work, coercing dtypes if needed. As a user you don't have to be concerned about it. As a code contributor, however, you have to be aware (and compensate for) these issues. |
@gfyoung I appreciate what numpy does, but that is wrong IMHO. It only makes sense if its object dtype. pandas has in effect a much more detailed and richer missing value support system, so we really really try hard to have appropriate values. Nothing is ever unitialized, its just missing. The exception is really int, which forces a casting to float because of the storage medium (numpy). |
@jreback : Fair enough. I do think though it would be good to be able to create "dummy" |
@gfyoung absolutley. and let me say I certainly appreciate your numpy background and viewpoints. For all dtypes, passing a |
@jreback : Also |
that is also a representation in how pandas deals with strings. These are by definition |
@jreback : So even for Python strings (variable length), that's how it is treated? Just curious, why is that the case? Also, in light of your point about the exception, would it be worth re-opening this PR so that we can then can create "dummy" |
@wesm of course would have the original motivation, but I suspect here are some reasons why fixed length strings are not a great idea in pandas:
not sure why you would want to expose 'dummy' for any purpose, its purely internal to |
@jreback : Well it was working on the PR for Another reason (though this might be moot - I am not entirely), but if you know for example what sort of |
How can you create dummies with integer dtypes? it is not efficient at all to create 'dummies' then populate them. In the world of a single dtype, sure you can, but when you have multiple dtypes (and esp lots of inference on the indexers), this not a good pattern. |
I'll concede that in the context of the |
When I say create a dummy with integer dtypes, it's essentially initializing an |
Which is what @TomAugspurger said above: |
EDIT: @jorisvandenbossche : Sorry, misread your comment the first time. Yes, that is what I am looking for. But why not abstract into a method, which is what my PR does? |
If the user wants to create an empty |
if you want to do
will work, but ONLY for a |
@jreback : Fair enough. I thought there might be a use-case for it, but if that isn't something people do too often or at all, then we can lay this PR to rest then. :) |
I don't understand this. Why would you have to think about that? You just specify the dtype you want in the empty function?
I don't have a strong opinion on this, but in any case the approach in this PR is not possible given that |
Notice how the The name was not really the issue besides the fact I had forgotten about In any case, this discussion is moot, since @jreback pointed out that such a use case is not as common compared to |
Added
empty()
methods to theSeries
andDataFrame
classes analogous to theempty()
function in thenumpy
library that can also acceptscipy
duck-typedtypes
in addition tonumpy
dtypes
.