-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG?: .sample() sometimes returning a view not a copy. #10736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
both
to guarantee that it will always be a copy Of course the reason to do this is that people will simply try to mutate this and assume it should work. |
Ok, patch coming! |
@jreback This behavior is very surprising to me. I assumed that pandas follows the NumPy rules for copies/views with If we have places where we are trying to convert array indexers into slices so we can do a view in |
This may be a naive question, but I've been struggling a little with copy/view issues lately, so seems worth asking: is that kind of data-structure-dependent variation in behavior worth having? For example, it doesn't seem like the following commands should be different, but one (the later) gives the setting with copy error:
|
(after discovering this behavior, I've literally spent the last two days going back through a major project i'm working on, setting |
@shoyer not sure what you are talking about. The semantics are the same as numpy. If you have a single dtype, then take will give you a view if you have a slice (or sometimes when selecting by integer). Same for iloc. Multi-dtypes will always give you a copy. This has always been true. @nickeubank this is the entire point of the |
@shoyer I see what you are talking about now. So |
@jreback I guess my question is more: is this really desirable as a property from a user-design perspective? The But from a general design perspective, is it ideal to expect users to always be mindful of the dtypes of columns they aren't directly manipulating in their (I'm thinking about my colleagues in the social sciences -- this just seems the kind of subtle and potentially unnecessarily complicated behavior that will really throw them for a loop) |
@nickeubank this is the fundamental design of numpy. Its less than ideal but generally not a problem. The warning never used to exists and we would get questions, like: why does this not set? e.g.
When it would work if it was a view, but the second someone added another dtype it would then not work. The thing is you HAVE/WANT to always use views as they are extremely cheap. Otherwise you end up copying everywhere which is bad. |
I've realized the
.sample()
function is sometimes returning a view rather than a copy. This will cause the "setting on a view/copy" error:I believe the behavior is the result of it using the
.take()
function, which is what the.sample()
command uses to pull the sample the passed frame.I'd like to suggest either:
.take()
with.iloc()
.take()
to.take().copy()
if that's faster (@shoyer suggested.take()
would be a faster -- is.take().copy()
?)Thoughts?
The text was updated successfully, but these errors were encountered: