Skip to content

BUG make hashtable.unique support readonly arrays #18825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 27, 2017

Conversation

hexgnu
Copy link
Contributor

@hexgnu hexgnu commented Dec 18, 2017

This problem was brought up in
#18773 and effectively comes
down to how Cython deals with readonly arrays. While it would be ideal
for Cython to fix the underlying problem in the meantime we can rely on
this.

@gfyoung gfyoung added Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 18, 2017
@@ -512,6 +512,16 @@ def f():
tm.assert_numpy_array_equal(
mask, np.array([False, True, True, True, True]))

def test_cut_read_only(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue number below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks @gfyoung

@hexgnu hexgnu force-pushed the fixes_unique_readonly branch from 21c65ae to 668d057 Compare December 18, 2017 22:01
@codecov
Copy link

codecov bot commented Dec 18, 2017

Codecov Report

Merging #18825 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18825      +/-   ##
==========================================
+ Coverage   91.59%   91.59%   +<.01%     
==========================================
  Files         150      150              
  Lines       48959    48959              
==========================================
+ Hits        44843    44845       +2     
+ Misses       4116     4114       -2
Flag Coverage Δ
#multiple 89.96% <ø> (ø) ⬆️
#single 41.13% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/util/testing.py 84.9% <0%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdebcf3...ddb5247. Read the comment docs.


mutable = np.arange(0, 100, 10)

one_to_hundred = np.arange(100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite...that's really zero to 99 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh duh lol... my bad I'll update it

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. Ideally we would find a better / more general way of doing this.

mutable = np.arange(0, 100, 10)

hundred_elements = np.arange(100)
tm.assert_categorical_equal(cut(hundred_elements, readonly),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can paramaterize on writeable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what you're asking. I added a parameterized test that tests the three combinations (Readonly, Mutable) (Mutable, Mutable) (Readonly, Readonly). Let me know if that's not what you were thinking. Thanks @jreback

@@ -255,10 +255,56 @@ dtypes = [('Float64', 'float64', 'val != val', True),
('UInt64', 'uint64', 'False', False),
('Int64', 'int64', 'val == iNaT', False)]

def get_dispatch(dtypes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment similiar to in algos_take for using the memory view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also noticed that the indenting was a tad off so I re-indented it to be 4 spaces.

@pep8speaks
Copy link

pep8speaks commented Dec 19, 2017

Hello @hexgnu! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 23, 2017 at 20:56 Hours UTC

@hexgnu hexgnu force-pushed the fixes_unique_readonly branch from c888981 to 4a9d32e Compare December 19, 2017 17:33
tm.assert_categorical_equal(cut(hundred_elements, array_1),
cut(hundred_elements, array_2))

for (w_1, w_2) in [[True, True], [True, False], [False, False]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize this test instead of writing a sub test function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry yea just pushed a commit for that. should be parametrized now 😄

@jreback jreback added this to the 0.22.0 milestone Dec 20, 2017
@hexgnu hexgnu force-pushed the fixes_unique_readonly branch from 4a9d32e to a5d5b82 Compare December 20, 2017 16:43
@@ -326,7 +326,7 @@ Reshaping
- Bug in :func:`DataFrame.stack` which fails trying to sort mixed type levels under Python 3 (:issue:`18310`)
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`)
- Bug in :func:`Series.rank` where ``Series`` containing ``NaT`` modifies the ``Series`` inplace (:issue:`18521`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also squashed some of my commits to clean up the commit history for you.

This problem was brought up in
pandas-dev#18773 and effectively comes
down to how Cython deals with readonly arrays. While it would be ideal
for Cython to fix the underlying problem in the meantime we can rely on
this.

fix: updates one_to_hundred for hundred_elements

This is because arange(100) isn't actually 1 to 100... it's 0 to 99

docs: adds comment to fix using ndarray and fixes indenting

test: parametrize test for test_readonly_cut

doc: add new whatsnew entry for v0.23.0

fix: checkout existing upstream v0.22.0
@hexgnu hexgnu force-pushed the fixes_unique_readonly branch from a5d5b82 to 4f1e2aa Compare December 22, 2017 02:00
@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

I pushed a fix. ping on green.

@hexgnu
Copy link
Contributor Author

hexgnu commented Dec 26, 2017

Awesome thanks @jreback looks like it all passed.

@jreback jreback merged commit 80a5399 into pandas-dev:master Dec 27, 2017
@jreback
Copy link
Contributor

jreback commented Dec 27, 2017

thanks @hexgnu
keep em coming!

hexgnu added a commit to hexgnu/pandas that referenced this pull request Dec 28, 2017
xhochy added a commit to xhochy/pandas that referenced this pull request Jul 7, 2018
xhochy added a commit to xhochy/pandas that referenced this pull request Jul 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.cut fails on readonly arrays
4 participants