BUG: ensuring that np.asarray() simple handles data as objects and doesn't… #22161

realead · 2018-08-01T15:36:00Z

… try to do smart things (GH22160)

closes unexpected behavior of pd.core.algorithms._ensure_data() #22160
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-08-01T20:35:30Z

Codecov Report

Merging #22161 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #22161   +/-   ##
=======================================
  Coverage   92.08%   92.08%           
=======================================
  Files         169      169           
  Lines       50704    50704           
=======================================
  Hits        46691    46691           
  Misses       4013     4013

Flag	Coverage Δ
#multiple	`90.49% <100%> (ø)`	⬆️
#single	`42.33% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.69% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 475e391...ab06c38. Read the comment docs.

jreback

does this change anything from a user perspective?

jreback · 2018-08-01T21:47:35Z

pandas/core/algorithms.py

@@ -134,7 +134,7 @@ def _ensure_data(values, dtype=None):
        return values, dtype, 'int64'

    # we have failed, return object
-    values = np.asarray(values)
+    values = np.asarray(values, dtype=np.object)


so we actually should prob use pandas.core.dtypes.cast.construct_1d_array_preserving_na which is even better here. further pls run the performance suite for things like factorize, value_counts, isin, this a very performance sensitive section.

@jreback Actually, pandas.core.dtypes.cast.construct_1d_ndarray_preserving_na would not work for two reasons:

For [42, 's'] it returns array(['42', 's'], dtype='<U11') and not the wanted array([42, 's'], dtype=object)), not sure this is the intended behavior of the function though

For [np.nan] it returns array([nan], dtype=float64) which leads to result[0] is np.nan being False, but we would like to keep the id of the object.

pep8speaks · 2018-08-03T04:42:37Z

Hello @realead! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 10, 2018 at 09:50 Hours UTC

realead · 2018-08-03T06:36:02Z

For asv continuous -f 1.1 upstream/master HEAD -b ^series_methods -b ^algorithms the result was:

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

jreback · 2018-08-09T11:14:09Z

can you rebase and run perf tests on algos. I find ever small changes sometimes can really impact perf here.

jreback · 2018-08-09T11:14:37Z

doc/source/whatsnew/v0.24.0.txt

@@ -573,6 +573,5 @@ Other
 - :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
 - Require at least 0.28.2 version of ``cython`` to support read-only memoryviews (:issue:`21688`)
 - :meth: `~pandas.io.formats.style.Styler.background_gradient` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` (:issue:`15204`)
-
-
+- :meth:`pandas.core.algorithms.isin` avoids spurious casting for lists (:issue:`22160`)


is this user visible?

@jreback Only when the user uses pandas.core.algorithms.isin directly, then the wrong behavior from #22160 is fixed. There is however no difference if isin is used via Series or Index - the values are already in a np.array and thus the bug ins't triggered.

ok , this is an internal, routine, ok removing this whatsnew note.

…esn't try to do smart things (GH22160)

…s adjusting the test cases

realead · 2018-08-09T21:01:22Z

There were no performance changes:

asv continuous -f 1.01 upstream/master HEAD -b ^series_methods -b ^algorithms -b ^categorial

· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing into conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 60 total benchmarks (2 commits * 1 environments * 30 benchmarks)
[  0.00%] · For pandas commit hash f11b14a6:
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  0.00%] ··· Running benchmarks.........................
[  0.00%] ··· Setting up algorithms.py:95
[  0.00%] ··· Running benchmarks.........
[  1.67%] ··· algorithms.Duplicated.time_duplicated_float           16.1±1ms;...
[  3.33%] ··· algorithms.Duplicated.time_duplicated_int           9.85±0.3ms;...
[  5.00%] ··· algorithms.Duplicated.time_duplicated_string        10.6±0.2μs;...
[  6.67%] ··· ...icatedUniqueIndex.time_duplicated_unique_int         31.1±0.1μs
[  8.33%] ··· algorithms.Factorize.time_factorize_float             60.3±3ms;...
[ 10.00%] ··· algorithms.Factorize.time_factorize_int               20.0±3ms;...
[ 11.67%] ··· algorithms.Factorize.time_factorize_string           143±0.5ms;...
[ 13.33%] ··· algorithms.Match.time_match_string                         368±4μs
[ 15.00%] ··· series_methods.Clip.time_clip                            121±0.5μs
[ 16.67%] ··· series_methods.Dir.time_dir_strings                    1.97±0.03ms
[ 18.33%] ··· series_methods.Dropna.time_dropna                  2.72±0.04ms;...
[ 20.00%] ··· series_methods.IsIn.time_isin                      1.56±0.01ms;...
[ 21.67%] ··· ...ForObjects.time_isin_long_series_long_values         3.30±0.1ms
[ 23.33%] ··· ...cts.time_isin_long_series_long_values_floats         5.94±0.1ms
[ 25.00%] ··· ...orObjects.time_isin_long_series_short_values        2.03±0.02ms
[ 26.67%] ··· series_methods.IsInForObjects.time_isin_nans               636±6μs
[ 28.33%] ··· ...orObjects.time_isin_short_series_long_values        1.10±0.03ms
[ 30.00%] ··· series_methods.Map.time_map                           979±20μs;...
[ 31.67%] ··· series_methods.NSort.time_nlargest                 2.78±0.05ms;...
[ 33.33%] ··· series_methods.NSort.time_nsmallest                2.31±0.05ms;...
[ 35.00%] ··· ...s_methods.SeriesConstructor.time_constructor        157±1μs;...
[ 36.67%] ··· ...SeriesGetattr.time_series_datetimeindex_repr        3.11±0.03μs
[ 38.33%] ··· series_methods.ValueCounts.time_value_counts        2.20±0.3ms;...
[ 40.00%] ··· algorithms.Hashing.time_frame                           21.2±0.3ms
[ 41.67%] ··· algorithms.Hashing.time_series_categorical              5.42±0.1ms
[ 43.33%] ··· algorithms.Hashing.time_series_dates                   3.23±0.06ms
[ 45.00%] ··· algorithms.Hashing.time_series_float                   3.29±0.03ms
[ 46.67%] ··· algorithms.Hashing.time_series_int                     3.34±0.04ms
[ 48.33%] ··· algorithms.Hashing.time_series_string                   13.8±0.5ms
[ 50.00%] ··· algorithms.Hashing.time_series_timedeltas              3.29±0.03ms
[ 50.00%] · For pandas commit hash 475e391e:
[ 50.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running benchmarks.........................
[ 50.00%] ··· Setting up algorithms.py:95
[ 50.00%] ··· Running benchmarks.........
[ 51.67%] ··· algorithms.Duplicated.time_duplicated_float         14.8±0.6ms;...
[ 53.33%] ··· algorithms.Duplicated.time_duplicated_int           10.7±0.8ms;...
[ 55.00%] ··· algorithms.Duplicated.time_duplicated_string        10.5±0.2μs;...
[ 56.67%] ··· ...icatedUniqueIndex.time_duplicated_unique_int         30.8±0.4μs
[ 58.33%] ··· algorithms.Factorize.time_factorize_float             58.8±3ms;...
[ 60.00%] ··· algorithms.Factorize.time_factorize_int               21.5±2ms;...
[ 61.67%] ··· algorithms.Factorize.time_factorize_string             148±5ms;...
[ 63.33%] ··· algorithms.Match.time_match_string                         363±2μs
[ 65.00%] ··· series_methods.Clip.time_clip                              118±1μs
[ 66.67%] ··· series_methods.Dir.time_dir_strings                    1.94±0.02ms
[ 68.33%] ··· series_methods.Dropna.time_dropna                  2.77±0.06ms;...
[ 70.00%] ··· series_methods.IsIn.time_isin                      1.56±0.02ms;...
[ 71.67%] ··· ...ForObjects.time_isin_long_series_long_values        3.29±0.08ms
[ 73.33%] ··· ...cts.time_isin_long_series_long_values_floats        5.89±0.09ms
[ 75.00%] ··· ...orObjects.time_isin_long_series_short_values        2.07±0.03ms
[ 76.67%] ··· series_methods.IsInForObjects.time_isin_nans             668±200μs
[ 78.33%] ··· ...orObjects.time_isin_short_series_long_values         1.36±0.8ms
[ 80.00%] ··· series_methods.Map.time_map                        1.02±0.02ms;...
[ 81.67%] ··· series_methods.NSort.time_nlargest                 2.84±0.06ms;...
[ 83.33%] ··· series_methods.NSort.time_nsmallest                2.40±0.09ms;...
[ 85.00%] ··· ...s_methods.SeriesConstructor.time_constructor        157±2μs;...
[ 86.67%] ··· ...SeriesGetattr.time_series_datetimeindex_repr         3.17±0.2μs
[ 88.33%] ··· series_methods.ValueCounts.time_value_counts       2.19±0.04ms;...
[ 90.00%] ··· algorithms.Hashing.time_frame                           21.1±0.6ms
[ 91.67%] ··· algorithms.Hashing.time_series_categorical             5.02±0.02ms
[ 93.33%] ··· algorithms.Hashing.time_series_dates                   3.18±0.01ms
[ 95.00%] ··· algorithms.Hashing.time_series_float                   3.23±0.06ms
[ 96.67%] ··· algorithms.Hashing.time_series_int                     3.38±0.03ms
[ 98.33%] ··· algorithms.Hashing.time_series_string                   13.9±0.4ms
[100.00%] ··· algorithms.Hashing.time_series_timedeltas              3.25±0.07ms

**BENCHMARKS NOT SIGNIFICANTLY CHANGED.**

jreback · 2018-08-09T22:30:37Z

@realead ok with this, can you remove the whatsnew note. ping on green.

realead · 2018-08-10T09:50:34Z

Close/Open to trigger CI-run

jreback · 2018-08-10T10:38:19Z

thanks @realead

…esn't… (pandas-dev#22161)

jreback requested changes Aug 1, 2018

View reviewed changes

jreback added the Compat pandas objects compatability with Numpy or Python functions label Aug 1, 2018

realead force-pushed the fix_GH22160 branch from 959d1b4 to b6f0512 Compare August 3, 2018 04:42

realead changed the title ~~ensuring that np.asarray() simple handles data as objects and doesn't…~~ BUG: ensuring that np.asarray() simple handles data as objects and doesn't… Aug 3, 2018

jreback requested changes Aug 9, 2018

View reviewed changes

realead added 5 commits August 9, 2018 21:36

BUG: ensuring that np.asarray() simple handles data as objects and do…

aa672b9

…esn't try to do smart things (GH22160)

reworking test cases

eb2da20

add entry in whatsnew

0aa9b53

adding yet another testcase and fixing pep8-problems

5c0ecc1

pandas-dev#22207 changed the behavior with different nan-objecst, thu…

f11b14a

…s adjusting the test cases

realead force-pushed the fix_GH22160 branch from dd94526 to f11b14a Compare August 9, 2018 20:53

jreback added this to the 0.24.0 milestone Aug 9, 2018

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 9, 2018

jreback approved these changes Aug 9, 2018

View reviewed changes

no whatsnew needed

ab06c38

realead closed this Aug 10, 2018

realead reopened this Aug 10, 2018

jreback merged commit cc3ab4a into pandas-dev:master Aug 10, 2018

realead deleted the fix_GH22160 branch August 11, 2018 19:45

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: ensuring that np.asarray() simple handles data as objects and do…

2587c67

…esn't… (pandas-dev#22161)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ensuring that np.asarray() simple handles data as objects and doesn't… #22161

BUG: ensuring that np.asarray() simple handles data as objects and doesn't… #22161

realead commented Aug 1, 2018 •

edited

Loading

codecov bot commented Aug 1, 2018 •

edited

Loading

jreback left a comment

jreback Aug 1, 2018

realead Aug 3, 2018 •

edited

Loading

pep8speaks commented Aug 3, 2018 •

edited

Loading

realead commented Aug 3, 2018 •

edited

Loading

jreback commented Aug 9, 2018

jreback Aug 9, 2018

realead Aug 9, 2018

jreback Aug 9, 2018

realead commented Aug 9, 2018 •

edited

Loading

jreback commented Aug 9, 2018

realead commented Aug 10, 2018

jreback commented Aug 10, 2018

BUG: ensuring that np.asarray() simple handles data as objects and doesn't… #22161

BUG: ensuring that np.asarray() simple handles data as objects and doesn't… #22161

Conversation

realead commented Aug 1, 2018 • edited Loading

codecov bot commented Aug 1, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback Aug 1, 2018

Choose a reason for hiding this comment

realead Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

pep8speaks commented Aug 3, 2018 • edited Loading

Comment last updated on August 10, 2018 at 09:50 Hours UTC

realead commented Aug 3, 2018 • edited Loading

jreback commented Aug 9, 2018

jreback Aug 9, 2018

Choose a reason for hiding this comment

realead Aug 9, 2018

Choose a reason for hiding this comment

jreback Aug 9, 2018

Choose a reason for hiding this comment

realead commented Aug 9, 2018 • edited Loading

jreback commented Aug 9, 2018

realead commented Aug 10, 2018

jreback commented Aug 10, 2018

realead commented Aug 1, 2018 •

edited

Loading

codecov bot commented Aug 1, 2018 •

edited

Loading

realead Aug 3, 2018 •

edited

Loading

pep8speaks commented Aug 3, 2018 •

edited

Loading

realead commented Aug 3, 2018 •

edited

Loading

realead commented Aug 9, 2018 •

edited

Loading