Skip to content

Match can return non-boolean results yet warning suggests results are boolean #22316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fuglede opened this issue Aug 13, 2018 · 5 comments · Fixed by #22626
Closed

Match can return non-boolean results yet warning suggests results are boolean #22316

fuglede opened this issue Aug 13, 2018 · 5 comments · Fixed by #22626
Labels
Clean Deprecate Functionality to remove in pandas
Milestone

Comments

@fuglede
Copy link

fuglede commented Aug 13, 2018

Code Sample, a copy-pastable example if possible

In [26]: pd.Series(['a', 'b', 4]).str.match('a')
Out[26]:
0     True
1    False
2      NaN
dtype: object

In [27]: pd.Series(['a', 'b', 4]).str.match('a', as_indexer=True)
# C:\Users\username\AppData\Local\Continuum\Anaconda3\Scripts\ipython:1: FutureWarning: 'as_indexer' keyword was specified but is ignored (match now returns a boolean indexer by default), and will be removed in a future version.
Out[27]:
0     True
1    False
2      NaN
dtype: object

Problem description

The warning states that match (and similarly for search) returns a boolean indexer by default, yet that's not the case for numeric input, as the example shows.

This one caught me off guard as it caused an indexing error in a piece of code that assumed that the result of match could be used as a mask regardless of input types.

Expected Output

The output currently obtainable by letting na=False:

In [33]: pd.Series(['a', 'b', 4]).str.contains('a', na=False)
Out[33]:
0     True
1    False
2    False
dtype: bool

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, Genuin
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.3.2
pip: 18.0
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.8
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

That keyword is slated to be removed in 0.24 #6581

Could you make a PR just removing it, rather than changing the error message?

@HyunTruth
Copy link
Contributor

HyunTruth commented Aug 15, 2018

@TomAugspurger I created a PR with the removal and change of test to accomodate the change #22356

@gfyoung gfyoung added Deprecate Functionality to remove in pandas Clean labels Aug 15, 2018
@fuglede
Copy link
Author

fuglede commented Aug 15, 2018

Heh, one has to be fast around here.

@jreback
Copy link
Contributor

jreback commented Aug 16, 2018

maybe my search is not working properly (as this is not listed as all), but this is versionchanged in 0.21.0 on the change, hence we would remove > 0.24, @TomAugspurger ?

@TomAugspurger
Copy link
Contributor

#6581 has it listed as under (0.24 / 1.0). Not sure what that means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Deprecate Functionality to remove in pandas
Projects
None yet
5 participants