Skip to content

Series of integers mod behavior #24396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewgsavage opened this issue Dec 22, 2018 · 3 comments · Fixed by #29180
Closed

Series of integers mod behavior #24396

andrewgsavage opened this issue Dec 22, 2018 · 3 comments · Fixed by #29180
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@andrewgsavage
Copy link

Code Sample

import pandas as pd

s=pd.Series(range(1,10))
s1=pd.Series("foo")

s1%s


0                                              foo
1    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
2    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
3    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
4    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
5    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
6    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
7    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
8    [nan, nan, nan, nan, nan, nan, nan, nan, nan]
dtype: object




import pandas as pd

s=pd.Series(range(1,10))
s2=pd.Series("foo", index=s.index)

s2%s

0    foo
1    foo
2    foo
3    foo
4    foo
5    foo
6    foo
7    foo
8    foo
dtype: object

Problem description

Was expecting a TypeError.

I think this is breaking a pint pandas interface test which tests s2 arithmetic_op Series(PintArray... raises some exception. (Which is failing on rmod )

import pandas as pd
import numpy as np
import pint
from pint.pandas_interface import PintArray

ureg = pint.UnitRegistry()
Q_ = ureg.Quantity
torque = PintArray(Q_([1, 2, 2, 3, 4, 5], "lbf ft"))
angular_velocity = PintArray(Q_([1000, 2000, 2000, 3000, 3000, 3000], "rpm"))
df = pd.DataFrame({"torque": torque, "angular_velocity": angular_velocity})


pd.Series("foo", index=df.index)%df.torque

0    foo
1    foo
2    foo
3    foo
4    foo
5    foo
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.24.0.dev0+1340.g8c58817bd
pytest: 3.7.1
pip: 10.0.1
setuptools: 40.0.0
Cython: 0.28.5
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.2
lxml.etree: None
bs4: 4.6.0
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@mroeschke mroeschke added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 23, 2018
@mroeschke
Copy link
Member

Thanks for the report. Investigation and PRs welcome!

@cbertinato
Copy link
Contributor

I looked into this and it may not be considered a bug.

In this case, the two operands are a string (or object as far as pandas is concerned) and an integer. Because the former is a string, the mod operation is interpreted as string formatting. The first attempt at evaluation sets up an operation between two nd-arrays, but probably ends up looping through each and doing something like mod('foo', 1), and so, raises a TypeError because the string does not contain any string formatting characters (TypeError: not all arguments converted during string formatting).

A general Exception is caught and tries instead to do a masked_arith_op, which ends up doing something like mod('foo', np.array(range(1, 10)). This does not raise a TypeError, but rather, gives the result that we see at the end: 'foo'.

So it this really a bug?

It is perhaps worth noting here that when the string contains a format characters, e.g., pd.Series('foo %d'), then the result is as expected, which is a valid use-case. However, in that case, the first attempt at evaluating the operation does not generate a TypeError, so we can still distinguish between a mod operation between a string and an int that is valid, and one such as the case we are now discussing.

We could simply check for a string type once the TypeError is generated, but I wonder whether there is not a reason that I am not seeing to not allow the operation to go through.

@jbrockmendel
Copy link
Member

This is fixed in master, just needs a test.

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 13, 2019
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 23, 2019
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 23, 2019
@jreback jreback added this to the 1.0 milestone Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants