You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import numpy as np
import pandas as pd
num_tries = 1000
length = 100000
for i in range(num_tries):
df = pd.DataFrame({"A" : 100000 * (200 + np.random.random(length)),
"B" : 100000 * (200 + np.random.random(length))})
df['A*B'] = df.A * df.B
if (df['A*B'] < 100).any():
print "found the bug in iteration", i
Expected Output
Empty
Actual Output Example:
found the bug in iteration 7
found the bug in iteration 36
found the bug in iteration 103
found the bug in iteration 113
found the bug in iteration 120
....
Commentary
The output is different on each run.
The problem does not occur if we set length = 10000. Thus, the problem only occurs for long vectors.
The problem also does not occur if we cast the pandas column to numpy arrays. Thus, if we replace the multiplication line by df['A*B'] = np.array(df.A) * np.array(df.B) then the problem does not occur
I suspect this is a CPU-related problem, and that some users will not be able to reproduce it. I'm running on a 64-bit Intel i7-3610QM. It is entirely possible that my CPU is malfunctioning, but it's still odd that the problem is so prevalent, and that it gets solved when using numpy multiplication instead of pandas multiplication.
more info
here is an example of the head of the data-frame from a buggy iteration:
Code Sample, a copy-pastable example if possible
Expected Output
Empty
Actual Output Example:
Commentary
The output is different on each run.
The problem does not occur if we set
length = 10000
. Thus, the problem only occurs for long vectors.The problem also does not occur if we cast the pandas column to numpy arrays. Thus, if we replace the multiplication line by
df['A*B'] = np.array(df.A) * np.array(df.B)
then the problem does not occurI suspect this is a CPU-related problem, and that some users will not be able to reproduce it. I'm running on a 64-bit Intel i7-3610QM. It is entirely possible that my CPU is malfunctioning, but it's still odd that the problem is so prevalent, and that it gets solved when using numpy multiplication instead of pandas multiplication.
more info
here is an example of the head of the data-frame from a buggy iteration:
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: