Pandas groupby extremely slow in python3 for certain sets of single precision floating point data

In python3 with certain sets of single precision floating point data pandas groupby is up to ~150 slower than the same data in python2
#### Code Sample, a copy-pastable example if possible

``` python
import numpy as np
import pandas as pd
from numpy.random import random
from time import time


def do_groupby(df):
    df.groupby(['a'])['b'].sum()


def main():
    tmp1 = (random(10000) * 0.1).astype(np.float32)
    tmp2 = (random(10000) * 10.0).astype(np.float32)
    tmp = np.concatenate((tmp1, tmp2))
    arr = np.repeat(tmp, 100)
    df = pd.DataFrame(dict(a=arr, b=arr))
    t1 = time()
    do_groupby(df)
    print("Took: %s" % (time() - t1,))

main()
```

On my machine
python2 this_file.py
The groupby takes around 0.1s

Where as python3 this_file.py
The groupby takes around 11s

With some investigation the discrepancy in run time between python versions varies hugely between the actual data but seems to have the biggest difference when half the data is roughly 100 times smaller than the other half.

Having profiled this, it seems this function is taking almost all the 11s in the python3 version in this method
https://github.com/pydata/pandas/blob/master/pandas/hashtable.pyx#L538

However I have no idea what is causing the run time discrepancy.
#### output of `pd.show_versions()`
## INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-22-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IE.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 21.2.2
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.0
numexpr: 2.5.2
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.9
pymysql: 0.7.4.None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pandas groupby extremely slow in python3 for certain sets of single precision floating point data #13335

Code Sample, a copy-pastable example if possible

output of `pd.show_versions()`

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pandas groupby extremely slow in python3 for certain sets of single precision floating point data #13335

Description

Code Sample, a copy-pastable example if possible

output of pd.show_versions()

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

output of `pd.show_versions()`