transform in groupby throws TypeError when run with python -O option #2057

bluefir · 2012-10-11T17:20:52Z

I have a script that works fine when run without any options but generates the following traceback when run with 'python -O':

File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1745, in transform
return self._transform_item_by_item(obj, wrapper)
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1777, in _transform_item_by_item
raise TypeError('Transform function invalid for data types')
TypeError: Transform function invalid for data types

I have Python 2.7.3 and pandas 0.9.0:

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import pandas
pandas.version
'0.9.0'

wesm · 2012-10-11T23:33:30Z

That's not cool. Do you have a self-contained reproduction you could post here?

bluefir · 2012-10-12T05:45:26Z

Well, sort of. I tried to reproduce and discovered another strange behavior. Here is the code for the strange behavior:

import numpy as np
from pandas import DataFrame, MultiIndex

def quantiles(df, q=0.5):
    print('Entered quantiles() with shape ' + str(df.shape))

    print('Calculating quantiles ' + str(q) + ' for each column')
    qtls = df.quantile(q)

    print('Building output data frame')
    df_zeros = DataFrame(np.zeros(df.shape), index=df.index, columns=df.columns)
    df_out = df_zeros.add(qtls, axis='columns')

    print('Output shape ' + str(df_out.shape))
    return df_out


midx = MultiIndex(levels=[[1, 2], ['a', 'b', 'c', 'd', 'e']],
                  labels=[[0, 0, 0, 0, 0, 1, 1, 1, 1, 1],
                          [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]],
                  names=['date', 'id'])
df = DataFrame(np.random.randn(10, 2), index=midx, columns=['col1', 'col2'])
print('\nInput data frame: ')
print(df.to_string())
print('\nCalculating medians for each column:')
qtls = df.groupby(level='date').transform(quantiles)
print('\nData frame with medians:')
print('Shape ' + str(qtls.shape))
print(qtls.to_string())

'python TestGroupbyTransformO.py' produces the expected oucome:

---begin console output-------------------------------------

Input data frame:
col1 col2
date id
1 a 0.025334 0.468002
b 1.307855 1.094578
c 0.454256 0.711495
d -1.450975 -0.858718
e 0.851123 0.878828
2 a 1.726560 -0.936486
b 0.911542 -0.177365
c -1.078583 1.797866
d 0.595278 1.683337
e -1.718456 -1.041106

Calculating medians for each column:
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5, 2)
Calculating quantiles 0.5 for each column
Building output data frame
Output shape (5, 2)
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5, 2)
Calculating quantiles 0.5 for each column
Building output data frame
Output shape (5, 2)

Data frame with medians:
Shape (10, 2)
col1 col2
date id
1 a 0.454256 0.711495
b 0.454256 0.711495
c 0.454256 0.711495
d 0.454256 0.711495
e 0.454256 0.711495
2 a 0.595278 -0.177365
b 0.595278 -0.177365
c 0.595278 -0.177365
d 0.595278 -0.177365
e 0.595278 -0.177365

---end console output-------------------------------------------

'python -O TestGroupbyTransformO.py' produces this:

---begin console output------------------------------------------

Input data frame:
col1 col2
date id
1 a 0.824534 0.258803
b -0.807477 -0.046351
c -0.243443 -0.887152
d -1.430488 1.675248
e -0.430917 1.466759
2 a 1.101497 0.738619
b -2.010792 -0.152976
c -1.757038 1.234569
d -0.081311 -1.690532
e 0.696795 0.442808

Calculating medians for each column:
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5, 2)
Calculating quantiles 0.5 for each column
Building output data frame
Output shape (5, 2)
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5L,)
Calculating quantiles 0.5 for each column
Building output data frame
Entered quantiles() with shape (5, 2)
Calculating quantiles 0.5 for each column
Building output data frame
Output shape (5, 2)

Data frame with medians:
Shape (10, 2)
Traceback (most recent call last):
File "TestGroupbyTransformO.py", line 29, in
print(qtls.to_string())
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1267, in to_st
ring
formatter.to_string(force_unicode=force_unicode)
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 279, in to_st
ring
strcols = self._to_str_columns(force_unicode)
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 214, in _to_s
tr_columns
str_columns = self._get_formatted_column_labels()
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 355, in get
formatted_column_labels
dtypes = self.frame.dtypes
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1386, in dtype
s
return self.apply(lambda x: x.dtype)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 3763, in apply

return self._apply_standard(f, axis)

File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 3831, in _appl
y_standard
k = res_index[i]
UnboundLocalError: local variable 'i' referenced before assignment

---end console output------------------------------------------------------------------------------------

This is not what I observe in my more complex problem. It has a bigger frame and a more complicated function, but it does calculate several quantiles per column during the first steps (I tried to isolate those steps and discovered the above behavior). Normal run produces something like this:

---begin console output-----------------------------------
...

Entered with (1996L,)
Entered with (1996L,)
Entered with (1996, 21)
Assertions done
Calculating quantiles
Quantiles calculated
Bounds calculated
Outliers detected
Output shape (1996, 21)
Entered with (1996L,)
Entered with (1996L,)
Entered with (1996, 21)
Assertions done
Calculating quantiles
Quantiles calculated
Bounds calculated
Outliers detected
Output shape (1996, 21)
Entered with (1996L,)
Entered with (1996L,)
Entered with (1996, 21)
Assertions done
Calculating quantiles
Quantiles calculated
Bounds calculated
Outliers detected
Output shape (1996, 21)
Entered with (1996L,)
Entered with (1996L,)
Entered with (1996, 21)
Assertions done
Calculating quantiles
Quantiles calculated
Bounds calculated
Outliers detected
Output shape (1996, 21)
...
---end console output------------------------------

The -O run produces this:

---begin console output-----------------------------
...
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Entered with (1996L,)
Assertions done
Calculating quantiles
Traceback (most recent call last):
File "InefficiencyScores.py", line 390, in
returns_daily_no_outliers = returns_daily.groupby(level=field_date).transfor
m(f_shrink_outliers)
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1745, in tra
nsform
return self._transform_item_by_item(obj, wrapper)
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1777, in _tr
ansform_item_by_item
raise TypeError('Transform function invalid for data types')
TypeError: Transform function invalid for data types

---end console output-----------------------------

As you can see, in my program the -O run never enters with the full shape (1996, 21) and doesn't seem to even get beyond quantile calculations:

print('Calculating quantiles')

# Calculate sample quantiles
q25 = df_out.quantile(q=0.25, axis=axis)
q75 = df_out.quantile(q=0.75, axis=axis)
if symmetric:
    midpoint = (q75 + q25) / 2.
else:
    midpoint = df_out.quantile(q=0.5, axis=axis)

print('Quantiles calculated')

I realize it's convoluted but I hope it helps. The original program is more complex and so far I haven't been able to simply reproduce the behavior I observe. But I did find another puzzling behavior! :-)

wesm · 2012-11-19T02:00:40Z

wow, this is annoying. Apparently python -O removes assert statements in code

bluefir · 2012-11-19T02:53:39Z

Yep, among other things. But that's the point! Faster code. -OO also removes docstrings. I am not sure how much it helps though. If it's hard to fix, fughetaboutit :-)

Continuing #2057, s/assert/raise AssertionError/g

wesm mentioned this issue Nov 19, 2012

Replace all usages of assert keyword/function with AssertionErrors #2288

Closed

wesm closed this as completed in 19dc284 Nov 19, 2012

This was referenced Mar 12, 2013

Continuing #2057, s/assert/raise AssertionError/g #3023

Merged

TODO: Decorate asserts with helpful messages #3024

Closed

ghost pushed a commit that referenced this issue Apr 23, 2013

Merge pull request #3023 from y-p/no_asserts

464dd3d

Continuing #2057, s/assert/raise AssertionError/g

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

transform in groupby throws TypeError when run with python -O option #2057

transform in groupby throws TypeError when run with python -O option #2057

bluefir commented Oct 11, 2012

wesm commented Oct 11, 2012

Uh oh!

bluefir commented Oct 12, 2012

Uh oh!

wesm commented Nov 19, 2012

Uh oh!

bluefir commented Nov 19, 2012

Uh oh!

Uh oh!

transform in groupby throws TypeError when run with python -O option #2057

transform in groupby throws TypeError when run with python -O option #2057

Comments

bluefir commented Oct 11, 2012

wesm commented Oct 11, 2012

Uh oh!

bluefir commented Oct 12, 2012

Uh oh!

wesm commented Nov 19, 2012

Uh oh!

bluefir commented Nov 19, 2012

Uh oh!