Skip to content

TST: Stateful/Wrong Test test_agg_timezone_round_trip #26864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jun 15, 2019 · 7 comments
Closed

TST: Stateful/Wrong Test test_agg_timezone_round_trip #26864

jbrockmendel opened this issue Jun 15, 2019 · 7 comments
Labels
Bug Groupby Testing pandas testing functions or related to the test suite

Comments

@jbrockmendel
Copy link
Member

from pandas/tests/groupby/aggregate/test_other.py test_agg_timezone_round_trip:

dates = [pd.Timestamp("2016-01-0%d 12:00:00" % i, tz='US/Pacific')
                 for i in range(1, 5)]
df = pd.DataFrame({'A': ['a', 'b'] * 2, 'B': dates})
grouped = df.groupby('A')
    
ts = df['B'].iloc[0]
assert ts == grouped.nth(0)['B'].iloc[0]
assert ts == grouped.head(1)['B'].iloc[0]
assert ts == grouped.first()['B'].iloc[0]
assert ts == grouped.apply(lambda x: x.iloc[0])[0]

The last assertion here is failing in a WIP branch, so I'm troubleshooting it on master and finding that it fails if I skip the first three assertions:

dates = [pd.Timestamp("2016-01-0%d 12:00:00" % i, tz='US/Pacific')
                 for i in range(1, 5)]
df = pd.DataFrame({'A': ['a', 'b'] * 2, 'B': dates})
grouped = df.groupby('A')
assert ts == grouped.apply(lambda x: x.iloc[0])[0]  # <-- KeyError: 0

Current thought is that the assertion is wrong and the test along with it. Thoughts @jreback @jorisvandenbossche ? (I don't do much in this part of the code; anyone else I should be pinging for it?)

@jbrockmendel
Copy link
Member Author

Another one from tests.indexing.test_datetime test_indexing_with_datetime_tz

df = DataFrame({'a': date_range('2014-01-01', periods=10, tz='UTC')})
result = df.iloc[5]
expected = Timestamp('2014-01-06 00:00:00+0000', tz='UTC', freq='D')
assert result == expected

This works in master, but I don't think it should. df.iloc[5] should be a single-entry Series like it would be for any other dtype.

@jreback
Copy link
Contributor

jreback commented Jun 15, 2019

no there is some state modification in groupby atm (there is an issue)
one of the _set_selection things iirc

@jbrockmendel
Copy link
Member Author

Are you saying these tests are correct? Because if so I'm extra confused

@jorisvandenbossche
Copy link
Member

I should look more in detail to whether the tested output is correct or not (what are the rules about the key column when doing apply?). But for the actual issue, as you notice, you can get different answers from the same call, depending on what was called before (the state modification Jeff mentions).

Running the example:

In [12]: dates = [pd.Timestamp("2016-01-0%d 12:00:00" % i, tz='US/Pacific') 
    ...:                  for i in range(1, 5)] 
    ...: df = pd.DataFrame({'A': ['a', 'b'] * 2, 'B': dates}) 
    ...: grouped = df.groupby('A')

In [13]: grouped.apply(lambda x: x.iloc[0])
Out[13]: 
   A                         B
A                             
a  a 2016-01-01 12:00:00-08:00
b  b 2016-01-02 12:00:00-08:00

In [14]: grouped.first()['B'] 
Out[14]: 
A
a   2016-01-01 12:00:00-08:00
b   2016-01-02 12:00:00-08:00
Name: B, dtype: datetime64[ns, US/Pacific]

In [15]: grouped.apply(lambda x: x.iloc[0]) 
Out[15]: 
A
a   2016-01-01 12:00:00-08:00
b   2016-01-02 12:00:00-08:00
dtype: datetime64[ns, US/Pacific]

which is of course obviously a bug.

But I am not fully sure whether it's the first or the second output of that call that is the desired behaviour (I think the first, meaning that the test is "incorrect")

@simonjayhawkins simonjayhawkins added Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 18, 2019
@jbrockmendel
Copy link
Member Author

@mroeschke am i remembering right that you've been working on groupby-related stuff recently? any idea how this could become stateful?

(the test mentioned in the OP has been fixed, but that fix did not address the potential for statefulness)

@mroeschke
Copy link
Member

Sorry I've been in the weeds of rolling instead of groupby. I haven't really dug into the ops of groupby ever; just the high level methods at times.

@rhshadrach
Copy link
Member

This was due to the column selection context manager which has since been removed. Also groupby acting on the grouping columns is now deprecated. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

6 participants