You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working through logs of web requests, and when I want to find the most common, say, user agent string for a (disguised) user, I run something like the following:
from pandas import Series, DataFrame, Timestamp
tdf = DataFrame({'day': {0: Timestamp('2015-02-24 00:00:00'), 1: Timestamp('2015-02-24 00:00:00'),
2: Timestamp('2015-02-24 00:00:00'), 3: Timestamp('2015-02-24 00:00:00'),
4: Timestamp('2015-02-24 00:00:00')},
'userAgent': {0: 'some UA string', 1: 'some UA string', 2: 'some UA string',
3: 'another UA string', 4: 'some UA string'},
'userId': {0: '17661101', 1: '17661101', 2: '17661101', 3: '17661101', 4: '17661101'}})
def most_common_values(df):
return Series({c: s.value_counts().index[0] for c,s in df.iteritems()})
tdf.groupby('day').apply(most_common_values)
Note that in this (admittedly unusual) example, all of the lines are identical. I'm not sure if that is necessary to recreate the issue. And, I'm obscuring the exact purpose of this code, but it reproduces the bug: The 'userId' comes back as a Timestamp, not a string. This happens after the function most_common_values returns, since that userId string is not returned as a timestamp. if we change the value of the userId to an int:
tdf['userId'] = tdf.userId.astype(int)
or if the value of the associated integer is small enough:
tdf['userId'] = '15320104`
then the results are what we'd expect (the most common value as its original type is returned.)
I imagine that for some reason something like a dateutil parser is being called on strings by default but that probably shoulnd't be happening...
The text was updated successfully, but these errors were encountered:
Good suggestion! Still, note how that returns a multiIndex? If you change to x.mode().iloc[0] so that you have a straight Timestamp index, you get back to the same weird result.
I'll dig as time allows. If you can remember the other open issue, please post. I was struggling to describe this, so it was tricky to search for open issues/SO posts/etc...
I am working through logs of web requests, and when I want to find the most common, say, user agent string for a (disguised) user, I run something like the following:
Note that in this (admittedly unusual) example, all of the lines are identical. I'm not sure if that is necessary to recreate the issue. And, I'm obscuring the exact purpose of this code, but it reproduces the bug: The 'userId' comes back as a Timestamp, not a string. This happens after the function most_common_values returns, since that userId string is not returned as a timestamp. if we change the value of the userId to an int:
or if the value of the associated integer is small enough:
then the results are what we'd expect (the most common value as its original type is returned.)
I imagine that for some reason something like a dateutil parser is being called on strings by default but that probably shoulnd't be happening...
The text was updated successfully, but these errors were encountered: