Skip to content

BUG: index sorting with strings & timestamps #11244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dniku opened this issue Oct 5, 2015 · 2 comments · Fixed by #30674
Closed

BUG: index sorting with strings & timestamps #11244

dniku opened this issue Oct 5, 2015 · 2 comments · Fixed by #30674
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@dniku
Copy link

dniku commented Oct 5, 2015

Pandas 0.16.2, Python 3.4.

test = pd.DataFrame({
    pd.Timestamp('2012-01-01 00:00:00'): ['a', 'b'],
    pd.Timestamp('2012-01-02 00:00:00'): ['c', 'd'],
    'name': ['e', 'e'],
    'aaaa': ['f', 'g']
    })
print(test)
print(test.groupby('name').first())

Fails with:

  File "/usr/lib64/python3.4/site-packages/pandas/core/groupby.py", line 106, in f
    self._set_selection_from_grouper()
  File "/usr/lib64/python3.4/site-packages/pandas/core/groupby.py", line 489, in _set_selection_from_grouper
    self._group_selection = ax.difference(Index(groupers)).tolist()
  File "/usr/lib64/python3.4/site-packages/pandas/core/index.py", line 1506, in difference
    theDiff = sorted(set(self) - set(other))
  File "pandas/tslib.pyx", line 836, in pandas.tslib._Timestamp.__richcmp__ (pandas/tslib.c:15612)
TypeError: Cannot compare type 'Timestamp' with type 'str'
@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

So the issue is below, its an Index set operation.

In [17]: Index([u'aaaa', Timestamp('2012-01-01 00:00:00'), Timestamp('2012-01-02 00:00:00'), u'name'], dtype='object').sort_values()
TypeError: Cannot compare type 'Timestamp' with type 'unicode'

You realize that this is a completely useless index. You should really not do this, mixing objects and strings.

I'll mark it as a bug, but not a high priority.

@jreback jreback changed the title TypeError while doing groupby() on DataFrame with Timestamp and str columns BUG: index sorting with strings & timestamps Oct 5, 2015
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions Compat pandas objects compatability with Numpy or Python functions Prio-low labels Oct 5, 2015
@jreback jreback added this to the Next Major Release milestone Oct 5, 2015
@jreback jreback added the Bug label Oct 5, 2015
@mroeschke
Copy link
Member

The original issue looks to work on master. Could use a test.

In [31]: test.groupby('name').first()
Out[31]:
     2012-01-01 00:00:00 2012-01-02 00:00:00 aaaa
name
e                      a                   c    f

In [32]: pd.__version__
Out[32]: '0.26.0.dev0+652.g30362ed82'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 23, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants