Skip to content

iloc erroneously truncates underlying data for timestamps with timezone #30263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jakevdp opened this issue Dec 13, 2019 · 4 comments · Fixed by #41712
Closed

iloc erroneously truncates underlying data for timestamps with timezone #30263

jakevdp opened this issue Dec 13, 2019 · 4 comments · Fixed by #41712
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jakevdp
Copy link
Contributor

jakevdp commented Dec 13, 2019

This is a strange one... Here's a short repro of the error, which I can reproduce in pandas 0.21-0.25 and numpy 0.15-0.17 (I didn't try further back than that):

>>> import pandas as pd
>>> df = pd.DataFrame({'x': pd.date_range('2019', periods=10, tz='UTC')})
>>> df = df.iloc[:, :5]
>>> df._repr_html_()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-f6e53eba0a8f> in <module>()
      2 df = pd.DataFrame({'x': pd.date_range('2019', periods=10, tz='UTC')})
      3 df = df.iloc[:, :5]
----> 4 df._repr_html_()

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in _repr_html_(self)
    667             buf = StringIO("")
    668             self.info(buf=buf)
--> 669             # need to escape the <class>, should be the first line.
    670             val = buf.getvalue().replace("<", r"&lt;", 1)
    671             val = val.replace(">", r"&gt;", 1)

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, bold_rows, classes, escape, max_rows, max_cols, show_dimensions, notebook, decimal, border)
   1732 
   1733         >>> df.to_records(column_dtypes={"A": "int32"})
-> 1734         rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
   1735                   dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])
   1736 

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in to_html(self, classes, notebook, border)
    732                 )
    733             )
--> 734 
    735     def _join_multiline(self, *strcols):
    736         lwidth = self.line_width

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in write_result(self, buf)
   1206         else:
   1207             threshold = None
-> 1208 
   1209         # if we have a fixed_width, we'll need to try different float_format
   1210         def format_values_with(float_format):

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in _write_body(self, indent)
   1369 
   1370     Examples
-> 1371     --------
   1372     Keeps all entries different after rounding:
   1373 

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in _write_regular_rows(self, fmt_values, indent)
   1403     to_begin = unique_pcts[0] if unique_pcts[0] > 0 else None
   1404     to_end = 100 - unique_pcts[-1] if unique_pcts[-1] < 100 else None
-> 1405 
   1406     # Least precision that keeps percentiles unique after rounding
   1407     prec = -np.floor(

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in <genexpr>(.0)
   1403     to_begin = unique_pcts[0] if unique_pcts[0] > 0 else None
   1404     to_end = 100 - unique_pcts[-1] if unique_pcts[-1] < 100 else None
-> 1405 
   1406     # Least precision that keeps percentiles unique after rounding
   1407     prec = -np.floor(

IndexError: list index out of range

It seems that somehow, the second argument of iloc is truncating the underlying data along the wrong axis:

>>> df.shape
(10, 1)

>>> df.values.shape
(5, 1)

>>> print(df)
                          x
0 2019-01-01 00:00:00+00:00
1 2019-01-02 00:00:00+00:00
2 2019-01-03 00:00:00+00:00
3 2019-01-04 00:00:00+00:00
4 2019-01-05 00:00:00+00:00
5                          
6                          
7                          
8                          
9       

This only appears to happen if the data contains timestamps with a timezone specified. If I remove tz='UTC' above, everything works properly.

@TomAugspurger
Copy link
Contributor

Hmm, thanks for the report. Reproduced on master.

@TomAugspurger TomAugspurger added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 13, 2019
@TomAugspurger
Copy link
Contributor

This probably happens for all extension arrays, and only when there's a single column

In [6]: df2 = pd.DataFrame({"A": pd.array([1] * 10, dtype="Int64")})
   ...:
   ...:

In [7]: df2.iloc[:, :5]
> /Users/taugspurger/sandbox/pandas/pandas/core/internals/blocks.py(1820)_slice()
-> if isinstance(slicer, tuple) and len(slicer) == 2:
(Pdb) c
Out[7]:
   A
0  1
1  1
2  1
3  1
4  1
5
6
7
8
9

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 15, 2019
@jbrockmendel
Copy link
Member

This works on master. Could use a test and/or bisect

@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Sep 4, 2020
@luijkr
Copy link

luijkr commented Dec 28, 2020

Hi all, first time contributor here, and I'd like to pick this issue up.
I've read the contributing docs, but just to be sure: the issue concerns writing a test / tests to guard against the above behavior, i.e. where iloc truncates along the wrong axis in case of a timestamp column with timezone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
7 participants