iloc erroneously truncates underlying data for timestamps with timezone #30263

jakevdp · 2019-12-13T20:22:05Z

This is a strange one... Here's a short repro of the error, which I can reproduce in pandas 0.21-0.25 and numpy 0.15-0.17 (I didn't try further back than that):

>>> import pandas as pd
>>> df = pd.DataFrame({'x': pd.date_range('2019', periods=10, tz='UTC')})
>>> df = df.iloc[:, :5]
>>> df._repr_html_()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-f6e53eba0a8f> in <module>()
      2 df = pd.DataFrame({'x': pd.date_range('2019', periods=10, tz='UTC')})
      3 df = df.iloc[:, :5]
----> 4 df._repr_html_()

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in _repr_html_(self)
    667             buf = StringIO("")
    668             self.info(buf=buf)
--> 669             # need to escape the <class>, should be the first line.
    670             val = buf.getvalue().replace("<", r"&lt;", 1)
    671             val = val.replace(">", r"&gt;", 1)

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, bold_rows, classes, escape, max_rows, max_cols, show_dimensions, notebook, decimal, border)
   1732 
   1733         >>> df.to_records(column_dtypes={"A": "int32"})
-> 1734         rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
   1735                   dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])
   1736 

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in to_html(self, classes, notebook, border)
    732                 )
    733             )
--> 734 
    735     def _join_multiline(self, *strcols):
    736         lwidth = self.line_width

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in write_result(self, buf)
   1206         else:
   1207             threshold = None
-> 1208 
   1209         # if we have a fixed_width, we'll need to try different float_format
   1210         def format_values_with(float_format):

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in _write_body(self, indent)
   1369 
   1370     Examples
-> 1371     --------
   1372     Keeps all entries different after rounding:
   1373 

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in _write_regular_rows(self, fmt_values, indent)
   1403     to_begin = unique_pcts[0] if unique_pcts[0] > 0 else None
   1404     to_end = 100 - unique_pcts[-1] if unique_pcts[-1] < 100 else None
-> 1405 
   1406     # Least precision that keeps percentiles unique after rounding
   1407     prec = -np.floor(

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in <genexpr>(.0)
   1403     to_begin = unique_pcts[0] if unique_pcts[0] > 0 else None
   1404     to_end = 100 - unique_pcts[-1] if unique_pcts[-1] < 100 else None
-> 1405 
   1406     # Least precision that keeps percentiles unique after rounding
   1407     prec = -np.floor(

IndexError: list index out of range

It seems that somehow, the second argument of iloc is truncating the underlying data along the wrong axis:

>>> df.shape
(10, 1)

>>> df.values.shape
(5, 1)

>>> print(df)
                          x
0 2019-01-01 00:00:00+00:00
1 2019-01-02 00:00:00+00:00
2 2019-01-03 00:00:00+00:00
3 2019-01-04 00:00:00+00:00
4 2019-01-05 00:00:00+00:00
5                          
6                          
7                          
8                          
9

This only appears to happen if the data contains timestamps with a timezone specified. If I remove tz='UTC' above, everything works properly.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-12-13T21:07:42Z

Hmm, thanks for the report. Reproduced on master.

TomAugspurger · 2019-12-13T21:15:07Z

This probably happens for all extension arrays, and only when there's a single column

In [6]: df2 = pd.DataFrame({"A": pd.array([1] * 10, dtype="Int64")})
   ...:
   ...:

In [7]: df2.iloc[:, :5]
> /Users/taugspurger/sandbox/pandas/pandas/core/internals/blocks.py(1820)_slice()
-> if isinstance(slicer, tuple) and len(slicer) == 2:
(Pdb) c
Out[7]:
   A
0  1
1  1
2  1
3  1
4  1
5
6
7
8
9

jbrockmendel · 2020-09-04T23:40:01Z

This works on master. Could use a test and/or bisect

luijkr · 2020-12-28T08:40:06Z

Hi all, first time contributor here, and I'd like to pick this issue up.
I've read the contributing docs, but just to be sure: the issue concerns writing a test / tests to guard against the above behavior, i.e. where iloc truncates along the wrong axis in case of a timestamp column with timezone?

TomAugspurger added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 13, 2019

jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 15, 2019

jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Sep 4, 2020

phofl added good first issue and removed Bug labels Nov 23, 2020

luijkr mentioned this issue Dec 28, 2020

Iloc truncates single-column dataframe with extension arrays #38750

Closed

4 tasks

jreback added this to the 1.3 milestone Dec 28, 2020

mroeschke mentioned this issue May 29, 2021

TST: More old issues #41712

Merged

15 tasks

jreback closed this as completed in #41712 May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iloc erroneously truncates underlying data for timestamps with timezone #30263

iloc erroneously truncates underlying data for timestamps with timezone #30263

jakevdp commented Dec 13, 2019 •

edited

Loading

TomAugspurger commented Dec 13, 2019

TomAugspurger commented Dec 13, 2019

jbrockmendel commented Sep 4, 2020

luijkr commented Dec 28, 2020

iloc erroneously truncates underlying data for timestamps with timezone #30263

iloc erroneously truncates underlying data for timestamps with timezone #30263

Comments

jakevdp commented Dec 13, 2019 • edited Loading

TomAugspurger commented Dec 13, 2019

TomAugspurger commented Dec 13, 2019

jbrockmendel commented Sep 4, 2020

luijkr commented Dec 28, 2020

jakevdp commented Dec 13, 2019 •

edited

Loading