-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.__setitem__ converts Index to RangeIndex for length-zero value #22060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There are likely two different cases The first, In [51]: b = a.set_index(['col0'])
In [52]: b.index
Out[52]: DatetimeIndex([], dtype='datetime64[ns]', name='col0', freq=None)
In [53]: b['col3'] = []
In [54]: b.index
Out[54]: RangeIndex(start=0, stop=0, step=1) Not sure about the second issue. |
in 1st case (NOT MultiIndex), type of index is lost in method In [6]: b.shape
Out[6]: (0, 2)
In [7]: b.index
Out[7]: DatetimeIndex([], dtype='datetime64[ns]', name='col0', freq=None)
In [8]: b.insert(2, 'col3', [])
In [9]: b.index
Out[9]: RangeIndex(start=0, stop=0, step=1) in # pandas/core/frame.py
class DataFrame(NDFrame):
...
def _ensure_valid_index(self, value):
"""
ensure that if we don't have an index, that we can create one from the
passed value
"""
# GH5632, make sure that we are a Series convertible
if not len(self.index) and is_list_like(value):
... I'm not sure that it's appropriate that Furthermore, on substitution non-zero-length list into zero-rows DataFrame as new column, it also behaves strange. import pandas as pd
from datetime import datetime
a = pd.DataFrame([[datetime.now(), 1234, 3.1415]], columns=['col0', 'col1', 'col2']).iloc[[]]
b = a.set_index(['col0']) # b has 0 rows, 3 columns and non-multi index
b['col3'] = [1,2,3] # <- NO ERROR!
print(b)
# col1 col2 col3
# 0 NaN NaN 1
# 1 NaN NaN 2
# 2 NaN NaN 3
c = a.set_index(['col0', 'col1']) # c has 0 rows, 2 columns and multi index
c['col3'] = [1,2,3] # <- ERROR!
# ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'
print(c) If DataFrame has non-zero rows, substitution of length-mismatch list as a new column occurs "ValueError: Length of values does not match length of index" in both non-multi index and multi index cases. |
I don't think it is appropriate. |
This seems to work now Code is returning:
|
Code Sample
Problem description
When some operations for DataFrames with zero-rows are executed, various information of their indice are lost. Furthermore types and triggers of lost information are not inconsistent between MultiIndex and normal Index.
In case of DataFrame with non-MultiIndex, both dtype and
x.index.name
are lost on appending new column by substitution of empty list object.In case of having MultiIndex, dtypes are lost just on calling
x.set_index([x, y,...])
. Howeverx.index.names
are preserved on appending new column.Expected Output
In my opinion, there are little bad effect if all dtype(s) and name(s) are preserved on any these example cases. and it's consistent with cases of operation for non-zero-rows DataFrame.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.3.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: