-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Empty list passed to Series
returns object
dtype, but via DataFrame
returns float64
#56679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Series
returns object
dtype, but the same done via DataFrame
returns float64
Series
returns object
dtype, but via DataFrame
returns float64
Same is the behavior for setitem flow too:
|
Thanks for the report. I would also expect this to be In [4]: pd.DataFrame(columns=["a"]).dtypes
Out[4]:
a object
dtype: object
In [5]: pd.DataFrame([], columns=["a"]).dtypes
Out[5]:
a object
dtype: object
In [6]: pd.DataFrame({}, columns=["a"]).dtypes
Out[6]:
a object
dtype: object
In [7]: pd.DataFrame({"a": []}, columns=["a"]).dtypes
Out[7]:
a float64
dtype: object
In [8]: pd.DataFrame({"a": pd.Series()}).dtypes
Out[8]:
a object
dtype: object It looks like this goes through if len(data) == 0 and dtype is None:
# We default to float64, matching numpy
subarr = np.array([], dtype=np.float64) I'm not sure if there a reason internally why we need to treat this as float64 but I would expect at least via this constructor route that object is still returned |
I would like to take a look at this issue |
Related: on an empty DataFrame,
due to this line: pandas/pandas/core/internals/managers.py Line 1802 in 1bf86a3
I ran into this because |
With both changes (handling the OP and the one I mentioned above), I'm seeing 35 tests fail in the expected way (i.e. there isn't some functionality we definitely don't want to change that breaks). It seems clear to me these changes would make dtypes on empty objects more consistent. The only question on my mind is if this is a bug fix or needs deprecation. |
Especially with 3.0 as the next release I would be OK treating this as a "bug fix" |
In starting to work on this, one of the things I noticed is that
I'm thinking this should also be a NumPy object array? |
cc @jbrockmendel thoughts? |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
There seems to be an inconsistency when creating a Series from empty list via
Series
&DataFrame
constructors. The former yieldsobject
dtype, the later returnsfloat64
dtype.Expected Behavior
Return
object
inDataFrame
constructorInstalled Versions
/nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : a671b5a
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-88-generic
Version : #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.4
numpy : 1.24.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.6
pytest : 7.4.3
hypothesis : 6.91.0
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2023.12.1
gcsfs : None
matplotlib : None
numba : 0.57.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 14.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2023.12.1
scipy : 1.11.4
sqlalchemy : 2.0.23
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: