Series / DataFrame constructors inconsistent with data=None and dtype #24385

TomAugspurger · 2018-12-21T17:22:35Z

The series constructed below should (I think) be a special case of the DataFrame example, but they differ.

In [4]: pd.DataFrame(None, index=[1, 2, 3], columns=['a'], dtype=int)
Out[4]:
    a
1 NaN
2 NaN
3 NaN

In [5]: pd.Series(None, index=[1, 2, 3], dtype=int)
Out[5]:
1    0
2    0
3    0
dtype: int64

I don't know which makes more sense.

WillAyd · 2018-12-21T17:27:18Z

Kind of nuanced but I'd side with the DF constructor here

TomAugspurger · 2018-12-21T17:29:47Z

Interesting, I was going to go with the Series one, since dtype=int is an explicit request from the user, whereas data is implicit. Though neither is especially intuitive.

Perhaps raising is the best action here :) We still allow the implicitly reindexing if we force the user to do Series(0, index=[1, 2, 3], dtype=int).

WillAyd · 2018-12-21T17:38:06Z

Yea raising is a good option if there’s a general way of doing it. Either way is ambiguous

TomAugspurger · 2018-12-21T17:51:35Z

Also, apparently the output of DataFrame(None, index=[1, 2], columns=['a'], dtype=object) is object, so the dtype= is only sometimes ignored.

When passing a dict and `column=` to DataFrame, we previously passed the dict of {column: array} to the Series constructor. This eventually hit `construct_1d_object_array_from_listlike`[1]. For extension arrays, this ends up calling `ExtensionArray.__iter__`, iterating over the elements of the ExtensionArray, which is prohibiatively slow. We try to properly handle all the edge cases that we were papering over earlier by just passing the `data` to Series. We fix a bug or two along the way, but don't change any *tested* behavior, even if it looks fishy (e.g. pandas-dev#24385). [1]: pandas-dev#24368 (comment) Closes pandas-dev#24368 Closes pandas-dev#24386

When passing a dict and `column=` to DataFrame, we previously passed the dict of {column: array} to the Series constructor. This eventually hit `construct_1d_object_array_from_listlike`[1]. For extension arrays, this ends up calling `ExtensionArray.__iter__`, iterating over the elements of the ExtensionArray, which is prohibiatively slow. --- ```python import pandas as pd import numpy as np a = pd.Series(np.arange(1000)) d = {i: a for i in range(30)} %timeit df = pd.DataFrame(d, columns=list(range(len(d)))) ``` before ``` 4.06 ms ± 53.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` after ``` 4.06 ms ± 53.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` With Series with sparse values instead, the problem is exacerbated (note the smaller and fewer series). ```python a = pd.Series(np.arange(1000), dtype="Sparse[int]") d = {i: a for i in range(50)} %timeit df = pd.DataFrame(d, columns=list(range(len(d)))) ``` Before ``` 213 ms ± 7.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` after ``` 4.41 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` --- We try to properly handle all the edge cases that we were papering over earlier by just passing the `data` to Series. We fix a bug or two along the way, but don't change any *tested* behavior, even if it looks fishy (e.g. pandas-dev#24385). [1]: pandas-dev#24368 (comment) Closes pandas-dev#24368 Closes pandas-dev#24386

TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 21, 2018

TomAugspurger mentioned this issue Dec 21, 2018

DataFrame constructor ignores integer dtype when dict-data and non-overlapping columns #24386

Closed

TomAugspurger mentioned this issue Dec 21, 2018

PERF: DataFrame dict constructor with columns #24387

Closed

jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Oct 17, 2019

mroeschke added the Bug label Jun 28, 2020

mroeschke added Enhancement Error Reporting Incorrect or improved errors from pandas and removed Bug labels Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series / DataFrame constructors inconsistent with data=None and dtype #24385

Series / DataFrame constructors inconsistent with data=None and dtype #24385

TomAugspurger commented Dec 21, 2018 •

edited

Loading

WillAyd commented Dec 21, 2018

TomAugspurger commented Dec 21, 2018 •

edited

Loading

WillAyd commented Dec 21, 2018

TomAugspurger commented Dec 21, 2018

Series / DataFrame constructors inconsistent with data=None and dtype #24385

Series / DataFrame constructors inconsistent with data=None and dtype #24385

Comments

TomAugspurger commented Dec 21, 2018 • edited Loading

WillAyd commented Dec 21, 2018

TomAugspurger commented Dec 21, 2018 • edited Loading

WillAyd commented Dec 21, 2018

TomAugspurger commented Dec 21, 2018

TomAugspurger commented Dec 21, 2018 •

edited

Loading

TomAugspurger commented Dec 21, 2018 •

edited

Loading