DataFrame should allow to hand in dtypes for every column #26305

dwt · 2019-05-07T09:08:02Z

Currently it's quite hard to create empty data frames that have specific columns and types. This seems to happen to me on a regular basis, which is why I add this feature request.

Code Sample, a copy-pastable example if possible

# state of the art / workaround
            empty_result = pd.DataFrame(np.empty((0,), 
                dtype=[
                    ('time', datetime),
                    ('ability', float),
                    ('error', float),
                    ('index', int),
                    ('index_error', float)
                ]))
# how it should be
            empty_result = pd.DataFrame(
                dtype=[
                    ('time', datetime),
                    ('ability', float),
                    ('error', float),
                    ('index', int),
                    ('index_error', float)
                ])

Problem description

It's just very non intuitive to specify a data frame completely when it's empty. I seem to need this on a regular basis when writing apis that deal with empty incoming data, which would then fail other pandas operations and correct empty data frames need to be constructed to return them instead.

Expected Output

An empty data frame should be constructed.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 18.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.utf-8
LOCALE: None.None

pandas: 0.24.2
pytest: 3.4.2
pip: 19.1
setuptools: 40.6.3
Cython: None
numpy: 1.14.6
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 2.2.4
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.18
pymysql: 0.9.3
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

jreback · 2019-05-07T11:05:43Z

duplicate of #9133 and #4464

we already accept this in .astype() so would take it for the constructor; pandas gets updated by pull requests from the community; you are welcome to do this.

jreback closed this as completed May 7, 2019

jreback added Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request labels May 7, 2019

jreback added this to the No action milestone May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame should allow to hand in dtypes for every column #26305

DataFrame should allow to hand in dtypes for every column #26305

dwt commented May 7, 2019

INSTALLED VERSIONS

jreback commented May 7, 2019

DataFrame should allow to hand in dtypes for every column #26305

DataFrame should allow to hand in dtypes for every column #26305

Comments

dwt commented May 7, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented May 7, 2019

Output of `pd.show_versions()`