Skip to content

Transposing dataframe loses dtype and ExtensionArray #22120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewgsavage opened this issue Jul 29, 2018 · 2 comments
Closed

Transposing dataframe loses dtype and ExtensionArray #22120

andrewgsavage opened this issue Jul 29, 2018 · 2 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@andrewgsavage
Copy link

Code Sample, a copy-pastable example if possible

from cyberpandas import IPArray
import cyberpandas
import pandas as pd

df = pd.DataFrame({"address": IPArray(['192.168.1.1', '192.168.1.10']),"address1": IPArray(['192.168.1.1', '192.168.1.10'])})
print(df.info())
print(df.T.T.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
address     2 non-null ip
address1    2 non-null ip
dtypes: ip(2)
memory usage: 144.0 bytes
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
address     2 non-null object
address1    2 non-null object
dtypes: object(2)
memory usage: 112.0+ bytes
None


print(type(df.address.values))
print(type(df.T.T.address.values))
<class 'cyberpandas.ip_array.IPArray'>
<class 'numpy.ndarray'>

#################
df = pd.DataFrame({"A":["a","b","c","a"]})

df["B"] = df["A"].astype('category')
print(df.info())
print(df.T.T.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
A    4 non-null object
B    4 non-null category
dtypes: category(1), object(1)
memory usage: 220.0+ bytes
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
A    4 non-null object
B    4 non-null object
dtypes: object(2)
memory usage: 144.0+ bytes
None

Problem description

When transposing a df containing a non standard dtype, the dtype is lost, and the ExtensionArray becomes an ndarray. I believe this occurs because the ExtensionArray is converted to an np array in the process of transposing the df, which does not keep the dtype/ExtensionArray.

Expected Output

print(df.info()) to be the same as
print(df.T.T.info())

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0.dev0+369.gbb451e89f
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@andrewgsavage andrewgsavage changed the title Transposing dataframe loses Transposing dataframe loses dtype and ExtensionArray Jul 29, 2018
@TomAugspurger
Copy link
Contributor

In general, transpose will result in object dtype columns, since most dataframes have a mixture of dtypes. We could have a special case for when all the columns are the same dtype, and essentially do a concat on the rows.

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jul 30, 2018
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Jul 30, 2018
@TomAugspurger
Copy link
Contributor

Closed by #30091.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

2 participants