You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's another example that shows it's actually the order of the merge that triggers the error (apparently because you can cast from interval to categorical but not vice versa at the moment):
importnumpyasnpimportpandasaspdintvl_arr=pd.Series([pd.Interval(0, 1), pd.Interval(1, 2)], dtype="interval")
cat_arr=intvl_arr.astype("category")
df1=pd.DataFrame({"a": intvl_arr})
df2=pd.DataFrame({"a": cat_arr})
pd.merge(df1, df2, how="inner", on="a")
# No errorpd.merge(df2, df1, how="inner", on="a")
# TypeError: data type not understood
This issue is a bit larger than Categorical[Interval] and actually applies to all Categorial[ExtensionDtype], e.g. the same error will occur for periods.
The offending line is the return statement of Categorical.astype where we use the numpy constructor, which doesn't understand pandas extension dtypes:
The workaround would be to add a check for extension dtypes and return a pd.array instead, which should be able to generically handle any pandas extension dtype. Something like:
Failure on merging on Categorical columns which include intervals.
For instance, the following raises
TypeError: data type not understood
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.5
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: