-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Plotting Int64 columns with nulled integers (NAType) fails #32073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you post the full traceback? |
Gladly:
|
We need to convert to floats with NaNs before passing the data to matplotlib, I suppose. |
Hi! Can I make an attempt on this one? I'm looking for issues that can be useful to the community and help me to understand better how pandas works in depth :) |
@jeandersonbc yes, that's welcome! |
take |
@AnnaDaglis someone else (@jeandersonbc) already just commented he would like to look at it, so I would first give him some time before taking this issue |
Thanks @jorisvandenbossche, I didn't manage time to work on the issue since last time that I posted so that's why I didn't reply early. |
Yes, I think so. But the check might need to be done lower in the stack, as when calling |
take |
So, I approached the problem by checking the numerical data in _compute_plot_data from matplot's backend. all tests pass, but I wonder if more tests should be added (e.g., at "pandas/tests/plotting/test_frame.py"). Any thoughts? Thanks already! |
Hi @AnnaDaglis - looks like the linked PR went stale (unfortunately), are you still interested in working on this? |
take |
I'm no longer working on this, as my minimal fix wasn't accepted and this might need a more structural solution for 3rd party lib operations on nullable types. |
take |
It's been 2 months since nobody is working on it. I am interested to work on this issue. |
take |
Awesome! Let us know if you want/need help |
Hi, I have a comment on this issue. It happened to me in v1.2.4, but it seems like it has been fixed now (v1.4.3) it has been fixed, but I have not found where or when it happened. I believe it's connected and could save time for someone who encounters this error while plotting with matplotlib: ValueError: values must be a 1D array If one runs the following code: import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'x' : [1,2,3,4,5],
'y' : [1.,2.,3.,4.1,5]
})
print(df.dtypes)
# x int64
# y float64
# dtype: object
# plot
plt.plot(df["x"],df["y"])
# convert types
df = df.convert_dtypes()
print(df.dtypes)
# x Int64
# y Float64
# dtype: object
# plot
plt.plot(df["x"],df["y"]) ie. if one converts dtypes to pandas dtypes, suddenly plotting with matplotlib fails. The code above will plot the first plot but after one converts the types, it fails. Notice that it doesn't matter if the variable is The full output is below:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer)
936 try:
--> 937 return self._constructor(self._mgr.get_slice(indexer)).__finalize__(self)
938 except ValueError:
~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis) ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer) ~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item) ~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy) ~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy) ValueError: values must be a 1D array During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py in plot(scalex, scaley, data, *args, **kwargs) ~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, data, *args, **kwargs) ~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in call(self, data, *args, **kwargs) ~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs, return_kwargs) ~/anaconda3/lib/python3.8/site-packages/matplotlib/cbook/init.py in _check_1d(x) ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in getitem(self, key) ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_with(self, key) ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values_tuple(self, key) ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer) ~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item) ~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy) ~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy) ValueError: values must be a 1D array In case it was relevant, this was my installation at that time: INSTALLED VERSIONScommit : 2cb9652 pandas : 1.2.4 |
Code Sample, a copy-pastable example if possible
Problem description
The first plotting command works, the second throws the error message
Expected Output
NAType should be treated the same way as numpy nan in plotting. Maybe transformed on the fly?
(I'm unsure if this is a pandas, a numpy, or a matplotlib issue, I'm starting here)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : 2.4.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : 0.3.3
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.2.7
numba : 0.48.0
The text was updated successfully, but these errors were encountered: