-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fixes plotting with nullable integers (#32073) #38014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is the first time I am contributing to pandas open source project. Please let me know if I can improve anything on this commit. |
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -681,6 +681,7 @@ Plotting | |||
indexed by a :class:`TimedeltaIndex` with a fixed frequency when x-axis lower limit was greater than upper limit (:issue:`37454`) | |||
- Bug in :meth:`DataFrameGroupBy.boxplot` when ``subplots=False``, a KeyError would raise (:issue:`16748`) | |||
- Bug in :meth:`DataFrame.plot` and :meth:`Series.plot` was overwriting matplotlib's shared y axes behaviour when no sharey parameter was passed (:issue:`37942`) | |||
- Bug in :meth:`DataFrame.plot` to handle nullable integers (:issue:`32073`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to be more specific:
Bug in :meth:`DataFrame.plot` was raising a ``TypeError`` with ``ExtensionDtype`` columns (:issue:`32073`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @arw2019 for your comments. I will add these changes to the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arw2019 I added the changes to the latest commit. Can you please review it again ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rkc007 for the PR!
Some comments. Also take a look at the CI (we standardize the use of pd.DataFrame
/DataFrame
within a file so having both errors)
Let us know if you have any questions!
pandas/plotting/_matplotlib/core.py
Outdated
numeric_data = numeric_data.copy() | ||
for col in numeric_data: | ||
numeric_data[col] = np.asarray(numeric_data[col]) | ||
def convert_to_ndarray(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use pd.to_numeric
here I think. Illustrative example:
In [8]: s = pd.Series([1, 2, 3, pd.NA], dtype="Int64")
In [9]: pd.to_numeric(s.to_numpy(), downcast="integer")
Out[9]: array([ 1., 2., 3., nan])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @arw2019 for the example. I think I didn't get your point here. Can you please explain it a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no don't use pd.to_numeric
we already know the type and its an internal function
) | ||
|
||
_check_plot_works(df.plot, x="A", y="B") | ||
_check_plot_works(df[["A", "B"]].astype("Int64").plot, x="A", y="B") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd rather to specify the dtype in the dataframe construction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thank you for the suggestion. I will rectify this in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arw2019 I added the changes to the latest commit. Can you please review it again ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkc007, please see my comments below.
pandas/plotting/_matplotlib/core.py
Outdated
numeric_data = numeric_data.copy() | ||
for col in numeric_data: | ||
numeric_data[col] = np.asarray(numeric_data[col]) | ||
def convert_to_ndarray(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving convert_to_ndarray
out of the scope of the function _compute_plot_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I added the changes to the latest commit. Can you please review it again ?
"A": [1, 2, 3, 4, 5], | ||
"B": [7, 5, np.nan, 3, 2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about having a nullable value in column "A" (x-axis)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanovmg I added the changes to the latest commit. Can you please review it again ?
pandas/plotting/_matplotlib/core.py
Outdated
numeric_data = numeric_data.copy() | ||
for col in numeric_data: | ||
numeric_data[col] = np.asarray(numeric_data[col]) | ||
def convert_to_ndarray(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no don't use pd.to_numeric
we already know the type and its an internal function
pandas/plotting/_matplotlib/core.py
Outdated
numeric_data = numeric_data.copy() | ||
for col in numeric_data: | ||
numeric_data[col] = np.asarray(numeric_data[col]) | ||
def convert_to_ndarray(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls do this
pandas/plotting/_matplotlib/core.py
Outdated
numeric_data[col] = np.asarray(numeric_data[col]) | ||
def convert_to_ndarray(data): | ||
# GH32073: cast to float if values contain nulled integers | ||
if is_integer_dtype(data.dtype) and is_extension_array_dtype(data.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we want to also catch / test nullable floats as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I fixed this in the latest commit.
pandas/plotting/_matplotlib/core.py
Outdated
@@ -383,6 +386,21 @@ def result(self): | |||
else: | |||
return self.axes[0] | |||
|
|||
def _convert_to_ndarray(self, data): | |||
# data = self.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out @ivanovmg.
Deleted the comment in the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nitpick about the unnecessary comment.
Otherwise looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. can you merge master and ping on green.
@jreback the branch has the latest code and all the checks passed. |
thanks @rkc007 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff