Skip to content

BUG: Fixes plotting with nullable integers (#32073) #38014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 7, 2020

Conversation

rkc007
Copy link
Contributor

@rkc007 rkc007 commented Nov 23, 2020

@rkc007
Copy link
Contributor Author

rkc007 commented Nov 23, 2020

This is the first time I am contributing to pandas open source project. Please let me know if I can improve anything on this commit.

@@ -681,6 +681,7 @@ Plotting
indexed by a :class:`TimedeltaIndex` with a fixed frequency when x-axis lower limit was greater than upper limit (:issue:`37454`)
- Bug in :meth:`DataFrameGroupBy.boxplot` when ``subplots=False``, a KeyError would raise (:issue:`16748`)
- Bug in :meth:`DataFrame.plot` and :meth:`Series.plot` was overwriting matplotlib's shared y axes behaviour when no sharey parameter was passed (:issue:`37942`)
- Bug in :meth:`DataFrame.plot` to handle nullable integers (:issue:`32073`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to be more specific:

Bug in :meth:`DataFrame.plot` was raising a ``TypeError`` with ``ExtensionDtype`` columns (:issue:`32073`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @arw2019 for your comments. I will add these changes to the next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arw2019 I added the changes to the latest commit. Can you please review it again ?

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rkc007 for the PR!

Some comments. Also take a look at the CI (we standardize the use of pd.DataFrame/DataFrame within a file so having both errors)

Let us know if you have any questions!

numeric_data = numeric_data.copy()
for col in numeric_data:
numeric_data[col] = np.asarray(numeric_data[col])
def convert_to_ndarray(data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use pd.to_numeric here I think. Illustrative example:

In [8]: s = pd.Series([1, 2, 3, pd.NA], dtype="Int64")

In [9]: pd.to_numeric(s.to_numpy(), downcast="integer")
Out[9]: array([ 1.,  2.,  3., nan])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @arw2019 for the example. I think I didn't get your point here. Can you please explain it a bit more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no don't use pd.to_numeric we already know the type and its an internal function

)

_check_plot_works(df.plot, x="A", y="B")
_check_plot_works(df[["A", "B"]].astype("Int64").plot, x="A", y="B")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd rather to specify the dtype in the dataframe construction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thank you for the suggestion. I will rectify this in the next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arw2019 I added the changes to the latest commit. Can you please review it again ?

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkc007, please see my comments below.

numeric_data = numeric_data.copy()
for col in numeric_data:
numeric_data[col] = np.asarray(numeric_data[col])
def convert_to_ndarray(data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving convert_to_ndarray out of the scope of the function _compute_plot_data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I added the changes to the latest commit. Can you please review it again ?

Comment on lines 156 to 157
"A": [1, 2, 3, 4, 5],
"B": [7, 5, np.nan, 3, 2],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having a nullable value in column "A" (x-axis)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanovmg I added the changes to the latest commit. Can you please review it again ?

@jreback jreback added NA - MaskedArrays Related to pd.NA and nullable extension arrays Visualization plotting labels Nov 26, 2020
numeric_data = numeric_data.copy()
for col in numeric_data:
numeric_data[col] = np.asarray(numeric_data[col])
def convert_to_ndarray(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no don't use pd.to_numeric we already know the type and its an internal function

numeric_data = numeric_data.copy()
for col in numeric_data:
numeric_data[col] = np.asarray(numeric_data[col])
def convert_to_ndarray(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls do this

numeric_data[col] = np.asarray(numeric_data[col])
def convert_to_ndarray(data):
# GH32073: cast to float if values contain nulled integers
if is_integer_dtype(data.dtype) and is_extension_array_dtype(data.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we want to also catch / test nullable floats as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I fixed this in the latest commit.

@@ -383,6 +386,21 @@ def result(self):
else:
return self.axes[0]

def _convert_to_ndarray(self, data):
# data = self.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @ivanovmg.
Deleted the comment in the latest commit.

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nitpick about the unnecessary comment.
Otherwise looks good to me.

@jreback jreback added this to the 1.2 milestone Dec 4, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. can you merge master and ping on green.

@rkc007
Copy link
Contributor Author

rkc007 commented Dec 7, 2020

@jreback the branch has the latest code and all the checks passed.

@jreback jreback merged commit 0aa598a into pandas-dev:master Dec 7, 2020
@jreback
Copy link
Contributor

jreback commented Dec 7, 2020

thanks @rkc007

@rkc007 rkc007 deleted the GH32073 branch December 8, 2020 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plotting Int64 columns with nulled integers (NAType) fails
4 participants