Skip to content

BUG: repair 'style' kwd handling in DataFrame.plot (#21003) #33821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 5, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -703,11 +703,12 @@ I/O
Plotting
^^^^^^^^

- :func:`.plot` for line/bar now accepts color by dictonary (:issue:`8193`).
- :func:`.plot` for line/bar now accepts color by dictionary (:issue:`8193`).
- Bug in :meth:`DataFrame.plot.hist` where weights are not working for multiple columns (:issue:`33173`)
- Bug in :meth:`DataFrame.boxplot` and :meth:`DataFrame.plot.boxplot` lost color attributes of ``medianprops``, ``whiskerprops``, ``capprops`` and ``medianprops`` (:issue:`30346`)
- Bug in :meth:`DataFrame.hist` where the order of ``column`` argument was ignored (:issue:`29235`)
- Bug in :meth:`DataFrame.plot.scatter` that when adding multiple plots with different ``cmap``, colorbars alway use the first ``cmap`` (:issue:`33389`)
- Bug in :meth:`DataFrame.plot` where a marker letter in the ``style`` keyword sometimes causes a ``ValueError`` (:issue:`21003`)


Groupby/resample/rolling
Expand Down
26 changes: 16 additions & 10 deletions pandas/plotting/_matplotlib/core.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import re
from typing import Optional
import warnings

Expand Down Expand Up @@ -40,6 +39,15 @@
)


def _color_in_style(style: str) -> bool:
"""
Check if there is a color letter in the style string.
"""
from matplotlib.colors import BASE_COLORS

return not set(BASE_COLORS).isdisjoint(style)


class MPLPlot:
"""
Base class for assembling a pandas plot using matplotlib
Expand Down Expand Up @@ -193,7 +201,6 @@ def __init__(
self._validate_color_args()

def _validate_color_args(self):
import matplotlib.colors

if (
"color" in self.kwds
Expand Down Expand Up @@ -226,13 +233,12 @@ def _validate_color_args(self):
styles = [self.style]
# need only a single match
for s in styles:
for char in s:
if char in matplotlib.colors.BASE_COLORS:
raise ValueError(
"Cannot pass 'style' string with a color symbol and "
"'color' keyword argument. Please use one or the other or "
"pass 'style' without a color symbol"
)
if _color_in_style(s):
raise ValueError(
"Cannot pass 'style' string with a color symbol and "
"'color' keyword argument. Please use one or the "
"other or pass 'style' without a color symbol"
)

def _iter_data(self, data=None, keep_index=False, fillna=None):
if data is None:
Expand Down Expand Up @@ -723,7 +729,7 @@ def _apply_style_colors(self, colors, kwds, col_num, label):
style = self.style

has_color = "color" in kwds or self.colormap is not None
nocolor_style = style is None or re.match("[a-z]+", style) is None
nocolor_style = style is None or not _color_in_style(style)
if (has_color or self.subplots) and nocolor_style:
if isinstance(colors, dict):
kwds["color"] = colors[label]
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/plotting/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,23 @@ def test_color_and_style_arguments(self):
with pytest.raises(ValueError):
df.plot(color=["red", "black"], style=["k-", "r--"])

def test_color_and_marker(self):
# GH21003 - code sample from 2018-05-10 is equivalent to this test
df = DataFrame(np.random.random((7, 4)))
# combining a color string and a marker letter should be allowed
df.plot(color="green", style="d") # d for diamond
Copy link
Member

@charlesdong1991 charlesdong1991 May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also check color and style here?
something like:

ax = df.plot(color="green", style="d")
assert ax.lines[0].get_color() == 'green'
assert ax.lines[0].get_marker() == 'd'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my reply below.


def test_color_list_and_marker(self):
# GH21003 - code sample from 2020-04-23 is equivalent to this test
df = DataFrame(np.random.random((7, 4)))
color_list = ["yellow", "red", "green", "blue"]
ax = df.plot(color=color_list, style="d")
# Before this patch was introduced, the previous line of code resulted
# in a plot where each individual line was assigned a list of colors:
# ax.lines[i].get_color() == ['yellow', 'red', 'green', 'blue']
# which resulted in a ValueError when plt.draw() was called.
assert [line.get_color() for line in ax.lines] == color_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, also use ax.lines.get_marker() to verify that marker is set correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! This will introduce lots of duplicate code between the tests test_color_and_style_arguments, test_color_and_marker and test_color_list_and_marker. Namely df.plot and the list comprehensions. I'll see how I can refactor this.

Copy link
Contributor Author

@joooeey joooeey May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more readable to test these four combinations of color and style arguments (test_color_and_style_arguments has two) in four separate test methods or all in one? I'd go for the former but I've noticed lots of huge test methods in frame.py so I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this will work in one test?

assert [line.get_color() for line in ax.lines] == color_list
assert all(line.get_marker() == 'd' for line in ax.lines)

Copy link
Member

@charlesdong1991 charlesdong1991 May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, i would prefer to combine both of your tests, something like this

@pytest.mark.parametrize("color, output_color_list", 
   [('green', ['green'] * 4), 
    (['yellow', 'red', 'green', 'blue'], ['yellow', 'red', 'green', 'blue'])
])
def test_color_and_marker(self, color, output_color_list):
     df = DataFrame(np.random.random((7, 4)))
     ax = df.plot(color=color, style="d")
     assert [line.get_color() for line in ax.lines] == output_color_list
     assert all(line.get_marker() == 'd' for line in ax.lines)

And regarding test_color_and_style_arguments, it's kinda out of the scope of your PR, and you can just leave it aside.
@joooeey does this sound good to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, import pandas results in this error:

import pandas
Traceback (most recent call last):

  File "<ipython-input-1-38d4b0363d82>", line 1, in <module>
    import pandas

  File "C:\Users\Joey\pandas-joooeey\pandas\__init__.py", line 52, in <module>
    from pandas.core.api import (

  File "C:\Users\Joey\pandas-joooeey\pandas\core\api.py", line 15, in <module>
    from pandas.core.arrays import Categorical

  File "C:\Users\Joey\pandas-joooeey\pandas\core\arrays\__init__.py", line 8, in <module>
    from pandas.core.arrays.datetimes import DatetimeArray

  File "C:\Users\Joey\pandas-joooeey\pandas\core\arrays\datetimes.py", line 46, in <module>
    from pandas.core.arrays import datetimelike as dtl

  File "C:\Users\Joey\pandas-joooeey\pandas\core\arrays\datetimelike.py", line 11, in <module>
    from pandas._libs.tslibs.timestamps import (

ImportError: cannot import name 'integer_op_not_supported' from 'pandas._libs.tslibs.timestamps' (C:\Users\Joey\pandas-joooeey\pandas\_libs\tslibs\timestamps.cp37-win_amd64.pyd)

Copy link
Member

@charlesdong1991 charlesdong1991 May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's jupyter notebook, have you tried to shut down the notebook and restart? and then import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the IPython Console in Spyder. Restarting Spyder didn't help.

Copy link
Contributor Author

@joooeey joooeey May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest pandas fails too with Windows fatal exception: access violation:

Anaconda prompt stack trace

[...]
pandas\tests\arrays\test_numpy.py ...................................................                                                       [ 15%]
pandas\tests\arrays\test_period.py ..................................................Windows fatal exception: access violation

Thread 0x00001f80 (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 576 in _handle_results
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Thread 0x00004e4c (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 528 in _handle_tasks
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Thread 0x00001fe8 (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\connection.py", line 810 in _exhaustive_wait
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\connection.py", line 878 in wait
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 499 in _wait_for_updates
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 519 in _handle_workers
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Thread 0x000028a8 (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 114 in worker
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Thread 0x000047c0 (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 114 in worker
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Thread 0x00004cb8 (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 114 in worker
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Thread 0x0000559c (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\multiprocessing\pool.py", line 114 in worker
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 870 in run
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\threading.py", line 890 in _bootstrap

Current thread 0x000014a0 (most recent call first):
  File "C:\Users\Joey\pandas-joooeey\pandas\tests\arrays\test_period.py", line 413 in test_arrow_table_roundtrip
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\python.py", line 184 in pytest_pyfunc_call
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\callers.py", line 187 in _multicall
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 84 in <lambda>
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\hooks.py", line 286 in __call__
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\python.py", line 1479 in runtest
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 135 in pytest_runtest_call
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\callers.py", line 187 in _multicall
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 84 in <lambda>
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\hooks.py", line 286 in __call__
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 217 in <lambda>
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 244 in from_call
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 216 in call_runtest_hook
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 186 in call_and_report
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 100 in runtestprotocol
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\runner.py", line 85 in pytest_runtest_protocol
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\callers.py", line 187 in _multicall
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 84 in <lambda>
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\hooks.py", line 286 in __call__
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\main.py", line 272 in pytest_runtestloop
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\callers.py", line 187 in _multicall
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 84 in <lambda>
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\hooks.py", line 286 in __call__
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\main.py", line 247 in _main
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\main.py", line 191 in wrap_session
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\main.py", line 240 in pytest_cmdline_main
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\callers.py", line 187 in _multicall
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 84 in <lambda>
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\pluggy\hooks.py", line 286 in __call__
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\lib\site-packages\_pytest\config\__init__.py", line 124 in main
  File "C:\ProgramData\Anaconda3\envs\pandas-dev\Scripts\pytest-script.py", line 10 in <module>

(pandas-dev) c:\Users\Joey\pandas-joooeey>

I've seen this error before, mentioned on April 27 in this thread but I don't remember how I resolved it.

Copy link
Member

@charlesdong1991 charlesdong1991 May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, have no idea, i really think

python setup.py build_ext --inplace -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517

would solve the issue if you already merge the master

@WillAyd could you help with the environment issue?


def test_nonnumeric_exclude(self):
df = DataFrame({"A": ["x", "y", "z"], "B": [1, 2, 3]})
ax = df.plot()
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/plotting/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -933,4 +933,4 @@ def test_plot_no_numeric_data(self):
def test_style_single_ok(self):
s = pd.Series([1, 2])
ax = s.plot(style="s", color="C3")
assert ax.lines[0].get_color() == ["C3"]
assert ax.lines[0].get_color() == "C3"