Skip to content

bug: plot two lines, unordered xlabels with type str #18687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
boeddeker opened this issue Dec 8, 2017 · 8 comments
Open

bug: plot two lines, unordered xlabels with type str #18687

boeddeker opened this issue Dec 8, 2017 · 8 comments
Labels

Comments

@boeddeker
Copy link

Code Sample, a copy-pastable example if possible

# Your code here
%matplotlib inline

import pandas as pd
df1 = pd.DataFrame([{'x': '1', 'y': 1}, {'x': '2', 'y': 2}])
df2 = pd.DataFrame([{'x': '2', 'y': 3}, {'x': '1', 'y': 4}])

print(df1)
#    x  y
# 0  1  1
# 1  2  2
print(df2)
#    x  y
# 0  2  3
# 1  1  4

# Wrong plot
ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax)

# Correct plot
ax = None
ax = df1.sort_values(by=['x']).plot('x', 'y', ax=ax)
ax = df2.sort_values(by=['x']).plot('x', 'y', ax=ax)

Problem description

I want to plot multiple dataframes in one graph. The x values are strings. The x value order in both dataframes is different.

The first plot draws the line

  • x = ['1', '2'] and y = [1, 2]
    The second plot draws the line
  • x = ['2', '1'] and y = [4, 3]
    Since the second plot overwrites the xticks, the first line is now x = ['2', '1'] and y = [1, 2].

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.21.0
pytest: 3.3.0
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Dec 8, 2017
@TomAugspurger
Copy link
Contributor

Strange, I'm not sure what's going on. You're welcome to take a look in pandas/plotting/_core.py if you're interested :)

@boeddeker
Copy link
Author

Thanks for the hint to the file.
I already took a look with pycharm, but I didn't locate the bug.

@Licht-T
Copy link
Contributor

Licht-T commented Dec 9, 2017

I am working on this.
The current implementation ignores the order of string Index.
https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_core.py#L578

eg. This makes the same result.

%matplotlib inline
import pandas as pd

df1 = pd.DataFrame([{'x': 'a', 'y': 1}, {'x': 'b', 'y': 2}])
df2 = pd.DataFrame([{'x': 'b', 'y': 3}, {'x': 'a', 'y': 4}])

ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax)

@Licht-T
Copy link
Contributor

Licht-T commented Dec 9, 2017

There is only one solution; converting a string Index into the numeric one if available.
I don't know whether pandas should support such conversion in its internals.

@boeddeker
Copy link
Author

In the case where I hit the problem, I had strings that are not convertible to floats.
Your example highlights the error better.

Converting strings to floats would reduce the occurrence of this bug.
Maybe handling strings need another solution.

An idea: The strings (labels) can be stored in xticklabels.
If the labels are string inside the _get_xticks the xticklabels are read, append with missing labels and the xticks are calculated from them.
This would require the ax object in _get_xticks.

@boeddeker
Copy link
Author

I have now example code that demonstrates my idea.

%matplotlib inline
import pandas as pd

df1 = pd.DataFrame([{'x': 'a', 'y': 1}, {'x': 'b', 'y': 2}])
df2 = pd.DataFrame([{'x': 'b', 'y': 3}, {'x': 'a', 'y': 4}])

def df_xstr_plot(df, x=None, y=None, ax=None):
    df = df.copy()
    
    if ax is not None:
        tick_labels = list(map(
            (lambda tick_label: tick_label.get_text()), 
            ax.get_xticklabels()
        ))
    else:
        tick_labels = []
    
    for new_tick_label in df[x]:
        if new_tick_label not in tick_labels:
            tick_labels.append(new_tick_label)
            
    # map str to int
    mapping = {tick_label: i for i, tick_label in enumerate(tick_labels)}
    df['x'] = df['x'].apply(lambda x: mapping[x])
    
    ax = df.plot(x, y, ax=ax)
    
    # Assign the correct xticklabels
    ax.set_xticks(list(range(len(tick_labels))))
    ax.set_xticklabels(tick_labels)
    
    return ax

ax = None
ax = df_xstr_plot(df1, 'x', 'y', ax=ax)
ax = df_xstr_plot(df2, 'x', 'y', ax=ax)  # correct

ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax)  # wrong

@TomAugspurger
Copy link
Contributor

This seems a bit complex. I don't think pandas should be doing anything special here, we should rely on matplotlib to handle all the string <-> position logic.

@boeddeker
Copy link
Author

You are right, I forgot to test if matplotlib can handle strings.
So the solution would be to add a further branch to _get_xticks for strings, that does not convert the strings to int.

%matplotlib inline
import pandas as pd
import matplotlib.pylab as plt

df1 = pd.DataFrame([{'x': 'a', 'y': 1}, {'x': 'b', 'y': 2}])
df2 = pd.DataFrame([{'x': 'b', 'y': 3}, {'x': 'a', 'y': 4}])

def df_xstr_plot(df, x=None, y=None, ax=None):
    if ax is None:
        figure, ax = plt.subplots(1, 1)
    
    ax.plot(df[x], df[y])
    return ax

ax = None
ax = df_xstr_plot(df1, 'x', 'y', ax=ax)
ax = df_xstr_plot(df2, 'x', 'y', ax=ax)  # correct

ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax)  # wrong

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 21, 2018
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants