Skip to content

QST: DataFrame with freq of 2.5ms fails to plot #41064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phollox-arch opened this issue Apr 20, 2021 · 4 comments
Open

QST: DataFrame with freq of 2.5ms fails to plot #41064

phollox-arch opened this issue Apr 20, 2021 · 4 comments
Labels
Bug Datetime Datetime data dtype Visualization plotting

Comments

@phollox-arch
Copy link


Question about pandas

I have a DataFrame with a time series as index.

import pandas as pd
from numpy.random import rand

df = pd.DataFrame(rand(100000), index=None, columns=['a'])
df['time'] = pd.date_range('2020-01-01 12:30:15',
                           periods=len(df['a']), freq='ms')
df.set_index('time', inplace=True)
df.plot()

When I put a multiplier on the frequency, it becomes extremely slow to plot df2, even if it has less elements than df. It actually crashed my Python kernel, and my laptop almost ran out of RAM

df2 = pd.DataFrame(rand(50000), index=None, columns=['a'])
df2['time'] = pd.date_range('2020-01-01 12:30:15',
                            periods=len(df2['a']), freq='2.5ms')
df2.set_index('time', inplace=True)
df2.plot()

Trying with freq=2500U does not make any difference

I was wondering if this behavior is normal. Haven't tried to run this on a code profiler. Or with any other algorithms, to see if has any effect.

Thanks

Python 3.9.3-1,
Pandas 1.2.3-1,
Numpy 1.20.1-1,
Matplotlib 3.4.1-2

@phollox-arch phollox-arch added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Apr 20, 2021
@phollox-arch
Copy link
Author

phollox-arch commented Apr 21, 2021

It works when plotting directly to matplotlib

import matplotlib.pyplot as plt
plt.plot(df2);

@pablodz
Copy link

pablodz commented May 10, 2021

As you said the code below runs as expected

import pandas as pd
from numpy.random import rand

df = pd.DataFrame(rand(100000), index=None, columns=['a'])
df['time'] = pd.date_range('2020-01-01 12:30:15',periods=len(df['a']), freq='ms')
df.set_index('time', inplace=True)

However,

df.plot()

fails due out of RAM looks like a backend of pandas in core.

PD: Outside jupyter works as expected.

Out of RAM occurs here

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-10-ab6419c28fa5> in <module>
      6                             periods=len(df2['a']), freq='2.5ms')
      7 df2.set_index('time', inplace=True)
----> 8 df2.plot()

~/.local/lib/python3.8/site-packages/pandas/plotting/_core.py in __call__(self, *args, **kwargs)
    947                     data.columns = label_name
    948 
--> 949         return plot_backend.plot(data, kind=kind, **kwargs)
    950 
    951     __call__.__doc__ = __doc__

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py in plot(data, kind, **kwargs)
     59             kwargs["ax"] = getattr(ax, "left_ax", ax)
     60     plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 61     plot_obj.generate()
     62     plot_obj.draw()
     63     return plot_obj.result

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py in generate(self)
    275 
    276         for ax in self.axes:
--> 277             self._post_plot_logic_common(ax, self.data)
    278             self._post_plot_logic(ax, self.data)
    279 

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py in _post_plot_logic_common(self, ax, data)
    442         """Common post process for each axes"""
    443         if self.orientation == "vertical" or self.orientation is None:
--> 444             self._apply_axis_properties(ax.xaxis, rot=self.rot, fontsize=self.fontsize)
    445             self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize)
    446 

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py in _apply_axis_properties(self, axis, rot, fontsize)
    530         if rot is not None or fontsize is not None:
    531             # rot=0 is a valid setting, hence the explicit None check
--> 532             labels = axis.get_majorticklabels() + axis.get_minorticklabels()
    533             for label in labels:
    534                 if rot is not None:

~/.local/lib/python3.8/site-packages/matplotlib/axis.py in get_majorticklabels(self)
   1171     def get_majorticklabels(self):
   1172         """Return this Axis' major tick labels, as a list of `~.text.Text`."""
-> 1173         ticks = self.get_major_ticks()
   1174         labels1 = [tick.label1 for tick in ticks if tick.label1.get_visible()]
   1175         labels2 = [tick.label2 for tick in ticks if tick.label2.get_visible()]

~/.local/lib/python3.8/site-packages/matplotlib/axis.py in get_major_ticks(self, numticks)
   1342         r"""Return the list of major `.Tick`\s."""
   1343         if numticks is None:
-> 1344             numticks = len(self.get_majorticklocs())
   1345 
   1346         while len(self.majorTicks) < numticks:

~/.local/lib/python3.8/site-packages/matplotlib/axis.py in get_majorticklocs(self)
   1247     def get_majorticklocs(self):
   1248         """Return this Axis' major tick locations in data coordinates."""
-> 1249         return self.major.locator()
   1250 
   1251     def get_minorticklocs(self):

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py in __call__(self)
    947             vmin, vmax = vmax, vmin
    948         if self.isdynamic:
--> 949             locs = self._get_default_locs(vmin, vmax)
    950         else:  # pragma: no cover
    951             base = self.base

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py in _get_default_locs(self, vmin, vmax)
    927         """Returns the default locations of ticks."""
    928         if self.plot_obj.date_axis_info is None:
--> 929             self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq)
    930 
    931         locator = self.plot_obj.date_axis_info

~/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py in _daily_finder(vmin, vmax, freq)
    553         span, dtype=[("val", np.int64), ("maj", bool), ("min", bool), ("fmt", "|S20")]
    554     )
--> 555     info["val"][:] = dates_.asi8
    556     info["fmt"][:] = ""
    557     info["maj"][[0, -1]] = True

KeyboardInterrupt: 

@mzeitlin11
Copy link
Member

Thanks for reporting this @phollox-arch, investigations to look into why this happens would be great!

@mzeitlin11 mzeitlin11 added Bug Datetime Datetime data dtype Visualization plotting and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 1, 2021
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone Jul 1, 2021
@agramfort
Copy link

I can replicate the problem. Here are more information to look into it:

import pandas as pd
df = pd.read_csv('p85.csv', index_col="date", parse_dates=True)
df.plot()  # super slow ???

# now I cast to seconds via strftime
df.index = [pd.to_datetime(df.index[i].strftime("%Y-%m-%d %H:%M:%S")) for i in range(len(df.index))]
# now the plot is blazing fast !
df.plot()

here is the .csv to replicate:

p85.csv

Here on my laptop it takes 20s to plot these 5 values and if I want to plot
the full data it crashes my machine due to excessive RAM usage.

However doing this works around the problem

pd.plotting.plot_params["x_compat"] = True

Hope this helps someone to look into it 🙏

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Visualization plotting
Projects
None yet
Development

No branches or pull requests

5 participants