Skip to content

API/CLN: timeseries plotting #15071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Jan 5, 2017 · 2 comments
Open

API/CLN: timeseries plotting #15071

jorisvandenbossche opened this issue Jan 5, 2017 · 2 comments
Labels
API Design Clean Needs Discussion Requires discussion from core team before further action Visualization plotting

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 5, 2017

Inspired by the timedelta plotting issue, I thought to look again at our timeseries plotting machinery. We know it is quite complex, and due to that several bugs, inconsistencies or unexpected behaviours exist (eg different results depending on order of plotting several serieses, wrong results when combining different types of time series, among others #9053, #6608, #14322, ..).
There has been some discussion related to this on the tsplot refactor PR of @sinhrks #7670 (not merged).

One of the reasons of the complexities is the distinction between 'irregular' and 'regular' time series (see eg #7670 (comment)):

  • 'regular' time series plotting is based on Periods, and is used for timeseries with a freq or inferred_freq (and also for periods)
  • 'irregular' time series plotting is based on the default matplotlib's handling of dates, i.e. converting to 'numerical values' (floats representing time in days since 0001-01-01, http://matplotlib.org/api/dates_api.html). You can always get this also for regular timeseries by passing x_compat=True.

So part of the problems and confusions comes from the differences between both (eg different label formatting) and from combining those two. Leading to the question:

Do we need both types of timeseries plotting?

The question is what the reason is that we convert DatetimeIndex to periods for plotting. The reasons I can think of:

  • Performance. Currently, the regular plotting is faster (so for a regular series ts.plot()is faster as ts.plot(x_compat=True)). However, I think this could be solved as most of the time is spent in converting the datetimes to floats (which should be vectorizable).
  • Nicer tick label locations and formatting. This is a clear plus, our (convoluted) ticklocators and formatters give much nicer results as the default matplotlib (IMO)

Others reasons that I am missing?

But, there are also clear drawbacks. Apart from the things mentioned above, you sometimes get clearly wrong behaviour: see eg the plot in #7670 (comment). In this case, the dates somewhere within a month, are snapped to the month edges when first a regular series is plotted with monthy frequency.
Another example of 'wrong' plotting is a yearly series (bug with freq 'A-dec', so end of year) plotted in the beginning of a year. See http://nbviewer.jupyter.org/gist/jorisvandenbossche/c0c68dce2fa02f1dfc4a8c343ec88cb6. But of course, in many cases, this behaviour is can also be the desired behaviour.

But do we need both? Would we want, if possible, to unify into one approach?

Can we unify both approaches?

Can we just use the matplotlib floats for timeseries plotting? Or always use the period-based machinery?

  • Using matplotlib's float-based plotting
    • Do we want this? It will give slightly different behaviour for certain 'regular' cases.
    • Assuming we can implement a similar tick locator/formatter comparable to period-based one. But, this may be impossible and the reason we have the current situation?
    • But we could keep the PeriodConverter for purely plotting actual Periods
    • Problem: float64 representing days can only give a precision of ~5µs, not up to 1ns (note: the period-based plotting can also not handle ns, but can handle 1µs precision).
  • Using period-based plotting for all timeseries
    • Do we want this? (deviates more from matplotlib -> larger difference in plotting dates with and without importing pandas)
    • What prevents us from converting an irregular timeseries to Periods? I would think we can find some common freq in almost all cases? (just a high-precision freq if needed)
  • Or create a new converter based on datetime64[ns] (so int64)?
    • Instead of using matplotlibs floats, and instead of varying freq Periods (at least for DatetimeIndex)
    • Again, assuming we can have nice tick label locator/formatting for this

cc @pandas-dev/pandas-core (especially @TomAugspurger and @sinhrks, I think you haven been most involved in plotting code recently, or @wesm for historical viewpoint)
I know it's a long issue, but if you could give it a read and give your thoughts on this, very welcome!

@jorisvandenbossche jorisvandenbossche added API Design Needs Discussion Requires discussion from core team before further action Visualization plotting labels Jan 5, 2017
@jreback
Copy link
Contributor

jreback commented Jan 5, 2017

cc @tcaswell
cc @mdboom

@wesm
Copy link
Member

wesm commented Jan 16, 2017

As I recall, the time series plotting with periods originated in scikits.timeseries. I am not especially attached to it -- if you can unify / have a single code path for plotting without significantly changing functionality, sounds good to me.

jgehrcke added a commit to jgehrcke/ci-analysis that referenced this issue Dec 7, 2020
There's a lot of magic going on between how the datetime64 values
actually encode datetime in plots. Sharing an axis across (sub)plots is
brittle w.r.t. these differences. Work around this, here: make it so that
individual timestamps have a non-zero value for seconds, by simply adding
one second, shifting the whole data set by one second to the left. That
prevents, I guess, an optimization to hit in which would see that
individual timestamps hit the full hour or integer  multiples of 30 or 15
minutes. Also see

 pandas-dev/pandas#15874
 pandas-dev/pandas#15071
 pandas-dev/pandas#31074
 pandas-dev/pandas#29705
 pandas-dev/pandas#29719
 pandas-dev/pandas#18571
 pandas-dev/pandas#11574
 pandas-dev/pandas#22586
@mroeschke mroeschke added the Clean label May 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Clean Needs Discussion Requires discussion from core team before further action Visualization plotting
Projects
None yet
Development

No branches or pull requests

4 participants