Skip to content

Locators off by one day #24784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matmcnic opened this issue Jan 15, 2019 · 3 comments
Open

Locators off by one day #24784

matmcnic opened this issue Jan 15, 2019 · 3 comments
Labels

Comments

@matmcnic
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd 
import datetime
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, DayLocator
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
#%matplotlib inline

#comment out any day to show the example with non-contiguous dates
d = {
    '2017-10-01': 20,
    '2017-10-02': 20,
    '2017-10-03': 15,
    '2017-10-04': 1,
    '2017-10-05': 118,
    '2017-10-06': 16,
}

df = pd.DataFrame(list(d.items()), columns=['DATE', 'TMIN'])
df['DATE'] = df['DATE'].astype('datetime64[ns]')

fig, ax = plt.subplots(figsize=(8, 4))

df.plot(x='DATE', y='TMIN', kind='line', ax=ax)

ax.xaxis.set_major_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("\n%b%d"));

#Turn off minor ticks
ax.xaxis.set_minor_formatter(mdates.DateFormatter(""));

ax.grid(True, which='both')

print(ax.get_xticks())

Problem description

Please note: I reported this to Matplotlib also. Please see comments there (including a work around that worked for me).

When using any of the Locators the date calculation (x axis labels) are off by one day sometimes. This occurs when the x data is a list of contiguous dates. If the dates are not contiguous the problem does not occur.

-Using data from a list of contiguous dates:
--problem occurs
--ax.get_xticks() shows that Epoch from Jan 01, 1970 is used

-Using data from a list with non-contiguous dates:
--problem does not occur
--ax.get_xticks() shows that Epoch from Jan 01, 0000 is used

This seems to be a problem in how a date is calculated given an Epoch number of days when the Epoch Jan 01, 1970 is used. Perhaps because some systems use 0 and some use 1 as the first day Jan 01, 1970?

Also notice the print(ax.get_xticks()) -- you can see that for contiguous data it uses Epoch 1970 and for non-contiguous it uses Epoch 0000.

Examples from print(ax.get_xticks()) (notice different Epochs used)
Contiguous data: [17440. 17441. 17442. 17443. 17444. 17445.]
Non-contiguous data: [736603. 736604. 736605. 736606. 736607. 736608.]

Expected Output

The plot should have Oct 01 instead of Sep 30 for the first tick
51133587-6893dc80-1803-11e9-8f38-728efb24fc26

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 15 Stepping 11, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

The source for these are in https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_timeseries.py if you want to look further.

@matmcnic
Copy link
Author

matmcnic commented Jan 26, 2019

I looked at the source but I was unable to determine the cause of the error.

Below is some more info that may be helpul...

It looks like the epoch is correct and the locator is correct, but the formatter's conversion to string is not correct.

In the attached output image and sample I use my formatter to show the epoch for the minor ticks and the internal formatter to show the date string for the major ticks. When I use my formatter it puts out the correct 1970 epoch numbers for each data point (2018-01-01 is 17532, etc.) but when I use the internal formatter (mdates.DateFormatter("\n%b%d")) the date string is off by one day. In my output image you can see the first tick is 17532, Dec 31 (but 17532 should be Jan 01).

Also the following prints the correct epoch numbers so it is not a case of incorrect epoch's for each tick but rather something in the epoch to date string calculation (but I can't see where that happens in the source).

print(ax.get_xticks())
Result: [17532. 17533. 17534. 17535. 17536. 17537.]

pandasdebug

My updated sample code:

import pandas as pd 
import datetime
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates

def format_x(x, pos=None):
    return(x)

d = {
    '2018-01-01': 20,
    '2018-01-02': 20,
    '2018-01-03': 15,
    '2018-01-04': 1,
    '2018-01-05': 118,
    '2018-01-06': 16,
}

df = pd.DataFrame(list(d.items()), columns=['DATE', 'TMIN'])
df['DATE'] = df['DATE'].astype('datetime64[ns]')

fig, ax = plt.subplots(figsize=(8, 4))

df.plot(x='DATE', y='TMIN', kind='line', ax=ax)

ax.xaxis.set_major_locator(mdates.DayLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("\n%b%d"))
ax.xaxis.set_minor_formatter(ticker.FuncFormatter(format_x))

ax.grid(True, which='both')

plt.show()

print(ax.get_xticks())

Perhaps the days-since-epoch are being interpreted using local time instead of UTC? That would give the Dec 31 instead of Jan 01 bug? (I looked through the pandas source but I can't see where the cals are?).

import datetime
days_test = 17532 * 86400
local_time = datetime.datetime.fromtimestamp(days_test).strftime('%Y-%m-%d %H:%M:%S')
utc_time = datetime.datetime.utcfromtimestamp(days_test).strftime('%Y-%m-%d %H:%M:%S')
print(f'{local_time}\n{utc_time}')

Results:
2017-12-31 19:00:00
2018-01-01 00:00:00

Also...
The problem also goes away in pandas if I use x_compat=True. This causes pandas to use the 0000 epoch instead of the 1970 epoch. If you add x_compat=True to my test code above you'll see the epoch days printed on the x-axis (and see they are epoch 0000).

df.plot(x='DATE', y='TMIN', kind='line', ax=ax, x_compat=True)

I ran a similar test in just matplotlib. The problem does not occur and it looks like they use the 0000 epoch instead of 1970. Pandas also will use the 0000 epoch if you have a break in the dates (just comment out one of my days above to see it) and in that case pandas no longer has the problem (so it's only when 1970 epoch is used in pandas that the bug shows up).

My matplotlib test and result:

import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker

def format_x(x, pos=None):
    return(x)

dates = ['2018-01-01','2018-01-02','2018-01-03','2018-01-04','2018-01-05','2018-01-06']
x = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in dates]
y = [20, 20, 15, 1, 118, 16,]

plt.gca().xaxis.set_major_locator(mdates.DayLocator())
plt.gca().xaxis.set_minor_locator(mdates.DayLocator())
plt.gca().xaxis.set_minor_formatter(mdates.DateFormatter("\n%b%d"))
plt.gca().xaxis.set_major_formatter(ticker.FuncFormatter(format_x))

plt.plot(x,y)
plt.gca().grid(True, which='both')

print(plt.gca().get_xticks())

matplotlibtest
[736695. 736696. 736697. 736698. 736699. 736700.]

@mroeschke mroeschke added the Visualization plotting label May 27, 2019
@itssujeeth
Copy link

As part of #pandashack2019, planning to work on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants