Skip to content

Pandas 0.15.1 df.interpolate().plot() wrong result with NaT in datelike index #8914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sorenwacker opened this issue Nov 27, 2014 · 8 comments
Labels

Comments

@sorenwacker
Copy link

I am using a DataFrame df with dates as index and some data with missing values.
All dates are from the same year 1999 and the data has some missing vales.
There is one record in the dates that is NaT.

When I use df.interploate().plot() a line is plotted between 1996 and 1716.
When NaT is committed the correct plot appears.

@jreback
Copy link
Contributor

jreback commented Nov 27, 2014

you need to provide copy pastable code and pd.show_versions()

@sorenwacker
Copy link
Author

import pandas as pd
import numpy as np
pd.options.display.mpl_style = 'default'
import matplotlib.pyplot as plt
%pylab inline
data=np.random.rand(5)
data[4]=NaN
times=['19990103','19990106','19990109','19990112','NaT']
dftimes = pd.to_datetime(times,format='%Y%m%d')
for i in dftimes:
print i
df=pd.DataFrame(data,dftimes)
df.interpolate().plot()

INSTALLED VERSIONS

commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.17.2-200.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.1
nose: 1.3.4
Cython: 0.21
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 1.5
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.2
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: 0.7.7
apiclient: None
rpy2: 2.5.2
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Nov 28, 2014

This is correct. Not sure what you think is incorrect here. Not sure that matplotlib handles NaT in plotting. You will generally need to drop these indexes. As I pointed out before it is a little odd to do this (have NaT in the index).

In [29]: df.interpolate()
Out[29]: 
                   0
1999-01-03  0.791905
1999-01-06  0.406823
1999-01-09  0.300865
1999-01-12  0.690322
NaT         0.690322

In [32]: df.interpolate(method='index')
NotImplementedError: Interpolation with NaNs in the index has not been implemented. Try filling those NaNs before interpolating.

@sorenwacker
Copy link
Author

look at the plot

df.interpolate().plot()

there you see that the line expanded to the 18th century.

@jreback
Copy link
Contributor

jreback commented Nov 28, 2014

I am using matplotlib 1.4.0 and current master of pandas, and it raises
ValueError: No sensible date limit could be found in the AutoDateLocator.

seems ok to me.

Not really sure matplotlib actually supports NaT on an index (as I indicated above).
And in general having a missing value on an INDEX is just not a good idea. doesn't make much sense.

and easy enough to simply drop them first.

@TomAugspurger

@TomAugspurger
Copy link
Contributor

Agreed that plotting (or anything) with a NaT in the index isn't good practice.

Although strangely, these two aren't the same:

In [52]: df.interpolate().plot()
Out[52]: <matplotlib.axes._subplots.AxesSubplot at 0x10daf7a90>

In [53]: fig, ax = plt.subplots()

In [54]: ax.plot(df.index, df.interpolate().values)

The first one has the x-axis decreasing (most recent to oldest)

The second one has the x-axis increasing.

@jondo
Copy link

jondo commented Nov 14, 2016

Could the currently processed #12405 be a duplicate of this?

@jreback
Copy link
Contributor

jreback commented Nov 14, 2016

@jondo could be; i'll put an xref on that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants