Skip to content

DataFrame.asof() fails when some columns are NaN #20652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nickos556 opened this issue Apr 11, 2018 · 5 comments · Fixed by #21034
Closed

DataFrame.asof() fails when some columns are NaN #20652

nickos556 opened this issue Apr 11, 2018 · 5 comments · Fixed by #21034
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@nickos556
Copy link

nickos556 commented Apr 11, 2018

Code Sample, a copy-pastable example if possible

    import pandas as pd
   
    data = """dt,a,b
   2018-02-27 09:01:00,-0.00034052907999999996,
   2018-02-27 09:02:00,-0.00019981724999999998,
   2018-02-27 09:03:00,-0.0009561605,
   2018-02-27 09:04:00,-0.0005727226999999999,
   2018-02-27 09:05:00,-0.0006973449,4.2"""
 
    from io import StringIO
    buff = StringIO(data)
    df = pd.read_csv(buff, parse_dates=['dt'])
    df.set_index('dt', drop=True, inplace=True)
   
    print(df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30','2018-02-27 09:04:30'])))

# gives
#                          a   b
#    2018-02-27 09:03:30 NaN NaN
#    2018-02-27 09:04:30 NaN NaN

# but, 
    df['a'].asof(pd.DatetimeIndex(['2018-02-27 09:03:30','2018-02-27 09:04:30']))

# gives
#    2018-02-27 09:03:30   -0.000956
#    2018-02-27 09:04:30   -0.000573
#    Name: a, dtype: float64

Problem description

NaN's in column 'b' should not affect asof() on column 'a'

Expected Output

                         a   b
    2018-02-27 09:03:30 -0.000956 NaN
    2018-02-27 09:04:30 -0.000573 NaN

Output of pd.show_versions()

pandas: 0.22.0
numpy 1.14.2

@jreback
Copy link
Contributor

jreback commented Apr 11, 2018

yeah this is correct.

In [7]: df.apply(lambda x: x.asof(pd.DatetimeIndex(['2018-02-27 09:03:30','2018-02-27 09:04:30'])))
Out[7]: 
                            a   b
2018-02-27 09:03:30 -0.000956 NaN
2018-02-27 09:04:30 -0.000573 NaN

welcome for you to have a look see and PR even better!

@jreback jreback added Bug Datetime Datetime data dtype Effort Low labels Apr 11, 2018
@jreback jreback added this to the Next Major Release milestone Apr 11, 2018
@nickos556
Copy link
Author

Ok @jreback will try and take a look over the weekend. Also thanks for everything u do for pandas/open source. Its very much appreciated :)

@Licht-T
Copy link
Contributor

Licht-T commented May 8, 2018

@jreback I also think this behavior is weird, but is it really a bug? In docs,

The last row without any NaN is taken

http://pandas.pydata.org/pandas-docs/version/0.19.0/generated/pandas.DataFrame.asof.html

@dragosthealex
Copy link
Contributor

It doesn't seem like a bug. Docs specify that unless subset is specified, all the columns are being considered.

(or the last row without NaN considering only the subset of columns in the case of a DataFrame

However, the phrasing might not be obvious, I'll try to improve the docs.

@msmarchena
Copy link
Contributor

Now I'm confused about the expected result of the asof function. For instance, If I put 7 in the last entry of the first row

    import pandas as pd   
    data = """dt,a,b
    2018-02-27 09:01:00,-0.00034052907999999996,7
    2018-02-27 09:02:00,-0.00019981724999999998,
    2018-02-27 09:03:00,-0.0009561605,
    2018-02-27 09:04:00,-0.0005727226999999999,
    2018-02-27 09:05:00,-0.0006973449,4.2"""

    from io import StringIO
    buff = StringIO(data)
    df = pd.read_csv(buff, parse_dates=['dt'])
    df.set_index('dt', drop=True, inplace=True)
    print(df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30','2018-02-27 09:04:30'])))

the asof function gives

    #                            a   b
    #2018-02-27 09:03:29 -0.000341  7.0
    #2018-02-27 09:04:30 -0.000341  7.0

but

     as_of_index = DatetimeIndex(['2018-02-27 09:03:30', '2018-02-27 09:04:30'])
     new_result = df.apply(lambda x: x.asof(as_of_index))
     print(new_result)

gives

     #                        a    b
     #2018-02-27 09:03:30 -0.000956  7.0
     #2018-02-27 09:04:30 -0.000573  7.0

These results are not supposed to be the same ?

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants