Skip to content

merge_asof tolerance is not working in both directions #14680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
VelizarVESSELINOV opened this issue Nov 17, 2016 · 4 comments
Closed

merge_asof tolerance is not working in both directions #14680

VelizarVESSELINOV opened this issue Nov 17, 2016 · 4 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@VelizarVESSELINOV
Copy link

A small, complete example of the issue

from io import StringIO

from pandas import Timedelta, merge_asof, read_csv, to_datetime

file1 = '''DATE_TIME,Label
2015-03-02 08:36:51,0
2015-03-02 08:36:54,1'''

df1 = read_csv(StringIO(file1))
df1.DATE_TIME = to_datetime(df1.DATE_TIME)
print(df1)

file2 = '''DATE_TIME,OtherData
2015-03-02 08:36:50,0
2015-03-02 08:36:51,1
2015-03-02 08:36:52,2
2015-03-02 08:36:53,3
2015-03-02 08:36:54,4
2015-03-02 08:36:55,5
2015-03-02 08:36:56,6
2015-03-02 08:36:57,7
2015-03-02 08:36:58,8
2015-03-02 08:36:59,9
2015-03-02 08:37:00,10'''

df2 = read_csv(StringIO(file2))
df2.DATE_TIME = to_datetime(df2.DATE_TIME)
print(df2)

print(merge_asof(df2, df1, on='DATE_TIME'))
print(merge_asof(df2, df1, on='DATE_TIME', tolerance=Timedelta(seconds=1)))
print(merge_asof(df2, df1, on='DATE_TIME', tolerance=Timedelta(seconds=5)))

Current Output

           DATE_TIME  Label
0 2015-03-02 08:36:51      0
1 2015-03-02 08:36:54      1
             DATE_TIME  OtherData
0  2015-03-02 08:36:50          0
1  2015-03-02 08:36:51          1
2  2015-03-02 08:36:52          2
3  2015-03-02 08:36:53          3
4  2015-03-02 08:36:54          4
5  2015-03-02 08:36:55          5
6  2015-03-02 08:36:56          6
7  2015-03-02 08:36:57          7
8  2015-03-02 08:36:58          8
9  2015-03-02 08:36:59          9
10 2015-03-02 08:37:00         10
             DATE_TIME  OtherData  Label
0  2015-03-02 08:36:50          0    NaN
1  2015-03-02 08:36:51          1    0.0
2  2015-03-02 08:36:52          2    0.0
3  2015-03-02 08:36:53          3    0.0
4  2015-03-02 08:36:54          4    1.0
5  2015-03-02 08:36:55          5    1.0
6  2015-03-02 08:36:56          6    1.0
7  2015-03-02 08:36:57          7    1.0
8  2015-03-02 08:36:58          8    1.0
9  2015-03-02 08:36:59          9    1.0
10 2015-03-02 08:37:00         10    1.0
             DATE_TIME  OtherData  Label
0  2015-03-02 08:36:50          0    NaN
1  2015-03-02 08:36:51          1    0.0
2  2015-03-02 08:36:52          2    0.0
3  2015-03-02 08:36:53          3    NaN
4  2015-03-02 08:36:54          4    1.0
5  2015-03-02 08:36:55          5    1.0
6  2015-03-02 08:36:56          6    NaN
7  2015-03-02 08:36:57          7    NaN
8  2015-03-02 08:36:58          8    NaN
9  2015-03-02 08:36:59          9    NaN
10 2015-03-02 08:37:00         10    NaN
             DATE_TIME  OtherData  Label
0  2015-03-02 08:36:50          0    NaN
1  2015-03-02 08:36:51          1    0.0
2  2015-03-02 08:36:52          2    0.0
3  2015-03-02 08:36:53          3    0.0
4  2015-03-02 08:36:54          4    1.0
5  2015-03-02 08:36:55          5    1.0
6  2015-03-02 08:36:56          6    1.0
7  2015-03-02 08:36:57          7    1.0
8  2015-03-02 08:36:58          8    1.0
9  2015-03-02 08:36:59          9    1.0
10 2015-03-02 08:37:00         10    NaN

Expected Output

           DATE_TIME  Label
0 2015-03-02 08:36:51      0
1 2015-03-02 08:36:54      1
             DATE_TIME  OtherData
0  2015-03-02 08:36:50          0
1  2015-03-02 08:36:51          1
2  2015-03-02 08:36:52          2
3  2015-03-02 08:36:53          3
4  2015-03-02 08:36:54          4
5  2015-03-02 08:36:55          5
6  2015-03-02 08:36:56          6
7  2015-03-02 08:36:57          7
8  2015-03-02 08:36:58          8
9  2015-03-02 08:36:59          9
10 2015-03-02 08:37:00         10
             DATE_TIME  OtherData  Label
0  2015-03-02 08:36:50          0    NaN
1  2015-03-02 08:36:51          1    0.0
2  2015-03-02 08:36:52          2    0.0
3  2015-03-02 08:36:53          3    0.0
4  2015-03-02 08:36:54          4    1.0
5  2015-03-02 08:36:55          5    1.0
6  2015-03-02 08:36:56          6    1.0
7  2015-03-02 08:36:57          7    1.0
8  2015-03-02 08:36:58          8    1.0
9  2015-03-02 08:36:59          9    1.0
10 2015-03-02 08:37:00         10    1.0
             DATE_TIME  OtherData  Label
0  2015-03-02 08:36:50          0    **0.0**
1  2015-03-02 08:36:51          1    0.0
2  2015-03-02 08:36:52          2    0.0
3  2015-03-02 08:36:53          3    NaN
4  2015-03-02 08:36:54          4    1.0
5  2015-03-02 08:36:55          5    1.0
6  2015-03-02 08:36:56          6    NaN
7  2015-03-02 08:36:57          7    NaN
8  2015-03-02 08:36:58          8    NaN
9  2015-03-02 08:36:59          9    NaN
10 2015-03-02 08:37:00         10    NaN
             DATE_TIME  OtherData  Label
0  2015-03-02 08:36:50          0    **0.0**
1  2015-03-02 08:36:51          1    0.0
2  2015-03-02 08:36:52          2    0.0
3  2015-03-02 08:36:53          3    0.0
4  2015-03-02 08:36:54          4    1.0
5  2015-03-02 08:36:55          5    1.0
6  2015-03-02 08:36:56          6    1.0
7  2015-03-02 08:36:57          7    1.0
8  2015-03-02 08:36:58          8    1.0
9  2015-03-02 08:36:59          9    1.0
10 2015-03-02 08:37:00         10    NaN

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.1.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.1
nose: None
pip: 9.0.0
setuptools: 28.0.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
None

@jreback
Copy link
Contributor

jreback commented Nov 17, 2016

this by definition is only backwards looking
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge_asof.html?highlight=merge_asof#pandas.merge_asof

if you need both directions you can do a
reindex with tolerance

is that not clear from the docs? this is an ordered time operation

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 17, 2016
@jreback
Copy link
Contributor

jreback commented Nov 17, 2016

see docs for .reindex here

@jreback jreback closed this as completed Nov 17, 2016
@jreback jreback added this to the No action milestone Nov 17, 2016
@chrisaycock
Copy link
Contributor

@VelizarVESSELINOV I just saw this old issue. A "direction" parameter has just been added. The code is in the master branch and will be released as part of pandas 0.20.0.

@VelizarVESSELINOV
Copy link
Author

@chrisaycock Thank you 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants