Skip to content

BUG: merge_asof only merges by first column of 'by=' list #15676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seibs opened this issue Mar 13, 2017 · 1 comment
Closed

BUG: merge_asof only merges by first column of 'by=' list #15676

seibs opened this issue Mar 13, 2017 · 1 comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@seibs
Copy link

seibs commented Mar 13, 2017

Code Sample, a copy-pastable example if possible

left = pd.DataFrame([
    [pd.to_datetime('20160602'), 1, pd.to_datetime('20150430')],
    [pd.to_datetime('20160602'), 2, pd.to_datetime('20150430')],
    [pd.to_datetime('20160603'), 1, pd.to_datetime('20150501')],
    [pd.to_datetime('20160603'), 2, pd.to_datetime('20150501')],
], columns=['time', 'key_1', 'key_2']).set_index('time')

right = pd.DataFrame([
    [pd.to_datetime('20160502'), 1, pd.to_datetime('20150430'), 1.0],
    [pd.to_datetime('20160502'), 2, pd.to_datetime('20150430'), 2.0],
    [pd.to_datetime('20160503'), 1, pd.to_datetime('20150501'), 3.0],
    [pd.to_datetime('20160503'), 2, pd.to_datetime('20150501'), 4.0],
], columns=['time', 'key_1', 'key_2', 'value']).set_index('time')

expected = pd.DataFrame([
    [pd.to_datetime('20160602'), 1, pd.to_datetime('20150430'), 1.0],
    [pd.to_datetime('20160602'), 2, pd.to_datetime('20150430'), 2.0],
    [pd.to_datetime('20160603'), 1, pd.to_datetime('20150501'), 3.0],
    [pd.to_datetime('20160603'), 2, pd.to_datetime('20150501'), 4.0],
], columns=['time', 'key_1', 'key_2', 'value']).set_index('time')

result = pd.merge_asof(left,
                       right,
                       left_index=True,
                       right_index=True,
                       by=['key_1', 'key_2'],
                       tolerance=pd.Timedelta('31d'))

assert_frame_equal(expected, result)

Problem description

merge_asof does not seem to be working with multiple "by" keys. The output looks like it only takes into account the first key in the list - with key_1 first in the list, the output is the same as by=['key_1'] and with key_2 first in the list, the output is the same as by=['key_2'].

Expected Output

            key_1      key_2  value
time                               
2016-06-02      1 2015-04-30    1.0
2016-06-02      2 2015-04-30    2.0
2016-06-03      1 2015-05-01    3.0
2016-06-03      2 2015-05-01    4.0

Actual Output

            key_1      key_2  value
time                               
2016-06-02      1 2015-04-30    3.0
2016-06-02      2 2015-04-30    4.0
2016-06-03      1 2015-05-01    3.0
2016-06-03      2 2015-05-01    4.0

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35-pv-ts2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.11.3 scipy: None statsmodels: None xarray: None IPython: 5.1.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.4 boto: None pandas_datareader: None
@chrisaycock
Copy link
Contributor

Interestingly enough, I get the right answer if I .reset_index():

In [21]: pd.merge_asof(left, right, on='time', by=['key_1', 'key_2'])
Out[21]:
        time  key_1      key_2  value
0 2016-06-02      1 2015-04-30    1.0
1 2016-06-02      2 2015-04-30    2.0
2 2016-06-03      1 2015-05-01    3.0
3 2016-06-03      2 2015-05-01    4.0

I will have to dig into this further.

@jreback jreback added Bug Difficulty Intermediate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 13, 2017
@jreback jreback added this to the Next Major Release milestone Mar 13, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 14, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
…re indexed (pandas-dev#15676)

closes pandas-dev#15676

Author: Christopher C. Aycock <[email protected]>

Closes pandas-dev#15679 from chrisaycock/GH15676 and squashes the following commits:

965caf2 [Christopher C. Aycock] Verify that 'by' parameters are the same length
4a2cc09 [Christopher C. Aycock] BUG: Allow multiple 'by' parameters in merge_asof() when DataFrames are indexed (pandas-dev#15676)
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
…re indexed (pandas-dev#15676)

closes pandas-dev#15676

Author: Christopher C. Aycock <[email protected]>

Closes pandas-dev#15679 from chrisaycock/GH15676 and squashes the following commits:

965caf2 [Christopher C. Aycock] Verify that 'by' parameters are the same length
4a2cc09 [Christopher C. Aycock] BUG: Allow multiple 'by' parameters in merge_asof() when DataFrames are indexed (pandas-dev#15676)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants