Merge() without a common column (LARGE DATASET, so no fake_key=1) #21742

lucasmengual92 · 2018-07-05T08:33:58Z

Code Sample, a copy-pastable example if possible

#RIPE FEATURES
    start_decimal_ip  end_decimal_ip  isSL  ...    isSUBNET  isINTERNET  isISP
0          621457152       621457663     1  ...           0           1      0
1         3273214112      3273214143     0  ...           0           0      0
2         1280513280      1280513535     1  ...           1           0      0
3         3581584960      3581584975     0  ...           0           0      0
4         3564859176      3564859179     1  ...           0           0      0
5         1049781248      1049782271     1  ...           0           0      0
6         3262479184      3262479194     0  ...           0           0      0

#IPS DATA
  target              ip  decimal_ip
0      1    62.184.112.1  1052274689
1      1     194.30.89.1  3256768769
2      1  194.117.123.83  3262479187
3      1  195.235.249.73  3287021897

Problem description

I want to marge two dataframes, one table (ips_data) with the IP, IP in decimal format, target (if isp or not). The other table (ripe_features) has the start and end point of an IP range (in decimal), and several values. My goal is to obtain the values of ripe_features when the IP of ips_data is between the range. I did a first attampt creating a fake_key =1, but my ripe_features database is large (over 4 million rows), so this method seems unhealthy.

I want to know if there´s a better way to dod this kind of merge. I was thinking of doing a new list/column 'ip' in the ripe_features with for() loop for each IP in start/end range to return the IP, else return null, and then do a merge with on='ip'.

Expected Output

target,ip,isSL,isSA,isADSL,isDSL,isWIFI,isHOTSPOT,isBLOCK,isSUBNET,isINTERNET,isISP
1,194.117.123.83,0,1,0,0,0,0,0,0,0,0
1,195.55.122.48,1,0,0,0,0,0,0,0,0,0
1,37.15.99.177,0,0,1,0,0,0,0,0,0,1

The text was updated successfully, but these errors were encountered:

gfyoung · 2018-07-05T21:43:19Z

cc @jreback

jreback · 2018-07-05T23:11:00Z

better to ask this question on SO; there are some examples of doing this type of interval based merging

cc @jschendel

lucasmengual92 · 2018-07-06T05:36:36Z

Yeah, I've already posted in SO too. thanks!

chris-b1 · 2018-08-11T16:19:18Z

folding into #21901

gfyoung added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Performance Memory or execution speed performance Usage Question labels Jul 5, 2018

chris-b1 closed this as completed Aug 11, 2018

chris-b1 added the Duplicate Report Duplicate issue or pull request label Aug 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge() without a common column (LARGE DATASET, so no fake_key=1) #21742

Merge() without a common column (LARGE DATASET, so no fake_key=1) #21742

lucasmengual92 commented Jul 5, 2018

gfyoung commented Jul 5, 2018

jreback commented Jul 5, 2018

lucasmengual92 commented Jul 6, 2018

chris-b1 commented Aug 11, 2018

Merge() without a common column (LARGE DATASET, so no fake_key=1) #21742

Merge() without a common column (LARGE DATASET, so no fake_key=1) #21742

Comments

lucasmengual92 commented Jul 5, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

gfyoung commented Jul 5, 2018

jreback commented Jul 5, 2018

lucasmengual92 commented Jul 6, 2018

chris-b1 commented Aug 11, 2018