Try to search your question here, if you can't find : Ask Any Question Now ?

How to merge two data.table under two conditions

HomeCategory: stackoverflowHow to merge two data.table under two conditions
Avatarpushpa asked 5 months ago

I would like to merge two tables, dt_program and dt_sale, to find the START and END using the common keys CH and ITEM_ID under the condition as follows:

  1. ORDER_TIME has to be within START and END

or

  1. ORDER_TIME can happen after END (the nearest ORDER_TIME to END)

Data are provided:

The time schedule table represents the program from each channel:

dt_program <- structure(list(CH = c("CH1", "CH1", "CH1", "CH1", "CH1", "CH2", 
        "CH2", "CH2", "CH3", "CH3", "CH3", "CH3"), ITEM_ID = c(110, 111, 
        110, 111, 110, 110, 111, 112, 114, 113, 110, 112), START = structure(c(1514791800, 
        1514799000, 1514806200, 1514813400, 1514820600, 1518602400, 1518609600, 
        1518616800.005, 1517560200, 1517565600, 1517570999.995, 1517576399.995
        ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), END = structure(c(1514795400, 
        1514802600, 1514809800.005, 1514817000.01, 1514824200.015, 1518604200, 
        1518611400, 1518618600, 1517563800, 1517569200, 1517574600, 1517580000
        ), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
        -12L), class = c("data.table", "data.frame"))

return:

     CH ITEM_ID               START                 END
 1: CH1     110 2018-01-01 07:30:00 2018-01-01 08:30:00
 2: CH1     111 2018-01-01 09:30:00 2018-01-01 10:30:00
 3: CH1     110 2018-01-01 11:30:00 2018-01-01 12:30:00
 4: CH1     111 2018-01-01 13:30:00 2018-01-01 14:30:00
 5: CH1     110 2018-01-01 15:30:00 2018-01-01 16:30:00
 6: CH2     110 2018-02-14 10:00:00 2018-02-14 10:30:00
 7: CH2     111 2018-02-14 12:00:00 2018-02-14 12:30:00
 8: CH2     112 2018-02-14 14:00:00 2018-02-14 14:30:00
 9: CH3     114 2018-02-02 08:30:00 2018-02-02 09:30:00
10: CH3     113 2018-02-02 10:00:00 2018-02-02 11:00:00
11: CH3     110 2018-02-02 11:29:59 2018-02-02 12:30:00
12: CH3     112 2018-02-02 12:59:59 2018-02-02 14:00:00

Also, I have the sale transaction table which collect the data when customer make purchase products:

dt_sale <- structure(list(CUST_ID = c("A001", "A001", "A001", "A002", "A002", 
"A003"), CH = c("CH1", "CH3", "CH2", "CH2", "CH3", "CH1"), ORDER_TIME = structure(c(1514793600, 
1514813400, 1518619200, 1514816100, 1517565600, 1514803200), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), ITEM_ID = c(110, 110, 112, 112, 114, 
111)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))

return:

   CUST_ID  CH          ORDER_TIME ITEM_ID
1:    A001 CH1 2018-01-01 08:00:00     110
2:    A001 CH3 2018-01-01 13:30:00     110
3:    A001 CH2 2018-02-14 14:40:00     112
4:    A002 CH2 2018-01-01 14:15:00     112
5:    A002 CH3 2018-02-02 10:00:00     114
6:    A003 CH1 2018-01-01 10:40:00     111

The output that I expected:

   CUST_ID  CH          ORDER_TIME ITEM_ID               START                 END
1:    A001 CH1 2018-01-01 08:00:00     110 2018-01-01 07:30:00 2018-01-01 08:30:00
2:    A001 CH3 2018-01-01 13:30:00     110                <NA>                <NA>
3:    A001 CH2 2018-02-14 14:40:00     112 2018-02-14 14:00:00 2018-02-14 14:30:00
4:    A002 CH2 2018-01-01 14:15:00     112                <NA>                <NA>
5:    A002 CH3 2018-02-02 10:00:00     114 2018-02-02 08:30:00 2018-02-02 09:30:00
6:    A003 CH1 2018-01-01 10:40:00     111 2018-01-01 09:30:00 2018-01-01 10:30:00

Could you please give suggestions?

1 Answers
Best Answer
Avatarnaveen answered 5 months ago
Your Answer

4 + 8 =

Popular Tags

WP Facebook Auto Publish Powered By : XYZScripts.com