LaVOZs

The World’s Largest Online Community for Developers

'; python - How to retrieve the rows matching the closest numbers between two columns - LavOzs.Com

Good morning,

I am stuck on a short project, I have 2 DFs which look like:

df1:

date city region customers sellers
2020-05-15 London A 125 25
2020-05-14 Paris B 1233 50
2020-05-01 London A 1260 58
2020-05-02 Paris B 250 41

df2:

date city region customers
2020-05-20 London A 1250
2020-05-21 Paris B 123

All the dates in df2 are not in df1 (forecast vs actuals).

As such, I merged both like this:

new_df = pd.merge(df1, df2, how='left', left_on=['city','region'], right_on = ['city','region'])

which results in

new_df:

date_x city_x region_x customers_x sellers_x date_y city_y region_y customers_y
2020-05-15 London A 125 25 NaN London A 1250
2020-05-14 Paris B 1233 50 NaN Paris B 123
2020-05-01 London A 1260 58 NaN London A 1250
2020-05-02 Paris B 250 41 NaN Paris B 123

What I want to achieve is to get the rows which will give me closest number to customer_x column using customers_y number.

In that example that would be: final_df:

2020-05-01 London A 1260 58 NaN London A 1250
2020-05-02 Paris B 250 41 NaN Paris B 123

So I guess I need to do the delta between customer x and y and then only retrieve the minimum value columns between both but I dont know how to do it... Any help is welcome. Thank you!

Please Try

df = pd.merge(df2, df1, how='left', on=['date', 'city','region','customers'])

You can do merge_asof:

# sort dataframe for merge_asof
df2 = df2.sort_values('customers')
df1 = df1.sort_values('customers')

final_df = (pd.merge_asof(df2, df1.reset_index(),
                          by=['city','region'], on='customers',
                          suffixes=['','_1'],
                          direction='nearest'
                         )
              .assign(customer_1=lambda x: x['index'].map(df1['customers']))
              .drop('index',axis=1)
     )

Output:

         date    city region  customers      date_1  sellers  customer_1
0  2020-05-21   Paris      B        123  2020-05-02       41         250
1  2020-05-20  London      A       1250  2020-05-01       58        1260
Related
How do I merge two dictionaries in a single expression in Python?
How to sort a dataframe by multiple column(s)
How do I get the number of elements in a list?
How do I concatenate two lists in Python?
How to change the order of DataFrame columns?
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
How do I get the row count of a pandas DataFrame?
How to iterate over rows in a DataFrame in Pandas?
How to select rows from a DataFrame based on column values?
Combine two columns of text in dataframe in pandas/python