LaVOZs

The World’s Largest Online Community for Developers

'; Is there a way to merge pandas dataframes on row and column index? - LavOzs.Com

I want to merge two pandas data frames that share the same index as well as some columns. pd.merge creates duplicate columns, but I would like to merge on both axes at the same time.

tried pd.merge and pd.concat but did not get the right result.

my try: df3=pd.merge(df1, df2, left_index=True, right_index=True, how='left')

df1

     Var#1  Var#2  Var#3  Var#4  Var#5  Var#6  Var#7
ID                                                  
323      7      6      8    7.0    2.0    2.0   10.0
324      2      1      5    3.0    4.0    2.0    1.0
675      9      8      1    NaN    NaN    NaN    NaN
676      3      7      2    NaN    NaN    NaN    NaN

df2

     Var#6  Var#7  Var#8  Var#9
ID                             
675      1      9      2      8
676      3      2      0      7

ideally I would get:

df3

     Var#1  Var#2  Var#3  Var#4  Var#5  Var#6  Var#7 Var#8   Var#9
ID                                                  
323      7      6      8    7.0    2.0    2.0   10.0  NaN    NaN
324      2      1      5    3.0    4.0    2.0    1.0  NaN    NaN
675      9      8      1    NaN    NaN    1      9    2      8
676      3      7      2    NaN    NaN    3      2    0      7

IIUC, use df.combine_first():

df3=df1.combine_first(df2)
print(df3)

      Var#1  Var#2  Var#3  Var#4  Var#5  Var#6  Var#7  Var#8  Var#9
ID                                                                
323      7      6      8    7.0    2.0    2.0   10.0    NaN    NaN
324      2      1      5    3.0    4.0    2.0    1.0    NaN    NaN
675      9      8      1    NaN    NaN    1.0    9.0    2.0    8.0
676      3      7      2    NaN    NaN    3.0    2.0    0.0    7.0

You can concat and group the data

pd.concat([df1, df2], 1).groupby(level = 0, axis = 1).first()

    Var#1   Var#2   Var#3   Var#4   Var#5   Var#6   Var#7   Var#8   Var#9
ID                                  
323 7.0     6.0     8.0     7.0     2.0     2.0     10.0    NaN     NaN
324 2.0     1.0     5.0     3.0     4.0     2.0     1.0     NaN     NaN
675 9.0     8.0     1.0     NaN     NaN     1.0     9.0     2.0     8.0
676 3.0     7.0     2.0     NaN     NaN     3.0     2.0     0.0     7.0
Related
How to sort a dataframe by multiple column(s)
Add one row to pandas DataFrame
Selecting multiple columns in a pandas dataframe
Renaming columns in pandas
Adding new column to existing DataFrame in Python pandas
Delete column from pandas DataFrame
How do I get the row count of a pandas DataFrame?
How to iterate over rows in a DataFrame in Pandas?
How to select rows from a DataFrame based on column values?
Get list from pandas DataFrame column headers