+3 votes
in Programming Languages by (20.2k points)
I have two dataframes with a common column. I want to merge them on the common column to generate a new dataframe. How can I merge them?

1 Answer

0 votes
by (35.8k points)

The merge() function of pandas can be used to merge two dataframes. You can check the list of arguments of this function in the link given under merge().

The parameter "on" is used to provide the name of the common column. Since this function works like database-style join, you can specify one of these values {'left', 'right', 'outer', 'inner', 'cross'} for the parameter "how".

Here is an example with different values of the parameter "how".

import pandas as pd
df1 = pd.DataFrame({"name": ['AA', 'BB', 'CC', 'DD', 'EE', 'HH', 'II'], "age": [34, 12, 56, 43, 23, 41, 52]})
df2 = pd.DataFrame({"name": ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG'], "income": [3434, 1122, 2156, 4334, 54523, 4321, 6541]})
df = df1.merge(df2, how='inner', on='name')
print(df)

The output of the above code with how='inner', i.e., inner-join. It uses the intersection of keys from both dataframes.

  name  age  income
0   AA   34    3434
1   BB   12    1122
2   CC   56    2156
3   DD   43    4334
4   EE   23   54523

df = df1.merge(df2, how='outer', on='name')

The output of the above code with how='outer', i.e., outer-join. It uses the union of keys from both dataframes.

 name   age   income
0   AA  34.0   3434.0
1   BB  12.0   1122.0
2   CC  56.0   2156.0
3   DD  43.0   4334.0
4   EE  23.0  54523.0
5   HH  41.0      NaN
6   II  52.0      NaN
7   FF   NaN   4321.0
8   GG   NaN   6541.0

df = df1.merge(df2, how='left', on='name')

The output of the above code with how='left', i.e., left-join. It uses only keys from the left dataframe.

 name  age   income
0   AA   34   3434.0
1   BB   12   1122.0
2   CC   56   2156.0
3   DD   43   4334.0
4   EE   23  54523.0
5   HH   41      NaN
6   II   52      NaN

df = df1.merge(df2, how='right', on='name')

The output of the above code with how='right', i.e., right-join. It uses only keys from the right dataframe.

  name   age  income
0   AA  34.0    3434
1   BB  12.0    1122
2   CC  56.0    2156
3   DD  43.0    4334
4   EE  23.0   54523
5   FF   NaN    4321
6   GG   NaN    6541

...