Python: how to remove duplicate column(s) from a pandas dataframe

Question

Python: how to remove duplicate column(s) from a pandas dataframe

1 Answer

answered Nov 18, 2021 by pkumar81 (349k points)
selected Oct 21, 2022 by pkumar81

Best answer

There could be several ways to delete a duplicate column from a dataframe. One of the simplest ways is to find the duplicate column using the duplicated() function and then remove it.

Here is an example of that.

import pandas as pd
df1 = pd.DataFrame({"name": ['AA', 'BB', 'CC', 'DD', 'EE', 'HH', 'II'], "age": [34, 12, 56, 43, 23, 41, 52]})
df2 = pd.DataFrame({"name": ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG'], "income": [3434, 1122, 2156, 4334, 54523, 4321, 6541]})
df = pd.concat([df1, df2], axis=1)
df = df.loc[:, ~df.columns.duplicated()]
print(df)

The above code will print the following output. Although df1 and df2 have the column 'name', the final dataframe does not have the duplicate column 'name'.

name age income
0   AA   34    3434
1   BB   12    1122
2   CC   56    2156
3   DD   43    4334
4   EE   23   54523
5   HH   41    4321
6   II   52    6541

Python: how to remove duplicate column(s) from a pandas dataframe

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories