Pandas的DataFrame如何做交集,并集,差集与对称差集
时间:2022-04-23 08:45:12|栏目:Python代码|点击: 次
一、简介
Python的数据类型集合:由不同元素组成的集合,集合中是一组无序排列的可 Hash 的值(不可变类型),可以作为字典的Key
Pandas
中的DataFrame
:DataFrame
是一个表格型的数据结构,可以理解为带有标签的二维数组。
常用的集合操作如下图所示:
二、交集
pandas
的merge
功能默认为 inner 连接,可以实现取交集- 集合
set
可以直接用 & 取交集
import pandas as pd print("CSDN叶庭云:https://yetingyun.blog.csdn.net/") set1 = {"Python", "Go", "C++", "Java"} set2 = {"Go", "C++", "JavaScript", "C"} set1 & set2 df1 = pd.DataFrame([ ['1', 'Python'], ['2', 'Go'], ['3', 'C++'], ['4', 'Java'], ], columns=['id','name']) df2 = pd.DataFrame([ ['2','Go'], ['3','C++'], ['5','JavaScript'], ['6','C'], ], columns=['id','name']) pd.merge(df1, df2, on=['id','name'])
操作如下所示:
三、并集
- Pandas的 merge 方法里参数 how 的取值有 “left”, “right”, “inner”, “outer”,默认是inner。outer外连接可以实现取并集。另一种方法也可以df1.append(df2)后去重,保留第一次出现的也可以实现取并集。
- 集合 set 可以直接用 | 取并集
set1 = {"Python", "Go", "C++", "Java"} set2 = {"Go", "C++", "JavaScript", "C"} set1 | set2 print("CSDN叶庭云:https://yetingyun.blog.csdn.net/") df1 = pd.DataFrame([ ['1', 'Python'], ['2', 'Go'], ['3', 'C++'], ['4', 'Java'], ], columns=['id','name']) df2 = pd.DataFrame([ ['2','Go'], ['3','C++'], ['5','JavaScript'], ['6','C'], ], columns=['id','name']) pd.merge(df1, df2, on=['id','name'], how='outer') df3 = df1.append(df2) df3.drop_duplicates(subset=['id'], keep="first")
四、差集
set1 = {"Python", "Go", "C++", "Java"} set2 = {"Go", "C++", "JavaScript", "C"} set1 - set2 print("CSDN叶庭云:https://yetingyun.blog.csdn.net/") set1 = {"Python", "Go", "C++", "Java"} set2 = {"Go", "C++", "JavaScript", "C"} set2 - set1 # df1-df2 df1 = pd.DataFrame([ ['1', 'Python'], ['2', 'Go'], ['3', 'C++'], ['4', 'Java'], ], columns=['id','name']) df2 = pd.DataFrame([ ['2','Go'], ['3','C++'], ['5','JavaScript'], ['6','C'], ], columns=['id','name']) df1 = df1.append(df2) df1 = df1.append(df2) set_diff_df = df1.drop_duplicates(subset=df1.columns, keep=False) set_diff_df # df2-df1 df1 = pd.DataFrame([ ['1', 'Python'], ['2', 'Go'], ['3', 'C++'], ['4', 'Java'], ], columns=['id','name']) df2 = pd.DataFrame([ ['2','Go'], ['3','C++'], ['5','JavaScript'], ['6','C'], ], columns=['id','name']) print("CSDN叶庭云:https://yetingyun.blog.csdn.net/") df2 = df2.append(df1) df2 = df2.append(df1) set_diff_df = df2.drop_duplicates(subset=df2.columns, keep=False) set_diff_df # df1-df2 df1 = pd.DataFrame([ ['1', 'Python'], ['2', 'Go'], ['3', 'C++'], ['4', 'Java'], ], columns=['id','name']) df2 = pd.DataFrame([ ['2','Go'], ['3','C++'], ['5','JavaScript'], ['6','C'], ], columns=['id','name']) pd.concat([df1, df2, df2]).drop_duplicates(keep=False) # df2-df1 df1 = pd.DataFrame([ ['1', 'Python'], ['2', 'Go'], ['3', 'C++'], ['4', 'Java'], ], columns=['id','name']) df2 = pd.DataFrame([ ['2','Go'], ['3','C++'], ['5','JavaScript'], ['6','C'], ], columns=['id','name']) pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
五、对称差集
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/") set1 = {"Python", "Go", "C++", "Java"} set2 = {"Go", "C++", "JavaScript", "C"} set1 ^ set2 # 对称差集 # 去重 不保留重复的:即可实现取对称差集 df3 = df1.append(df2) df3.drop_duplicates(subset=['id'], keep=False)
上一篇:分析Python编程时利用wxPython来支持多线程的方法
栏 目:Python代码
本文标题:Pandas的DataFrame如何做交集,并集,差集与对称差集
本文地址:http://www.codeinn.net/misctech/199930.html