i new pandas , facing following problem:
i have 2 data frames:
df1 :
x y 1 3 4 2 nan 3 6 4 nan 5 9 2 6 1 4 9
df2:
x y 1 2 3 6 1 5 2 4 1 8 7 5 3 6 3 1 4 5 4 2 1 3 5 4 5 9 2 3 8 7 6 1 4 5 3 7
the size of 2 same.
i want merge 2 dataframes such resulting dataframe following:
result :
x y 1 3 4 6 1 5 2 4 1 8 7 5 3 6 3 1 4 5 4 2 1 3 5 4 5 9 2 3 8 7 6 1 4 5 6 7
so in result, priority given df2. if there value in df2, put first , remaining values put df1 (they have same position in df1). there should no repeated values in result (i.e if value in position 1 in df1 , position 3 in df2, value should come in position 1 in result , not repeat)
any kind of appreciated. thanks!
iiuc
setup
df1 = pd.dataframe(dict(x=range(1, 7), y=[[3, 4], none, [6], none, [9, 2], [1, 4, 9]])) df2 = pd.dataframe(dict(x=range(1, 7), y=[[2, 3, 6, 1, 5], [4, 1, 8, 7, 5], [6, 3, 1, 4, 5], [2, 1, 3, 5, 4], [9, 2, 3, 8, 7], [1, 4, 5, 3, 7]])) print df1 print print df2 x y 0 1 [3, 4] 1 2 none 2 3 [6] 3 4 none 4 5 [9, 2] 5 6 [1, 4, 9] x y 0 1 [2, 3, 6, 1, 5] 1 2 [4, 1, 8, 7, 5] 2 3 [6, 3, 1, 4, 5] 3 4 [2, 1, 3, 5, 4] 4 5 [9, 2, 3, 8, 7] 5 6 [1, 4, 5, 3, 7]
convert more usable:
df1_ = df1.set_index('x').y.apply(pd.series) df2_ = df2.set_index('x').y.apply(pd.series) print df1_ print print df2_ 0 1 2 x 1 3.0 4.0 nan 2 nan nan nan 3 6.0 nan nan 4 nan nan nan 5 9.0 2.0 nan 6 1.0 4.0 9.0 0 1 2 3 4 x 1 2 3 6 1 5 2 4 1 8 7 5 3 6 3 1 4 5 4 2 1 3 5 4 5 9 2 3 8 7 6 1 4 5 3 7
combine priority given df1
(i think meant df1
consistent interpretation of question , expected output provided) reducing eliminate duplicates:
print df1_.combine_first(df2_).apply(lambda x: x.unique(), axis=1) 0 1 2 3 4 x 1 3 4 6 1 5 2 4 1 8 7 5 3 6 3 1 4 5 4 2 1 3 5 4 5 9 2 3 8 7 6 1 4 9 3 7