in sql query select id, avg(x), avg(y), max(x), max(y) table group id
can transformed single operation on pandas data frame? can each function result like
mydf['center_x'] = mydf.groupby('id')['x'].mean() mydf['center_y'] = mydf.groupby('id')['y'].mean()
etc
but it's excessive call of same operation, plus excessive storage demand (mydf huge). besides it's kind of troublesome unique set @ end.
any ideas?
yes, supported, see documentation here.
to cite there:
in [56]: grouped = df.groupby('a') in [57]: grouped['c'].agg([np.sum, np.mean, np.std]) out[57]: sum mean std bar 0.443469 0.147823 0.301765 foo 2.529056 0.505811 0.966450
maybe more pertinent questions, can different functions different columns so:
in [60]: grouped.agg({'c' : np.sum, ....: 'd' : lambda x: np.std(x, ddof=1)}) ....: out[60]: c d bar 0.443469 1.490982 foo 2.529056 0.645875
or, practical example closer original questions:
df = pd.dataframe({"group":list("aaaaabbbbb"), "a":np.random.rand(10), "b":np.random.rand(10)}) df.groupby("group").agg({"a":[np.sum, np.mean], "b":[np.sum, np.mean]}) b sum mean sum mean group 1.450488 0.290098 3.526962 0.705392 b 1.570386 0.314077 2.734465 0.546893