python - applying an operation to multiple columns having multiple columns in the result -

in sql query select id, avg(x), avg(y), max(x), max(y) table group id can transformed single operation on pandas data frame? can each function result like

mydf['center_x'] =  mydf.groupby('id')['x'].mean() mydf['center_y'] =  mydf.groupby('id')['y'].mean()

etc

but it's excessive call of same operation, plus excessive storage demand (mydf huge). besides it's kind of troublesome unique set @ end.

any ideas?

yes, supported, see documentation here.

to cite there:

in [56]: grouped = df.groupby('a')  in [57]: grouped['c'].agg([np.sum, np.mean, np.std]) out[57]:            sum      mean       std                                 bar  0.443469  0.147823  0.301765 foo  2.529056  0.505811  0.966450

maybe more pertinent questions, can different functions different columns so:

in [60]: grouped.agg({'c' : np.sum,    ....:              'd' : lambda x: np.std(x, ddof=1)})    ....:  out[60]:              c         d                       bar  0.443469  1.490982 foo  2.529056  0.645875

or, practical example closer original questions:

df = pd.dataframe({"group":list("aaaaabbbbb"), "a":np.random.rand(10), "b":np.random.rand(10)}) df.groupby("group").agg({"a":[np.sum, np.mean], "b":[np.sum, np.mean]})                     b     sum     mean    sum     mean group                  1.450488    0.290098    3.526962    0.705392 b   1.570386    0.314077    2.734465    0.546893

Story

Search This Blog

python - applying an operation to multiple columns having multiple columns in the result -