for object
data can map 2 columns third, (object
) column of tuples
>>> df = pd.dataframe([["a","b"], ["a", "a"],["b","b"]]) >>> df.apply(lambda row: (row[0], row[1]), axis=1) 0 (a, b) 1 (a, a) 2 (b, b) dtype: object
(see pandas: how use apply function multiple columns).
however, when try same thing numerical columns
>>> df2 = pd.dataframe([[10,2], [10, 1],[20,2]]) df2.apply(lambda row: (row[0], row[1]), axis=1) 0 1 0 10 2 1 10 1 2 20 2
so instead of series of pairs (i.e. [(10,2), (10,1), (20,2)]
) dataframe
.
how can force pandas
series of pairs? (preferably, doing nicer converting string , parsing.)
i don't recommend this, can force it:
in [11]: df2.apply(lambda row: pd.series([(row[0], row[1])]), axis=1) out[11]: 0 0 (10, 2) 1 (10, 1) 2 (20, 2)
please don't this.
two columns give better performance, flexibility , ease of later analysis.
just update op's experience:
what wanted count occurrences of each [0, 1] pair.
in series use value_counts
method (with column above result). however, same result achieved using groupby , found 300 times faster (for op):
df2.groupby([0, 1]).size()
it's worth emphasising (again) [11]
has create series object , tuple instance each row, huge overhead compared of groupby.