python - Mapping a few numerical columns into a new columns of tuples in Pandas -

for object data can map 2 columns third, (object) column of tuples

>>> df = pd.dataframe([["a","b"], ["a", "a"],["b","b"]]) >>> df.apply(lambda row: (row[0], row[1]), axis=1) 0    (a, b) 1    (a, a) 2    (b, b) dtype: object

(see pandas: how use apply function multiple columns).

however, when try same thing numerical columns

>>> df2 = pd.dataframe([[10,2], [10, 1],[20,2]]) df2.apply(lambda row: (row[0], row[1]), axis=1)      0     1 0    10    2 1    10    1 2    20    2

so instead of series of pairs (i.e. [(10,2), (10,1), (20,2)]) dataframe.

how can force pandas series of pairs? (preferably, doing nicer converting string , parsing.)

i don't recommend this, can force it:

in [11]: df2.apply(lambda row: pd.series([(row[0], row[1])]), axis=1) out[11]:          0 0  (10, 2) 1  (10, 1) 2  (20, 2)

please don't this.

two columns give better performance, flexibility , ease of later analysis.

just update op's experience:

what wanted count occurrences of each [0, 1] pair.

in series use value_counts method (with column above result). however, same result achieved using groupby , found 300 times faster (for op):

df2.groupby([0, 1]).size()

it's worth emphasising (again) [11] has create series object , tuple instance each row, huge overhead compared of groupby.

Story

Search This Blog