Python Pandas compare two dataframes to assign country to phone number -

i have 2 dataframes read in via csv. dataframe 1 consists of phone number , additional data. second dataframe contains country codes , country names.

i want take phone number first dataset , compare country codes of second. country codes can between 1 4 digits long. go longest country code shortest. if there match, want assign country name phone number.

input longlist:

phonenumber, add_info     34123425209, info1 92654321762, info2 12018883637, info3 6323450001, info4 496789521134, info5

input country_list:

country;country_code;order_info spain;34;1 pakistan;92;4 usa;1;2 philippines;63;3 germany;49;4 poland;48;1 norway;47;2

output should be:

phonenumber, add_info, country, order_info     34123425209, info1, spain, 1 92654321762, info2, pakistan, 4 12018883637, info3, usa, 2 6323450001, info4, philippines, 3 496789521134, info5, germany, 4

i have solved once this:

#! /usr/bin/python import csv import pandas open ('longlist.csv','r') lookuplist: open ('country_list.csv','r') inputlist:     open('outputfile.csv', 'w') outputlist:         reader = csv.reader(lookuplist, delimiter=',')         reader2 = csv.reader(inputlist, delimiter=';')         writer = csv.writer(outputlist, dialect='excel')          in reader2:             xl in reader:                 if xl[0].startswith(i[1]):                     zeile = [xl[0], xl[1], i[0], i[1], i[2]]                     writer.writerow(zeile)             lookuplist.seek(0)

but solve problem, using pandas. got work: - read in csv files - remove duplicates "longlist" - sort list of countries / country code

this is, have working already:

import pandas pd, numpy np longlist = pd.read_csv('path/to/longlist.csv',                                   usecols=[2,3], names=['phonenumber','add_info']) country_list = pd.read_csv('path/to/country_list.csv',                             sep=';', names=['country','country_code','order_info'], skiprows=[0])  # remove duplicates , make phone number index longlist = longlist.drop_duplicates('phonenumber') longlist = longlist.set_index('phonenumber')  # sort country list, high low value , make country code index country_list=country_list.sort_values(by='country_code', ascending=0) country_list=country_list.set_index('country_code')  (...)  longlist.to_csv('path/to/output.csv')

but way trying same datasets not work. cannot apply startswith (cannot iterate through objects , cannot apply on objects). appreciate help.

i way:

cl = pd.read_csv('country_list.csv', sep=';', dtype={'country_code':str}) ll = pd.read_csv('phones.csv', skipinitialspace=true, dtype={'phonenumber':str})  lookup = cl['country_code'] lookup.index = cl['country_code']  ll['country_code'] = (     ll['phonenumber']     .apply(lambda x: pd.series([lookup.get(x[:4]), lookup.get(x[:3]),                                 lookup.get(x[:2]), lookup.get(x[:1])]))     .apply(lambda x: x.get(x.first_valid_index()), axis=1) )  # remove `how='left'` parameter if don't need "unmatched" phone-numbers     result = ll.merge(cl, on='country_code', how='left')

output:

in [195]: result out[195]:     phonenumber add_info country_code      country  order_info 0   34123425209    info1           34        spain         1.0 1   92654321762    info2           92     pakistan         4.0 2   12018883637    info3            1          usa         2.0 3   12428883637   info31         1242      bahamas         3.0 4    6323450001    info4           63  philippines         3.0 5  496789521134    info5           49      germany         4.0 6   00000000000      bad         none          nan         nan

explanation:

in [216]: (ll['phonenumber']    .....:   .apply(lambda x: pd.series([lookup.get(x[:4]), lookup.get(x[:3]),    .....:                               lookup.get(x[:2]), lookup.get(x[:1])]))    .....: ) out[216]:       0     1     2     3 0  none  none    34  none 1  none  none    92  none 2  none  none  none     1 3  1242  none  none     1 4  none  none    63  none 5  none  none    49  none 6  none  none  none  none

phones.csv: - i've intentionally added 1 bahamas number (1242...) , 1 invalid number (00000000000)

phonenumber, add_info 34123425209, info1 92654321762, info2 12018883637, info3 12428883637, info31 6323450001, info4 496789521134, info5 00000000000, bad

Story

Search This Blog

Python Pandas compare two dataframes to assign country to phone number -