multithreading - Iterate through a for loop using multiple cores in Python -

i have following code running normal python code:

def remove_missing_rows(app_list):     print("########### missing row removal ###########")     missing_rows = []  ''' remove row has missing data in name, id, or description column'''     row in app_list:         if not row[1]:             missing_rows.append(row)             continue  # continue loop next row. no need check more columns         if not row[5]:             missing_rows.append(row)             continue  # continue loop next row. no need check more columns         if not row[4]:             missing_rows.append(row)      print("number of missing entries: " + str(len(missing_rows)))  # 967 current method      # remove missing_rows original data     app_list = [row row in app_list if row not in missing_rows]     return app_list

now, after writing smaller sample wish run on large data set. thought useful utilise multiple cores of computer.

i'm struggling implement using multiprocessing module though. e.g. idea have core 1 work through first half of data set, while core 2 work through last half. etc. , in parallel. possible?

this not cpu bound. try code below.

i've used set fast (hash-based) contains (you use when invoke if row not in missing_rows, , it's slow long list).

if csv module you're holding tuples hashable not many changes needed:

def remove_missing_rows(app_list):     print("########### missing row removal ###########")     filterfunc = lambda row: not all([row[1], row[4], row[5]])     missing_rows = set(filter(filterfunc, app_list))      print("number of missing entries: " + str(len(missing_rows)))  # 967 current method      # remove missing_rows original data     # note: should lot faster set     app_list = [row row in app_list if row not in missing_rows]     return app_list

Story

Search This Blog

multithreading - Iterate through a for loop using multiple cores in Python -