i understand there overhead when using multiprocessing module, seems high amount , level of ipc should low can gather.
say generate large-ish list of random numbers between 1-1000 , want obtain list of prime numbers. code meant test multiprocessing on cpu-intensive tasks. ignore overall inefficiency of primality test.
the bulk of code may this:
from random import systemrandom math import sqrt timeit import default_timer time multiprocessing import pool, process, manager, cpu_count rdev = systemrandom() num_cnt = 0x5000 nums = [rdev.randint(0, 1000) _ in range(num_cnt)] primes = [] def chunk(l, n): = int(len(l)/float(n)) j in range(0, n-1): yield l[j*i:j*i+i] yield l[n*i-i:] def is_prime(n): if n <= 2: return true if not n % 2: return false in range(3, int(sqrt(n)) + 1, 2): if n % == 0: return false return true
it seems me should able split among multiple processes. have 8 logical cores, should able use cpu_count()
# of processes.
serial:
def serial(): global primes primes = [] num in nums: if is_prime(num): primes.append(num) # primes contain values
the following sizes of num_cnt correspond speed:
- 0x500 = 0.00100 sec.
- 0x5000 = 0.01723 sec.
- 0x50000 = 0.27573 sec.
- 0x500000 = 4.31746 sec.
this way chose multiprocessing. uses chunk()
function split nums
cpu_count()
(roughly equal) parts. passes each chunk new process, iterates through them, , assigns entry of shared dict variable. ipc should occur when assign value shared variable. why occur otherwise?
def loop(ret, id, numbers): l_primes = [] num in numbers: if is_prime(num): l_primes.append(num) ret[id] = l_primes def parallel(): man = manager() ret = man.dict() num_procs = cpu_count() procs = [] i, l in enumerate(chunk(nums, num_procs)): p = process(target=loop, args=(ret, i, l)) p.daemon = true p.start() procs.append(p) [proc.join() proc in procs] return sum(ret.values(), [])
again, expect overhead, time seems increasing exponentially faster serial version.
- 0x500 = 0.37199 sec.
- 0x5000 = 0.91906 sec.
- 0x50000 = 8.38845 sec.
- 0x500000 = 119.37617 sec.
what causing this? ipc? initial setup makes me expect overhead, insane amount.
edit:
here's how i'm timing execution of functions:
if __name__ == '__main__': print(hex(num_cnt)) func in (serial, parallel): t1 = time() vals = func() t2 = time() if vals none: # serial has no return value print(len(primes)) else: # parallel print(len(vals)) print("took {:.05f} sec.".format(t2 - t1))
the same list of numbers used each time.
example output:
0x5000 3442 took 0.01828 sec. 3442 took 0.93016 sec.
hmm. how measure time? on computer, parallel version faster serial one.
i'm mesuring using time.time()
way: if assume tt
alias time.time()
.
serial() t2 = int(round(tt() * 1000)) print(t2 - t1) parallel() t3 = int(round(tt() * 1000)) print(t3-t2)
i get, 0x500000
input:
- 5519ms serial version
- 3351ms parallel version
i believe your mistake caused inclusion of number generation process inside parallel, not inside serial one.
on computer, generation of random numbers takes 45seconds (it's slow process). so, can explain difference between 2 values don't think computer uses different architecture.