concurrency - Python: concurrent file seek -


i looking @ way allow concurrent file object seeking.

as test case of file seeking going wary:

#!/usr/bin/env python2 import time, random, os s = 'the quick brown fox jumps on lazy dog'  # create file, testing f = open('file.txt', 'w') f.write(s) f.close()  # actual code... f = open('file.txt', 'rb') def fn():     out = ''     in xrange(10):         k = random.randint(0, len(s)-1)         f.seek(k)         time.sleep(random.randint(1, 4)/10.)         out += s[k] + ' ' + f.read(1) + '\n'     return out  import multiprocessing p = multiprocessing.pool() n = 3 res = [p.apply_async(fn) _ in xrange(n)] r in res:     print r.get() f.close() 

i have worker processes, random seeking within file, sleep, read. compare read actual string character. not print right away avoid concurrency issues printing.

you can see when n=1, goes well, goes astray when n>1 due concurrency in file descriptor.

i have tried duplicate file descriptor within fn():

def fn():     fd = os.dup(f)     f2 = os.fdopen(fd) 

and use f2. not seem help.

how can seeking concurrently, i.e. multiple processes? (in case, open file within fn(), mwe. in actual case, harder that.)

you cannot - python i/o builds on c's i/o, , there 1 "current file position" per open file in c. that's inherently shared.

what can perform seek+read under protection of interprocess lock.

like define:

def process_init(lock):     global seek_lock     seek_lock = lock 

and in main process add pool constructor:

initializer=process_init, initargs=(multiprocessing.lock(),) 

then whenever want seek , read, under protection of lock:

with seek_lock:      f.seek(k)      char = f.read(1) 

as lock, want little logically necessary while it's held. won't allow concurrent seeking, prevent seeks in 1 process interfering seeks in other processes.

it would, of course, better open file in each process, each process has own notion of file position - said can't. rethink ;-)