i looking @ way allow concurrent file object seeking.
as test case of file seeking going wary:
#!/usr/bin/env python2 import time, random, os s = 'the quick brown fox jumps on lazy dog' # create file, testing f = open('file.txt', 'w') f.write(s) f.close() # actual code... f = open('file.txt', 'rb') def fn(): out = '' in xrange(10): k = random.randint(0, len(s)-1) f.seek(k) time.sleep(random.randint(1, 4)/10.) out += s[k] + ' ' + f.read(1) + '\n' return out import multiprocessing p = multiprocessing.pool() n = 3 res = [p.apply_async(fn) _ in xrange(n)] r in res: print r.get() f.close()
i have worker processes, random seeking within file, sleep
, read
. compare read
actual string character. not print right away avoid concurrency issues printing.
you can see when n=1
, goes well, goes astray when n>1
due concurrency in file descriptor.
i have tried duplicate file descriptor within fn()
:
def fn(): fd = os.dup(f) f2 = os.fdopen(fd)
and use f2
. not seem help.
how can seeking concurrently, i.e. multiple processes? (in case, open
file within fn()
, mwe. in actual case, harder that.)
you cannot - python i/o builds on c's i/o, , there 1 "current file position" per open file in c. that's inherently shared.
what can perform seek+read under protection of interprocess lock.
like define:
def process_init(lock): global seek_lock seek_lock = lock
and in main process add pool
constructor:
initializer=process_init, initargs=(multiprocessing.lock(),)
then whenever want seek , read, under protection of lock:
with seek_lock: f.seek(k) char = f.read(1)
as lock, want little logically necessary while it's held. won't allow concurrent seeking, prevent seeks in 1 process interfering seeks in other processes.
it would, of course, better open file in each process, each process has own notion of file position - said can't. rethink ;-)