I need a fast solution for random w/r of text snippets in Python. What I want to do is like this:
- Write the snippet and record a pointer
- Use the pointer to retrieve the snippet
The snippets are of arbitrary length and I choose not to use a database to store them, but only the pointers. By simply replacing Python file methods with C functions (solution 1), it's been pretty fast and the pointers consist of only "where" and "how long" of the snippet. After that, I experimented what I thought is the real thing that works with Berkeley DB. I don't know what to call it, a "paging" something perhaps?
The thing is, this code definitely works, 1.5 to 2 times faster than solution 1, but it isn't a lot faster and needs to use a 4-part pointer. Perhaps this is not a worthy method, but is there any room to significantly improve it?
The following is the code:
from collections import namedtuple
from ctypes import cdll,c_char_p,\
c_void_p,c_size_t,c_long,\
c_int,create_string_buffer
libc = cdll.msvcrt
fopen = libc.fopen
fread = libc.fread
fwrite = libc.fwrite
fseek = libc.fseek
ftell = libc.ftell
fflush = libc.fflush
fclose = libc.fclose
#######################################################
# The following is how to write a snippet into the SnippetBase file
ptr = namedtuple('pointer','blk1, start, nblk, length')
snippet = '''
blk1: the first blk where the snippet is
start: the start of this snippet
nblk: number of blocks this snippet takes
length: length of this snippet
'''
bsize = 4096 # bsize: block size
fh = fopen('.\\SnippetBase.txt','wb')
fseek(fh,0,2)
pos1 = divmod(ftell(fh),bsize)
fwrite(snippet,c_size_t(len(snippet)),1,fh)
fflush(fh)
pos2 = divmod(ftell(fh),bsize)
ptr = ptr(pos1[0],pos1[1],pos2[0]-pos1[0]+1,len(snippet))
fclose(fh)
#######################################################
# The following is how to read the snippet from the SnippetBase file
fh = fopen('.\\SnippetBase.txt','rb')
fseek(fh,c_long(ptr.blk1*bsize),1)
buff = create_string_buffer(ptr.nblk*bsize)
fread(buff,c_size_t(ptr.nblk*bsize),1,fh)
print buffer(buff,ptr.start,ptr.length)
fclose(fh)