On Wed, Jan 31, 2001, Matthew Zito wrote:
If you're looking for large scalability AND high performance, my preferred solution would be to have a relational database as the backend, but don't store any messages in it - simply pointers to their location on disk. Then store the messages without regard to intended username in a hashed directory structure. The pop3 server then gets the list of new messages from the database server, which could just be a list of filenames. Then, the pop3 server simply has to open the message to return it - it doesn't have to do an opendir(). Also, if you use the filename as the UIDL returned, there's no need to even stat() the file, again saving you a whole nfs call. The obvious downside is that you can't do a :
rm -f /users/j/o/h/n/johndoe.mbx
But, with 200k mailboxes, you should have an automated way to do that anyway.
Hah. Unlink the directory, and do a background fsck every few hours? :) The trouble with the above format is that you're ignoring any locality that exists in the filesystem. For example, in Berkeley FFS, files in a given directory are allocated in the same cylinder group (or at least it is attempted..) Which, under heavy heavy load could actually give a slight performance boost on a non-filled FFS. I believe there was a paper covering this locality for web caches. Ah, yes: "Reducing the Disk I/O of Web Proxy Server Caches" - Carlos Maltzahn and Kathy J Richardson Compaq Computer Corporation, Network Systems Laboratory - Dirk Grunwald University of Colorado .. some (not all) of the concepts included there are relevant here. Other filesystems will have different allocation/layout policies, and additions such as "hinting" which can substantially speed up mail accesses. But, this is off topic, and I digress. :-) Adrian -- Adrian Chadd "Sex Change: a simple job of outside <adrian@creative.net.au> to inside plumbing." - Some random movie