Re: Scalable Mail solution with NAS

31 Jan 2001

      On Wed, 31 Jan 2001, Eric Sobocinski wrote:
...
At 11:06 AM -0500, 01/31/2001, Sebastien Berube wrote:
...
One way to fix this
issue would be to use a hashing scheme to split the amount of actual
mailboxes into a subdirectory structure.  You could get something like
johndoe@yourdomain.com would have his mailbox in
/export/mailboxes/j/o/h/n/johndoe.mbox
so in /export/mailboxes, in order to find the j directory, you only have
about 36 directories entries or so.
Although this example is not good in the case where you accept usernames
with 3 or less characters.
It's not hard to right-pad any short usernames before hashing.  For
instance, the username "bo" might hash as "bo__" and thus would end up in
the directory "/export/mailboxes/b/o/_/_/bo.mbox".  If you allow
non-alphanumerics you'll want to translate those to something innocuous as
well, or a name such as "bo.lee" will cause problems.
Well, hashing like that works well from the standpoint that it's very easy 
for the software to find the mailbox.  It's going to make things like backups 
very costly, though, because of all the recursive directories.  Also, you're 
going to end up with some directories very imbalanced, since there are more 
frequently occurring names.  

If you're going to use NFS, you probably want to use something like maildir 
format. - which is nfs-safe but becomes very costly as the number of messages 
increase. A lot of that has to do with the performance of the remote nfs 
server - the underlying filesystem's performance in reading large directories 
will make a BIG difference as far as that goes.  Netapps have excellent 
large-directory performance, fwiw.

If you're looking for large scalability AND high performance, my preferred 
solution would be to have a relational database as the backend, but don't 
store any messages in it - simply pointers to their location on disk.  Then 
store the messages without regard to intended username in a hashed directory 
structure.   The pop3 server then gets the list of new messages from the 
database server, which could just be a list of filenames.  Then, the pop3 
server simply has to open the message to return it - it doesn't have to do an 
opendir().  Also, if you use the filename as the UIDL returned, there's no 
need to even stat() the file, again saving you a whole nfs call.   The 
obvious downside is that you can't do a :

rm -f /users/j/o/h/n/johndoe.mbx

But, with 200k mailboxes, you should have an automated way to do that anyway.

Thanks,
Matt

-- 
Matthew J. Zito
Systems Engineer
Register.com, Inc., 11th Floor, 575 8th Avenue, New York, NY 10018
Ph: 212-798-9205
PGP Key Fingerprint: 4E AC E1 0B BE DD 7D BC  D2 06 B2 B0 BF 55 68 99

Re: Scalable Mail solution with NAS

Matthew Zito