An added twist appears to be that a full demand matrix is difficult to build without large systems of flow collection. I'm curious if the problem can easily be approached without full insight in to this data. (per link usage yes, but without end-to-end flows)
Methods for reducing the number of flow collection points are discussed in this paper: http://www.research.att.com/~jrex/papers/sigcomm00.pdf Then there's always the MPLS as measurement platform approach. Set up a mesh of LSPs and you have edge-to-edge (or perhaps POP-to-POP to reduce the number of LSPs) traffic demands. Less ambitious variations on this can be used to classify limited amounts of traffic for tactical TE. Something like Juniper's DCU might provide enough info, if 16 buckets is granular enough.
I've seen a few modeling applications, but nothing that attempts to solve the general optimization problem.
There have been a couple of interesting papers on the topic recently. See, for example, work by Thorup et. al. and Ben-Ameur et. al. I'm not aware of released code that actually implements these papers, though. Bradley