You can configure pmacct to specify on which properties of the received flow data it should aggregate its output data, one could configure pmacct to store data using the following primitives:
($timeperiod, $entrypoint_router_id, $bgp_nexthop, $packet_count)
Where $timeperiod is something like 5 minute ranges, and the post processing software calculates the distance between the entrypoint router and where the flow would leave the network ($bgp_nexthop).
See 'aggregate' on http://wiki.pmacct.net/OfficialConfigKeys
In short: you configure pmacct to throw away everything you don't need (maybe after some light pre-processing), and hope that what remains is small enough to fit in your cluster and at the same time offers enough insight to answer the question you set out to resolve.
it's late here, so i am a bit slower than usual. but could you explain in detail how this tests the hypothesis? even of all your traffic entered on a bgp hop and exited on a bgp hop, and all bgp entries set next_hop (which i think you do), you would be ignoring the 'distance' the packet traveled from source to get to your entry and traveled from your exit to get to the final destination. randy