On Wed, Jul 28, 2021 at 6:04 AM Vimal <j.vimal@gmail.com> wrote:
My intention is to run a web-crawling service on a public cloud. This service is geographically distributed, and therefore will run in multiple regions around the world inside AWS... this means there will be multiple AWS VPCs, each with their own NAT gateway, and traffic destined to websites that we crawl will appear to come from this NAT gateway's IP address.
Hello, AWS does not provide the ability to attach anycasted IP addresses to a NAT gateway, regardless of whether it would work, so that's the end of your quest.
The reason I want a predictable IP is to communicate this IP to website owners so they can allow access from these IPs into their networks. I chose IP as an example; it can also be a subnet, but what I don't want to provide is a list of 100 different IP addresses without any predictability.
If you bring your own IP addresses, you can attach a separate /24s of them to your VPCs in each region, providing you with a single predictable range of source addresses. You will find it difficult and expensive to acquire that many IP addresses from the regional registries for the purpose you describe. Silly question but: for a web crawler, why do you care whether it has the limited geographically distribution that a cloud service provides? It's a parallel batch task. It doesn't exactly matter whether you have minimum latency. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/