RE: Routing issues to AWS environment.

9 May 2019

      Job,
We have had a lot of dialog with the excellent people at NTT NOC this week, easily over a couple of hours in total. We were told to talk to AWS directly and have our customers talk to AWS. Basically, "it's not us" response. So we reached out to our buddies in NANOG. We have no way to get AWS to communicate to us, we don't directly peer with them like many other cloud providers out of the Equinix IX. 

We have a work around in the fact that we broke up some of our Ashburn /21 advertisements into /23 and /24 advertisements of the ones that included our customer IP assignments. The result was pushing a more specific route out our Ashburn peers versus our out of the area peers such as in Chicago is helping. That has helped resolve our direct customer issues, but leads us to believe where we have BGP peering in other regions outside of Ashburn, VA AWS isn't honoring our AS prepending. 

The original issue is that our local customers in the DC region get routed from our AS over NTT into AWS in Ashburn for AWS-East region environments, but AWS is sending the return traffic over to Chicago to one of our other upstream peers. For a few select customers this is breaking their applications completely with not being able to connect or severely disrupting performance and bringing the applications to a crawl. Yet, we can push iperf traffic in our own AWS instances with zero packet loss or perceivable issue other than the asymmetrical routing that is adding around 30ms to the return latency versus the typical 2ms to 3ms latency.   We do have Layer2 between our POPs. 

Is ignoring AS prepending common? Given my example issue, what direction would you normally take? 

Sincerely, 
Nick Ellermann

-----Original Message-----
From: NANOG <nanog-bounces+nellermann=broadaspect.com@nanog.org> On Behalf Of Job Snijders
Sent: Thursday, May 9, 2019 10:24
To: Chuck Church <chuckchurch@gmail.com>
Cc: nanog@nanog.org
Subject: Re: Routing issues to AWS environment.

Hi Chuck,

On Thu, May 09, 2019 at 06:34:21AM -0400, Chuck Church wrote:
...
Are you sure the problem isn’t NTT? My buddy’s WISP peers with Spirit 
and had a boatload of problems with random packet loss affecting 
initially just SIP and RTP (both UDP). Spirit was blaming NTT.
Problems went away when Spirit stopped peering with NTT yesterday.
Path is through Telia now to their main SIP trunk provider.
I don't know the specifics of what you reference, but in a large geographically dispersed network like NTT's backbone, I can assure you there will always be something down somewhere. Issues can take on many
forms: sometimes it is a customer specific issue related to a single interface, sometimes something larger is going on.

It is quite rare that the whole network is on fire, so in the general case is good to investigate and consider each and every report about potential issues separately.

The excellent people at the NTT NOC are always available at noc@ntt.net or the phone numbers listed in PeeringDB.

Kind regards,

Job