We're seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks. If someone is able to assist, could they please contact me off-list?
pennsylvania is largeish, maybe: "To philadelphia customers behind deviceX, Y, Z" or "Pittsburghers behind devices M, N, O" or something else helpful :) On Tue, Mar 12, 2019 at 12:30 AM Phil Lavin <phil.lavin@cloudcall.com> wrote:
We’re seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks.
If someone is able to assist, could they please contact me off-list?
or something else helpful :)
Here's traceroutes, for those interested. Times are UTC. The issue is present to Verizon customers in both Pittsburgh and BlueBell. I don't have any other PA Verizon customers to reference against, though all of our other Verizon customers outside of PA look fine. phil@debian:~$ mtr -zwc1 108.16.123.123 Start: Tue Mar 12 00:19:43 2019 HOST: debian Loss% Snt Last Avg Best Wrst StDev 1. AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0 2. AS??? 10.11.11.1 0.0% 1 2.6 2.6 2.6 2.6 0.0 3. AS2914 129.250.199.37 0.0% 1 1.5 1.5 1.5 1.5 0.0 4. AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.4 9.4 9.4 9.4 0.0 5. AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 6.6 6.6 6.6 6.6 0.0 6. AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 8.5 8.5 8.5 8.5 0.0 7. AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 8. AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 137.2 137.2 137.2 137.2 0.0 9. AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 118.4 118.4 118.4 118.4 0.0 phil@debian:~$ mtr -zwc1 108.16.123.123 Start: Tue Mar 12 07:48:25 2019 HOST: debian Loss% Snt Last Avg Best Wrst StDev 1. AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0 2. AS??? 10.11.11.1 0.0% 1 1.0 1.0 1.0 1.0 0.0 3. AS2914 129.250.199.37 0.0% 1 2.9 2.9 2.9 2.9 0.0 4. AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 7.2 7.2 7.2 7.2 0.0 5. AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.1 9.1 9.1 9.1 0.0 6. AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 7.1 7.1 7.1 7.1 0.0 7. AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 8. AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 14.7 14.7 14.7 14.7 0.0 9. AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 17.8 17.8 17.8 17.8 0.0 Smokeping graph at https://ibb.co/g4VQR8k
On Tue, Mar 12, 2019 at 1:01 AM Phil Lavin <phil.lavin@cloudcall.com> wrote:
or something else helpful :)
Here's traceroutes, for those interested. Times are UTC. The issue is present to Verizon customers in both Pittsburgh and BlueBell. I don't have any other PA Verizon customers to reference against, though all of our other Verizon customers outside of PA look fine.
phil@debian:~$ mtr -zwc1 108.16.123.123 Start: Tue Mar 12 00:19:43 2019 HOST: debian Loss% Snt Last Avg Best Wrst StDev 1. AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0 2. AS??? 10.11.11.1 0.0% 1 2.6 2.6 2.6 2.6 0.0 3. AS2914 129.250.199.37 0.0% 1 1.5 1.5 1.5 1.5 0.0 4. AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.4 9.4 9.4 9.4 0.0 5. AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 6.6 6.6 6.6 6.6 0.0 6. AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 8.5 8.5 8.5 8.5 0.0 7. AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 8. AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 137.2 137.2 137.2 137.2 0.0 9. AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 118.4 118.4 118.4 118.4 0.0
phil@debian:~$ mtr -zwc1 108.16.123.123 Start: Tue Mar 12 07:48:25 2019 HOST: debian Loss% Snt Last Avg Best Wrst StDev 1. AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0 2. AS??? 10.11.11.1 0.0% 1 1.0 1.0 1.0 1.0 0.0 3. AS2914 129.250.199.37 0.0% 1 2.9 2.9 2.9 2.9 0.0 4. AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 7.2 7.2 7.2 7.2 0.0 5. AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.1 9.1 9.1 9.1 0.0 6. AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 7.1 7.1 7.1 7.1 0.0 7. AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 8. AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 14.7 14.7 14.7 14.7 0.0 9. AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 17.8 17.8 17.8 17.8 0.0
I'm not in philly, but from IAD area the path back is via HE.net.it seems quick enough from IAD, but as a data point PHL may head back via NYC or it may go through IAD and HE.net.
Smokeping graph at https://ibb.co/g4VQR8k
PSA to people running transit networks. a) During congestion you are not buffering just the exceeding traffic, you will delay every packet in the class for duration of congestion b) Adding buffering does not increase RX rate during persistent congestion, it only increases delay c) Occasional persistent congestion is normal, because how we've modeled economics of transit d) Typical device transit network operates can add >100ms latency on a single link, but you don't want more than 5ms latency on BB link Fix for IOS-XR: class BE bandwidth percent 50 queue-limit 5 ms Fix for Junos: BE { transmit-rate percent 50; buffer-size temporal 5k; } The actual byte value programmed is interface_rate * percent_share * time. If your class is by design out-of-contract, that means your rate is actually higher, which means the programmed buffer byte value results in smaller queueing delay. The configured byte value will only result in configured queueing delay when actual rate == g-rate. The buffers are not large to facilitate buffering single queue for 100ms, the buffers are large to support configurations of large amount of logical interfaces each with large number of queues. If you are configuring just few queues, assumption is that you are dimensoning your buffer sizes. Hopefully this motivates some networks to limit buffer sizes. Thanks! On Tue, Mar 12, 2019 at 9:32 AM Phil Lavin <phil.lavin@cloudcall.com> wrote:
We’re seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks.
If someone is able to assist, could they please contact me off-list?
-- ++ytti
From: NANOG <nanog-bounces@nanog.org> On Behalf Of Saku Ytti Sent: Tuesday, March 12, 2019 7:58 AM
PSA to people running transit networks.
a) During congestion you are not buffering just the exceeding traffic, you will delay every packet in the class for duration of congestion b) Adding buffering does not increase RX rate during persistent congestion, it only increases delay c) Occasional persistent congestion is normal, because how we've modeled economics of transit d) Typical device transit network operates can add >100ms latency on a single link, but you don't want more than 5ms latency on BB link
Fix for IOS-XR: class BE bandwidth percent 50 queue-limit 5 ms
Fix for Junos: BE { transmit-rate percent 50; buffer-size temporal 5k; }
The actual byte value programmed is interface_rate * percent_share * time. If your class is by design out-of-contract, that means your rate is actually higher, which means the programmed buffer byte value results in smaller queueing delay. The configured byte value will only result in configured queueing delay when actual rate == g-rate.
The buffers are not large to facilitate buffering single queue for 100ms, the buffers are large to support configurations of large amount of logical interfaces each with large number of queues. If you are configuring just few queues, assumption is that you are dimensoning your buffer sizes.
Hopefully this motivates some networks to limit buffer sizes.
Thanks!
+1 to that. The overall system works so much better if the network nodes don't interfere and instead report the actual network conditions accurately and in a timely fashion to the end hosts -i.e. by inducing drops as and when they occur. There are a number of papers on this topic btw.. adam
We're seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks.
Verizon reached out shortly after my e-mail to say they had resolved the issue - latency has been within normal bounds since. Many thanks :)
participants (4)
-
adamv0025@netconsultings.com
-
Christopher Morrow
-
Phil Lavin
-
Saku Ytti