We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues. Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was strictly for business customers. i.e. A residential customer could only get static IP service by converting their account to a business account. That may change in the near future. One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo to determine what content should be made available, but that's, AFAIK, limited by country or region. What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution to this other than moving impacted customers off CGNAT. One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support is lacking. There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated. Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were using all their ports. i.e. was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking traffic - just lots of it, etc.? The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from 100.64/10 from entering our network via peering and transit, so they can be assured anything from 100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP space. I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others we haven't run into yet, and if so, how you solved them. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
We have had very good success with A10 vthunder on rural broadband co-op networks for Resi subscribers. No problems with the NAT aspect, literally 0. Operationally it just works. Games, streaming, xbox, nintendo switch, all just works. We typically do 32:1 or about 2000 udp/tcp ports allocated per customer behind the A10. The closer you climb to 48:1 64:1 128:1 etc the ratio of CDN blocking b/c "you are behind a vpn" starts to go up noticeably. If you have your LIDs (what A10 calls the inside ips that get mapped to nat pools) setup properly and your inside CGN 100.64/10 ip space sanely laid out its pretty easy. You can carve out pools for each market (say a couple of /21s or a /19) and map that to a pool of public ips accordingly and then in your self hosted geofeed lay out that block with the correct data. We try to give all business customers a /32 public ip either from dhcp reservation or static assignment on an evpn subnet so business customers would not get CGN ips typically. Also encourage them to enable v6 and get that setup where possible.
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was strictly for business customers. i.e. A residential customer could only get static IP service by converting their account to a business account. That may change in the near future.
One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo to determine what content should be made available, but that's, AFAIK, limited by country or region. What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution to this other than moving impacted customers off CGNAT.
One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support is lacking. There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated. Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were using all their ports. i.e. was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking traffic - just lots of it, etc.?
The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from 100.64/10 from entering our network via peering and transit, so they can be assured anything from 100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP space.
I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others we haven't run into yet, and if so, how you solved them.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Hi Jon, So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems. Mike On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was strictly for business customers. i.e. A residential customer could only get static IP service by converting their account to a business account. That may change in the near future.
One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo to determine what content should be made available, but that's, AFAIK, limited by country or region. What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution to this other than moving impacted customers off CGNAT.
One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support is lacking. There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated. Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were using all their ports. i.e. was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking traffic - just lots of it, etc.?
The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from 100.64/10 from entering our network via peering and transit, so they can be assured anything from 100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP space.
I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others we haven't run into yet, and if so, how you solved them.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not. On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 10/8/24 1:19 PM, Jon Lewis wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
I just checked and it looks like Youtube and Netflix do which is a pretty good chunk. Not sure about Amazon Prime. I was actually thinking about social media which i think it's pretty well supported. Mike
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
You may have run in to this, but Hulu also limits (or they were before I canceled the service personally) the number of “homes” you can use it at, and they tracked this by IP. So, if your customer’s IP changes more than a few times a year they will not be able to use the service they’re paying for. Last time I was responsible for said problem I was looking at alternate solutions to do CGNAT on, and reducing the domains from an architecture perspective…obviously they both have big repercussions. On Tue, Oct 8, 2024 at 7:10 PM Michael Thomas <mike@mtcc.com> wrote:
On 10/8/24 1:19 PM, Jon Lewis wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
I just checked and it looks like Youtube and Netflix do which is a pretty good chunk. Not sure about Amazon Prime. I was actually thinking about social media which i think it's pretty well supported.
Mike
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Anyone know the penetration rate of IPV6 for home users (cable modem)? I know that some of the CPE doesn't even properly support IPV6 such as the stuff being handed out by RCN/Astound. We just got our IPV6 allocation from ARIN and everything here is now dual stack. Was relatively painless. On Tuesday, October 8th, 2024 at 3:19 PM, Jon Lewis <jlewis@lewis.org> wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
It's pretty high, at least in the U.S. https://stats.labs.apnic.net/ipv6/US Support in consumer electronics (TVs, game consoles) is weak, but a lot of home gateways are fine. Netflix and YouTube stream over IPv6, and I think Amazon Prime Video also does, but of course only if you're streaming to an IPv6-capable device. https://www.vyncke.org/ipv6status/detailed.php?country=us Definitely some laggards, but if you haven't looked in a while, you might be surprised. Lee -----Original Message----- From: NANOG <nanog-bounces+leehoward=hilcostreambank.com@nanog.org> On Behalf Of Lucien Hoydic via NANOG Sent: Tuesday, October 8, 2024 5:04 PM To: nanog@nanog.org Subject: Re: CGNAT growing pains This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments. Anyone know the penetration rate of IPV6 for home users (cable modem)? I know that some of the CPE doesn't even properly support IPV6 such as the stuff being handed out by RCN/Astound. We just got our IPV6 allocation from ARIN and everything here is now dual stack. Was relatively painless. On Tuesday, October 8th, 2024 at 3:19 PM, Jon Lewis <jlewis@lewis.org> wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
also, isp-embedded cdn caching was required to provide ipv6, iirc for most of mine, and I provided ipv6 subnets even if it was optional. now i just need to enable ipv6 on the last mile broadband and I'll be in business! i can't wait to see the results. as I previously stated, I do not want to plan growth for my cgnat boundary...ipv6 is my (the) answer to relaxing the use of my cgnat boundary. i've tested 6vpe successfully over my pre-existing ipv4 mpls l3vpn's, and it's just another rt import/export to get ipv6 flowing naturally out to the internet. i've currently been testing ftth in my lab with calix cpe, and have successful ia_na (wan) and ia_pd (lan) prefix delegation working. the linux engineer(s) I work with are just stumped at the moment on getting the new KEA dhcp server to provide all the same ISC dhcp v4 option handling that we want to carry into v6. any advice is welcome -Aaron On 10/9/2024 11:04 AM, Howard, Lee via NANOG wrote:
It's pretty high, at least in the U.S.
https://stats.labs.apnic.net/ipv6/US
Support in consumer electronics (TVs, game consoles) is weak, but a lot of home gateways are fine. Netflix and YouTube stream over IPv6, and I think Amazon Prime Video also does, but of course only if you're streaming to an IPv6-capable device.
https://www.vyncke.org/ipv6status/detailed.php?country=us
Definitely some laggards, but if you haven't looked in a while, you might be surprised.
Lee
-----Original Message----- From: NANOG <nanog-bounces+leehoward=hilcostreambank.com@nanog.org> On Behalf Of Lucien Hoydic via NANOG Sent: Tuesday, October 8, 2024 5:04 PM To: nanog@nanog.org Subject: Re: CGNAT growing pains
This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
Anyone know the penetration rate of IPV6 for home users (cable modem)? I know that some of the CPE doesn't even properly support IPV6 such as the stuff being handed out by RCN/Astound.
We just got our IPV6 allocation from ARIN and everything here is now dual stack. Was relatively painless.
On Tuesday, October 8th, 2024 at 3:19 PM, Jon Lewis <jlewis@lewis.org> wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
-- -Aaron
From what I've seen, rolling out dual-stack will take about 40% of your traffic to native v6. YMMV of course. In addition to services that don't support v6, there are also devices (looking at you, Roku) that don't support it, or things like smart TVs that don't have it turned on by default, and most users aren't going to go poking that deep in the menus to enable it. With respect to the port usage, I've seen some CGN solutions that pre-allocate a block of ports per inside IP, but allow overflow, so they will allocate additional blocks of ports as needed. That seems to be a good balance because you don't burn a ton of ports for lighter users, and the logging requirements are pretty minimal since a log only gets generated when an additional block is allocated. It does mean that one user's traffic could be popping out of two different public IPs. On 10/10/24, 4:10 PM, "NANOG on behalf of Aaron Gould" <nanog-bounces+andrew.peterson=calix.com@nanog.org <mailto:calix.com@nanog.org> on behalf of aaron1@gvtc.com <mailto:aaron1@gvtc.com>> wrote: [You don't often get email from aaron1@gvtc.com <mailto:aaron1@gvtc.com>. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification <https://aka.ms/LearnAboutSenderIdentification> ] [External Email] also, isp-embedded cdn caching was required to provide ipv6, iirc for most of mine, and I provided ipv6 subnets even if it was optional. now i just need to enable ipv6 on the last mile broadband and I'll be in business! i can't wait to see the results. as I previously stated, I do not want to plan growth for my cgnat boundary...ipv6 is my (the) answer to relaxing the use of my cgnat boundary. i've tested 6vpe successfully over my pre-existing ipv4 mpls l3vpn's, and it's just another rt import/export to get ipv6 flowing naturally out to the internet. i've currently been testing ftth in my lab with calix cpe, and have successful ia_na (wan) and ia_pd (lan) prefix delegation working. the linux engineer(s) I work with are just stumped at the moment on getting the new KEA dhcp server to provide all the same ISC dhcp v4 option handling that we want to carry into v6. any advice is welcome -Aaron On 10/9/2024 11:04 AM, Howard, Lee via NANOG wrote:
It's pretty high, at least in the U.S.
https://stats.labs.apnic.net/ipv6/US <https://stats.labs.apnic.net/ipv6/US>
Support in consumer electronics (TVs, game consoles) is weak, but a lot of home gateways are fine. Netflix and YouTube stream over IPv6, and I think Amazon Prime Video also does, but of course only if you're streaming to an IPv6-capable device.
https://www.vyncke.org/ipv6status/detailed.php?country=us <https://www.vyncke.org/ipv6status/detailed.php?country=us>
Definitely some laggards, but if you haven't looked in a while, you might be surprised.
Lee
-----Original Message----- From: NANOG <nanog-bounces+leehoward=hilcostreambank.com@nanog.org <mailto:hilcostreambank.com@nanog.org>> On Behalf Of Lucien Hoydic via NANOG Sent: Tuesday, October 8, 2024 5:04 PM To: nanog@nanog.org <mailto:nanog@nanog.org> Subject: Re: CGNAT growing pains
This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
Anyone know the penetration rate of IPV6 for home users (cable modem)? I know that some of the CPE doesn't even properly support IPV6 such as the stuff being handed out by RCN/Astound.
We just got our IPV6 allocation from ARIN and everything here is now dual stack. Was relatively painless.
On Tuesday, October 8th, 2024 at 3:19 PM, Jon Lewis <jlewis@lewis.org <mailto:jlewis@lewis.org>> wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp <http://www.lewis.org/~jlewis/pgp> for PGP public key_________
-- -Aaron
On Oct 10, 2024, at 3:16 PM, Andrew Peterson via NANOG <nanog@nanog.org> wrote:
From what I've seen, rolling out dual-stack will take about 40% of your traffic to native v6. YMMV of course.
At our university we see between 50 and 60% IPv6 usage measured by inbound bandwidth. We have had IPv6 enabled everywhere on the network since 2008. If more browsers switched from Happy Eyeballs version 1 to Happy Eyeballs version 2 the percentage would go way up! And would shrink the traffic through our NAT boxes to a trickle. "Based on our testing, this makes our Happy Eyeballs implementation go from roughly 50/50 IPv4/IPv6 in iOS 8 and Yosemite to ~99% IPv6 in iOS 9 and El Capitan betas." https://mailarchive.ietf.org/arch/msg/v6ops/DYiI9v_O66RNbMJsx0NsatFkubQ/
In addition to services that don't support v6, there are also devices (looking at you, Roku) that don't support it, or things like smart TVs that don't have it turned on by default, and most users aren't going to go poking that deep in the menus to enable it.
With respect to the port usage, I've seen some CGN solutions that pre-allocate a block of ports per inside IP, but allow overflow, so they will allocate additional blocks of ports as needed. That seems to be a good balance because you don't burn a ton of ports for lighter users, and the logging requirements are pretty minimal since a log only gets generated when an additional block is allocated. It does mean that one user's traffic could be popping out of two different public IPs.
On 10/10/24, 4:10 PM, "NANOG on behalf of Aaron Gould" <nanog-bounces+andrew.peterson=calix.com@nanog.org <mailto:calix.com@nanog.org> on behalf of aaron1@gvtc.com <mailto:aaron1@gvtc.com>> wrote:
[You don't often get email from aaron1@gvtc.com <mailto:aaron1@gvtc.com>. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification <https://aka.ms/LearnAboutSenderIdentification> ]
[External Email]
also, isp-embedded cdn caching was required to provide ipv6, iirc for most of mine, and I provided ipv6 subnets even if it was optional. now i just need to enable ipv6 on the last mile broadband and I'll be in business! i can't wait to see the results. as I previously stated, I do not want to plan growth for my cgnat boundary...ipv6 is my (the) answer to relaxing the use of my cgnat boundary. i've tested 6vpe successfully over my pre-existing ipv4 mpls l3vpn's, and it's just another rt import/export to get ipv6 flowing naturally out to the internet.
i've currently been testing ftth in my lab with calix cpe, and have successful ia_na (wan) and ia_pd (lan) prefix delegation working. the linux engineer(s) I work with are just stumped at the moment on getting the new KEA dhcp server to provide all the same ISC dhcp v4 option handling that we want to carry into v6. any advice is welcome
-Aaron
On 10/9/2024 11:04 AM, Howard, Lee via NANOG wrote:
It's pretty high, at least in the U.S.
https://stats.labs.apnic.net/ipv6/US <https://stats.labs.apnic.net/ipv6/US>
Support in consumer electronics (TVs, game consoles) is weak, but a lot of home gateways are fine. Netflix and YouTube stream over IPv6, and I think Amazon Prime Video also does, but of course only if you're streaming to an IPv6-capable device.
https://www.vyncke.org/ipv6status/detailed.php?country=us <https://www.vyncke.org/ipv6status/detailed.php?country=us>
Definitely some laggards, but if you haven't looked in a while, you might be surprised.
Lee
-----Original Message----- From: NANOG <nanog-bounces+leehoward=hilcostreambank.com@nanog.org <mailto:hilcostreambank.com@nanog.org>> On Behalf Of Lucien Hoydic via NANOG Sent: Tuesday, October 8, 2024 5:04 PM To: nanog@nanog.org <mailto:nanog@nanog.org> Subject: Re: CGNAT growing pains
This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
Anyone know the penetration rate of IPV6 for home users (cable modem)? I know that some of the CPE doesn't even properly support IPV6 such as the stuff being handed out by RCN/Astound.
We just got our IPV6 allocation from ARIN and everything here is now dual stack. Was relatively painless.
On Tuesday, October 8th, 2024 at 3:19 PM, Jon Lewis <jlewis@lewis.org <mailto:jlewis@lewis.org>> wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp <http://www.lewis.org/~jlewis/pgp> for PGP public key_________
-- -Aaron
Bruce Curtis Network Engineer / Information Technology NORTH DAKOTA STATE UNIVERSITY phone: 701.231.8527 bruce.curtis@ndsu.edu
https://www.google.com/intl/en/ipv6/statistics.html On Tue, Oct 8, 2024 at 1:19 PM Jon Lewis <jlewis@lewis.org> wrote:
I'm not so sure about that. Our customers are all offered dual-stack (DHCPv6, DHCPv6-PD). Do any of the common streaming services support v6 yet? Last I checked, Hulu did not.
On Tue, 8 Oct 2024, Michael Thomas wrote:
Hi Jon,
So is this easier than what the mobile carriers are doing -- 464xlat, isn't it? Probably a sizeable portion of the traffic would be running native v6, right? Obviously it wouldn't run into these sorts of problems.
Mike
On 10/8/24 12:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
We have ~60,000 subs on ftth, dsl and cable modem, behind several Juniper MX routers.... MX960's with MS-MPC-128G (ftth and cm) and MX104 with MS-MIC-16G (dsl) and doing well. We a had some growing pains, issues, but were resolved with, app, eim, eif, and source ip load balancing on AMS interface.... also, since all my subs are in L3VPN's, I had to share inet.0 metric with inet.3 to get mp-ibgp to see other mx's as least cost route to accomplish nice load balancing. we did about 3000 ports per sub, so like 100 port blocks at max of 30 (100*30=3000). we usually do like a /24 or /23 at each MX960, and i recall /25 at the dsl MX104's. I've senn actually high point max usage of a MS-MPC-128G flat line during peak time at approx 65gbps... and even more recently i recall seeing about 70gbs. That's on a single MS-MPC-128G. I hope I don't have to upgrade to SPC....(or dual ms-mps-128g) I'd rather do dual stack ipv6 and bypass the cgnat boundary. that's what my current focus is. -Aaron On 10/8/2024 2:19 PM, Jon Lewis wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was strictly for business customers. i.e. A residential customer could only get static IP service by converting their account to a business account. That may change in the near future.
One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo to determine what content should be made available, but that's, AFAIK, limited by country or region. What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution to this other than moving impacted customers off CGNAT.
One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support is lacking. There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated. Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were using all their ports. i.e. was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking traffic - just lots of it, etc.?
The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from 100.64/10 from entering our network via peering and transit, so they can be assured anything from 100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP space.
I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others we haven't run into yet, and if so, how you solved them.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
-- -Aaron
First, roll out IPv6 if you haven't yet. That should relieve a lot of pressure on your pool size, and gives customers a workaround for some of the weird things ("Use the IPv6 address instead of IPv4."). Second, build your own geofeed. You can create a CSV providing as much detail as you want, down to "This individual address is at this long/lat" if you want. Then publish the location of that file in whois. Short pointer: https://mailman.nanog.org/pipermail/nanog/2022-April/219080.html After you've rolled out IPv6 you can consider 464xlat or MAP-T. Both work well, but both require support from the CPE. I've heard of a custom implementation that kicks a customer off the CGN/xlat/BR if it detects uPNP (i.e., a customer that needs port forwarding). It requires reprovisioning the CPE and a reboot, but two minutes of downtime probably prevents a support call. Lee Howard IPv4.Global -----Original Message----- From: NANOG <nanog-bounces+leehoward=hilcostreambank.com@nanog.org> On Behalf Of Jon Lewis Sent: Tuesday, October 8, 2024 3:19 PM To: nanog@nanog.org Subject: CGNAT growing pains [You don't often get email from jlewis@lewis.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments. We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues. Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was strictly for business customers. i.e. A residential customer could only get static IP service by converting their account to a business account. That may change in the near future. One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo to determine what content should be made available, but that's, AFAIK, limited by country or region. What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution to this other than moving impacted customers off CGNAT. One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support is lacking. There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated. Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were using all their ports. i.e. was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking traffic - just lots of it, etc.? The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from 100.64/10 from entering our network via peering and transit, so they can be assured anything from 100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP space. I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others we haven't run into yet, and if so, how you solved them. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Hi Jon, Are you dual stack? v6 would solve some of these issues? On Tue, Oct 8, 2024 at 12:20 PM Jon Lewis <jlewis@lewis.org> wrote:
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did run into a number of issues.
Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was strictly for business customers. i.e. A residential customer could only get static IP service by converting their account to a business account. That may change in the near future.
One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo to determine what content should be made available, but that's, AFAIK, limited by country or region. What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution to this other than moving impacted customers off CGNAT.
One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support is lacking. There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated. Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were using all their ports. i.e. was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking traffic - just lots of it, etc.?
The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from 100.64/10 from entering our network via peering and transit, so they can be assured anything from 100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10, giving an error that it's invalid IP space.
I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others we haven't run into yet, and if so, how you solved them.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
participants (10)
-
Aaron Gould
-
Andrew Peterson
-
C. Jon Larsen
-
Curtis, Bruce
-
David Bass
-
Howard, Lee
-
Jon Lewis
-
Lucien Hoydic
-
Michael Thomas
-
Tom Mitchell