If accurate interface stats are important to you, MX’s don’t support accurate SNMP Interface Utilization, ie they don’t comply with RFC2665/3635, which seems like a fairly basic thing to do but they decided not to, and has been impactful to me in the past. So, any SNMP monitoring of an interface will always show less utilization than what is actually occurring, possibly leading to a false sense of security, or delay in augmentation. Would also affect usage based billing, if you do that. https://www.juniper.net/documentation/us/en/software/junos/network-mgmt/topi... For M Series, T Series, and MX Series, the SNMP counters do not count the Ethernet header and frame check sequence (FCS). Therefore, the Ethernet header bytes and the FCS bytes are not included in the following four tables: ifInOctets ifOutOctets ifHCInOctets ifHCOutOctets Thanks, Michael Fiumano *From:* NANOG *On Behalf Of *Mark Tinka *Sent:* Monday, May 10, 2021 10:25 AM *To:* nanog@nanog.org *Subject:* Re: Juniper hardware recommendation On 5/10/21 16:19, aaron1@gvtc.com wrote: I prefer MX204 over the ACX5048. The ACX5048 can’t add L3 interface to an mpls layer 2 type of service. There are other limitations to the ACX5048 that cause me to want to possibly replace them with MX204’s. But in defense of the ACX5048, we have gotten some good mileage (a few years now) of good resi/busi bb over vrf’s and also carrier ethernet for businesses and lots of cell backhaul… so they are good for that. I’ve heard the ACX5448 was even better. Trio will always provide better features, but come with the price tag to boot. I’m looking at the MX240 for the SCB3E MPC10E hefty with 100 gig ports You might want to look at the MX10003, in that case, as well. We are deploying those for 100Gbps service (customer-facing). Works out cheaper than offering 100Gbps service on the MX240/480/960 for the same task. Mark.
At least it isn’t Arista, where SVI egress counters are disabled by default, and once enabled count everything UNLESS the packet egresses via a LAG! Talk about being “impactful”, we’re having to buy new routers to insert behind them, just to count packets so we can bill accurately, and for that matter, have traffic graphs that work at all. :-( Adam Thompson Consultant, Infrastructure Services [[MERLIN LOGO]]<https://www.merlin.mb.ca/> 100 - 135 Innovation Drive Winnipeg, MB, R3T 6A8 (204) 977-6824 or 1-800-430-6404 (MB only) athompson@merlin.mb.ca<mailto:athompson@merlin.mb.ca> www.merlin.mb.ca<http://www.merlin.mb.ca/> From: NANOG <nanog-bounces+athompson=merlin.mb.ca@nanog.org> On Behalf Of Michael Fiumano Sent: Friday, May 14, 2021 12:06 PM To: nanog@nanog.org Subject: RE: Juniper hardware recommendation If accurate interface stats are important to you, MX’s don’t support accurate SNMP Interface Utilization, ie they don’t comply with RFC2665/3635, which seems like a fairly basic thing to do but they decided not to, and has been impactful to me in the past. So, any SNMP monitoring of an interface will always show less utilization than what is actually occurring, possibly leading to a false sense of security, or delay in augmentation. Would also affect usage based billing, if you do that. https://www.juniper.net/documentation/us/en/software/junos/network-mgmt/topi... For M Series, T Series, and MX Series, the SNMP counters do not count the Ethernet header and frame check sequence (FCS). Therefore, the Ethernet header bytes and the FCS bytes are not included in the following four tables: ifInOctets ifOutOctets ifHCInOctets ifHCOutOctets Thanks, Michael Fiumano From: NANOG On Behalf Of Mark Tinka Sent: Monday, May 10, 2021 10:25 AM To: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Juniper hardware recommendation On 5/10/21 16:19, aaron1@gvtc.com<mailto:aaron1@gvtc.com> wrote: I prefer MX204 over the ACX5048. The ACX5048 can’t add L3 interface to an mpls layer 2 type of service. There are other limitations to the ACX5048 that cause me to want to possibly replace them with MX204’s. But in defense of the ACX5048, we have gotten some good mileage (a few years now) of good resi/busi bb over vrf’s and also carrier ethernet for businesses and lots of cell backhaul… so they are good for that. I’ve heard the ACX5448 was even better. Trio will always provide better features, but come with the price tag to boot. I’m looking at the MX240 for the SCB3E MPC10E hefty with 100 gig ports You might want to look at the MX10003, in that case, as well. We are deploying those for 100Gbps service (customer-facing). Works out cheaper than offering 100Gbps service on the MX240/480/960 for the same task. Mark.
To echo Alain's comments earlier, the Juniper QFX 5100 series is stable, once you figure out all the shortcomings of the chipset. We aren't doing anything fancy, but have certainly bumped into our share of issues that have no workaround because it's a limitation of the physical hardware. Since we're talking about counters, see if you can spot the error with IPv6 accounting in the output from our 5100 below (about 50% of our traffic is v6): Transit statistics: Input bytes : 284315487788005 412457312 bps Output bytes : 39937401090441 29417528 bps Input packets: 231391925059 39552 pps Output packets: 88278182551 10809 pps IPv6 transit statistics: Input bytes : 0 Output bytes : 0 Input packets: 0 Output packets: 0 ;-) I believe the 5100 just announced EOL (https://support.juniper.net/support/eol/product/qfx_series/); I haven't had time to look at the replacement models to see if they behave any better. Jason
Looks like its replacement is the 5120 series. The question is does the 5120 have the same limitations and similar chipset? On Sun, May 16, 2021 at 7:06 AM Jason Healy <jhealy@suffieldacademy.org> wrote:
To echo Alain's comments earlier, the Juniper QFX 5100 series is stable, once you figure out all the shortcomings of the chipset. We aren't doing anything fancy, but have certainly bumped into our share of issues that have no workaround because it's a limitation of the physical hardware. Since we're talking about counters, see if you can spot the error with IPv6 accounting in the output from our 5100 below (about 50% of our traffic is v6):
Transit statistics: Input bytes : 284315487788005 412457312 bps Output bytes : 39937401090441 29417528 bps Input packets: 231391925059 39552 pps Output packets: 88278182551 10809 pps IPv6 transit statistics: Input bytes : 0 Output bytes : 0 Input packets: 0 Output packets: 0
;-)
I believe the 5100 just announced EOL ( https://support.juniper.net/support/eol/product/qfx_series/); I haven't had time to look at the replacement models to see if they behave any better.
Jason
All sounds like a bit of Broadcom to me :-). Mark. On 5/16/21 14:56, Colton Conor wrote:
Looks like its replacement is the 5120 series. The question is does the 5120 have the same limitations and similar chipset?
On Sun, May 16, 2021 at 7:06 AM Jason Healy <jhealy@suffieldacademy.org <mailto:jhealy@suffieldacademy.org>> wrote:
To echo Alain's comments earlier, the Juniper QFX 5100 series is stable, once you figure out all the shortcomings of the chipset. We aren't doing anything fancy, but have certainly bumped into our share of issues that have no workaround because it's a limitation of the physical hardware. Since we're talking about counters, see if you can spot the error with IPv6 accounting in the output from our 5100 below (about 50% of our traffic is v6):
Transit statistics: Input bytes : 284315487788005 412457312 bps Output bytes : 39937401090441 29417528 bps Input packets: 231391925059 39552 pps Output packets: 88278182551 10809 pps IPv6 transit statistics: Input bytes : 0 Output bytes : 0 Input packets: 0 Output packets: 0
;-)
I believe the 5100 just announced EOL (https://support.juniper.net/support/eol/product/qfx_series/ <https://support.juniper.net/support/eol/product/qfx_series/>); I haven't had time to look at the replacement models to see if they behave any better.
Jason
On Sun, 16 May 2021, Colton Conor wrote:
Looks like its replacement is the 5120 series. The question is does the 5120 have the same limitations and similar chipset?
Severly limited TCAM makes use of ACLs challenging. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Hey Michael,
If accurate interface stats are important to you, MX’s don’t support accurate SNMP Interface Utilization, ie they don’t comply with RFC2665/3635, which seems like a fairly basic thing to do but they decided not to, and has been impactful to me in the past. So, any SNMP monitoring of an interface will always show less utilization than what is actually occurring, possibly leading to a false sense of security, or delay in augmentation. Would also affect usage based billing, if you do that.
Juniper has worked like this since day1 and shockingly the world doesn't care, people really don't care for accuracy. CLI and SNMP are both L3. If you want to report L2 'set chassis fpc N pic N account-layer2-overhead'. However, who decided that L2 is right? To me only L1 is right, I don't care about L2 at all. So any system I'd use, I'd normalise the data to L1. Ethernet on minimum size packets L1 - 100% L2 - 76% L3 - 24% Not sure why 76 is better than 24. Both are wrong and will cause operational confusion because people think the link is not congested. This is extremely poorly understood even by professionals, so poorly that people regularly think you can't get 100% utilisation, because you can't unless you normalise stats to L1 rate. -- ++ytti
On 5/15/21 10:38, Saku Ytti wrote:
Not sure why 76 is better than 24. Both are wrong and will cause operational confusion because people think the link is not congested. This is extremely poorly understood even by professionals, so poorly that people regularly think you can't get 100% utilisation, because you can't unless you normalise stats to L1 rate.
Because end users will demand compensation and lawyer time for only getting 195Mbps on their 200Mbps service. 195Mbps is not 200Mbps. I've seen operators over-provision services simply to quiet-down the noise, i.e., they'll provision 210Mbps for a 200Mbps service. We don't do this, but I encourage all of my competitors to do so. The example I always give is that if there were no seats on an aircraft, it'd carry significantly more people than otherwise advertised. We try hard to educate customers about how the higher layers eat away at the lower ones re: capacity, and that's just how the system works. There probably isn't a single man-made technology that offers 100% efficiency. So I'm not about to go out of business giving you the optical illusion that my corner of earth will make it so. In the end, it's easier to just let those customers go than spend human hours and money placating them. Mark.
On Sat, 15 May 2021 at 13:00, Mark Tinka <mark@tinka.africa> wrote:
Because end users will demand compensation and lawyer time for only getting 195Mbps on their 200Mbps service. 195Mbps is not 200Mbps.
Customers and operators both have very little idea what they are doing. Most people have no idea what the policer are accounting for. And everything still works, without anyone understanding what they are doing. So mostly it's not a problem if you're doing L1, L2 or L3. Of course your 100M physical interface is limited to L1 rate of 100M. If you provision that as VLAN of 100M service, should you sell now L1, L2 or L3 of 100M? What are. you doing? (No you you, passive you, you are not representative, nanog is not representative, the passive you doesn't know which they are selling, and which they are selling changes with hardware upgrades, and they don't know it). -- ++ytti
Hi! On Sat, 2021-05-15 at 11:38 +0300, Saku Ytti wrote:
Juniper has worked like this since day1 and shockingly the world doesn't care, people really don't care for accuracy. CLI and SNMP are both L3. If you want to report L2 'set chassis fpc N pic N account-layer2-overhead'.
However, who decided that L2 is right? To me only L1 is right, I don't care about L2 at all. So any system I'd use, I'd normalise the data to L1.
Ethernet on minimum size packets L1 - 100% L2 - 76% L3 - 24%
Not sure why 76 is better than 24. Both are wrong and will cause operational confusion because people think the link is not congested. This is extremely poorly understood even by professionals, so poorly that people regularly think you can't get 100% utilisation, because you can't unless you normalise stats to L1 rate.
How do you normalise? Use L2 or L3 octets stats, and use the number of packets to calculate the L2 and/or L1 overhead the stats are missing? Or do you have a better way? Cheers, Sander
On Mon, 17 May 2021 at 00:22, Sander Steffann <sander@steffann.nl> wrote:
How do you normalise? Use L2 or L3 octets stats, and use the number of packets to calculate the L2 and/or L1 overhead the stats are missing? Or do you have a better way?
That's the way one of my employers did it, and I can't think of a better way. bytes += PPS*overhead Overhead is likely 20bytes (preamble, SFD, ifg). But it could also be 24B (FCS/CRC might be missing in what otherwise is claimed to be L2). You may need a lab to confirm what exactly is being counted. This adjustment could be in DB or it could be render-time, both have pro and con. -- ++ytti
Good monitoring softwares allow to do "preprocessing" before storing the monitored data in database. Saku's formula should work well in this case. I use Zabbix for monitoring big infrastructure. It has many advantages like: - Push or pull metrics (dmz friendly) - Can use many proxies (scale well) - preprocessing of data (fix vendors mess) - alert based on business logic through templates ( proactive instead of reactive) - open source and have enterprise support (always nice to be able to call 1800 zabbix in case of emergency) - agent, agentless, discovery, snmp, java/jmx, telnet, ipmi, web scenarios, etc (never face a coirner-case that can't be monitored so far) Really awesome at infrastructure level. Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Saku Ytti Sent: May 17, 2021 3:34 AM To: Sander Steffann <sander@steffann.nl> Cc: Michael Fiumano <mfiumano2@gmail.com>; nanog list <nanog@nanog.org> Subject: Re: Juniper hardware recommendation On Mon, 17 May 2021 at 00:22, Sander Steffann <sander@steffann.nl> wrote:
How do you normalise? Use L2 or L3 octets stats, and use the number of packets to calculate the L2 and/or L1 overhead the stats are missing? Or do you have a better way?
That's the way one of my employers did it, and I can't think of a better way. bytes += PPS*overhead Overhead is likely 20bytes (preamble, SFD, ifg). But it could also be 24B (FCS/CRC might be missing in what otherwise is claimed to be L2). You may need a lab to confirm what exactly is being counted. This adjustment could be in DB or it could be render-time, both have pro and con. -- ++ytti
participants (9)
-
Adam Thompson
-
Colton Conor
-
Jason Healy
-
Jean St-Laurent
-
Jon Lewis
-
Mark Tinka
-
Michael Fiumano
-
Saku Ytti
-
Sander Steffann