Arista hardware health and environmental nagios plugin
Hello All, Does anyone have a ready to use nagios/icinga plugin for hardware health and temperature monitoring of arista devices that they are willing to share? (7050, 7280 and 7500) With google searches I can't find any available. Arista TAC replied: "nagios does snmp, so that should fit you needs" There is https://github.com/ncsa/nagios-plugins which should be able to be augmented to do the extra checks. And with pyeapi it shouldn't be rocket science either. (for a developer, which I am not) If I were to request our devops department to build it it would probably put in back of a very long queue. So if there is anyone out there that is willing to share it would be greatly appreciated. Thanks, Bas
See it as tweaking the wheel... Now a perl script (with caching) to monitor VCP ports on QFX5100's is re-inventing the wheel, just because their engineers opted out of the usual way to handle network interfaces. They could have simply named them VCP-<Member ID>/0/x instead of naming them all VCP-255/0/x ----- Alain Hebert ahebert@pubnix.net PubNIX Inc. 50 boul. St-Charles P.O. Box 26770 Beaconsfield, Quebec H9W 6G7 Tel: 514-990-5911 http://www.pubnix.net Fax: 514-990-9443 On 05/19/17 15:34, bas wrote:
Hello,
Wiadomość napisana przez bas <kilobit@gmail.com> w dniu 19.05.2017, o godz. 21:34:
I had hoped not to have to re-invent the wheel.
Some custom scripts I use on 7050SX: https://github.com/piwanejko/Arista-monitoring-tools Nagios checks: CPU1 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006001'!'550'!'600' CPU1 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.1'!'70'!'90' CPU2 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006002'!'550'!'600' CPU2 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.2'!'70'!'90' CPU3 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006003'!'550'!'600' CPU3 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.3'!'70'!'90' CPU4 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006004'!'550'!'600' CPU4 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.4'!'70'!'90' Fan tray 1 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100601111'!''!'1' Fan tray 2 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100602111'!''!'1' Fan tray 3 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100603111'!''!'1' Fan tray 4 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100604111'!''!'1' Lower board temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006011'!'500'!'600' PSU1 fan status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100711211'!''!'1' PSU1 in current status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100711103'!''!'1' PSU1 in voltage status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100711105'!''!'1' PSU2 fan status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100721211'!''!'1' PSU2 in current status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100721103'!''!'1' PSU2 in voltage status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100721105'!''!'1' SUP temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006005'!'550'!'600' Upper board temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006009'!'500'!'600' Uptime check_snmp_sw!'2c'!'public'!'.1.3.6.1.2.1.1.3.0'!'@60000:70000'!'60000:' check_snmp_sw -> check_snmp -H $HOSTADDRESS$ -P $ARG1$ -C $ARG2$ -o $ARG3$ -w $ARG4$ -c $ARG5$ I also made custom script to check discs and memory utilization, but it's too old and terribly written to be shared. Best regards,
Bas, Arista EOS supports ENTITY-SENSOR-MIB and exposes temperature sensors, etc, via that MIB so you should be able to use any NAGIOS plugins that can pull ENTITY-SENSOR-MIB data for environmental monitoring. For example, https://exchange.nagios.org/directory/Plugins/Hardware/Others/check_ entPhySensorValue/details I haven't used that specific NAGIOS plugin myself -- it just turned up when I searched and looked like it would do the job. To find the index of the temp sensor(s) you want to monitor (e.g. CPU, back panel, front panel, etc) you can drop into a bash shell on your Arista switches and run something like "snmptable localhost ENTITY-MIB::entPhysicalTable" and look at the entPhysicalDescr column to see the available sensors. The actual sensor values are provided in ENTITY-SENSOR-MIB::entPhySensorTable. The indices in entPhySensorTable are constructed by adding entPhysicalContainedIn + entPhysicalParentRelPos. For example, on my switch I see a sensor named "Back-panel temp sensor" with entPhysicalContainedIn=1100006000 and entPhysicalParentRelPos=3 so the index into the ENTITY-SENSOR-MIB::entPhySensorTable would be 1100006000+3 = 1100006003: $ snmpwalk localhost ENTITY-SENSOR-MIB::entPhySensorTable |grep 100006003 ENTITY-SENSOR-MIB::entPhySensorType.100006003 = INTEGER: celsius(8) ENTITY-SENSOR-MIB::entPhySensorScale.100006003 = INTEGER: units(9) ENTITY-SENSOR-MIB::entPhySensorPrecision.100006003 = INTEGER: 1 ENTITY-SENSOR-MIB::entPhySensorValue.100006003 = INTEGER: 326 ENTITY-SENSOR-MIB::entPhySensorOperStatus.100006003 = INTEGER: ok(1) ENTITY-SENSOR-MIB::entPhySensorUnitsDisplay.100006003 = STRING: Celsius ENTITY-SENSOR-MIB::entPhySensorValueTimeStamp.100006003 = Timeticks: (1063007379) 123 days, 0:47:53.79 ENTITY-SENSOR-MIB::entPhySensorValueUpdateRate.100006003 = Gauge32: 5000 milliseconds The entPhySensorValue value of 326 means 32.6 degrees Celsius because entSensorPrecision=1 (meaning entPhySensorValue equals "degrees C times 10"). Nathan On Fri, May 19, 2017 at 1:08 PM, bas <kilobit@gmail.com> wrote:
participants (5)
-
Alain Hebert
-
bas
-
Nathan Schrenk
-
Piotr Iwanejko
-
Stephen Satchell