Last week I attended E2EVC in Vienna, a non-commercial,virtualization community Event, and attended a session of Saša Mašić about an issue he had at a customer. In short, the problems where caused by a switch that couldn’t process IP packets because it overflowed, and therefore Ethernet flow control temporarily stopped the transmission of some attached devices. Although I’m no network expert I do know this might be a serious issue. After reading this article, you might agree.
If you’re looking for the PowerShell script, scroll to the bottom.
RxPause and TxPause
This week I was asked to troubleshoot performance issues at a customer and got informed that the core switches (Cisco Catalyst 4500) might be over capacity. A quick look in the flow control statistics (# show flowcontrol) indicated there where send numerous RxPause and TxPause packages, a sign Flow Control has been active.
Port Send FlowControl Receive FlowControl RxPause TxPause admin oper admin oper --------- -------- -------- -------- -------- ------- ------- Gi1/1 off off desired on 0 0 Gi1/2 off off desired off 0 0 Gi2/1 on on desired on 0 0 Gi2/2 on on desired on 72982 0 Gi2/3 on on desired on 0 0 Gi2/4 on on desired on 0 0 (...) Po1 Unsupp. Unsupp. desired off 0 0 Po2 Unsupp. Unsupp. desired off 0 0 Po3 Unsupp. Unsupp. desired off 0 315 Po4 Unsupp. Unsupp. desired off 386006 12387 (...)
Since the IOS (not to confuse with the OS on Apple devices) was recently upgraded the statistics where “fresh”. In order to troubleshoot I need to collect the metrics over a longer period from multiple (core) switches, I needed an automated solution.
The easiest (and most obvious) method would be to use SNMP to query the statistics. I found it the “easiest” way of finding the correct OID by “walking” the values (querying all exposed values) and match it with the values I found in the terminal session (# show flowcontrol).. In short, it wasn’t exposed.
I needed an automated solution that queried multiple switches and store to a file it so I can analyze the data later. Since it temporary I didn’t find it necessary to store it in a SQL database, a comma-separated-file (CSV) would be just fine (if you need to store it in SQL, contact me).
I’ve localized the ‘comma’ in the comma-separated-file since this is different in some countries. A Dutch operating system for instance has a semicolon.
The IOS on the core switches didn’t have cryptography so I was unable to use SSH, therefore I used telnet to connect to the switches.
Usage: FlowControl.ps1 –remoteHost [–remotePort] [–username] –password [-enablePassword] [-timeout]
- remoteHost : The host where to connect to (FQDN or IP)
- remotePort (optional): The port to connect to (default 23)
- username (optional) : The user to authenticate with (default: admin)
- password: The password to authenticate with
- enablePassword (optional): The enable password (default equal to password)
- timeout (optional): The timeout in seconds for each action (default 30)
The wheel is round, we all know that, so why should I reinvent it? I used a PowerShell script to do an SNMP telnet test from Glen Scales and worked from there. Appending data to a CSV file is by default not possible, so I’ve used the altered function from Dmitry Sotnikov.