Arista PHY Detail

Sooooo long since last time, why you my ask? Well, ehm, one word: baby!

Back to topic, I have this 2110 network running for a while, I have a blog post coming soon on that (part 2). I have some issues, a lot of CRC, Error, Symbol etc. This are coming and going. Such pain to troubleshoot.

So lets deep dive more on this. I have setup Telemetry for this and Prometheus are scraping this, the data are presented in Grafana for nice logs and alerting. That is fine and good, but now I started to get some alerts telling me that there are errors on some of the links. Grafana did not lie, the counters under the interface said the same thing.

Spine2#show int eth61/1
Ethernet61/1 is up, line protocol is up (connected)
  Hardware is Ethernet, address is 985d.821f.b84d
  Description: Leaf2 - Ethernet29/1
  IPv6 link-local address is fe80::9a5d:82ff:fe1f:b84d/64
  Address being determined by SLAAC
  No IPv6 global unicast address is assigned
  IP MTU 9194 bytes , BW 100000000 kbit
  Full-duplex, 100Gb/s, auto negotiation: off, uni-link: n/a
  Up 5 days, 19 hours, 42 minutes, 36 seconds
  Loopback Mode : None
  0 link status changes since last clear
  Last clearing of "show interface" counters 5 days, 19:40:27 ago
  30 seconds input rate 33.3 kbps (0.0% with framing overhead), 54 packets/sec
  30 seconds output rate 60.9 kbps (0.0% with framing overhead), 87 packets/sec
     173735316708 packets input, 217253077423021 bytes
     Received 3 broadcasts, 173710958446 multicast
     4929 runts, 0 giants
     22052 input errors, 17123 CRC, 0 alignment, 18120 symbol, 0 input discards
     0 PAUSE input
     72666851232 packets output, 91623591182571 bytes
     Sent 1 broadcasts, 72642461862 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output

The next thing here was to look at the dbm values for the QSFP, they looked good.

Spine2#show int eth61/1 transceiver 
If device is externally calibrated, only calibrated values are printed.
N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
                               Bias      Optical   Optical                
          Temp       Voltage   Current   Tx Power  Rx Power               
Port      (Celsius)  (Volts)   (mA)      (dBm)     (dBm)     Last Update  
-----     ---------  --------  --------  --------  --------  -------------------
Et61/1     41.68      3.27      35.53    1.81      0.23      0:00:03 ago

The next thing was to look at the other end,

Leaf2#show int eth29/1
 Ethernet29/1 is up, line protocol is up (connected)
  Hardware is Ethernet, address is fcbd.6786.4b2b
  Description: Spine2 - Ethernet61/1
  IPv6 link-local address is fe80::febd:67ff:fe86:4b2b/64
  Address being determined by SLAAC
  No IPv6 global unicast address is assigned
  IP MTU 9194 bytes , BW 100000000 kbit
  Full-duplex, 100Gb/s, auto negotiation: off, uni-link: n/a
  Up 6 days, 5 hours, 49 minutes, 59 seconds
  Loopback Mode : None
  8 link status changes since last clear
  Last clearing of "show interface" counters 21 days, 8:07:58 ago
  5 minutes input rate 14.3 Gbps (14.6% with framing overhead), 1420090 packets/sec
  5 minutes output rate 35.7 Gbps (36.3% with framing overhead), 3568027 packets/sec
     1673725141717 packets input, 2111527576144721 bytes
     Received 7 broadcasts, 1673613199109 multicast
     1 runts, 0 giants
     1 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
     0 PAUSE input
     3696743864070 packets output, 4621816202569007 bytes
     Sent 8 broadcasts, 3696632144932 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output
Leaf2#show int eth29/1 transceiver 
If device is externally calibrated, only calibrated values are printed.
N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
                               Bias      Optical   Optical                
          Temp       Voltage   Current   Tx Power  Rx Power               
Port      (Celsius)  (Volts)   (mA)      (dBm)     (dBm)     Last Update  
-----     ---------  --------  --------  --------  --------  -------------------
Et29/1     42.61      3.29      39.67    1.27      0.75      0:00:05 ago

Every thing looked create at this end.

I did do a clean of all fiber and QSFP, but stil the same. I change the QSFP at both ends and the same, did also recode the QSFP with Arista. No luck!

What to do when stuck? Ask my “friend” google. I came over a command:

Spine2#show int eth61/1 phy detail 
Current System Time: Wed Sep  2 18:39:32 2020
Ethernet61/1
                              Current State     Changes            Last Change
                              -------------     -------            -----------
  PHY state                   linkUp                 11    6 days, 5:54:15 ago
  Interface state             up                     10    6 days, 5:54:14 ago
  HW resets                                           0                  never
  Transceiver                 100GBASE-CWDM4          6    8 days, 8:28:43 ago
  Transceiver SN              XXXXX     
  Oper speed                  100Gbps                                         
  Interrupt Count                                    12                       
  Transceiver Reset Count                             0                  never
  Transceiver Interrupt Count                         2    8 days, 8:28:40 ago
  Transceiver Smbus Failures                          0                  never
  Diags mode                  normalOperation
  Model                       Bcm56971-tscf16 (A)
  Loopback                    none
  Xcvr EEPROM read timeout                            0                  never
  Spurious xcvr detection                             0                  never
  DOM control/status fail                             0                       
  Presence indication         xcvrPresent             8    8 days, 8:28:50 ago
  Bad EEPROM checksums                                0                  never
  RX_LOS since system boot    False                   0                  never
  RX_LOS since insertion                              0                       
  TX_FAULT since system boot  False                   0                  never
  TX_FAULT since insertion                            0                       
  PMA/PMD RX signal detect    ok                      1  128 days, 2:20:43 ago
  Forward Error Correction    Reed-Solomon
  Reed-Solomon codeword size  528
  FEC alignment lock          ok                      1    6 days, 5:54:18 ago
  FEC corrected codewords     7565               107992            0:00:03 ago
  FEC uncorrected codewords   0                       0                  never
  FEC lane corrected symbols  
    Lane 0                    0                       0                  never
    Lane 1                    0                       0                  never
    Lane 2                    7571               107992            0:00:03 ago
    Lane 3                    0                       0                  never
  FEC lane mapping            
    FEC lane                  00 01 02 03
    PMA lane                  00 01 02 03
  PCS RX link status          up                      9    6 days, 5:54:17 ago
  PCS high BER                ok                      0                  never
  PCS BER                     12                  42019    8 days, 8:29:57 ago
  PCS error blocks            0                            6 days, 5:54:23 ago
Lane 0
RXPPM CLK90 CLKP1 PF(M,L) VGA DCO DFE(1,2,3,4,5,6)        TXPPM TXEQ(n1,m,p1,2,3)
    0    32     1    0,0   12   2  13, -8,  1, -1,  0,  0     0  7, 68,25, 0, 0

Right here we can see that FEC have to correct on Lane 2 and the PCS BER is rising, this means that there are problems on layer 1. .

Overview

Spine2#show int eth61/1 phy detail
Current System Time: Wed Sep 2 18:39:32 2020
Ethernet61/1
Current State Changes Last Change
————- ——- ———–
PHY state linkUp 11 6 days, 5:54:15 ago
Interface state up 10 6 days, 5:54:14 ago
HW resets 0 never
Transceiver 100GBASE-CWDM4 6 8 days, 8:28:43 ago
Transceiver SN XXXXXX
Oper speed 100Gbps
Interrupt Count 12
Transceiver Reset Count 0 never
Transceiver Interrupt Count 2 8 days, 8:28:40 ago
Transceiver Smbus Failures 0 never
Diags mode normalOperation
Model Bcm56971-tscf16 (A)
Loopback none
Xcvr EEPROM read timeout 0 never
Spurious xcvr detection 0 never
DOM control/status fail 0
Presence indication xcvrPresent 8 8 days, 8:28:50 ago
Bad EEPROM checksums 0 never
RX_LOS since system boot False 0 never
RX_LOS since insertion 0
TX_FAULT since system boot False 0 never
TX_FAULT since insertion 0
PMA/PMD RX signal detect ok 1 128 days, 2:20:43 ago
Forward Error Correction Reed-Solomon
Reed-Solomon codeword size 528
FEC alignment lock ok 1 6 days, 5:54:18 ago
FEC corrected codewords 7565 107992 0:00:03 ago
FEC uncorrected codewords 0 0 never
FEC lane corrected symbols
Lane 0 0 0 never
Lane 1 0 0 never
Lane 2 7571 107992 0:00:03 ago
Lane 3 0 0 never
FEC lane mapping
FEC lane 00 01 02 03
PMA lane 00 01 02 03
PCS RX link status up 9 6 days, 5:54:17 ago
PCS high BER ok 0 never
PCS BER 12 42019 8 days, 8:29:57 ago
PCS error blocks 0 6 days, 5:54:23 ago
Lane 0
RXPPM CLK90 CLKP1 PF(M,L) VGA DCO DFE(1,2,3,4,5,6) TXPPM TXEQ(n1,m,p1,2,3)
0 32 1 0,0 12 2 13, -8, 1, -1, 0, 0 0 7, 68,25, 0, 0

Reed-Solomon

I will not rewrite wikipedia here so I just link to the article. https://en.wikipedia.org/wiki/Reed–Solomon_error_correction

Reed–Solomon codes operate on a block of data treated as a set of finite-field elements called symbols. Reed–Solomon codes are able to detect and correct multiple symbol errors. By adding t = n − k check symbols to the data, a Reed–Solomon code can detect (but not correct) any combination of up to and including t erroneous symbols, or locate and correct up to and including ⌊t/2⌋ erroneous symbols at unknown locations. As an erasure code, it can correct up to and including t erasures at locations that are known and provided to the algorithm, or it can detect and correct combinations of errors and erasures. Reed–Solomon codes are also suitable as multiple-burst bit-error correcting codes, since a sequence of b + 1 consecutive bit errors can affect at most two symbols of size b. The choice of t is up to the designer of the code and may be selected within wide limits.

There are two basic types of Reed–Solomon codes – original view and BCH view – with BCH view being the most common, as BCH view decoders are faster and require less working storage than original view decoders.

PCS

PCS is the Physical Coding Sublayer. The PCS layer defines how Ethernet frames are encoded into groups of bits called “blocks” that are transmitted via the PMA/PMD layer.

Summary

So if you see any increase of PCS and some of the Lane you can start to look at layer 1. That could be problem with QSFP or any of the cabling in between.

Traffic Loopback.

I found another command, traffic-loopback source system device mac under the interface. What this does is loop the traffic on the QSFP. What is the reason for that? The port on the side you put this on will go down then UP, the other side will go down. The next this is that EOS will send small amount of traffic that it sends out too QSFP and it will loop it back in to the port. The EOS trick it selfs to think it sending traffic and receive some traffic back.

What I used that for is to see if the QSFP is faulty or the physical port, if you then see PCS BER increasing then is not the cabling, it’s the QSFP or the port. If you change the QSFP and still see the same problem, create a TAC case with all this findings and let them also have a look.

Thoughts on SMPTE 2110 from the Network side.

This is kind of a part 1.1 or what you wanna call it. Before starting on the technical side of 2110 I would just share my thoughts and experience so far on 2110 in a IP Core.

For now I have a full 2110 network up and running, its not yet in production, but it will be soon. To be simple here in the end it’s just multicast traffic that I need to transport, but there are som challenges here that we need to address.

I am not a broadcast engineer, I am a Network Engineer, so the broadcast side I am still a noob compare to many other I work with. I try to learn the broadcast side, but this would take som time. There are a lot of pieces that are put together to make this puzzle work.

One of the challenges to move forward to IP here is the broadcast vendors does not understand network on that level they should. Same goes for the network vendors, they are not understanding or have the knowledge on have end customers are using the broadcast equipment in a production. Many relays on the standards that are out there, in the network world we think about this as multicast streams that we need to handle. The challenge here is that PIM is not bandwidth aware, so we relays on ECMP to load balance for us. This is not a perfect solution, but it does work.

Another thing that scares me is that many vendor and broadcast people are designing this as layer 2 network. As I wrote in part 1, that is a big NO NO! Yes, it’s simple to understand and maybe to setup, but handling it when the network grows is pain, just handling Spanning-tree here and control it can be huge task and in some cases impossible..

There are solution out there that have made som control software to handle this, but I am not a fan of taking the control plane out of the hardware. If that controller goes down you loose control or in some cases the network and a control plane on the switch is usual faster then when its running on a server or another hardware somewhere. I know some vendor or persons are not agree with me on this, but this is just my experience. Another thing here also is that you can loose feature that are in the software on the network equipment because the control software does not supports it yet and you can not manually configure the switch through CLI because the control software does not like this.

So a solution in between here I think is great, a broadcast software that can do some manual multicast stuff and handling the endpoints is fine and don’t touch the routing. We still have the issue on the network side then, bandwidth ! In this case ECMP is only option, Cisco have NBM (None Blocking Multicast) that uses Policy map to handle load sharing between links and it’s the local switch that handles this. This address the bandwidth issue somehow, but it’s not that smart. I have not tested it my self, so will be careful to say anything about it. I would test it soon.

After some more experience I would write more of this thought post.

The list can go on, but I think I leave it there for now. Maybe add some more thoughts in the future. This is just my thoughts I wrote down on a late night after I put my 4 months baby to bed and hade som time to spare. Sorry for any abbreviations. I would also like to here anyone else opinion on this, I have not enable comments here yet, but soon. In meanwhile contact me on LinkedIn or comment on the post where I linked this article.

Moving to IP Series, part 1

FOTO: BENJAMIN A. WARD / NEP

Now I work as a Senior Network Engineer at NEP Norway. NEP is a Live media/broadcast company. We are doing live television production and other television production. We have a variety of customers in different areas, like sports, entertainment etc. In NEP Norway we have different departments, Broadcast where I work, Media Services.

Now we are starting in Norway to move the production over to IP. This is a project that has lastet for sometime. First fase was of course define the scope of the project and if we wanted to go all IP for everything. The first step is to move some part of it over, Special the MCR.

The next section in this project for me is to look at what requirements from the broadcast side was. I needed to read a lot about SMPTE ST2110 standard, those was not familiar to me. More on that later in a another post. After that I looked at what others have done, inside the organizations and other broadcast/media companies.

There are some debates if you should go for Layer 2 (NO!!!!!) network or Layer 3 network. The majority of the people talk about Layer 2 does not know enough about networking. It’s a “easy” approach to get thing up and running, but does not scale. You have to deal with Spanning-tree issues, large broadcast domains etc.

Reflections on the First Implementation of Cisco SD-Access with DNA Center in Europe

I have wrote this on linkedin first, the url to the article is here:
https://www.linkedin.com/pulse/reflections-first-implementation-cisco-sd-access-dna-center-pedersen-1/

One of our customers were in a transition phase and needed a new infrastructure to suit their needs. They were already very happy with Cisco as a network provider and decided to go all inn – HyperFlex with Nexus in the data center and Cisco DNA and SD-Access for clients.

SDA is the new Cisco network enterprise flagship combining Virtual Extensible Local Area Network (VXLAN), Scalable Group Tags (SGT) and Locator ID separator protocol (LISP). Together these technologies provide streamlined deployments, upgrades, a distributed firewall deployed at the edge, and more. This being the first SDA installation in Europe, I asked myself: “How could this possibly go wrong?”. Well knowing I would be playing with fire, I went in with an open mind.

Installation – Here we go!

SDA requires a few components – DNA Center, Cisco ISE, certain edge switches and an underlay.

 With I started the project with one of my favorite parts – physical rack stacking and cabling of the DNA Center server. DNA Center comes installed on a Cisco UCS server. The only difference between this server and a regular one is the DNA sticker on it.

After all the “heavy” lifting the power supply was connected, and the server started up. Then it was time to consult the Cisco documentation. DNA Center comes with some predefined port and some other goodies. For those who might be interested in what they are, please refer to the official documentation.

 As the server booted up, the regular Cisco UCS boot screen was shown. For a brief second, I almost forgot the CMIC configuration key-combination, but my fingers just hit the keyboard and I was able to bring it back on track. Next, I checked that the CMIC worked and I dragged myself self out from the server room – the fan noise and humming isn’t exactly my favorite kind of music.

I was then finally able to sit down somewhere, and I logged into the CMIC and KVM. From there I would try to boot the server itself, and not just the management tool of the server. The server booted, and the Maglev installation started, nothing fancy, though java client or flash was needed. I was ready to move forward to next step.

After having set the IP addresses and remove the Cluster link option, then I noticed something strange.

The First Challange

The port that held the IP address that I just configured was listed as disconnected in Maglev. I opened Secure CRT and logged into the core switch where DNA Center was connected. The port state was up/up. I checked the documentation, and I had chosen the right port and done the proper configurational steps.

I rebooted the server and Maglev came back up, but the problem was still there. I then changed the SFP and cable, booted it again, no luck, the problem was still there. I did all the configuration again, pushed enter and hoped the problems would vanish. Wrong this time as well, the configuration setup had stopped at DNS and NTP. It did not get a connection with these services. When all hope seemed lost, I rebooted the DNA Center one last time and the interface came up.

The rest of the Installation went fine, and I just went along with the Cisco Documentation.

Basic – the beginning of the edge.

After installation of the DNA Center, I needed to connect it to Cisco ISE. This was one of the easiest things I have done in my life. Just minor certificate configuration and it was ready to go. To verify that it was working I created more SGT’s on ISE to see that they populated on DNA Center.

The next step is to build the basic configuration, that is NTP, DNS, Radius etc. That was straightforward.

Building the underlay

The customer had moved into a new building and needed to start the migration to HyperFlex right away. Now there was a challenge: The way we planned to implement the underlay was with DNA Center, but the underlay had to be deployed before the DNA Center had arrived. We decided to deploy the core switches in a traditional fashion.

Then DNA Center arrived, and now I couldn´t take down the core switches to rebuild them with Cisco DNA from the bottom up. From that point things got challenging. I knew that some of my colleagues in Denmark had spent some time with Cisco DNA in the lab and I sent him an email asking for help. Being that great person that he is, he decided to help me out.

Underlay – Not everything goes on the top.

The fabric of SDA needs an underlay to transport the overlay VXLAN tunnels. With this design, it becomes much easier to deploy services as one only needs to extend the overlay. The underlay is only set up once, or when there is a need to scale out.

Core – Where all Tunnels

It is vital to follow the practices of SDA when building the underlay from scratch – do it as DNA Center would have done it. Luckily for me, my colleague in Denmark had already tested this in their lab, and the configuration was a standard layer 3 routed network with loopbacks. No more layer 2. We also needed a routing protocol and we landed on the ISIS Protocol. Nothing fancy here. I started with the core and made loopbacks and the links to a set of edge switches. Next is, of course, the ISIS configuration. The following lists some of the configuration needed at the core switches.

#Core 1
interface Loopback0
 ip address 10.100.100.128 255.255.255.255
!
interface TenGigabitEthernet 1/1/7
 description *** To Firewall - Underlay ***
 no switchport
 ip address 10.100.101.2 255.255.255.248
 standby version 2
 standby 0 ip 10.100.101.1
 standby 0 preempt
 standby 0 priority 120
 standby 0 track 1 decrement 30
 standby 0 name HSRP-FW-UNDERLAY
 standby 0 password OfCoUrSeIcAnTgIvEtHaTaWaY
!
interface TenGigabitEthernet1/1/8
 description *** Edge Swithc 1 - Port TenGigabitEthernet1/0/8***
 no switchport
 ip address 10.100.100.1 255.255.255.252
 ip router isis
 isis network point-to-point
!
router isis
 net 49.0001.0100.3204.7128.00
 domain-password cisco
 metric-style wide
 nsf ietf
 passive-interface Loopback0
!
ip route 0.0.0.0 0.0.0.0 10.100.101.4 
!
!
#Core 2
interface Loopback0
 ip address 10.100.100.129 255.255.255.255
!
interface TenGigabitEthernet 2/1/7
 description *** To Firewall - Underlay ***
 no switchport
 ip address 10.100.101.3 255.255.255.248
 standby version 2
 standby 0 ip 10.100.101.1
 standby 0 preempt
 standby 0 priority 100
 standby 0 track 1 decrement 30
 standby 0 name HSRP-FW-UNDERLAY
 standby 0 password OfCoUrSeIcAnTgIvEtHaTaWaY
!
interface TenGigabitEthernet2/1/8
 description *** Edge Swithc 1 - Port TenGigabitEthernet2/0/8***
 no switchport
 ip address 10.100.100.5 255.255.255.252
 ip router isis
 isis network point-to-point
!
router isis
 net 49.0001.0100.3204.7129.00
 domain-password cisco
 metric-style wide
 nsf ietf
 passive-interface Loopback0
!
ip route 0.0.0.0 0.0.0.0 10.100.101.4
!

Edge – The Edge is Far Away

The next step is to configure the edge switches. To get this right, I had to configure one layer 2 and one layer Why the layer Because I did not want to run to the edge switches if something bad were to happen during the transition phase. It enabled me to use SSH through the management VLAN. Following is the basic edge switch configuration.

#Edge 1 - Stack
interface Loopback0
 ip address 10.100.100.130 255.255.255.255
!
interface TenGigabitEthernet1/1/8
 no switchport
 ip address 10.100.100.2 255.255.255.252
 ip router isis
 isis network point-to-point
!
interface TenGigabitEthernet2/1/8
 no switchport
 ip address 10.100.100.6255.255.255.252
 ip router isis
 isis network point-to-point
!
router isis
 net 49.0001.0100.3204.7130.00
 domain-password cisco
 metric-style wide
 nsf ietf
 passive-interface Loopback0

With the underlay in place, we could finally start using DNA Center to provision the overlay.

Inventory – Where is everything?

Before DNA Center can start managing the overlay, it first needs to know what devices it needs to manage. Therefore, the next step was to have DNA Center discover the edge switches, then to configure some credentials and SNMP.

I also needed to add the IP address Pools that the Customer needed. That’s it, we could finally start provisioning the switches.

Provisioning – Push it!

This was the GUI that meet us:

The process was quite simple. The goal here is to provide basic configuration, like NTP, SNMP, Radius etc. It worked without any hassle and took about five minutes.

After the provisioning was finished, I could log in to the switches with my Active Directory account. How cool is that? Below is a screenshot of the inventory list.

Fabric – This time it’s on the top.

Finally, we could start building the overlay. This was simple as well, we just added the core as Control Plan/Border and Edges as the edge. The picture below is taken during the initial overlay SDA process.

After setting the initial overlay config, we just had to add the correct network to the proper ports. This is where we faced the biggest challenge.

Some floors worked, and some didn’t. We tried to remove the fabric on the switches and add them back again. No joy. After having for quite some time we called Cisco TAC.

After two minutes I had a TAC Engineer on my ear and a WebEx session up and running with him. We for 30 minutes and after that, he asked me to hold the line. Suddenly we were four on the line. We for about two hours and after that, we found out that the traffic dropped at the Control Plan/Border node. We did see the traffic going out through the firewall and back down to the Control plan/Border node. From there the packets just disappeared. We used the built Wireshark capture tool on the switches to confirm this. The “funny” thing was that the PC got an IP address from DHCP, but all other traffic was dropped. After about one more hour, we stopped for the day.

 The next day, the TAC Engineers had given up and we contacted the development team. Three more people connected to our Webex session. There I was, with four TAC engineers and three developers. I just laid back in the chair and watched them work. Then one of the developers said that they could not find any obvious mistakes. I took a break and talked to the customer. We decided to revert the underlay to a standard layer 2 and use it that way while the Cisco team worked on the case. That was not one of my proudest moments, but I understood the requirements of the customer. We removed the fabric and the provisioned configuration.

 A few hours after we pulled the plug, I got an email from one of the developers. They had found the problem. The Catalyst 9500 a memory leak. They contacted the Catalyst team and they released a new software version to fix it. Unfortunately, we could not roll back immediately, as it had gotten late, and we had other issues to tend to.

The week after we found a maintenance window to upgrade the Catalyst 9500, and after some more days, I also found the time to test one of the edge switches. It worked. No traffic was being dropped. After we just needed to deploy the rest of the switches.

 The impression that I am left with is that Cisco SDA and DNA Center is a really great product, and with time, will be the new standard in client access.

 If you have any question or want to know more, don’t hesitate to send me a message on LinkedIn.

Special thanks to my colleague Alexander Aasen for help to write and proofreading this article.