Arista PHY Detail

Sooooo long since last time, why you my ask? Well, ehm, one word: baby!

Back to topic, I have this 2110 network running for a while, I have a blog post coming soon on that (part 2). I have some issues, a lot of CRC, Error, Symbol etc. This are coming and going. Such pain to troubleshoot.

So lets deep dive more on this. I have setup Telemetry for this and Prometheus are scraping this, the data are presented in Grafana for nice logs and alerting. That is fine and good, but now I started to get some alerts telling me that there are errors on some of the links. Grafana did not lie, the counters under the interface said the same thing.

Spine2#show int eth61/1
Ethernet61/1 is up, line protocol is up (connected)
  Hardware is Ethernet, address is 985d.821f.b84d
  Description: Leaf2 - Ethernet29/1
  IPv6 link-local address is fe80::9a5d:82ff:fe1f:b84d/64
  Address being determined by SLAAC
  No IPv6 global unicast address is assigned
  IP MTU 9194 bytes , BW 100000000 kbit
  Full-duplex, 100Gb/s, auto negotiation: off, uni-link: n/a
  Up 5 days, 19 hours, 42 minutes, 36 seconds
  Loopback Mode : None
  0 link status changes since last clear
  Last clearing of "show interface" counters 5 days, 19:40:27 ago
  30 seconds input rate 33.3 kbps (0.0% with framing overhead), 54 packets/sec
  30 seconds output rate 60.9 kbps (0.0% with framing overhead), 87 packets/sec
     173735316708 packets input, 217253077423021 bytes
     Received 3 broadcasts, 173710958446 multicast
     4929 runts, 0 giants
     22052 input errors, 17123 CRC, 0 alignment, 18120 symbol, 0 input discards
     0 PAUSE input
     72666851232 packets output, 91623591182571 bytes
     Sent 1 broadcasts, 72642461862 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output

The next thing here was to look at the dbm values for the QSFP, they looked good.

Spine2#show int eth61/1 transceiver 
If device is externally calibrated, only calibrated values are printed.
N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
                               Bias      Optical   Optical                
          Temp       Voltage   Current   Tx Power  Rx Power               
Port      (Celsius)  (Volts)   (mA)      (dBm)     (dBm)     Last Update  
-----     ---------  --------  --------  --------  --------  -------------------
Et61/1     41.68      3.27      35.53    1.81      0.23      0:00:03 ago

The next thing was to look at the other end,

Leaf2#show int eth29/1
 Ethernet29/1 is up, line protocol is up (connected)
  Hardware is Ethernet, address is fcbd.6786.4b2b
  Description: Spine2 - Ethernet61/1
  IPv6 link-local address is fe80::febd:67ff:fe86:4b2b/64
  Address being determined by SLAAC
  No IPv6 global unicast address is assigned
  IP MTU 9194 bytes , BW 100000000 kbit
  Full-duplex, 100Gb/s, auto negotiation: off, uni-link: n/a
  Up 6 days, 5 hours, 49 minutes, 59 seconds
  Loopback Mode : None
  8 link status changes since last clear
  Last clearing of "show interface" counters 21 days, 8:07:58 ago
  5 minutes input rate 14.3 Gbps (14.6% with framing overhead), 1420090 packets/sec
  5 minutes output rate 35.7 Gbps (36.3% with framing overhead), 3568027 packets/sec
     1673725141717 packets input, 2111527576144721 bytes
     Received 7 broadcasts, 1673613199109 multicast
     1 runts, 0 giants
     1 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
     0 PAUSE input
     3696743864070 packets output, 4621816202569007 bytes
     Sent 8 broadcasts, 3696632144932 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output
Leaf2#show int eth29/1 transceiver 
If device is externally calibrated, only calibrated values are printed.
N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
                               Bias      Optical   Optical                
          Temp       Voltage   Current   Tx Power  Rx Power               
Port      (Celsius)  (Volts)   (mA)      (dBm)     (dBm)     Last Update  
-----     ---------  --------  --------  --------  --------  -------------------
Et29/1     42.61      3.29      39.67    1.27      0.75      0:00:05 ago

Every thing looked create at this end.

I did do a clean of all fiber and QSFP, but stil the same. I change the QSFP at both ends and the same, did also recode the QSFP with Arista. No luck!

What to do when stuck? Ask my “friend” google. I came over a command:

Spine2#show int eth61/1 phy detail 
Current System Time: Wed Sep  2 18:39:32 2020
Ethernet61/1
                              Current State     Changes            Last Change
                              -------------     -------            -----------
  PHY state                   linkUp                 11    6 days, 5:54:15 ago
  Interface state             up                     10    6 days, 5:54:14 ago
  HW resets                                           0                  never
  Transceiver                 100GBASE-CWDM4          6    8 days, 8:28:43 ago
  Transceiver SN              XXXXX     
  Oper speed                  100Gbps                                         
  Interrupt Count                                    12                       
  Transceiver Reset Count                             0                  never
  Transceiver Interrupt Count                         2    8 days, 8:28:40 ago
  Transceiver Smbus Failures                          0                  never
  Diags mode                  normalOperation
  Model                       Bcm56971-tscf16 (A)
  Loopback                    none
  Xcvr EEPROM read timeout                            0                  never
  Spurious xcvr detection                             0                  never
  DOM control/status fail                             0                       
  Presence indication         xcvrPresent             8    8 days, 8:28:50 ago
  Bad EEPROM checksums                                0                  never
  RX_LOS since system boot    False                   0                  never
  RX_LOS since insertion                              0                       
  TX_FAULT since system boot  False                   0                  never
  TX_FAULT since insertion                            0                       
  PMA/PMD RX signal detect    ok                      1  128 days, 2:20:43 ago
  Forward Error Correction    Reed-Solomon
  Reed-Solomon codeword size  528
  FEC alignment lock          ok                      1    6 days, 5:54:18 ago
  FEC corrected codewords     7565               107992            0:00:03 ago
  FEC uncorrected codewords   0                       0                  never
  FEC lane corrected symbols  
    Lane 0                    0                       0                  never
    Lane 1                    0                       0                  never
    Lane 2                    7571               107992            0:00:03 ago
    Lane 3                    0                       0                  never
  FEC lane mapping            
    FEC lane                  00 01 02 03
    PMA lane                  00 01 02 03
  PCS RX link status          up                      9    6 days, 5:54:17 ago
  PCS high BER                ok                      0                  never
  PCS BER                     12                  42019    8 days, 8:29:57 ago
  PCS error blocks            0                            6 days, 5:54:23 ago
Lane 0
RXPPM CLK90 CLKP1 PF(M,L) VGA DCO DFE(1,2,3,4,5,6)        TXPPM TXEQ(n1,m,p1,2,3)
    0    32     1    0,0   12   2  13, -8,  1, -1,  0,  0     0  7, 68,25, 0, 0

Right here we can see that FEC have to correct on Lane 2 and the PCS BER is rising, this means that there are problems on layer 1. .

Overview

Spine2#show int eth61/1 phy detail
Current System Time: Wed Sep 2 18:39:32 2020
Ethernet61/1
Current State Changes Last Change
————- ——- ———–
PHY state linkUp 11 6 days, 5:54:15 ago
Interface state up 10 6 days, 5:54:14 ago
HW resets 0 never
Transceiver 100GBASE-CWDM4 6 8 days, 8:28:43 ago
Transceiver SN XXXXXX
Oper speed 100Gbps
Interrupt Count 12
Transceiver Reset Count 0 never
Transceiver Interrupt Count 2 8 days, 8:28:40 ago
Transceiver Smbus Failures 0 never
Diags mode normalOperation
Model Bcm56971-tscf16 (A)
Loopback none
Xcvr EEPROM read timeout 0 never
Spurious xcvr detection 0 never
DOM control/status fail 0
Presence indication xcvrPresent 8 8 days, 8:28:50 ago
Bad EEPROM checksums 0 never
RX_LOS since system boot False 0 never
RX_LOS since insertion 0
TX_FAULT since system boot False 0 never
TX_FAULT since insertion 0
PMA/PMD RX signal detect ok 1 128 days, 2:20:43 ago
Forward Error Correction Reed-Solomon
Reed-Solomon codeword size 528
FEC alignment lock ok 1 6 days, 5:54:18 ago
FEC corrected codewords 7565 107992 0:00:03 ago
FEC uncorrected codewords 0 0 never
FEC lane corrected symbols
Lane 0 0 0 never
Lane 1 0 0 never
Lane 2 7571 107992 0:00:03 ago
Lane 3 0 0 never
FEC lane mapping
FEC lane 00 01 02 03
PMA lane 00 01 02 03
PCS RX link status up 9 6 days, 5:54:17 ago
PCS high BER ok 0 never
PCS BER 12 42019 8 days, 8:29:57 ago
PCS error blocks 0 6 days, 5:54:23 ago
Lane 0
RXPPM CLK90 CLKP1 PF(M,L) VGA DCO DFE(1,2,3,4,5,6) TXPPM TXEQ(n1,m,p1,2,3)
0 32 1 0,0 12 2 13, -8, 1, -1, 0, 0 0 7, 68,25, 0, 0

Reed-Solomon

I will not rewrite wikipedia here so I just link to the article. https://en.wikipedia.org/wiki/Reed–Solomon_error_correction

Reed–Solomon codes operate on a block of data treated as a set of finite-field elements called symbols. Reed–Solomon codes are able to detect and correct multiple symbol errors. By adding t = n − k check symbols to the data, a Reed–Solomon code can detect (but not correct) any combination of up to and including t erroneous symbols, or locate and correct up to and including ⌊t/2⌋ erroneous symbols at unknown locations. As an erasure code, it can correct up to and including t erasures at locations that are known and provided to the algorithm, or it can detect and correct combinations of errors and erasures. Reed–Solomon codes are also suitable as multiple-burst bit-error correcting codes, since a sequence of b + 1 consecutive bit errors can affect at most two symbols of size b. The choice of t is up to the designer of the code and may be selected within wide limits.

There are two basic types of Reed–Solomon codes – original view and BCH view – with BCH view being the most common, as BCH view decoders are faster and require less working storage than original view decoders.

PCS

PCS is the Physical Coding Sublayer. The PCS layer defines how Ethernet frames are encoded into groups of bits called “blocks” that are transmitted via the PMA/PMD layer.

Summary

So if you see any increase of PCS and some of the Lane you can start to look at layer 1. That could be problem with QSFP or any of the cabling in between.

Traffic Loopback.

I found another command, traffic-loopback source system device mac under the interface. What this does is loop the traffic on the QSFP. What is the reason for that? The port on the side you put this on will go down then UP, the other side will go down. The next this is that EOS will send small amount of traffic that it sends out too QSFP and it will loop it back in to the port. The EOS trick it selfs to think it sending traffic and receive some traffic back.

What I used that for is to see if the QSFP is faulty or the physical port, if you then see PCS BER increasing then is not the cabling, it’s the QSFP or the port. If you change the QSFP and still see the same problem, create a TAC case with all this findings and let them also have a look.

Thoughts on SMPTE 2110 from the Network side.

This is kind of a part 1.1 or what you wanna call it. Before starting on the technical side of 2110 I would just share my thoughts and experience so far on 2110 in a IP Core.

For now I have a full 2110 network up and running, its not yet in production, but it will be soon. To be simple here in the end it’s just multicast traffic that I need to transport, but there are som challenges here that we need to address.

I am not a broadcast engineer, I am a Network Engineer, so the broadcast side I am still a noob compare to many other I work with. I try to learn the broadcast side, but this would take som time. There are a lot of pieces that are put together to make this puzzle work.

One of the challenges to move forward to IP here is the broadcast vendors does not understand network on that level they should. Same goes for the network vendors, they are not understanding or have the knowledge on have end customers are using the broadcast equipment in a production. Many relays on the standards that are out there, in the network world we think about this as multicast streams that we need to handle. The challenge here is that PIM is not bandwidth aware, so we relays on ECMP to load balance for us. This is not a perfect solution, but it does work.

Another thing that scares me is that many vendor and broadcast people are designing this as layer 2 network. As I wrote in part 1, that is a big NO NO! Yes, it’s simple to understand and maybe to setup, but handling it when the network grows is pain, just handling Spanning-tree here and control it can be huge task and in some cases impossible..

There are solution out there that have made som control software to handle this, but I am not a fan of taking the control plane out of the hardware. If that controller goes down you loose control or in some cases the network and a control plane on the switch is usual faster then when its running on a server or another hardware somewhere. I know some vendor or persons are not agree with me on this, but this is just my experience. Another thing here also is that you can loose feature that are in the software on the network equipment because the control software does not supports it yet and you can not manually configure the switch through CLI because the control software does not like this.

So a solution in between here I think is great, a broadcast software that can do some manual multicast stuff and handling the endpoints is fine and don’t touch the routing. We still have the issue on the network side then, bandwidth ! In this case ECMP is only option, Cisco have NBM (None Blocking Multicast) that uses Policy map to handle load sharing between links and it’s the local switch that handles this. This address the bandwidth issue somehow, but it’s not that smart. I have not tested it my self, so will be careful to say anything about it. I would test it soon.

After some more experience I would write more of this thought post.

The list can go on, but I think I leave it there for now. Maybe add some more thoughts in the future. This is just my thoughts I wrote down on a late night after I put my 4 months baby to bed and hade som time to spare. Sorry for any abbreviations. I would also like to here anyone else opinion on this, I have not enable comments here yet, but soon. In meanwhile contact me on LinkedIn or comment on the post where I linked this article.

Moving to IP Series, part 1

FOTO: BENJAMIN A. WARD / NEP

Now I work as a Senior Network Engineer at NEP Norway. NEP is a Live media/broadcast company. We are doing live television production and other television production. We have a variety of customers in different areas, like sports, entertainment etc. In NEP Norway we have different departments, Broadcast where I work, Media Services.

Now we are starting in Norway to move the production over to IP. This is a project that has lastet for sometime. First fase was of course define the scope of the project and if we wanted to go all IP for everything. The first step is to move some part of it over, Special the MCR.

The next section in this project for me is to look at what requirements from the broadcast side was. I needed to read a lot about SMPTE ST2110 standard, those was not familiar to me. More on that later in a another post. After that I looked at what others have done, inside the organizations and other broadcast/media companies.

There are some debates if you should go for Layer 2 (NO!!!!!) network or Layer 3 network. The majority of the people talk about Layer 2 does not know enough about networking. It’s a “easy” approach to get thing up and running, but does not scale. You have to deal with Spanning-tree issues, large broadcast domains etc.