Moving to SMPTE ST2110 Series, part 2

This series started as good idea and it is, but the series is bad planned. So I will shorten this a bit. So to finish this I will just write all the posts and hope it will come out best way I can.

There has been some time since I wrote the last part. Why you my think? Here is a list of why:

  • 22 months old baby, parent leave etc. I don’t need any gym membership.
  • Kindergarden
  • Another thing is PPL-A, Private Pilot license, 850 pages of theoretical goodies I need to learn.
  • Building IoT and smart home solution from scratch, soldering, coding etc.
  • Work, and new job.
  • and som other stuff.

Sadly I have the same amount of time that everybody else has.

Little update about the project, we are now live on 2110 network and working “well”. I will explain more.

The Building blocks.

From network point of view there was two “challenges”, the design layout (Spine/leaf topology or monolithic) and vendor. So to tackle the second one I want to know the first one, the layout.

There are advantage and disadvantages on both, Monolithic with one big switch, in some cases chassis with line cards can work, but it would not scale right in our design thoughts. Another advantage is that is non-blocking architecture. Don’t have issue with uplinks. Disadvantages is that chassis are expensive and when you have filled it up then you need to by a larger one or if you don’t get a larger one, then I need to get one more.

To scale and a design that suits our needs we choose spine/leaf architecture. We have to deal with a blocking architecture and ensure that we always have enough uplinks and this would probably cost more for know.

The next thing was how are we gonna ensure that we have enough uplinks and ensure that traffic would be some how equally shard among the links. We looked at some solutions for this. There was Nevion VideoIPath, this is a SDN solution that I was not comfortable with and it does not support 2110, only 2022. Next we looked at was Cisco NBM, that one I liked, only disadvantage was DCNM, to slow for know. There was some other SDN solutions as well, but they did not appeal to me. We went back to the “simple” one, ECMP. More on that later.

Next BIG thing is PTP, here is not time endless but in sync. All streams needs to get the same time source. This is for example important to put video and sound together. Spoiler, we use Meinbergh as PTP source. To use PTP in a LAN is not an issue, to get it over a WAN and handle the delay is…. to put it nice, hell! More on PTP WAN later. The second thing that need to be aware of is to get switches that supports PTP.

For vendor part, there was two choice, Arista and Cisco. Why those two? Good question! My Dutch colleague have tried to do this on Juniper and did not get it to work. They tried for along time and have to give up, they went for Arista. There are Mellonox switches, FS switches etc. I wanted a vendor that I know I could trust, that have implemented similarly network before. Then I narrow it down to two vendor.

I did a comparisons between them, hade one-to-one talk to ask a lot of questions etc.

  • Cisco
    • Positiv
      • Have NBM
      • Own ASIC design
      • Known good TAC
      • TAC supports third party vendor of SFP
      • Good price (Was a little bit cheaper than the Arista)
      • RTP support (Oh yeah)
      • Full support to get telemetry streaming out via OpenConfig.
    • Negativ
      • DCNM (To slow)
  • Arista
    • Positiv
      • Can look to more Broadcast implementations than Cisco.
      • Good OS (EOS)
      • Really good SFP diagnostic
    • Negativ
      • Expensive
      • Not that good TAC. Difficult to get support if running third part SFP.
      • Use Broadcom ASIC.
      • To get all telemetry streams you need to use their expensive software.
      • No RTP support.

The are more, but this was just a quick overview. I will write a more depth article about the battle between Arista and Cisco for Broadcast and Media industry.

What we went for? Oh yes, Arista. Why? Again good question, the short answer, because I hade to. I would not elaborate this more for now, maybe later.

Now I hade this pieces in the puzzle.

Sorry for any misspellings.

Arista PHY Detail

Sooooo long since last time, why you my ask? Well, ehm, one word: baby!

Back to topic, I have this 2110 network running for a while, I have a blog post coming soon on that (part 2). I have some issues, a lot of CRC, Error, Symbol etc. This are coming and going. Such pain to troubleshoot.

So lets deep dive more on this. I have setup Telemetry for this and Prometheus are scraping this, the data are presented in Grafana for nice logs and alerting. That is fine and good, but now I started to get some alerts telling me that there are errors on some of the links. Grafana did not lie, the counters under the interface said the same thing.

Spine2#show int eth61/1
Ethernet61/1 is up, line protocol is up (connected)
  Hardware is Ethernet, address is 985d.821f.b84d
  Description: Leaf2 - Ethernet29/1
  IPv6 link-local address is fe80::9a5d:82ff:fe1f:b84d/64
  Address being determined by SLAAC
  No IPv6 global unicast address is assigned
  IP MTU 9194 bytes , BW 100000000 kbit
  Full-duplex, 100Gb/s, auto negotiation: off, uni-link: n/a
  Up 5 days, 19 hours, 42 minutes, 36 seconds
  Loopback Mode : None
  0 link status changes since last clear
  Last clearing of "show interface" counters 5 days, 19:40:27 ago
  30 seconds input rate 33.3 kbps (0.0% with framing overhead), 54 packets/sec
  30 seconds output rate 60.9 kbps (0.0% with framing overhead), 87 packets/sec
     173735316708 packets input, 217253077423021 bytes
     Received 3 broadcasts, 173710958446 multicast
     4929 runts, 0 giants
     22052 input errors, 17123 CRC, 0 alignment, 18120 symbol, 0 input discards
     0 PAUSE input
     72666851232 packets output, 91623591182571 bytes
     Sent 1 broadcasts, 72642461862 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output

The next thing here was to look at the dbm values for the QSFP, they looked good.

Spine2#show int eth61/1 transceiver 
If device is externally calibrated, only calibrated values are printed.
N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
                               Bias      Optical   Optical                
          Temp       Voltage   Current   Tx Power  Rx Power               
Port      (Celsius)  (Volts)   (mA)      (dBm)     (dBm)     Last Update  
-----     ---------  --------  --------  --------  --------  -------------------
Et61/1     41.68      3.27      35.53    1.81      0.23      0:00:03 ago

The next thing was to look at the other end,

Leaf2#show int eth29/1
 Ethernet29/1 is up, line protocol is up (connected)
  Hardware is Ethernet, address is fcbd.6786.4b2b
  Description: Spine2 - Ethernet61/1
  IPv6 link-local address is fe80::febd:67ff:fe86:4b2b/64
  Address being determined by SLAAC
  No IPv6 global unicast address is assigned
  IP MTU 9194 bytes , BW 100000000 kbit
  Full-duplex, 100Gb/s, auto negotiation: off, uni-link: n/a
  Up 6 days, 5 hours, 49 minutes, 59 seconds
  Loopback Mode : None
  8 link status changes since last clear
  Last clearing of "show interface" counters 21 days, 8:07:58 ago
  5 minutes input rate 14.3 Gbps (14.6% with framing overhead), 1420090 packets/sec
  5 minutes output rate 35.7 Gbps (36.3% with framing overhead), 3568027 packets/sec
     1673725141717 packets input, 2111527576144721 bytes
     Received 7 broadcasts, 1673613199109 multicast
     1 runts, 0 giants
     1 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
     0 PAUSE input
     3696743864070 packets output, 4621816202569007 bytes
     Sent 8 broadcasts, 3696632144932 multicast
     0 output errors, 0 collisions
     0 late collision, 0 deferred, 0 output discards
     0 PAUSE output
Leaf2#show int eth29/1 transceiver 
If device is externally calibrated, only calibrated values are printed.
N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
                               Bias      Optical   Optical                
          Temp       Voltage   Current   Tx Power  Rx Power               
Port      (Celsius)  (Volts)   (mA)      (dBm)     (dBm)     Last Update  
-----     ---------  --------  --------  --------  --------  -------------------
Et29/1     42.61      3.29      39.67    1.27      0.75      0:00:05 ago

Every thing looked create at this end.

I did do a clean of all fiber and QSFP, but stil the same. I change the QSFP at both ends and the same, did also recode the QSFP with Arista. No luck!

What to do when stuck? Ask my “friend” google. I came over a command:

Spine2#show int eth61/1 phy detail 
Current System Time: Wed Sep  2 18:39:32 2020
Ethernet61/1
                              Current State     Changes            Last Change
                              -------------     -------            -----------
  PHY state                   linkUp                 11    6 days, 5:54:15 ago
  Interface state             up                     10    6 days, 5:54:14 ago
  HW resets                                           0                  never
  Transceiver                 100GBASE-CWDM4          6    8 days, 8:28:43 ago
  Transceiver SN              XXXXX     
  Oper speed                  100Gbps                                         
  Interrupt Count                                    12                       
  Transceiver Reset Count                             0                  never
  Transceiver Interrupt Count                         2    8 days, 8:28:40 ago
  Transceiver Smbus Failures                          0                  never
  Diags mode                  normalOperation
  Model                       Bcm56971-tscf16 (A)
  Loopback                    none
  Xcvr EEPROM read timeout                            0                  never
  Spurious xcvr detection                             0                  never
  DOM control/status fail                             0                       
  Presence indication         xcvrPresent             8    8 days, 8:28:50 ago
  Bad EEPROM checksums                                0                  never
  RX_LOS since system boot    False                   0                  never
  RX_LOS since insertion                              0                       
  TX_FAULT since system boot  False                   0                  never
  TX_FAULT since insertion                            0                       
  PMA/PMD RX signal detect    ok                      1  128 days, 2:20:43 ago
  Forward Error Correction    Reed-Solomon
  Reed-Solomon codeword size  528
  FEC alignment lock          ok                      1    6 days, 5:54:18 ago
  FEC corrected codewords     7565               107992            0:00:03 ago
  FEC uncorrected codewords   0                       0                  never
  FEC lane corrected symbols  
    Lane 0                    0                       0                  never
    Lane 1                    0                       0                  never
    Lane 2                    7571               107992            0:00:03 ago
    Lane 3                    0                       0                  never
  FEC lane mapping            
    FEC lane                  00 01 02 03
    PMA lane                  00 01 02 03
  PCS RX link status          up                      9    6 days, 5:54:17 ago
  PCS high BER                ok                      0                  never
  PCS BER                     12                  42019    8 days, 8:29:57 ago
  PCS error blocks            0                            6 days, 5:54:23 ago
Lane 0
RXPPM CLK90 CLKP1 PF(M,L) VGA DCO DFE(1,2,3,4,5,6)        TXPPM TXEQ(n1,m,p1,2,3)
    0    32     1    0,0   12   2  13, -8,  1, -1,  0,  0     0  7, 68,25, 0, 0

Right here we can see that FEC have to correct on Lane 2 and the PCS BER is rising, this means that there are problems on layer 1. .

Overview

Spine2#show int eth61/1 phy detail
Current System Time: Wed Sep 2 18:39:32 2020
Ethernet61/1
Current State Changes Last Change
————- ——- ———–
PHY state linkUp 11 6 days, 5:54:15 ago
Interface state up 10 6 days, 5:54:14 ago
HW resets 0 never
Transceiver 100GBASE-CWDM4 6 8 days, 8:28:43 ago
Transceiver SN XXXXXX
Oper speed 100Gbps
Interrupt Count 12
Transceiver Reset Count 0 never
Transceiver Interrupt Count 2 8 days, 8:28:40 ago
Transceiver Smbus Failures 0 never
Diags mode normalOperation
Model Bcm56971-tscf16 (A)
Loopback none
Xcvr EEPROM read timeout 0 never
Spurious xcvr detection 0 never
DOM control/status fail 0
Presence indication xcvrPresent 8 8 days, 8:28:50 ago
Bad EEPROM checksums 0 never
RX_LOS since system boot False 0 never
RX_LOS since insertion 0
TX_FAULT since system boot False 0 never
TX_FAULT since insertion 0
PMA/PMD RX signal detect ok 1 128 days, 2:20:43 ago
Forward Error Correction Reed-Solomon
Reed-Solomon codeword size 528
FEC alignment lock ok 1 6 days, 5:54:18 ago
FEC corrected codewords 7565 107992 0:00:03 ago
FEC uncorrected codewords 0 0 never
FEC lane corrected symbols
Lane 0 0 0 never
Lane 1 0 0 never
Lane 2 7571 107992 0:00:03 ago
Lane 3 0 0 never
FEC lane mapping
FEC lane 00 01 02 03
PMA lane 00 01 02 03
PCS RX link status up 9 6 days, 5:54:17 ago
PCS high BER ok 0 never
PCS BER 12 42019 8 days, 8:29:57 ago
PCS error blocks 0 6 days, 5:54:23 ago
Lane 0
RXPPM CLK90 CLKP1 PF(M,L) VGA DCO DFE(1,2,3,4,5,6) TXPPM TXEQ(n1,m,p1,2,3)
0 32 1 0,0 12 2 13, -8, 1, -1, 0, 0 0 7, 68,25, 0, 0

Reed-Solomon

I will not rewrite wikipedia here so I just link to the article. https://en.wikipedia.org/wiki/Reed–Solomon_error_correction

Reed–Solomon codes operate on a block of data treated as a set of finite-field elements called symbols. Reed–Solomon codes are able to detect and correct multiple symbol errors. By adding t = n − k check symbols to the data, a Reed–Solomon code can detect (but not correct) any combination of up to and including t erroneous symbols, or locate and correct up to and including ⌊t/2⌋ erroneous symbols at unknown locations. As an erasure code, it can correct up to and including t erasures at locations that are known and provided to the algorithm, or it can detect and correct combinations of errors and erasures. Reed–Solomon codes are also suitable as multiple-burst bit-error correcting codes, since a sequence of b + 1 consecutive bit errors can affect at most two symbols of size b. The choice of t is up to the designer of the code and may be selected within wide limits.

There are two basic types of Reed–Solomon codes – original view and BCH view – with BCH view being the most common, as BCH view decoders are faster and require less working storage than original view decoders.

PCS

PCS is the Physical Coding Sublayer. The PCS layer defines how Ethernet frames are encoded into groups of bits called “blocks” that are transmitted via the PMA/PMD layer.

Summary

So if you see any increase of PCS and some of the Lane you can start to look at layer 1. That could be problem with QSFP or any of the cabling in between.

Traffic Loopback.

I found another command, traffic-loopback source system device mac under the interface. What this does is loop the traffic on the QSFP. What is the reason for that? The port on the side you put this on will go down then UP, the other side will go down. The next this is that EOS will send small amount of traffic that it sends out too QSFP and it will loop it back in to the port. The EOS trick it selfs to think it sending traffic and receive some traffic back.

What I used that for is to see if the QSFP is faulty or the physical port, if you then see PCS BER increasing then is not the cabling, it’s the QSFP or the port. If you change the QSFP and still see the same problem, create a TAC case with all this findings and let them also have a look.