Moving to SMPTE ST2110 Series, part 2

This series started as good idea and it is, but the series is bad planned. So I will shorten this a bit. So to finish this I will just write all the posts and hope it will come out best way I can.

There has been some time since I wrote the last part. Why you my think? Here is a list of why:

  • 22 months old baby, parent leave etc. I don’t need any gym membership.
  • Kindergarden
  • Another thing is PPL-A, Private Pilot license, 850 pages of theoretical goodies I need to learn.
  • Building IoT and smart home solution from scratch, soldering, coding etc.
  • Work, and new job.
  • and som other stuff.

Sadly I have the same amount of time that everybody else has.

Little update about the project, we are now live on 2110 network and working “well”. I will explain more.

The Building blocks.

From network point of view there was two “challenges”, the design layout (Spine/leaf topology or monolithic) and vendor. So to tackle the second one I want to know the first one, the layout.

There are advantage and disadvantages on both, Monolithic with one big switch, in some cases chassis with line cards can work, but it would not scale right in our design thoughts. Another advantage is that is non-blocking architecture. Don’t have issue with uplinks. Disadvantages is that chassis are expensive and when you have filled it up then you need to by a larger one or if you don’t get a larger one, then I need to get one more.

To scale and a design that suits our needs we choose spine/leaf architecture. We have to deal with a blocking architecture and ensure that we always have enough uplinks and this would probably cost more for know.

The next thing was how are we gonna ensure that we have enough uplinks and ensure that traffic would be some how equally shard among the links. We looked at some solutions for this. There was Nevion VideoIPath, this is a SDN solution that I was not comfortable with and it does not support 2110, only 2022. Next we looked at was Cisco NBM, that one I liked, only disadvantage was DCNM, to slow for know. There was some other SDN solutions as well, but they did not appeal to me. We went back to the “simple” one, ECMP. More on that later.

Next BIG thing is PTP, here is not time endless but in sync. All streams needs to get the same time source. This is for example important to put video and sound together. Spoiler, we use Meinbergh as PTP source. To use PTP in a LAN is not an issue, to get it over a WAN and handle the delay is…. to put it nice, hell! More on PTP WAN later. The second thing that need to be aware of is to get switches that supports PTP.

For vendor part, there was two choice, Arista and Cisco. Why those two? Good question! My Dutch colleague have tried to do this on Juniper and did not get it to work. They tried for along time and have to give up, they went for Arista. There are Mellonox switches, FS switches etc. I wanted a vendor that I know I could trust, that have implemented similarly network before. Then I narrow it down to two vendor.

I did a comparisons between them, hade one-to-one talk to ask a lot of questions etc.

  • Cisco
    • Positiv
      • Have NBM
      • Own ASIC design
      • Known good TAC
      • TAC supports third party vendor of SFP
      • Good price (Was a little bit cheaper than the Arista)
      • RTP support (Oh yeah)
      • Full support to get telemetry streaming out via OpenConfig.
    • Negativ
      • DCNM (To slow)
  • Arista
    • Positiv
      • Can look to more Broadcast implementations than Cisco.
      • Good OS (EOS)
      • Really good SFP diagnostic
    • Negativ
      • Expensive
      • Not that good TAC. Difficult to get support if running third part SFP.
      • Use Broadcom ASIC.
      • To get all telemetry streams you need to use their expensive software.
      • No RTP support.

The are more, but this was just a quick overview. I will write a more depth article about the battle between Arista and Cisco for Broadcast and Media industry.

What we went for? Oh yes, Arista. Why? Again good question, the short answer, because I hade to. I would not elaborate this more for now, maybe later.

Now I hade this pieces in the puzzle.

Sorry for any misspellings.

Reflections on the First Implementation of Cisco SD-Access with DNA Center in Europe

I have wrote this on linkedin first, the url to the article is here:
https://www.linkedin.com/pulse/reflections-first-implementation-cisco-sd-access-dna-center-pedersen-1/

One of our customers were in a transition phase and needed a new infrastructure to suit their needs. They were already very happy with Cisco as a network provider and decided to go all inn – HyperFlex with Nexus in the data center and Cisco DNA and SD-Access for clients.

SDA is the new Cisco network enterprise flagship combining Virtual Extensible Local Area Network (VXLAN), Scalable Group Tags (SGT) and Locator ID separator protocol (LISP). Together these technologies provide streamlined deployments, upgrades, a distributed firewall deployed at the edge, and more. This being the first SDA installation in Europe, I asked myself: “How could this possibly go wrong?”. Well knowing I would be playing with fire, I went in with an open mind.

Installation – Here we go!

SDA requires a few components – DNA Center, Cisco ISE, certain edge switches and an underlay.

 With I started the project with one of my favorite parts – physical rack stacking and cabling of the DNA Center server. DNA Center comes installed on a Cisco UCS server. The only difference between this server and a regular one is the DNA sticker on it.

After all the “heavy” lifting the power supply was connected, and the server started up. Then it was time to consult the Cisco documentation. DNA Center comes with some predefined port and some other goodies. For those who might be interested in what they are, please refer to the official documentation.

 As the server booted up, the regular Cisco UCS boot screen was shown. For a brief second, I almost forgot the CMIC configuration key-combination, but my fingers just hit the keyboard and I was able to bring it back on track. Next, I checked that the CMIC worked and I dragged myself self out from the server room – the fan noise and humming isn’t exactly my favorite kind of music.

I was then finally able to sit down somewhere, and I logged into the CMIC and KVM. From there I would try to boot the server itself, and not just the management tool of the server. The server booted, and the Maglev installation started, nothing fancy, though java client or flash was needed. I was ready to move forward to next step.

After having set the IP addresses and remove the Cluster link option, then I noticed something strange.

The First Challange

The port that held the IP address that I just configured was listed as disconnected in Maglev. I opened Secure CRT and logged into the core switch where DNA Center was connected. The port state was up/up. I checked the documentation, and I had chosen the right port and done the proper configurational steps.

I rebooted the server and Maglev came back up, but the problem was still there. I then changed the SFP and cable, booted it again, no luck, the problem was still there. I did all the configuration again, pushed enter and hoped the problems would vanish. Wrong this time as well, the configuration setup had stopped at DNS and NTP. It did not get a connection with these services. When all hope seemed lost, I rebooted the DNA Center one last time and the interface came up.

The rest of the Installation went fine, and I just went along with the Cisco Documentation.

Basic – the beginning of the edge.

After installation of the DNA Center, I needed to connect it to Cisco ISE. This was one of the easiest things I have done in my life. Just minor certificate configuration and it was ready to go. To verify that it was working I created more SGT’s on ISE to see that they populated on DNA Center.

The next step is to build the basic configuration, that is NTP, DNS, Radius etc. That was straightforward.

Building the underlay

The customer had moved into a new building and needed to start the migration to HyperFlex right away. Now there was a challenge: The way we planned to implement the underlay was with DNA Center, but the underlay had to be deployed before the DNA Center had arrived. We decided to deploy the core switches in a traditional fashion.

Then DNA Center arrived, and now I couldn´t take down the core switches to rebuild them with Cisco DNA from the bottom up. From that point things got challenging. I knew that some of my colleagues in Denmark had spent some time with Cisco DNA in the lab and I sent him an email asking for help. Being that great person that he is, he decided to help me out.

Underlay – Not everything goes on the top.

The fabric of SDA needs an underlay to transport the overlay VXLAN tunnels. With this design, it becomes much easier to deploy services as one only needs to extend the overlay. The underlay is only set up once, or when there is a need to scale out.

Core – Where all Tunnels

It is vital to follow the practices of SDA when building the underlay from scratch – do it as DNA Center would have done it. Luckily for me, my colleague in Denmark had already tested this in their lab, and the configuration was a standard layer 3 routed network with loopbacks. No more layer 2. We also needed a routing protocol and we landed on the ISIS Protocol. Nothing fancy here. I started with the core and made loopbacks and the links to a set of edge switches. Next is, of course, the ISIS configuration. The following lists some of the configuration needed at the core switches.

#Core 1
interface Loopback0
 ip address 10.100.100.128 255.255.255.255
!
interface TenGigabitEthernet 1/1/7
 description *** To Firewall - Underlay ***
 no switchport
 ip address 10.100.101.2 255.255.255.248
 standby version 2
 standby 0 ip 10.100.101.1
 standby 0 preempt
 standby 0 priority 120
 standby 0 track 1 decrement 30
 standby 0 name HSRP-FW-UNDERLAY
 standby 0 password OfCoUrSeIcAnTgIvEtHaTaWaY
!
interface TenGigabitEthernet1/1/8
 description *** Edge Swithc 1 - Port TenGigabitEthernet1/0/8***
 no switchport
 ip address 10.100.100.1 255.255.255.252
 ip router isis
 isis network point-to-point
!
router isis
 net 49.0001.0100.3204.7128.00
 domain-password cisco
 metric-style wide
 nsf ietf
 passive-interface Loopback0
!
ip route 0.0.0.0 0.0.0.0 10.100.101.4 
!
!
#Core 2
interface Loopback0
 ip address 10.100.100.129 255.255.255.255
!
interface TenGigabitEthernet 2/1/7
 description *** To Firewall - Underlay ***
 no switchport
 ip address 10.100.101.3 255.255.255.248
 standby version 2
 standby 0 ip 10.100.101.1
 standby 0 preempt
 standby 0 priority 100
 standby 0 track 1 decrement 30
 standby 0 name HSRP-FW-UNDERLAY
 standby 0 password OfCoUrSeIcAnTgIvEtHaTaWaY
!
interface TenGigabitEthernet2/1/8
 description *** Edge Swithc 1 - Port TenGigabitEthernet2/0/8***
 no switchport
 ip address 10.100.100.5 255.255.255.252
 ip router isis
 isis network point-to-point
!
router isis
 net 49.0001.0100.3204.7129.00
 domain-password cisco
 metric-style wide
 nsf ietf
 passive-interface Loopback0
!
ip route 0.0.0.0 0.0.0.0 10.100.101.4
!

Edge – The Edge is Far Away

The next step is to configure the edge switches. To get this right, I had to configure one layer 2 and one layer Why the layer Because I did not want to run to the edge switches if something bad were to happen during the transition phase. It enabled me to use SSH through the management VLAN. Following is the basic edge switch configuration.

#Edge 1 - Stack
interface Loopback0
 ip address 10.100.100.130 255.255.255.255
!
interface TenGigabitEthernet1/1/8
 no switchport
 ip address 10.100.100.2 255.255.255.252
 ip router isis
 isis network point-to-point
!
interface TenGigabitEthernet2/1/8
 no switchport
 ip address 10.100.100.6255.255.255.252
 ip router isis
 isis network point-to-point
!
router isis
 net 49.0001.0100.3204.7130.00
 domain-password cisco
 metric-style wide
 nsf ietf
 passive-interface Loopback0

With the underlay in place, we could finally start using DNA Center to provision the overlay.

Inventory – Where is everything?

Before DNA Center can start managing the overlay, it first needs to know what devices it needs to manage. Therefore, the next step was to have DNA Center discover the edge switches, then to configure some credentials and SNMP.

I also needed to add the IP address Pools that the Customer needed. That’s it, we could finally start provisioning the switches.

Provisioning – Push it!

This was the GUI that meet us:

The process was quite simple. The goal here is to provide basic configuration, like NTP, SNMP, Radius etc. It worked without any hassle and took about five minutes.

After the provisioning was finished, I could log in to the switches with my Active Directory account. How cool is that? Below is a screenshot of the inventory list.

Fabric – This time it’s on the top.

Finally, we could start building the overlay. This was simple as well, we just added the core as Control Plan/Border and Edges as the edge. The picture below is taken during the initial overlay SDA process.

After setting the initial overlay config, we just had to add the correct network to the proper ports. This is where we faced the biggest challenge.

Some floors worked, and some didn’t. We tried to remove the fabric on the switches and add them back again. No joy. After having for quite some time we called Cisco TAC.

After two minutes I had a TAC Engineer on my ear and a WebEx session up and running with him. We for 30 minutes and after that, he asked me to hold the line. Suddenly we were four on the line. We for about two hours and after that, we found out that the traffic dropped at the Control Plan/Border node. We did see the traffic going out through the firewall and back down to the Control plan/Border node. From there the packets just disappeared. We used the built Wireshark capture tool on the switches to confirm this. The “funny” thing was that the PC got an IP address from DHCP, but all other traffic was dropped. After about one more hour, we stopped for the day.

 The next day, the TAC Engineers had given up and we contacted the development team. Three more people connected to our Webex session. There I was, with four TAC engineers and three developers. I just laid back in the chair and watched them work. Then one of the developers said that they could not find any obvious mistakes. I took a break and talked to the customer. We decided to revert the underlay to a standard layer 2 and use it that way while the Cisco team worked on the case. That was not one of my proudest moments, but I understood the requirements of the customer. We removed the fabric and the provisioned configuration.

 A few hours after we pulled the plug, I got an email from one of the developers. They had found the problem. The Catalyst 9500 a memory leak. They contacted the Catalyst team and they released a new software version to fix it. Unfortunately, we could not roll back immediately, as it had gotten late, and we had other issues to tend to.

The week after we found a maintenance window to upgrade the Catalyst 9500, and after some more days, I also found the time to test one of the edge switches. It worked. No traffic was being dropped. After we just needed to deploy the rest of the switches.

 The impression that I am left with is that Cisco SDA and DNA Center is a really great product, and with time, will be the new standard in client access.

 If you have any question or want to know more, don’t hesitate to send me a message on LinkedIn.

Special thanks to my colleague Alexander Aasen for help to write and proofreading this article.