I have wrote this on linkedin first, the url to the article is here:
One of our customers were in a transition phase and needed a new infrastructure to suit their needs. They were already very happy with Cisco as a network provider and decided to go all inn – HyperFlex with Nexus in the data center and Cisco DNA and SD-Access for clients.
SDA is the new Cisco network enterprise flagship combining Virtual Extensible Local Area Network (VXLAN), Scalable Group Tags (SGT) and Locator ID separator protocol (LISP). Together these technologies provide streamlined deployments, upgrades, a distributed firewall deployed at the edge, and more. This being the first SDA installation in Europe, I asked myself: “How could this possibly go wrong?”. Well knowing I would be playing with fire, I went in with an open mind.
Installation – Here we go!
SDA requires a few components – DNA Center, Cisco ISE, certain edge switches and an underlay.
With I started the project with one of my favorite parts – physical rack stacking and cabling of the DNA Center server. DNA Center comes installed on a Cisco UCS server. The only difference between this server and a regular one is the DNA sticker on it.
After all the “heavy” lifting the power supply was connected, and the server started up. Then it was time to consult the Cisco documentation. DNA Center comes with some predefined port and some other goodies. For those who might be interested in what they are, please refer to the official documentation.
As the server booted up, the regular Cisco UCS boot screen was shown. For a brief second, I almost forgot the CMIC configuration key-combination, but my fingers just hit the keyboard and I was able to bring it back on track. Next, I checked that the CMIC worked and I dragged myself self out from the server room – the fan noise and humming isn’t exactly my favorite kind of music.
I was then finally able to sit down somewhere, and I logged into the CMIC and KVM. From there I would try to boot the server itself, and not just the management tool of the server. The server booted, and the Maglev installation started, nothing fancy, though java client or flash was needed. I was ready to move forward to next step.
After having set the IP addresses and remove the Cluster link option, then I noticed something strange.
The First Challange
The port that held the IP address that I just configured was listed as disconnected in Maglev. I opened Secure CRT and logged into the core switch where DNA Center was connected. The port state was up/up. I checked the documentation, and I had chosen the right port and done the proper configurational steps.
I rebooted the server and Maglev came back up, but the problem was still there. I then changed the SFP and cable, booted it again, no luck, the problem was still there. I did all the configuration again, pushed enter and hoped the problems would vanish. Wrong this time as well, the configuration setup had stopped at DNS and NTP. It did not get a connection with these services. When all hope seemed lost, I rebooted the DNA Center one last time and the interface came up.
The rest of the Installation went fine, and I just went along with the Cisco Documentation.
Basic – the beginning of the edge.
After installation of the DNA Center, I needed to connect it to Cisco ISE. This was one of the easiest things I have done in my life. Just minor certificate configuration and it was ready to go. To verify that it was working I created more SGT’s on ISE to see that they populated on DNA Center.
The next step is to build the basic configuration, that is NTP, DNS, Radius etc. That was straightforward.
Building the underlay
The customer had moved into a new building and needed to start the migration to HyperFlex right away. Now there was a challenge: The way we planned to implement the underlay was with DNA Center, but the underlay had to be deployed before the DNA Center had arrived. We decided to deploy the core switches in a traditional fashion.
Then DNA Center arrived, and now I couldn´t take down the core switches to rebuild them with Cisco DNA from the bottom up. From that point things got challenging. I knew that some of my colleagues in Denmark had spent some time with Cisco DNA in the lab and I sent him an email asking for help. Being that great person that he is, he decided to help me out.
Underlay – Not everything goes on the top.
The fabric of SDA needs an underlay to transport the overlay VXLAN tunnels. With this design, it becomes much easier to deploy services as one only needs to extend the overlay. The underlay is only set up once, or when there is a need to scale out.
Core – Where all Tunnels
It is vital to follow the practices of SDA when building the underlay from scratch – do it as DNA Center would have done it. Luckily for me, my colleague in Denmark had already tested this in their lab, and the configuration was a standard layer 3 routed network with loopbacks. No more layer 2. We also needed a routing protocol and we landed on the ISIS Protocol. Nothing fancy here. I started with the core and made loopbacks and the links to a set of edge switches. Next is, of course, the ISIS configuration. The following lists some of the configuration needed at the core switches.
#Core 1 interface Loopback0 ip address 10.100.100.128 255.255.255.255 ! interface TenGigabitEthernet 1/1/7 description *** To Firewall - Underlay *** no switchport ip address 10.100.101.2 255.255.255.248 standby version 2 standby 0 ip 10.100.101.1 standby 0 preempt standby 0 priority 120 standby 0 track 1 decrement 30 standby 0 name HSRP-FW-UNDERLAY standby 0 password OfCoUrSeIcAnTgIvEtHaTaWaY ! interface TenGigabitEthernet1/1/8 description *** Edge Swithc 1 - Port TenGigabitEthernet1/0/8*** no switchport ip address 10.100.100.1 255.255.255.252 ip router isis isis network point-to-point ! router isis net 49.0001.0100.3204.7128.00 domain-password cisco metric-style wide nsf ietf passive-interface Loopback0 ! ip route 0.0.0.0 0.0.0.0 10.100.101.4 ! ! #Core 2 interface Loopback0 ip address 10.100.100.129 255.255.255.255 ! interface TenGigabitEthernet 2/1/7 description *** To Firewall - Underlay *** no switchport ip address 10.100.101.3 255.255.255.248 standby version 2 standby 0 ip 10.100.101.1 standby 0 preempt standby 0 priority 100 standby 0 track 1 decrement 30 standby 0 name HSRP-FW-UNDERLAY standby 0 password OfCoUrSeIcAnTgIvEtHaTaWaY ! interface TenGigabitEthernet2/1/8 description *** Edge Swithc 1 - Port TenGigabitEthernet2/0/8*** no switchport ip address 10.100.100.5 255.255.255.252 ip router isis isis network point-to-point ! router isis net 49.0001.0100.3204.7129.00 domain-password cisco metric-style wide nsf ietf passive-interface Loopback0 ! ip route 0.0.0.0 0.0.0.0 10.100.101.4 !
Edge – The Edge is Far Away
The next step is to configure the edge switches. To get this right, I had to configure one layer 2 and one layer Why the layer Because I did not want to run to the edge switches if something bad were to happen during the transition phase. It enabled me to use SSH through the management VLAN. Following is the basic edge switch configuration.
#Edge 1 - Stack interface Loopback0 ip address 10.100.100.130 255.255.255.255 ! interface TenGigabitEthernet1/1/8 no switchport ip address 10.100.100.2 255.255.255.252 ip router isis isis network point-to-point ! interface TenGigabitEthernet2/1/8 no switchport ip address 10.100.100.6218.104.22.168 ip router isis isis network point-to-point ! router isis net 49.0001.0100.3204.7130.00 domain-password cisco metric-style wide nsf ietf passive-interface Loopback0
With the underlay in place, we could finally start using DNA Center to provision the overlay.
Inventory – Where is everything?
Before DNA Center can start managing the overlay, it first needs to know what devices it needs to manage. Therefore, the next step was to have DNA Center discover the edge switches, then to configure some credentials and SNMP.
I also needed to add the IP address Pools that the Customer needed. That’s it, we could finally start provisioning the switches.
Provisioning – Push it!
This was the GUI that meet us:
The process was quite simple. The goal here is to provide basic configuration, like NTP, SNMP, Radius etc. It worked without any hassle and took about five minutes.
After the provisioning was finished, I could log in to the switches with my Active Directory account. How cool is that? Below is a screenshot of the inventory list.
Fabric – This time it’s on the top.
Finally, we could start building the overlay. This was simple as well, we just added the core as Control Plan/Border and Edges as the edge. The picture below is taken during the initial overlay SDA process.
After setting the initial overlay config, we just had to add the correct network to the proper ports. This is where we faced the biggest challenge.
Some floors worked, and some didn’t. We tried to remove the fabric on the switches and add them back again. No joy. After having for quite some time we called Cisco TAC.
After two minutes I had a TAC Engineer on my ear and a WebEx session up and running with him. We for 30 minutes and after that, he asked me to hold the line. Suddenly we were four on the line. We for about two hours and after that, we found out that the traffic dropped at the Control Plan/Border node. We did see the traffic going out through the firewall and back down to the Control plan/Border node. From there the packets just disappeared. We used the built Wireshark capture tool on the switches to confirm this. The “funny” thing was that the PC got an IP address from DHCP, but all other traffic was dropped. After about one more hour, we stopped for the day.
The next day, the TAC Engineers had given up and we contacted the development team. Three more people connected to our Webex session. There I was, with four TAC engineers and three developers. I just laid back in the chair and watched them work. Then one of the developers said that they could not find any obvious mistakes. I took a break and talked to the customer. We decided to revert the underlay to a standard layer 2 and use it that way while the Cisco team worked on the case. That was not one of my proudest moments, but I understood the requirements of the customer. We removed the fabric and the provisioned configuration.
A few hours after we pulled the plug, I got an email from one of the developers. They had found the problem. The Catalyst 9500 a memory leak. They contacted the Catalyst team and they released a new software version to fix it. Unfortunately, we could not roll back immediately, as it had gotten late, and we had other issues to tend to.
The week after we found a maintenance window to upgrade the Catalyst 9500, and after some more days, I also found the time to test one of the edge switches. It worked. No traffic was being dropped. After we just needed to deploy the rest of the switches.
The impression that I am left with is that Cisco SDA and DNA Center is a really great product, and with time, will be the new standard in client access.
If you have any question or want to know more, don’t hesitate to send me a message on LinkedIn.
Special thanks to my colleague Alexander Aasen for help to write and proofreading this article.