David Ramsden – Network engineer, general geek, petrol + drum and bass head
7Feb/15

Cisco VSS, domain ID and virtual MAC addresses

The other weekend I connected a L2 circuit between two sites. At both ends were Cisco 6500 Catalyst switches running VSS. The interfaces they connected to were configured as L3 and EIGRP was run between the two sites to share routes. But as soon as they were connected the neighbors started flapping.

Troubleshooting started and as always you start at the lowest OSI layer and work up. Bingo! The issue was at Layer 2 as I could see ARP was incomplete on both sides for the neighbor addresses. Checking the MAC address for the interface the L2 circuit was connected to at site A and the MAC address for the interface the L2 circuit was connected to at site B showed the same MAC. How could this happen?

As mentioned in the first sentence both ends had a Cisco 6500 Catalyst switches running VSS. One of the first things you do when configuring VSS so set the switch virtual domain ID. Cisco recommend that you enable virtual MAC addresses (mac-address use-virtual) under the switch virtual domain. I'll explain why Cisco recommend this option. When when the first switch comes up, VSS uses the MAC address pool from that member and uses that pool across all L3 interfaces. This MAC address pool is maintained by VSS when one (and only one) switch is reloaded. But if the entire VSS is reloaded and the other switch happens to come up first the MAC address pool will change. This shouldn't be a huge deal but if there are any other devices out there that are ignoring gratuitous ARP they will require manual intervention to get them working which will cause further service disruption.

Hence Cisco recommend using mac-address use-virtual under the switch virtual domain ID. This ensures the same MAC address pool is used at all times. No exceptions. But the switch virtual domain ID is significant in determining the virtual MAC address pool. It's used in the formula to calculate this pool. As per the Cisco documentation:

The MAC address range reserved for the VSS is derived from a reserved pool of addresses with the domain ID encoded in the leading 6 bits of the last octet and trailing 2 bits of the previous octet of the mac-address. The last two bits of the first octet is allocated for protocol mac-address which is derived by adding the protocol ID (0 to 3) to the router MAC address.

When I checked both switches I found they both had a switch virtual domain ID of 10. Therefore the virtual MAC address on the L3 interfaces were both 0008.e3ff.fc28. We can use the formula to check this:

6th octet (28) to binary: 00101000
Remove trailing 2 bits: 001010
001010 (bin) to decimal: 10

But what are the options for fixing the problem where the MAC addresses are the same on both sides?

  1. On one side, under the L3 interface use mac-address H.H.H.H
  2. Change the switch virtual domain ID on one VSS - Possible to do but requires a complete outage as a VSS reload is required.
  3. Remove mac-address use-virtual from the switch virtual domain ID - Not recommended as discussed previously.

Option 1 seems like the most viable option but how do you guarantee the MAC address you manually assign is unique? Will there be issues in the future? If we pick an arbitrary number between 1 and 255 (switch virtual domain ?) we can then use the formula to calculate a "safe" MAC address as long as no one in the future connects a VSS with this arbitrary number as the switch virtual domain ID. I decided to choose 99.

Virtual MAC address with domain ID 10: 0008.e3ff.fc28
Last two octets (fc28) hex to binary: 1111110000101000
99 (dec) to binary: 01100011
Insert 01100011 after leading 6 bits and before trailing 2 bits: 1111110110001100
1111110110001100 (bin) to hex: fd8c

Hence if a switch virtual domain ID of 99 was used the virtual MAC address assigned to a L3 interface would be 0008.e3ff.fd8c.

Problem solved. It was just unfortunate that the switch virtual domain ID happened to be the same. No one ever saw the two sites needing to be connected this way. If you're deploying VSS in your organisation, work smart and use unique switch virtual domain IDs everywhere. If you happen to connect to a 3rd party first check if they're using VSS and if they are, check what they have as their switch virtual domain ID. If there's a conflict someone will need to manually set a MAC address on an interface.

29Dec/14

Root shell on DrayTek AP 800

The DrayTek AP 800 is a 2.4Ghz 802.11n Access Point with the ability to make it dual band, 2.4Ghz and 5Ghz, with an optional USB dongle. It supports multi-SSID with VLAN tagging, built in RADIUS server, per-SSID/station bandwidth control and can act as a bridge, repeater etc.

As with all of these SOHO products it'd built on Linux. Which means somewhere there is a root shell lurking.

The DrayTek AP 800 has telnet enabled out of the box. Establish a telnet connection and login as the admin user. You'll be dropped in to a restricted busybox shell. To make it slightly less restrictive type rddebug. This will let you use commands such as ps and echo.

Now spawn telnetd on a different port and invoke a full shell, with:

Telnet to the AP on port 2323 to be dropped in to a root shell.

This will likely also work with the AP 900 and the 2860 series. Leave a comment if you've tried it.

Tagged as: , , No Comments
25Oct/14

Show me the config!

Back in the days before stacks and VSS, doing a "show run" to look at something was easy. You pressed space a few times to page through the output and found the section you were interested in. Now when you've got a stack of several switches or switch running VSS with hundreds of interfaces you'll be pressing space forever and a day paging through the config.

In IOS you can pipe output to a selection of commands to filter output. Much like redirecting output using a pipe in UNIX or even Powershell.

Say for example you want to see the static routes in the running-config:

As with the above, regular expressions are acceptable too!

This is all well and good for commands such as "ip route" or "ntp" for example but what if you want to check an EIGRP config?

Oh. This isn't what we were expecting. This is because the EIGRP config is a section. Thankfully you can pipe to the "section" command:

Much easier than "sh run" and then paging through the output until you find what you're after.

And finally, if you're in configuration mode and want to check something with show, ping etc and don't want to have to drop back to priv exec mode, prefix your command with "do":

 

13Oct/14

Cisco Crypto ACLs – Do they really need to match?

When starting out with IPsec tunnels it seems to be a common misconception that the crypto ACL, sometimes referred to as the encryption domain or the interesting traffic, must match 100% or be mirrored at both peers or the tunnel won't come up. This isn't strictly true. Whilst the ISAKMP phase 1 and IPsec phase 2 proposals must match, the crypto ACL can be different.

Assume that at the local peer traffic to be encrypted originates from 10.0.0.0/24 and is destined for 192.168.0.0/24. The crypto ACL would be:

But what about the following?

IPsec phase 2 can still be established even though the crypto ACL isn't mirrored at the local and remove peer. The local peer specifies 10.0.0.0/24 but the remote peer specifies 10.0.0.0/8. In this scenario IPsec phase 2 can only be initiated from the peer that has the larger subnet. This is true for both Cisco ASA and IOS.

And in the example above, in the local peer's ACL there's a deny ACE but none on the remote peer's ACL. In this scenario any traffic originating on the local peer from 10.0.0.0/24 destined to 192.168.0.200/32 won't traverse the tunnel. The device (ASA or IOS router) will look at the next crypto map in the sequence and try to match traffic there. If no crypto maps are found it'll flow unencrypted out of the egress interface.

Obviously be careful with mismatching subnets and using deny ACEs in the crypto ACL because you may end up with traffic trying to enter the wrong tunnel and other strange things happening.

 

Tagged as: , , , , , , No Comments
9Sep/14

Reviewing Cisco ASA firewall rules

Today I was reviewing a firewall rule set on a Cisco ASA firewall. The firewall has around 399 ACLs (Access Control Lists) comprising of 7272 ACEs (Access Control Entries). Quite a task! Unfortunately I didn't have any tools to hand such as Cisco Security Manager or something like FirePac to audit the rules and give me some suggestions.

Stage 1 was to visually look at the ACLs and spot the obvious mistakes and remove them. Stage 2 was to then remove any unused names, objects and object-groups. I hacked up a Perl script to do this. The script reads the complete ASA config, gets all the names, objects and object-groups then works out which ones aren't referenced anywhere else:

Stage 3 was to work out which ACLs could be completely removed and which ACLs should be reviewed in more detail. If an ACL with or without ACEs, has a total of 0 hits it can (probably) be removed. If an ACL with ACEs has less than or equal to 100 hits it should be reviewed in more detail because the chances are some of the ACEs associated with it can be removed. A quick and dirty Perl script did the trick:

I found 181 ACLs that can be immediately removed and a further 16 to be reviewed. With an average of 18 ACEs per ACL, that equates to 3258 ACEs that can removed and 288 that may be able to be removed after a review.

By the end of this journey I should have reduce the rule set by at least 44.80%. After that the rule set just needs re-ordering to optimise the processing.

Tagged as: , , , , , 6 Comments
28Aug/14

Open curtains – VNC with no authentication

A few weeks ago I read an article about mass scanning the Internet for VNC servers that don't require authentication. Dubbed "open curtains" because it's like having your curtains open allowing anyone passing by to glance in. The person doesn't need to bypass any security in place or have a key to your door to get this access - it's open for everyone to take a peek inside.

To achieve this I used:

It's completely automated. No human interaction is required.

The nmap NSE script I wrote is as follows:

It's used with nmap to scan random targets:

The above continuously scans random addresses on TCP/5900 (the default VNC server port) and uses the open-curtains.nse script. The NSE script simply says "if the protocol is TCP and the port is 5900, execute the open-curtains.pl script passing the host IP and port number as arguments". The open-curtains.pl script is run in the background so things don't hang.

The open-curtains.pl Perl script I wrote is as follows:

This will first open a TCP connection to the target and negotiate the VNC protocol. The VNC server will first send an RFB header containing the server version (e.g. RFB 003.008). We then send back a response of "RFB 003.003" to tell the server what our client capabilities are. The server will then respond with the authentication mechanisms that are accepted. This is the important part. We only then continue if the server tells us that no security is configured (0x1). At no point do we try to bypass the security or try any passwords. Finally the vncsnapshot program is invoked to connect and take a screenshot from the VNC server.

And the results were quite interesting.

I found a lot of digital signage, all of which appear to be located in South Korea and mostly relating to "LG U+" (telecoms and mobile phone operator controlled by LG Group):

Quite a few desktops. Surprisingly a lot of them were Ubuntu desktops, proving that the Operating System is only as secure as the user makes it:

This desktop was pretty ironic. Look at the websites the user has open in Firefox:

vnc9

And an OS X desktop. Oddly enough it looks like someone has tried to pwn it but with win32 files?

vnc11

A few home automation type systems and embedded devices such as MFPs:

A few Danfoss terminals that appear to be running a UNIX like-OS popped up too:

vnc18A few Point of Sale/Back Office systems:

Worryingly one of these is likely pwned already and is connected to a C&C server via IRC. The mIRC registration web page that has opened gives the game away. Unfortunately the user is probably just closing the window each time:

vnc21

Other systems included a pharmacy system and a radar system on a ship:

Over the space of about 2 weeks I collected 399 screenshots. The process could be a lot quicker if something such as masscan were to be used. As of yet I've not found any SCADA systems but they are out there...

Tagged as: , , No Comments
27Aug/14

Custom kernel on a DigitalOcean droplet – the right way

A few days ago I decided to create a VPS, known as a "droplet", with DigitalOcean. They claim a deployment time of 55 seconds. And 55 seconds after hitting the button I had a Debian 7 x64 droplet running. The plan was to migrate my current VPS to this DigitalOcean droplet. The first task I always undertake with any Linux deployment is to create a custom stripped down kernel patched with grsecurity. However, unknown to me, the way DigitalOcean boot your droplet with KVM means that you can only use a kernel of their choice.

But don't panic because there is a workaround by utilising kexec. kexec is used to speed up reboots. When you power on any machine it goes through a POST procedure to initialise the hardware. When you reboot a machine it goes through this POST procedure again which really isn't needed if the hardware is already initialised. Normally kexec is invoked during the reboot runlevel to load a kernel in to memory and jump straight in to it, bypassing the hardware initialisation. The trick is to reverse this and have the machine utilise kexec during the startup runlevel, therefore jumping in to the custom kernel straight away and only utilising DigitalOcean's choice of kernel as a bootstrap.

I've seen a lot of hacks where people suggest modifying the init scripts such as /etc/init.d/rcS to kexec the custom kernel on boot before any init scripts are executed. But this is very hacky and you could easily end up in a situation where the custom kernel doesn't boot but because there's no way to stop kexec running you could have a completely unusable droplet.

In my opinion the correct way to do this is to write a proper LSB init script. Note that the below is specific to Debian and Ubuntu. The init script should also give you the option to abort the kexec process in case something goes wrong. At least that way the droplet will still boot with DigitalOcean's kernel and you can get in and fix things.

The LSB init script is as follows:

The above should be placed in /etc/init.d/droplet-kernel and chmod 755.

In addition you need /etc/default/droplet-kernel:

Now use update-rc.d to install the init script:

Now when the droplet boots, one of the first thing it does is look at running kexec to load and boot a custom kernel. The /etc/defaults/droplet-kernel file contains all the customisable options, such as easily enabling/disabling this process, where the custom kernel and initrd images are and the option to override the kernel arguments (such as rootfs, verbosity etc).

The script runs interactive and will hold the boot routine for 10 seconds, giving you the option to press Ctrl+C to abort the kexec process. This is especially useful if something has gone wrong with the kernel that will be kexec'd. Pressing Ctrl+C will mean the droplet continues to boot using DigitalOcean's kernel and you should be able to fix things up.

You'll notice that when the droplet is kexec'd, it appends "kexeced" to the kernel cmdline argument. It does this to prevent any loops from occurring. The init script checks if "kexeced" is the last argument and if it is won't try to kexec the kernel. Otherwise the droplet would get in to an endless reboot.

When the droplet is rebooted (runlevel 6), it removes the "kexeced" argument so that we get the option to Ctrl+C during the bootup again. This avoids the need to shutdown the droplet and power it on again for kexec to kick in.

Here it is in action:

2Aug/14

Automating mass Cisco IOS upgrades

This morning I needed to upgrade the IOS on 29 Cisco 3560G switches. Rather than login to each one, clean up the flash storage, FTP on the IOS image and set the boot image, I wrote a simple shell script and used clogin from RANCID to automate this task. Of course, nearly every Network Configuration Management platform that's any good should be able to do this but I prefer the personal touch.

The commands required on the switch were as follows:

First I tell IOS to not prompt on file operations. This makes automation easier as there's no need to deal with questions. Then I clean up the flash storage on the switch by removing any old IOS images. The IOS image is copied from an FTP server to the flash storage. The file prompt is put back to defaults and the boot system variable is set to the new IOS image. Finally the configuration is committed to NVRAM because at some point the switch will need to be reloaded.

The shell script will read in a list of IP addresses to connect to and then using clogin it'll login to each switch and execute the commands above.

The script I wrote is as follows:

A file called ips.txt has the list of IP addresses for the switches (one IP address per line). The commands listed above go in to a file called commands.txt. And lastly there's a file called clogin.txt that contains the login details that clogin needs. This would look like:

This tells clogin that there's no need to enter enable and to first try SSH and followed by telnet.

When the script is run it will grab the first IP address in ips.txt, execute clogin to login to the switch and then execute each command in commands.txt. When clogin exits, the IP address in ips.txt will be removed and placed in to a file called processed.txt. The script then prompts if it should continue to the next IP address, allowing you to review what happened to make sure the IOS image copied on OK.

This allowed me to upgrade 29 switches, whilst watching some morning TV and sipping a coffee with my feet up. All that's required now is a reload of each switch.

2Aug/14

Automating Cisco switch swap outs

So you can't automate the entire process unfortunately. You're still going to need to pull a late night and get your hands dirty...

Recently I was tasked with swapping out 4 old Cisco 10/100Mb switches with new 10/100/1000Mb switches. The old switches were a combination of Cisco 3560, 2950 and 3548 series. The old switches also had some old configurations that needed to be updated and the interface configurations weren't consistent. The interfaces also had different VLAN configurations and this was my main concern. What if I made a mistake? It's not very realistic to chase 192 ports and make sure every single one was working as expected.

Going back to a previous blog, should network engineers be programmers, writing a script to speed up the configuration process and eliminate any mistakes was the answer.

The process would be:

  1. Get the existing IOS configuration.
  2. Find the interfaces.
  3. Convert the old 10/100 ("FastEthernet") interfaces to 10/100/100 ("GigabitEthernet") interfaces.
  4. Extract the important parts of the interface configuration (description, speed/duplex if set, VLAN configuration, trunk configuration etc).
  5. Pump out the converted interfaces.
  6. Copy/paste in to a template after manually making sure it looks good.
  7. Upload the configuration to the startup-config of the new switch.
  8. Swap the switch out.

First of all, the existing IOS configurations are stored in SVN. They get here via RANCID. So I had the old configurations easily to hand. Also if anything did happen to go wrong on the night, this served as a good reference point.

My weapon of choice for this scripting task was Perl. The script I wrote took in an existing IOS configuration and extracted the physical interfaces and SVIs, converted them from FastEthernet to GigabitEthernet and grabbed all of the important stuff such as description, speed/duplex, VLAN configuration, trunk configuration, IP configuration it's an SVI etc.

Here it is:

I ran this against each of the 4 existing switch IOS configurations, checked the output quickly and then copied/pasted the interfaces in to my switch template that has everything else set such as Spanning Tree configuration, errdisable recovery, NTP, AAA, SNMP, syslog etc etc. VTP would take care of the VLAN database once the switch was connected to the core.

I connected each of the new switches to my laptop and configured the Vlan1 SVI with a temporary IP, then TFTP'd the switch template to the startup-config along with the IOS version required. Turn the switch off and on again to make sure it looks good and job done on the configuration front. Each switch took a very small amount of time to configure and I could be safe in the knowledge that all the interfaces were correct.

The 4 switches were swapped out in the dead of night. It took 4 hours from start to finish, including testing and monitoring. Out of roughly 192 ports there was only one device which didn't work the next morning and that was due to an auto-negotiation issue.In my book it was a very successful, painless and efficient change. One which I wasn't particularly looking forward to but thanks to a bit of scripting ended up being easy.

2Aug/14

Should network engineers be programmers?

Short answer: Yes.

Maybe not a programmer in the sense that you need to be proficient in C++, .NET, assembler, know UML etc but having some general programming knowledge is very useful. In my opinion and experience the most important programming skill to have is a fairly in-depth knowledge of a scripting language. Be that shell, Perl, Powershell or even batch scripts. A week doesn't go by where I don't write a script to help me with my day to day tasks. Either to automate a process or format some logs or debug output I've collected.

Personally my scripting language of choice is either shell or Perl. Shell for easy repetitive tasks and Perl for formatting data or even creating configurations. Here's a very simple example of a Perl script I wrote recently:

What does this do, apart from make my life simpler? It generates a Cisco IOS config with 29 LACP port channels and configures the physical interfaces. Then it's a case of running the script and copying/pasting the result in to the device. It also eliminates any human error. If you were having to create 29 port channels and configure 58 physical interfaces, the chances are you'll make a mistake. Such as forgetting to configure the interface as a trunk, setting the wrong channel group ID on the interface or generally getting in to a bit of a mess.

I'm going to post a few other blogs containing some scripts I've recently used to help automate tasks. Time to sharpen your scripting skills!