Figuring out why dd-wrt randomly restores to Factory Settings

I'm part of the committee at my local CIU club and have volunteered to offer technical support. Part of this included setting up and managing the network to provide WiFi throughout what is quite a large building ...

There wasn't much in the budget to set it all up, so I decided to buy a few resonably decent routers (TP-Link Archer C6), and install the OpenSource dd-wrt firmware so that I could tinker with some of the advanced features avaiable.

All was working well for quite a while, providing both a private and public network. It was quite important for the WiFi to be working, as this was the main connection method for the card machines ... No Wifi, no payments ...

Then suddenly the router within the main bar kept restoring itself to factory settings. Initially I just thought it was a one off, and thought nothing of it (mainly because all the logs were lost, due to it being restored) - I just configured it again and went on my merry way. But after a few weeks it happened again.

This time I decided to take a backup of the configuration after reconfiguring it (along with the other routers too), and also set up logging to Papertrail - I just created a free account, so that I could see recent logs, I didn't need to retain them for any real length of time.

Then it happened again, and I managed to see the logs that caused the restore, yay!

Kernel  Warning kernel [ 3637.780000] ath10k_pci 0000:00:00.0: failed to install key for vdev 1 peer [redacted] -145
Kernel  Error   kernel [ 3637.780000] wlan1.1: failed to remove key (0, [redacted]) from hardware (-145)
Kernel  Warning kernel [ 3640.790000] ath10k_pci 0000:00:00.0: wmi command 36890 timeout, restarting hardware
Kernel  Warning kernel [ 3640.790000] ath10k_pci 0000:00:00.0: failed to install key for vdev 1 peer [redacted] -11
Kernel  Error   kernel [ 3640.800000] wlan1.1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
Kernel  Warning kernel [ 3640.810000] ath10k_pci 0000:00:00.0: failed to install key for vdev 1 peer [redacted] -143
Kernel  Error   kernel [ 3640.820000] wlan1.1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-143)
Kernel  Info    kernel [ 3640.930000] ieee80211 phy1: Hardware restart was requested

After a bit of searching through the dd-wrt fourms, I found that dd-wrt will by default restore itself to factory settings if it fails to boot 5 times.

I could just try to increase the amount of restarts, or turn that option off, but that probably wouldn't resolve the issue, it would most likely constantly restart - which wouldn't be too healthy for the router ...

So I had a look into what the actual error message is on about, particularly the logs that mention `failure to install/set/remove key`. I can see it's happening on `wlan1.1`, which is the public WiFi, and also found it it basically suggests that for whatever reason, that an attempt was made to install/set/remove the encryption key for the specified WLAN interface (wlan1.1) but was unsuccessful.

Firstly it could simply be a bug within the firmware, so I made sure that they were all upgraded to the latest firmware version. There's no official method of auto-updating, so it can sometimes fall behind the latest version.

There was also the suggestion that the WLAN hardware could be incompatible with some of the encryption algorithms that are available within dd-wrt. I had initially set it up with many of the protocols enabled, because there was some dated equipment that needed to be connected via WiFi, and also needed to support newer devices. Since setting up though, we had upgraded the older devices, so I looked into minimising the enabled protocols.

Lastly there was the possibility that it's simply a hardware failure. I decided to test this theory, by swapping the router with one in a different area, that had fewer devices connecting to it. So if the issue happened again, I would assume without a hardware failure, it would be the one located in the main bar that would throw the error.

For good measure, now that I knew the error message, I set up an email alert from Papertrail. I also set up a generic alert for 'Error'.

Now to play the waiting game!

I intend to just wait out for any alerts that come through, and monitor the logs for any reocurrace of the issue, or any new errors 🙂