Just fixed a problem that had been (unbeknownst to me) plaguing this place for who knows how long.
After rebooting a domain controller, the SQL servers would start throwing logon errors until it came back up. Most of the errors were from IIS that use windows auth, a few errors even came from the app servers. What made it especially tricky was that it really looked like a kerberos problem. Depending on what machine you were looking at, you’d get an error like
Login failed for user ”. The user is not associated with a trusted SQL Server connection.
or maybe
Logon Failure:
Reason: An error occurred during logon
… it’d go on to tell you what was wrong with Kerberos
and one
The Kerberos subsystem encountered a PAC verification failure
I saw a few
The failure code from authentication protocol Kerberos was The specified user does not exist
Long story short – I looked and poked and Googled… I couldn’t seem to find anyone that had reported this problem before (and I’m usually pretty good at putting together search terms that get me what I want) until I finally found a KB article that fit my hunch of what was happening. I don’t think I saw the “NO_SUCH_USER” code in any of the logs I looked through (maybe that’s what you get if you’re using NTLM?) but the rest of it sure sounded good. I tried the work-around on the domain controllers first – stopping the netlogon service before rebooting – and didn’t get a single error.
I deserve a raise. 😀