Recently I started noticing a very strange problem. user Group Policy was not applying for a single user. All other users got GPO just fine, but for one user GPO failed to apply. This manifested itself in a couple of interesting ways:
- When you run gpresult under this user's account you get an error that this user does not have RSOP data
- If you use folder redirection via GPO, as I was, suddenly this user does not get folder redirection applied. The user's "my documents" folder all of sudden is a local copy from %userprofile%, and it has nothing in it. If the user goes directly to the network-based documents folder the original documents are of course all there.
As always, the first step in troubleshooting any group policy problem is to turn on UserEnvDebugLevel. I set it to 0x00030002 to get as much information as possible. I then ran gpupdate /force: /target:User. Upon examining the userenv.log file I found the following information:
USERENV(c74.bb0) 11:54:24:953 LibMain: Process Name: C:\WINDOWS\system32\gpupdate.exe
USERENV(c74.e70) 11:54:24:985 RefreshPolicyEx: Entering with force refresh 0
...
USERENV(4bc.ed4) 11:54:29:811 ProcessGPOs: GetGPOInfo failed.
USERENV(4bc.ed4) 11:54:29:811 ProcessGPOs: No WMI logging done in this policy cycle.
USERENV(4bc.ed4) 11:54:29:811 ProcessGPOs: Processing failed with error 997.
Error messages are good. Do a web search for error 997 and you find, well, not much. There is a newsgroup post where some guy has the same problem, and fixes it by deleting the user account and creating a new one. Thanks but no thanks. I'm not much for solutions to a small cut that involve amputating the limb the cut happens to be on.
Instead I went to the event logs on the DC where I noticed relatively quickly, by filtering for event ID 540, that this user was not logging on with Kerberos. For some reason, when this user logged on a quick Kerberos logon happened, followed by an almost immediate logoff, and then an NTLM logon. This is why Group Policy is not being applied - the user is not logging on with Kerberos. Without Kerberos you do not get Group Policy. Unfortunately, it does not tell us why this is happening.
At this point, I looped in my friend Jimmy, who has forgotten far more about Active Directory than I will ever know. Jimmy immediately suggested turning on some Kerberos debugging, which sounded like a really good idea. We used the following the parameters:
Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters
Value: LogLevel
Type: REG_DWORD
Data: 1
Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Kdc
Value: KdcExtraLogLevel
Type: REG_DWORD
Data: 4
Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Kdc
Value: KdcDebugLevel
Type: REG_DWORD
Value: 1
These values need to be in the specific keys listed here. I know that sounds obvious, but it must not have been so obvious to the authors of this KB article:
http://support.microsoft.com/kb/887993/en-us
Once we did that we got this error in the event log:
Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 3
Date: 11/25/2006
Time: 10:15:44
User: N/A
Computer: <server>
Description:
A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 18:15:44.0000 11/25/2006 Z
Error Code: 0xd KDC_ERR_BADOPTION
Extended Error: 0xc00000bb KLIN(0)
Client Realm:
Client Name:
Server Realm: <domain>
Server Name: host/<dc.domain>
Target Name: host/<dc.domain>@<domain>
Error Text:
File: 9
Line: ae0
Error Data is in record data.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 30 15 a1 03 02 01 03 a2 0.¡....¢
0008: 0e 04 0c bb 00 00 c0 00 ...»..À.
0010: 00 00 00 03 00 00 00 .......
More error messages. As we mentioned before, the only thing better than error messages is not having the problem in the first place. Unfortunately, we got nothing else here. There is virtually no information available on the KDC_ERR_BADOPTION error, just like we had no information on the error 997 that we saw earlier.
This is where we got stuck. We have two very interesting error messages, but no information on what caused them. However, one thing seems really clear, the error is client based, and happens only for a single user. In addition, we are seeing this on two different client computers for the same user. Since the user has a roaming profile this allows us to discard several possible sources of the problem:
- It is almost certainly not related to the client computer since it happens on two different computers
- It is almost certainly not related to the DC since no other users are affected by this
The only other variable is the user itself. With the roaming profile we should expect that any corruption in the user's profile would manifest itself on more than one computer. To test this hypothesis we logged the user off, deleted the locally cached copy of the profile on the client, and renamed the server copy to something else. When the user next logs on the client will re-create a new profile for the user based on the local Default profile, and upload it to the file server where the profile should be stored.
This procedure verified that it was indeed profile corruption. When the user logged on next a complete kerberos logon was negotiated and group policy was applied.
We still do not know the exact nature of the profile corruption, nor what caused it. We are still investigating though. For now, what we do know is that if you have this happen you can fix the problem with a slightly less painful procedure than complete user amputation - namely by forcing the user's profile to be re-created. This is obviously not a particularly graceful solution, and is a bit like amputating a significant piece of the user. However, in a domain environment, if encryption keys are stored in Active Directory, it is considerably less painful than deleting the user account and starting over, as the newsgroup post we pointed to earlier suggested.
If you have any information to give us on this error please let us know. In the meantime, we will keep searching.