Troubleshooting Machine GPO issues
Knowledge Base ID: KB12554
Version: 2.0
Published: 03 Nov 2009
Updated: 03 Nov 2009
Categories: . IC_4000
. IC_6000
. OAC Enterprise
. OAC FIPS Edition
. IC_4500
. IC_6500

Synopsis:
This document contains suggestions for troubleshooting Machine GPO issues in an 802.1x environment.

Problem:
This document is written with the assumption that all authentications are successful.

Before reading through this document, please be sure that Machine GPOs are working in a non-802.1x environment. Also ensure that your end-user machine is running a current service pack and that your network adapter’s driver is up to date.

Solution:

Note: Once a Windows machine has been assigned an IP address Windows will try and establish a secure connection to the DC and then move on to Machine GPO processing. If a machine is placed on a VLAN that does not have access to the DC prior to a Machine 802.1x connection you may have difficulties getting Machine GPOs to run reliably.




When Machine GPOs run unreliably or not at all, it is important to identify where the break is occurring. Typically GPOs fail to run for 1 of 2 reasons, delays in authentication or delays in IP Address assignment.

The simplest way to test that Machine Authentication and DHCP is working is to ping the test machine as it boots up. The more successful pings before the Ctrl-Alt-Del screen the better. 
Considering that there is no OAC UI to look at while a machine boots up trouble shooting Machine Authentication can be tricky. For that reason use a level 5 OAC debug log and a packet capture to track events.  For a packet capture in a wired environment it is suggest to mirror the port the test machine is connected to. For wireless networks, placement of the capturing machine is irrelevant.

Note: It’s recommended to test GPOs with reboots. That way any solution will address the delays involved with services starting up.


Troubleshooting Basic Connectivity

  1. It is recommended to delete the existing OAC log before every test. The log is located in the directory below.
    C:\Documents and Settings\All Users\Application Data\Juniper Networks\Logging
  2. After deleting the logs reboot the test machine and start the packet capture as the machine begins to boot up.
  3. Once the “Ctrl-Alt-Del” screen appears, login without waiting.
  4. When the desktop appears, bring up the OAC Client Manager and select Tools\Logs.
  5. In the log window select “save all”, then stop the packet capture.

    Don’t be concerned if the OAC log window is blank. When “save all” is selected OAC will create an OAC_Logs.zip file that contains your log.
  6. Extract the debuglog.log file from the OAC_Logs.zip. If the log was deleted before the system was rebooted the steps below will help provide a good picture of what occurred between system startup and user login.
  7. Search for one or more of the following highlighted events in the log and note the time.
    • “WlEventStartup()”
    • \\HKLM\Software\Funk Software, Inc.\Odyssey\client\configuration\machine
    • New identity added (MachineAccount). MachineAccount in this case is the name of the profile configured in OAC under Machine Authentication.
    • Port is up, BSSID = 0180C2000003, SSID =
    • Processing EAP-Request/Identity: code
    • Processing EAP-Success: code = 3
    • Authorize() Renew outer IP address
    • S T A R T I N G W I N D O W S L O G O N

    You should have a list of events that looks something like this.

    1. 00120,09 2008/10/01 11:29:41.093 WlEventStartup()
    2. 00217,09 2008/10/01 11:29:42.234 Setting database path to \\HKLM\Software\Funk Software, Inc.\Odyssey\client\configuration\machine
    3. 00201,09 2008/10/01 11:29:42.296 New identity added (MachineAccount)
    4. 00170,09 2008/10/01 11:29:43.140] Port is up, BSSID = 0014F27DC790, SSID =
    5. 00188,09 2008/10/01 11:29:43.140 Processing EAP-Request/Identity: code = 1, id = 1, length = 38
    6. 00179,09 2008/10/01 11:29:44.359 EAP-Success: code = 3
    7. 00197,09 2008/10/01 11:29:45.359 CNetShimOdysseyKernelIO2NetInterfaceInstance::Authorize() Renew outer IP address
    8. 00197,09 2008/10/01 11:29:52.012 S T A R T I N G W I N D O W S L O G O N

Note: One way to distinguish between wired and wireless connections is to view the value of SSID= . If there is a value after the equal sign it’s a wireless connection, if not it’s wired. In some environments users will configure both wired and wireless connections and it can be a little difficult to differentiate between the two.


Understanding the events:

The events listed above tell us the following.
  1. 11:29:41.093 A Windows Login Startup Event occurred
  2. 11:29:42.234 OAC loaded the machine configuration
  3. 11:29:42.296 OAC loaded the machine profile
  4. 11:29:43.140 OAC is physically connected to a switch or AP
  5. 11:29:43.140 OAC has received an ID request from the switch or AP which starts the 802.1x negotiation
  6. 11:29:44.359 OAC received an Access Accept or successful authentication response.
  7. 11:29:44.359 OAC notifies Windows that the network interface is connected and that Windows should request an IP address.
  8. 00197,09 2008/10/01 11:29:48.012 User has enter their credentials at the Ctrl-Alt-Del screen
Reviewing all the information together tells us that 3 seconds expired between startup and a successful authentication. We also know that the machine authentication succeeded 8 seconds before the user logged in. If Windows does request an IP after OAC sends the “renew outer IP address” and the DHCP server responds quickly we would expect to see machine GPOs run properly. This is where the packet capture comes in handy. Below is an example of what a typical 802.1x authentication looks like in a packet capture. After every test, ensure that the DHCP Agent sends a request shortly after the EAP Success appears in the trace. I’ve also included an example of GPO processing traffic.


The source of the initial “Negotiate Protocol Request” will originate from either the client or a DC\file server.






In a typical DHCP exchange you would see a single request sent by the client followed by an offer from the DHCP server and finally an ACK from the client. The example above is what happens when the machine is placed on a Guest VLAN prior to the machine authentication. When the Machine successfully authenticates it is placed in another VLAN. After the Machine Authentication completes OAC instructs the DHCP Agent to request an address. By default The DHCP Agent will attempt to renew the address from the previous VLAN. As you can see the server NAK’ed the request since the request is coming for a different VLAN. Eventually the DHCP Agent will request a new address instead of trying to renew the old one (Discover) and be offered an address on the new VLAN. This whole exchange can take in upwards of 40 seconds and may cause Machine GPOs to fail.


Known Problems

Machine Authentication is not starting quickly.
If the machine authentication does not start in a timely manner on XP or 2000, make sure that the Windows WZC service is running. If the service is stopped you may experience delays in adapter initialization.

Machine Authentication starts quickly but takes too long to complete.
There are a few things that can cause this.

Check the Radius server logs. Ensure that user lookups are completing quickly. If you think the lookup is the problem you can attempt to confirm it by creating an account in the Radius server’s local user database, and then configuring OAC to use those credentials. If the average connection is much faster then the problem may be caused by latency on your Domain Controller.

If you are performing Host Checks at Machine Authentication please see KB12348 for details on how to resolve the latency issue. This KB is only applicable if you are running versions of OAC prior to 2.2R3. This KB does not address Host Checks pertaining to Antivirus, software firewalls, or Anti-malware checks. Checks of this type can introduce significant delays due to the fact that the services to be checked may not have started yet. One solution to this problem is to make the Juniper OAC service a dependency of the service you want to check. See this Microsoft tech-note for details on how to do that (http://support.microsoft.com/kb/193888 ).

Network latency or packet loss can cause an authentication to timeout and restart quickly giving the impression of one long connection. Search the OAC log starting from “Request/Identity: code” and ending with “EAP-Success: code = 3”. Follow the time stamps search for gaps of 20 seconds or more. Typically these delays are caused up stream, use your packet capture and Radius logs to track down the delay. Between the OAC log and the IC snapshot you should be able to track the whole authentication and find what packet is getting lost. If it’s a case of true network latency or packet loss the expectation is that different packets will be lost. If it’s the same packet the problem could me more severe. Use packet captures and AP\Switch logs to find where the missing packet is getting lost or delayed.

Once Machine Authentication and DHCP is working, retest your GPOs. If the GPOs still fail try reading through Microsoft tech-notes below. Failing that contact Juniper and open a support call.


Troubleshooting Group Policy Problems
http://technet.microsoft.com/en-us/library/cc787386.aspx

Group Policy application fails on a computer that is running Windows 2000, Windows XP Service Pack 1, or Windows XP Service Pack 2
http://support.microsoft.com/default.aspx?scid=kb;en-us;840669

The Logon Script Does Not Run During the Initial Logon Process
http://support.microsoft.com/kb/302104/en-us

How Core Group Policy Works
http://technet.microsoft.com/en-us/library/cc784268.aspx

Fast Logon Optimization
http://technet.microsoft.com/en-us/library/cc780527.aspx
(see note at bottom)




















Purpose:
Troubleshooting