Friday, 14 January 2011

Forefront TMG Enterprise Standalone Array does not start after server reboot

Last couple of days me and my colleague where troubleshooting a brand new installation of Forefront TMG Enterprise Standalone Array consisting of two nodes. The problem we had was that after a server restart, Forefront TMG Control service would not start. It would hang in a starting state for about 10 minutes after which it would eventually start but the TMG Firewall service and all other TMG services that depend on TMG Control service did not start because of this timeout. After that we could manually start the services and the TMG Array worked with no problem. It was only the problem after server reboot.

The environment:

  • Two Windows Server 2008 R2 Standard virtual machines on VMware ESX 3.5 Update 5 environment
  • Forefront TMG 2010 Enterprise SP1 Software Update 1
  • Forefront Standalone Array in Workgroup mode with one node designated as Array Manager and the other one as Array Member
  • Each node had a server certificate installed in local computer store with Extended Key Usages for Server Authentication and Client Authentication

Here are some of the errors we were seeing in the event log:

The Microsoft Forefront TMG Control service hung on starting. 

The Microsoft Forefront TMG Firewall service depends on the Microsoft Forefront TMG Control service which failed to start because of the following error:
After starting, the service hung in a start-pending state.

The Microsoft Forefront TMG Managed Control service depends on the Microsoft Forefront TMG Control service which failed to start because of the following error:
After starting, the service hung in a start-pending state.

The Microsoft Forefront TMG Job Scheduler service depends on the Microsoft Forefront TMG Control service which failed to start because of the following error:
After starting, the service hung in a start-pending state.

Log Name:      Application
Source:        Windows Error Reporting
Date:          14.1.2011. 14:57:24
Event ID:      1001
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      *************
Description:
Fault bucket , type 0
Event Name: ServiceHang
Response: Not available
Cab Id: 0

Problem signature:
P1: isactrl
P2: mspadmin.exe"
P3: 0.0.0.0
P4: 10
P5: 2
P6:
P7:
P8:
P9:
P10:

Attached files:

These files may be available here:

Analysis symbol:
Rechecking for solution: 0
Report Id: 363731e9-1fe6-11e0-846f-00155d274102
Report Status: 0



To better understand the problem here are some technical details.

TMG members in the Standalone array communicate with the array manager which has AD LDS (Active Directore Lightwight Directory Services) installed which provides configuration storage for the entire array. The array manager would first save the configuration to its local AD LDS instance and the rest of the array members connect to it using Secure LDAP which require server certificate with "Server Authentication" key usage. Actually, only the Array Manager requires the certificate and the Array Members require root certificate from the Certification Authority that signed the certificate located in Trusted Root Certification Authorities Store so that they would trust Array Managers certificate. But in case the Array Manager fails you would have to manually promote one of the Array Members to Array Manager and he would then require server certificate installed. So we have installed server certificate on both TMG computers.

Here comes the problem.... Since Array Member had its own server certificate with Extended Key Usage and Intended Purposes set to Server Authentication and Client Authentication, when authenticating to remote AD LDS service it would present its client certificate and this process is known as mutual authentication or MTLS. Well it seems that TMG Control service does not like this behavior and it times out for about 10-15 minute after which none of the TMG services start. The problem happened on both TMG computers even though one TMG was Array Manager, he still needed to connect to local AD LDS instance but he too tried to mutually authenticate to the local AD LDS service.

Well, the solution was quite simple in fact, but very hard and frustrating to find because there were almost no relevant logs to look at. On each TMG computer the certificate properties should be modified to include only Server Authentication for Intended Purpose.

Here is how to do it:
  • Open Certificates MMC snap-in and connect to Local Computer
  • Navigate to Personal > Certificates > your_computer_certificate (the certificate should have common name of FQDN of your TMG computer)
  • Double click on the certificate, click on the Details tab and click Edit Properties
  • Choose "Enable only the following purposes" radio button and check Server Authentication
  • Restart your computer and see if TMG services start normally




Of course, if your certificate only has Server Authentication in Extended Key Usage field then you will not experience this issue.

Microsoft also had something to say about this issue in the following article, but not directly related to this problem.


Client logon is slow and server certificates used for Web publishing are configured with the default purpose settings "Server Authentication" and "Client Authentication"
Issue: When Windows Server 2003 detects the default purpose setting of "Client Authentication", the operating system attempts to perform TLS with mutual authentication to the domain controller. The mutual authentication process requires ISA Server to have access to the private key of the server certificate with the "Client Authentication" setting enabled, and ISA Server does not (and should not) have this access.
Solution: Ensure that all server certificates do not have the default "Client Authentication" purpose enabled. You can disable this setting on the property pages of the relevant server certificate as follows:
Disable Client Authentication purpose on a certificate
1.     Open the Certificates Microsoft Management Console (mmc) snap-in. To add the Certificate Manager to the mmc, do the following:
·         Click Start, and then click Run.
·         Type mmc and then press ENTER.
·         Select the File menu, and then select Add/Remove Snap-in.
·         In the Add/Remove Snap-in box, and then click Add.
·         Double-click the Certificates snap-in, select Computer Account, and then click Finish.
·         Select Local Computer, and then click Finish.
·         Close the dialog boxes.
2.     In the Certificates mmc, click to expand the Certificates node, and then expand Personal.
3.     Right-click the relevant certificate and then click Properties.
4.     On the Details tab, click Edit Properties.
5.     Select Enable only the following purposes, and clear the Client Authentication purpose.


Link to the entire article here.

While troubleshooting this issue, out of pure frustration we even replicated the entire environment on Windows Server 2008 SP2 and later even on Hyper-V as the virtualization platform to eliminate any compatibility issues but finally it seems that this little setting did the trick.

We have also tried, read this carefully, Rollup 1 and Rollup 2 for Software Update 1 for Service Pack 1 for TMG 2010 just to be sure we had the entire environment patched and read numerous blogs that talked about TMG Control service dependency issues that would arise after installation of TMG updates and rollups but none of those worked.

I really hope this article will someday save a lot of time to someone :)

P.S.
Here is a link to a blog article that describes some other startup issues that you may have with TMG related to service dependency ordering.

4 comments:

  1. Today I wrote a "part 2" of this article which you can find here:
    http://www.itsolutionbraindumps.com/2011/01/how-to-properly-issue-certificate-for.html

    It further describes the hang issue with the TMG Control service.

    ReplyDelete
  2. Thanks for the post. It saves me a lot of headache! Another way to solve this is to configure services as Automatic (Delayed) but your solution is more elegant.

    Cheers

    ReplyDelete
  3. For TMG server certificate installation you can follow the thread, So Simple.

    http://networksupportblog.m4infotech.in/tag/tmg-server-certificate-installation/

    Thanks

    ReplyDelete
  4. Microsoft provides a KB article for this issue:

    http://support.microsoft.com/kb/2659700/en-us

    ReplyDelete