Ports Required to Join a Windows Domain – Managing Windows Machines in a DMZ with SCCM

For those looking for the ports you need open, this is what I use for a Windows 7 and Windows 2008 R2 DC.

LDAP TCP-in – 389
LDAP UDP in – 389
LDAP for Global Catalog TCP in – 3268
NetBIOS name Resolution UDP in – 138
SAM/LSA TCP in – 445
SAM/LSA UDP in – 445
Secure LDAP TCP in –  636
Secure LDAP for Global Catalog TCP in – 3269
W32Time NTP UDP in – 123
RPC – RPC Dynamic
RPC Endpoint Mapper
DNS – TCP and UDP 53
Kerberos V5 UDP in – 88
Netbios Datagram UDP in – 137

Now for the long story (note that the below solution is conceptual and hasn’t been tested in a lab yet. This is just me getting things written down. I’ll update more after we put this in place) –

When managing machines that are behind a firewall, you’ll need to open ports on the firewall to get them joined to a domain. I have an interesting situation coming up next week where we need to manage machines that are in my customer’s DMZ. In the current customer’s environment, the machines in their DMZ are workgroup machines that aren’t joined to a domain.

From an SCCM Perspective, we can manage workgroup machines. However we still need the ports opened through the firewall to manage these machines. In some cases, network administrators don’t want all of their DMZ machines to go through the firewall that separates the DMZ from the internal network. We haven’t determined if that’s a route we want to take, but I personally feel that the more machines you have that can route through the firewall, the more you increase your attack vector.

My proposed solution would be to build up a secondary SCCM server in the DMZ that can only communicate through the firewall. All of the other machines in the DMZ would communicate through that server. That server doesn’t even HAVE to be a secondary site, we could just build one box with an MP and DP (or whatever other site roles you need). The secondary will just allow us to throttle bandwidth if needed.

Below is a diagram that we use for Internet Based Client management, but for managing machines in the DMZ, the process would be the same.

Network Diagram for Internet-Based Servers - Scenario 3 with No SQL Server Replica

So in our case, since there will be less than one hundred machines in the DMZ that need to be managed, we’ll probably put all site roles on one box. We’ll then need to open the ports I referenced above to allow the server to join the domain (note that all site system roles MUST be apart of a domain. They don’t have to be apart of the same domain as the site server, but they do need to be part of a domain.). Once the server is joined to the domain, we’ll need to open either port 80 or port 443 (for HTTPS) outbound to allow for the Software Update Point to communicate through the firewall. The diagram says HTTPS, but we can use HTTP since we’ll be in a mixed mode environment. For native mode environments you’d need to utilize HTTPS. We’ll also need SMB 445 outbound open to allow the site server to communicate with the other site roles in the perimeter network. We’ll also need to create an inbound rule for SQL on port 1433 for the management point to communicate with the SQL DB.

Once we’ve done all that, we’ll need to setup the appropriate rights to allow for site system installation on our server.

Client Configuration (thanks to Chris Stauffer for these tips)

In our scenario, we will be able to map to the MP’s client share to install ccmsetup.exe. We can use the ccmsetup.exe /MP:servername /logon SMSSITECODE=XYZ. We’ll likely also need to make some adjustments to the LMHosts and Hosts file.

Note that these tips should work if you have the firewall rules enabled for your clients to communicate through the firewall which would be port 80 or any custom port you’ve allowed SCCM to use.

LMHOSTS file:

Add the SMS information to a LMHOSTS file, which you can copy to each client. Use the following as a guide (WS03DC01 is the SMS server name):

192.168.1.61 ws03dc01                        #PRE
192.168.1.61 “SMS_SLP            \0x1A” #PRE
192.168.1.61 “SMS_MP              \0x1A” #PRE
192.168.1.61 “SMS_NLB             \0x1A” #PRE
# “12345678901234567890”
(note that there are 20 characters between the quote marks on each line, and the last line is just to help with spacing – it is not needed)

HOSTS file:

Add the SMS information to a HOSTS file, which you can copy to each client. Use the following as a guide (WS03DC01 is the SMS server name):

192.168.1.61 ws03dc01.domain.lcl ws03dc01

Summary

Build a server in the DMZ
Open the following inbound ports on the firewall to allow the server in DMZ to join the domain

LDAP TCP-in – 389
LDAP UDP in – 389
LDAP for Global Catalog TCP in – 3268
NetBIOS name Resolution UDP in – 138
SAM/LSA TCP in – 445
SAM/LSA UDP in – 445
Secure LDAP TCP in –  636
Secure LDAP for Global Catalog TCP in – 3269
W32Time NTP UDP in – 123
RPC – RPC Dynamic
RPC Endpoint Mapper
DNS – TCP and UDP 53
Kerberos V5 UDP in – 88
Netbios Datagram UDP in – 137

Open the following outbound ports on the firewall to allow SMB and HTTP traffic

SMB TCP/UDP – 445
HTTP TCP – 80

Open the following inbound port on the firewall to allow SQL Traffic from the MP

SQL TCP – 1433

Give rights for the site server on the intranet to the site system server in the DMZ so the site systems can install on the server.

Setup Site Roles (DP, MP, SUP)

Setup a test client and make changes to LMHOSTS and HOSTS file (if needed, not sure if necessary yet).

Install SCCM client with ccmsetup.exe /MP:servername /logon SMSSITECODE=XYZ

Some may say to use an FSP. I don’t know that it’s necessary. These client installs will be done manually, and i’d rather look at the client logs.

We’ll see how well this works this week. I’ll post an update by end of week.

SCCM Fix: Error 0x80070643 or 0x8024200b when installing Office 2003 Updates by using WSUS or SCCM

In our SCCM test environment we were installing this months Microsoft Security Updates and noticed on one test machine that all the Office 2003 Updates were failing. I hadn’t seen this before, so I started to do a little digging.

Knowing that SCCM really doesn’t do much but call the Windows Update Agent components, I first checked out the c:\windows\windowsupdate.log to see if there were any specific error messages relating to why these updates weren’t installing. What I noticed was the following:

Handler : MSI transaction completed. MSI: 0x80070643, Handler: 0x8024200b, Source: No, Reboot: 0

We basically have two exit codes here. The handler tells me that WUA passed this on to MSI and MSI returned exit code 0x80070643. Windows Update (which is the Handler in this case) is saying its exit code is 0x8024200b. Based on both of these exit codes, we have a failure. The actual code we care about is the first one because that’s what was returned to MSI. The second code is basically just a generic failure code.

By doing a search, I found that 0x80070643 relates to an issue with the Office Source Engine being disabled. Sure enough, on the machine we saw this issue happening on, it had the service disabled. Once we set it to manual, everything worked as normal. See http://support.microsoft.com/kb/903772 for more information.

So then I got to thinking, “How many machines in my environment have this service disabled? Our help desk could get flooded with calls.” But then I realized that if this really were a wide spread issue with this service disabled, we’d have a lot more issues in previous months with Office 2003 updates. But nonetheless, I was curious, so I made a query in SCCM to look for all machines with the Office Source Engine disabled. The query syntax is as follows:

select distinct SMS_R_System.NetbiosName from  SMS_R_System inner join SMS_G_System_SERVICE on SMS_G_System_SERVICE.ResourceID = SMS_R_System.ResourceId where SMS_G_System_SERVICE.Name = “ose” and SMS_G_System_SERVICE.StartMode = “Disabled”

How many machines came back with this service disabled? 4 (out of over 2000).

I think we’re OK :)

SCCM: Administrator Console Locked up, Frozen, Hosed, or Hung solution

Today I received an email from a tech who complained that the Configuration Manager administrator console we have installed on a Windows 2003 terminal server was hosed up. I’ve seen this happen in a few instances. I’m not entirely sure why the console just sits hung, but the solution that I’ve seen work most often is to delete the MMC related files in the user’s profile that get created after the console has been used. You want to delete the sms and/or adminconsole files (I think the sms file comes from the old SMS 2003 console).

You can find the files at the following path:

Windows 2003: C:\Documents and settings\userid\Application Data\Microsoft\MMC

Windows 2008: C:\users\userid\appData\Roaming\Microsoft\MMC

Once the files have been deleted, logoff and log back on and run the console. This should fix the issue.

SCCM: How to Determine Content Download to Cache Issues

Over the past couple of days I’ve been fighting with an application that a 3rd party vendor packaged for us. The package in question is an MSI that calls numerous other files. In total, the package has over 3000 files and is about 180 MB in size.

Issue

The issue that was reported to me was that the content was not downloading. My datatransferservice.log, my CAS.log, and ContentTransferManager.log all looked good. The client found a local DP to download the content from. However when I looked at my cache folder, I saw that only a couple of MB of data was downloaded and it never increased in size.

So I figured BITS was the culprit here. Unfortunately there’s no good logging for BITS without doing some nasty logmon stuff. The good news is that BITSAdmin is a great utility (at least for now, according to BITSAdmin on a Windows 7 box they reference new BITS Related powershell cmdlets which I didn’t find all that particularly useful).

In order to see what jobs are currently downloading, type in:

bitsadmin /list /allusers

This will give you output similar to the following:

BITSADMIN version 3.0 [ 7.5.7600 ]
BITS administration utility.
(C) Copyright 2000-2006 Microsoft Corp.

BITSAdmin is deprecated and is not guaranteed to be available in future versions of Windows.
Administrative tools for the BITS service are now provided by BITS PowerShell cmdlets.

{19A1D938-E1E9-437F-882E-1BFAABB707CB} ‘CCMDTS Job’ ERROR 146 / 3805 4752558 / UNKNOWN
Listed 1 job(s).

Notice how in this picture I have the following line:

{19A1D938-E1E9-437F-882E-1BFAABB707CB} ‘CCMDTS Job’ ERROR 146 / 3805 4752558 / UNKNOWN

This is no good. This basically means there’s an error somewhere in the transfer job.

Before we get into the next step of the solution, you must first understand what an SCCM distribution point is. An SCCM DP is simply a glorified web server. If you look at the Datatransferservice.log file on any of your SCCM clients, you’ll see a lot of URLs that look like the following:

http://DOMAIN:80/SMS_DP_SMSPKGF$/CEN00119/System32/Redist/MS/System/msvcrt.dll

What your client is basically doing is grabbing the files from these URLs and storing them into your local cache directory underneath the package ID.

So back to BITS. Since we saw an error in our bitsadmin /list /allusers, we need to find out exactly what that error is. The following command will show just that:

bitsadmin /info {19A1D938-E1E9-437F-882E-1BFAABB707CB} /verbose > c:\bits2.txt

So what this command is doing is giving us the information about the failed BITS job that we saw before. The verbose command gives us the status of each file in the job. We then pipe this out to a file. Inside of that file, we see the following:

BITSADMIN version 3.0 [ 7.5.7600 ]
BITS administration utility.
(C) Copyright 2000-2006 Microsoft Corp.

BITSAdmin is deprecated and is not guaranteed to be available in future versions of Windows.
Administrative tools for the BITS service are now provided by BITS PowerShell cmdlets.

GUID: {19A1D938-E1E9-437F-882E-1BFAABB707CB} DISPLAY: ‘CCMDTS Job’
TYPE: DOWNLOAD STATE: ERROR OWNER: NT AUTHORITY\SYSTEM
PRIORITY: LOW FILES: 146 / 3805 BYTES: 4752558 / UNKNOWN
CREATION TIME: 5/12/2010 11:31:04 AM MODIFICATION TIME: 5/12/2010 11:33:18 AM
COMPLETION TIME: UNKNOWN ACL FLAGS:
NOTIFY INTERFACE: REGISTERED NOTIFICATION FLAGS: 11
RETRY DELAY: 60 NO PROGRESS TIMEOUT: 2419200 ERROR COUNT: 147
PROXY USAGE: NO_PROXY PROXY LIST: NULL PROXY BYPASS LIST: NULL
ERROR FILE:    http://DOMAIN:80/SMS_DP_SMSPKGF$/CEN00119/Program Files/Hummingbird/Connectivity/9.00/HostExplorer/SDK/Samples/OHIO/Visual C++ Samples/HEOhioSample/MyTabCtrl.cpp -> C:\Windows\system32\CCM\Cache\CEN00119.1.System\Program Files/Hummingbird/Connectivity/9.00/HostExplorer/SDK/Samples/OHIO/Visual C++ Samples/HEOhioSample/MyTabCtrl.cpp
ERROR CODE:    0x80190194 – HTTP status 404: The requested URL does not exist on the server.

ERROR CONTEXT: 0x00000005 – The error occurred while the remote file was being processed.

DESCRIPTION:
JOB FILES:

The above shows exactly what the issue is. a HTTP status error of 404 indicates that the file cannot be found on the server. In my case, the package that I am troubleshooting has two + signs for the Visual C++ Samples folder. In this case, if I try to visit that URL (which you can do in normal situations) you get a 404, which confirms the error BITS is reporting.

To fix this issue, you need to change the path so that this folder doesn’t have any special characters. In ansi, a + sign is the equivilent to a 0x2B (i.e. %2B for a URL string), however the way SCCM handles this poor. I’m not sure if this is a bug, or just by design. The vendor was the one responsible for this pathing, but I can see where some apps will have folders for their SDK that would include C++ example code.

In any event, it took 2 days to figure this one out, but thankfully I did. Luckily, the users don’t need the SDK files :)

Related External Links

Related External Links

SCCM: Program failed (download failed – content mismatch) for advertisement failures

Today I was noticing on a few of our advertisements that all of our machines in the collection had received the “Program failed (download failed – content mismatch)” failure message. Upon receiving this error message, I came across a blog post by Matthew Boyd that referred to the following:

Binary Differential Replication – If this is enabled in the package configuration, some packages seem to fail. I’m assuming that they can’t handle this kind of replication and several of the files become corrupt, creating a hash mismatch. This can be turned off by opening up the package properties, going to the Data Source tab, and unchecking Enable binary differential replication. This wasn’t my problem because I hadn’t enabled binary differential replication.

Hidden Files – Apparently, if the package source contains hidden files, SCCM may not calculate the correct hash for the package and clients could encounter an error. I found a quick way to check this using the command line:

  1. Open up a command window in the root director that contains your package source files.
  2. Type Dir /S /A:H and hit enter. Depending on the package, you may be presented with several directories with multiple hidden files.
  3. Trying to remove the hidden attribute on all the files with the GUI would be tedious, so just use this command instead: attrib -H /S
  4. Update Distribution Points

For my environment, BDR has never been an issue with content mismatches. So for the specific package I was working on I looked at my package source and sure enough there were some hidden files in the source.

So my next step was to look at my entire package source folder and I noticed that there were a lot of packages that had some hidden files. I went through their advertisement reports and sure enough, these packages also had content mismatch failures.

Thanks to Matthew for posting the tip.

Related External Links

Windows 2008 and Event ID 2022

So recently the company I work for decided to rebuild a large SCCM (or ConfigMgr for those of you in MS land) primary site onto new hardware. To give you a brief glimpse into the site configuration, this site has ~20,000+ clients assigned to it and the server houses the management point, software update point, reporting point, and distribution point. There are also quite a few secondary sites below it.

We did the recovery on a Friday and early Saturday morning we started to see the following in our event log:

The server was unable to find a free connection 148 times in the last 60 seconds.  This indicates a spike in network traffic.  If this is happening frequently, you should consider increasing the minimum number of free connections to add headroom.  To do that, modify the MinFreeConnections and MaxFreeConnections for the LanmanServer in the registry.

This basically killed all file shares on the box (the server service had died). This made it so the despooler.boxreceive share was dead, so our central site couldn’t send down to this newly upgraded child primary. Restarting the server resolved the issue, but only for about 10 hours and the issue started to happen again.

We spent most of Monday on the phone with Microsoft diagnosing the issue, but with no real fix. We made some modifications to HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesLanmanserverparameters added adjusted/added the MaxFreeConnections, MinFreeConnections, MaxWorkItems, MaxRawWorkItems, InitWorkItems, and MaxMpxCt. I won’t reference what we changed these to as these didn’t solve this issue, but suffice it to say, we were able to run for about 14 hours before the shares died again.

In doing my research, I’ve found that Symantec could be the source of the problem as referenced here and here. However the articles that are listed refer to having Symantec Endpoint Protection, which within our environment doesn’t exist. We do use symantec, just not the endpoint client.

So in another round of research I found that some other people were disabling symantec. While we haven’t decided to take that step just yet, we’ve made some modifications to the way we deal with symantec and its auto protection scanning settings. Once testing is complete, I’ll post exactly what we did in hopes of helping someone else who has this same problem. Stay tuned.

Update: So it appears we may have found the solution to our problem. It’s a two-fold issue. The first was the shares being killed by Symantec. Yes, symantec was killing our shares. If you’re using a version of symantec earlier than 10.2.3.3000 you’ll want to update to maintenance release 3 (MR3). Doing this update should fix the issues with auto protect making network shares unresponsive.

Our other issue wasn’t as easy to figure out, at least initially. What we were experiencing was 100% CPU utilization (across 16 cores mind you). The w3pw.exe service process was running multiple processes which in total were eating all of our CPU just after reboot. I found this odd, especially in the evening because most of our users were out of the office and most of the machines should have been asleep/off. So initially I didn’t assume this to be a load issue, and more of a configuration issue (i.e. maybe IIS 7 needs some additional configuration).

Long story short, and after getting MS on the phone and talking with perf. engineers and such, we noticed that in policyagent.log that there was an errant policy body on many of the machines that we had at our disposal assigned to this newly re-built site. What was happening is that since the processing of this policy was failing that every client (~20,000) was trying to re-download this policy every 15 minutes! Apparently 4 quad core processors (sorry, don’t have the processor models handy right now, 4 quad core AMD opteron’s running at 2.3 Ghz I believe) will melt when IIS is processing MP requests for policy downloads every 15 minutes.

Looking closer at the policy body we were able to determine via a look up in the DB that the policy was referring to our network access account causing the snafu. So by changing the password on that account, the machines were able to download that newly updated policy and all was well. At least I hope…

More to come…hopefully not :)

    Related External Links