Colin’s IT, Security and Working Life blog

January 28, 2010

Design for an Exchange 2010 Backup

Filed under: Documentation — chaplic @ 6:41 pm

Like most, I’ve been coming to terms with the storage performance requirements (or, lack thereof) in Exchange 2010.

For any previous Exchange deployment (certainly 2003) you’d start with a SAN and use features like snapping to ensure you can backup without affecting performance.

To my mind SANs remain stubbornly expensive for what’s actually delivered (I was just quoted over £3500 for a single 15K 600GB SAS disk which is certified to run in a SAN!).

So the fact I really don’t need one for Exchange 2010 is perfect.

But how do I back up?

Microsoft will tell you they don’t back up at all, just rely on the native protection and deleted items retention.

I’m a little – just a little- less gung-ho than that and I suspect many of my customers are, too.

There’s very little product choice on the market, or indeed much Microsoft collateral about how to backup Exchange 2010, so I thought I’d take a stab at a possible solution myself!

My objectives:

  • Don’t “waste” the genuine data protection tools native to Exchange 2010
  • Prepare for the day when an exchange corruption is replicated to ALL databases (however impossible that might be)
  • Provide a longer-term archive.

 

Consider the following environment:

image

 

We’ve got a single DAG with four copies of every database. Let’s say for argument sake we’ve got 5000 users with a 1GB mailbox. Of course, our disks are directly attached and we’re using nice-and-cheap SATA storage on JBOD. Let’s use 1TB because smaller disks are beer money less.

So far, so good, so like every other Exchange 2010 design piece. We’re leveraging the native protection and we’ve got four copies of the data.

But how to protect against the replicated corruption scenario?

 

 

image

 

I’m using another new feature of Exchange 2010; lagged replication. So this server in question is always behind the other servers; in theory then should the “replicated corruption” scenario occur, we can take action before it plays into our time delayed mailbox server.

But how long? Too short a delay and the corruption might get missed and played into the lagged database anyway Too long and and invocation of the lagged server might risk losing mail.

My best-guess figure was about 24 hours; this is comparable to a normal restore if we don’t have logfiles.

Now, observant types will have noticed there’s extra disk arrays attached to the lagged mailbox server. To break with custom, these will be RAID5 and their purpose is to act as a file share area to perform backup-to-disk operations. I’m doing disk-to-disk backups because:

I can, at very little infrastructure cost

Having recent backups online is always useful.

At the time of writing, the choice of backup products is underwhelming so I’m going to use the built-in tool. The real downside to this is that I can only backup from the active node, thus I need to be real careful about what I’m backing up, when. Pumping the data across the network in good time might be tricky without the right network setup.

Most likely, one or two databases will get backed up every night with all databases having at least an incremental backup

Now to the final part of the plan; the long-term archive. Hopefully never needed, but your operation might need to keep archives of data (this, probably isn’t the solution for this, you need to check out other new exchange features). But it’s most likely needed when the CEO needs an email he deleted 12 months ago.

image

Backup-to-tape is therefore meets my need. I’m only going to backup to tape the files produced by the disk-to-disk backup process, and I’m going to choose my timings wisely.

So there-we-have-it. A fairly robust backup architecture? I’m hoping as time progresses and products fill the void (like DPM2010) this solution will look archaic, but for now it’s my best shot at what backup could look like.

January 11, 2010

Search entire domains for service accounts

Filed under: Programs and Scripts — chaplic @ 6:29 pm

 

Have you ever been in a scenario where you need to change a password on a service account but don’t know what service on what servers use the account? You could pick through audit logs and it still might not tell you if a service hasn’t been restarted recently. Regscan will visit all machines in your domain and give you a list of machines that use that account

image

Usage

Simply enter

regscan account domain [textfile.txt]

where:

  • account is the account you are searching for. Don’t put the domain name first, regscan will pick out either notation from the service list
  • domain is the netbios domain name to search
  • textfile.txt (optional, but reccommended) Specifies a list of servers to search, one per line. In large domains, this is a more reliable method than leaving the program to scan the domain to find machines.

Download

Grab the program here. Let us know how you get on with it.

January 1, 2010

Fixing Windows Update error 80244021

Filed under: Fault Finding — chaplic @ 8:11 pm

 

Spotted on a couple of my machines, windows update was not working, with the above error:

 

image

 

The Microsoft TechNet article is pretty unhelpful, suggesting the windows update service is having trouble connecting, possibly an on-machine firewall stopping it.

Nothing that should be stopping this springs to mind, so my first concern is malware.  A quick scan by Malwarebytes didn’t show anything; sadly I know that doesn’t guarantee we’re OK. I had a quick look at the host file; nothing changed there. The IP addresses associated with the windowsupdate DNS names appeared to be OK. It did seem as if the PC was being blocked from geting updates.

So, what is actually happening when I click “Get updates” ?

I needed something to let me see behind the lovely chromed update UI. The tool I chose was was fiddler. Mainly used by people debugging websites, it also has the useful knack of sniffing all http traffic from the machine. Let’s fire it up and hit the “try again” button:

 

image

 

We can see the update process requesting the wuident.cab from a server jelly.dessert.local

clearly, the machine in question doesn’t belong to WindowsUpdate. Fortunately, there’s an explanation which is less worrying than some uber-weird virus.

A few weeks ago, I need a couple hundred GBs of disk space for some new VMs in a hurry. Being in a tight spot, I uninstalled WUS which conveniently was taking up about that much space; I then of course changed group policy so that my dozen or so  machines talked to windows update directly

It would appear, however, that a couple of machines have group-policy update issues and never got the update changing from using a local WUS to the microsoft update servers.

So a fairly predictable fix from there on in. But the original fault-finding would be soooo much easier with a little more diagnostic error messages, Microsoft!

 

image

Blog at WordPress.com.