Exchange DAG Node Failure – Force Switchover With Queues
Written by Allen White on. Posted in Exchange 2010
I had this issue with a client last week, the system was Exchange 2010 with a 2 node Database Availability Group (DAG) setup. One of the Exchange nodes had gone offline and this would be permanent as the failure was catastrophic. I checked that the second node had kicked into action but it had not. The mailbox database was down and upon checking the replication status of the mailbox database to the second node the copy queue was at 9223372036854775766.
Because of this when I tried to force fail over I was greeted with the following error.
An Active Manager operation failed. Error The database action failed. Error: An error occurred while trying to validate the specified database copy for possible activation. Error: Database copy ‘Database1’ on server ‘dagnode2.domain.com’ has a copy queue length of 9223372036854725486 logs, which is too high to enable automatic recovery. You can use the Move-ActiveMailboxDatabase cmdlet with the -SkipLagChecks and -MountDialOverride parameters to move the database with loss. If the database isn’t mounted after successfully running Move-ActiveMailboxDatabase, use the Mount-Database cmdlet to mount the database.
I was pretty confident that no mail would be lost as all my clients are in cached mode so upon reconnecting to the CAS server they would sync mail backup to to second mailbox server.
Upon running the command mentioned in the error I was again greeted with red warning errors stating the that it could not start the Microsoft Exchange Search Indexing service on the failed node…that’s because it does not exist anymore, great.
To get around this we need to add a few extra flags to the command above. They are as below.
- –SkipActiveCopyChecks – The SkipActiveCopyChecks parameter specifies whether to skip checking the current active copy to see if it’s currently a seeding source for any passive databases. Be aware that when using this parameter, you can move a database that’s currently a seeding source, which cancels the seed operation.
- –SkipClientExperienceChecks – The SkipClientExperienceChecks parameter specifies whether to skip the search catalog (content index) state check to see if the search catalog is healthy and up to date. If the search catalog for the database copy you’re activating is in an unhealthy or unusable state and you use this parameter to skip the search catalog health check and activate the database copy, you will need to either re-crawl or reseed the search catalog.
With this in mind we run the following commands to force our node back into life even though the mailbox database is not fully synced.
Move-ActiveMailboxDatabase database1 -ActivateOnServer dagnode2 -SkipHealthChecks -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipLagChecks -MountDialOverride:BESTEFFORT
Once ran your database will now mount and clients will be able to connect. As mentioned this works well for situations where you have a 2 node DAG cluster with one node down and the copy queue length does not allow automatic failure.
Remove a Copy Of DAG Mailbox Database
Once you are up and running again you will need to tidy up the failed dag node. First by removing the mailbox database copy from the failed server. Do this with this command.
Remove-MailboxDatabaseCopy -Identity database1\dagnode1 -Confirm:$False
Remove a Node From DAG
We also should remove the node from dag completely with the following command.
Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer DAGNODE1 -ConfigurationOnly
This little lot took me a few hours to get fixed so hopefully this will save someone out there a lot of time