Monday, July 14, 2014

Automatic failover on two data centers SQL AlwaysOn solution with minimum number of components


Using SQL AwaysOn you can very easily create solution where you have two SQL servers (physical or virtual) located to different data centers and all data is replicated between them.

Because in AlwaysOn shared storage is not needed you also don't need expensive storage replication systems and you can even keep all data on local storage.

Automatic failover challenges

If you want that failover is automatic when active node goes down or lose network connectivity, you need plan and test carefully how traffic between nodes and from nodes to quorum works on all situations.

There is of course multiple ways solve this but I will explain for you how we solved this with our network provider.

We are using node and file share majority on our quorum model because it is only model which can be used on this case. More information why witness share is only choice on this case you can find from here: http://blogs.technet.com/b/askpfeplat/archive/2012/06/27/clustering-what-exactly-is-a-file-share-witness-and-when-should-i-use-one.aspx

Our network provider was also created support request to Microsoft and got recommendation that both cluster nodes should always see witness share even if connection between data centers is lost. That was one important reason behind our solution.
Another one was that if both cluster nodes can see each others but lose connection to witness share it only causes alert to event log (which you should monitoring) and services keeps alive.

Our solution

Our solution to problem was create totally separated route from both data centers to third data center where witness share server is located. These routes are using even on physically layer totally different devices than connections between VLANs or to default gateway.

On our solution traffic between data centers 1 and 2 is using layer 2 so both cluster nodes are in same subnet and only witness share connections are routed.

Using that solution we got route to witness share working even connection between data centers will lost or normal route between VLANs/to internet is lost. Like Microsoft's support recommended.

Logical picture

Following picture explains how SQL network is split on logical layer and how routes are done from/to witness share server.

On this example you should use example following IP settings on SQL cluster nodes and persistent routes which you can see on picture.
  • SQL cluster node 1
    • IP: 10.10.10.100
    • Netmask: 255.255.255.0
    • Gateway: 10.10.10.1
  • SQL cluster node 2
    • IP: 10.10.10.200
    • Netmask: 255.255.255.0
    • Gateway: 10.10.10.1

Limitations/challenges on our solution

I figured out at least these limitations/challenges on our solution which you should remember.
  • You need very carefully check that you are using correct router on witness share connection. If you are misconfigured that you will lose route to witness share when connection between datacenter 1 and 2 is lost.
  • You need very carefully check that you are using IP address from correct piece of address list. If you are misconfigured that you will lose route from witness share server back to your cluster node.

Benefits on this solution

Biggest benefits behind this solution is that you get low cost SQL cluster solution which can handle even whole data center crash automatically. That is very important especially with applications which are store all important data to databases.

No comments:

Post a Comment