Thursday, July 7, 2016

Detecting and recovering failed H2 cluster nodes

Leave a Comment

After going through H2 developer guide I still don't understand how can I find out what cluster node(s) was/were failing and which database needs to be recovered in the event of temporary network failure.

Let's consider the following scenario:

  • H2 cluster started with N active nodes (is actually it true that H2 can support N>2, i.e. more than 2 cluster nodes?)
  • (lots DB updates, reads...)
  • Network connection with one (or several) cluster nodes gets down and node becomes invisible to the rest of the cluster
  • (lots of DB updates, reads...)
  • Network link with previously disconnected node(s) restored
  • It is discovered that cluster node was probably missing (as far as I can see SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER' starts responding with empty string if one node in cluster fails)

After this point it is unclear how to find out what nodes were failing? Obviously, I can do some basic check like comparing DB size, but it is unreliable.

  1. What is the recommended procedure to find out what node was missing in the cluster, esp. if query above responds with empty string?

  2. Another question - why urlTarget doesn't support multiple parameters? How I am supposed to use CreateCluster tool if multiple nodes in the cluster failed and I want to recover more than one?

  3. Also I don't understand how CreateCluster works if I had to stop the cluster and I don't want to actually recover any nodes? What's not clear to me is what I need to pass to CreateCluster tool if I don't actually need to copy database.

1 Answers

Answers 1

That is partially right SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER', will return an empty string when queried in standard mode.

However, you can get the list of servers by using Connection.getClientInfo() as well, but it is a two-step process. Paraphrased from h2database.com:

The list of properties returned by getClientInfo() includes a numServers property that returns the number of servers that are in the connection list. getClientInfo() also has properties server0..serverN, where N is the number of servers - 1. So to get the 2nd server from the list you use getClientInfo('server1').

Note: The serverX property only returns IP addresses and ports and not hostnames.

And before you say simple replication, yes that is default operation, but you can do more advanced things that are outside the scope of your question in clustered H2.

Here's the quote for what you're talking about:

Clustering can only be used in the server mode (the embedded mode does not support clustering). The cluster can be re-created using the CreateCluster tool without stopping the remaining server. Applications that are still connected are automatically disconnected, however when appending ;AUTO_RECONNECT=TRUE, they will recover from that.

So yes if the cluster stops, auto_reconnect is not enabled, and you stick with the basic query, you are stuck and it is difficult to find information. While most people will tell you to look through the API and or manual, they haven't had to look through this one so, my sympathies.

I find it way more useful to track through the error codes, because you get a real good idea of what you can do when you see how the failure is planned for ... here you go.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment