….~Unix and Linux~…..

All about Unix/Linux and myself

Configuring a two node serviceguard Cluster (part 1)

with 2 comments

Here I will describe the configuration procedure of a basic two node HP Serviceguard Cluster on HP-Unix. I will use Oracle RDBMS as the cluster package. I will just show the configuration steps. I am not going to discuss any theoretical concept like why someone will use a cluster or what actually a single point of failure is etc etc. There is a handful of discussion on these topics on internet. During the configuration I felt the lack of a well documented step by step configuration guide. I will try to make this one.

Hardware Configuration:

  • I will use two HP 9000 series RP3440 service with 2 CPU and 4 Gig physical memory.
  • Each server have 4 network interface card. We will use three of them.
  • For shared storage I will use HP MSA 1000 storage.

Software Configuration:

Operating system is: HP Unix 11.23 MCOE (Mission Critical Operating Environment).

The HP-UX 11i Mission Operating Environment provides all the capabilities of the base HP-UX 11i and Enterprise Operating Environments plus certain critical add-on products for additional multiple system availability and performance management. The benefit is you don’t need to install servicegurad and other related file sets manually. It will be installed during the OS installation.

For heartbeat I will use a point to point Ethernet connection.

Here is the steps:

1. Configure the hardware first. Connect all required network cables. For this example, connect two NICs on each server to the public Ethernet network, connect the heartbeat cable through a point to point cable. Assign the LUNs from the storage. The amount of required shared storage will depend on the application requirement. For this example 5/6 LUNs of 50GB will be more than enough.

2. Install the operating system. Ensure that the shared LUNs are visible from both node. The output of ioscan -fnC command will look something like this:

# ioscan -fnC disk
Class     I  H/W Path       Driver     S/W State   H/W Type     Description
============================================================================
disk      0  0/0/2/0.0.0.0  sdisk      CLAIMED     DEVICE       TEAC    DV-28E-N
                           /dev/dsk/c0t0d0   /dev/rdsk/c0t0d0
disk      1  0/1/1/0.0.0    sdisk      CLAIMED     DEVICE       HP 146 GST3146707LC
                           /dev/dsk/c2t0d0   /dev/rdsk/c2t0d0
disk     16  0/1/1/0.1.0    sdisk      CLAIMED     DEVICE       HP 146 GST3146707LC
                           /dev/dsk/c2t1d0   /dev/rdsk/c2t1d0
disk      2  0/2/1/0/4/0.1.0.0.0.0.1    sdisk      CLAIMED     DEVICE  HP MSA VOLUME
                           /dev/dsk/c4t0d1   /dev/rdsk/c4t0d1
disk      4  0/2/1/0/4/0.1.0.0.0.0.2    sdisk      CLAIMED     DEVICE  HP MSA VOLUME
                           /dev/dsk/c4t0d2   /dev/rdsk/c4t0d2
disk      6  0/2/1/0/4/0.1.0.0.0.0.3    sdisk      CLAIMED     DEVICE  HP MSA VOLUME
                           /dev/dsk/c4t0d3   /dev/rdsk/c4t0d3
disk      8  0/2/1/0/4/0.1.0.0.0.0.4    sdisk      CLAIMED     DEVICE  HP MSA VOLUME
                           /dev/dsk/c4t0d4   /dev/rdsk/c4t0d4
disk     10  0/2/1/0/4/0.1.0.0.0.0.5    sdisk      CLAIMED     DEVICE  HP MSA VOLUME
                           /dev/dsk/c4t0d5   /dev/rdsk/c4t0d5

3. Assign the IP addresses on both nodes and make sure you are able to ping each other using both public interface IP and heartbeat IP. The /etc/rc.config.d/netconf file will look something like this:

#cat /etc/rc.config.d/netconf
HOSTNAME="node01"
OPERATING_SYSTEM=HP-UX
LOOPBACK_ADDRESS=127.0.0.1
INTERFACE_NAME[0]="lan0"
IP_ADDRESS[0]="10.10.96.162"
SUBNET_MASK[0]="255.255.255.0"
BROADCAST_ADDRESS[0]=""
INTERFACE_STATE[0]=""
DHCP_ENABLE[0]=0
INTERFACE_MODULES[0]=""

INTERFACE_NAME[1]="lan1"
IP_ADDRESS[1]="1.1.1.34"
SUBNET_MASK[1]="255.255.255.0"
BROADCAST_ADDRESS[1]=""
INTERFACE_STATE[1]=""
DHCP_ENABLE[1]=0
INTERFACE_MODULES[0]=""

ROUTE_DESTINATION[0]=default
ROUTE_MASK[0]=""
ROUTE_GATEWAY[0]="10.10.96.1"
ROUTE_COUNT[0]=""

GATED=0
GATED_ARGS=""
RDPD=0
RARP=0

DEFAULT_INTERFACE_MODULES=""

4. Add the information of all IP addresses (public and heartbeat) on /etc/hosts file. Make sure that /etc/hosts file on both nodes are same. Following is a sample file:

# cat /etc/hosts
10.10.96.162   node01
10.10.96.164   node02
10.10.96.163   crm_db        #service IP / package IP
1.1.1.34         node01hb
1.1.1.24         node02hb
127.0.0.1       localhost       loopback

5. Create a VG for using as a lock-VG. It will work as the quorum device.  Here is the procedure:

Create the physical volume (PV) first:

# pvcreate -f /dev/rdsk/c4t0d5
Physical volume "/dev/rdsk/c4t0d5" has been successfully created.

Note that, you must mention the raw device file (/dev/rdsk/..) for creating the PV and you have to use the block device file (/dev/dsk/..) for creating VG.

Create the group file for the volume group:

# mkdir /dev/vglock
# mknod /dev/vglock/group c 64 0x010000

The minor number (0×010000) of the group file must be unique for all VG. To check the minor number of any existing group file use ‘ls -l /dev/*/group’.

Now create the volume group named vglock. This VG will be used as lock VG:

# vgcreate -s 32 vglock /dev/dsk/c4t0d5
Increased the number of physical extents per physical volume to 6399.
Volume group "/dev/vglock" has been successfully created.
Volume Group configuration for /dev/vglock has been saved in /etc/lvmconf/vglock.conf

Deactivate the VG on node01, export the VG information to a map file and transfer the file to the 2nd node (node02 here) using the following comands:

# vgchange -a n vglock
Volume group "vglock" has been successfully changed.

# vgexport -p -v -s -m /etc/lvmconf/vglock.map vglock
Beginning the export process on Volume Group "vglock".
/dev/dsk/c4t0d5

# rcp /etc/lvmconf/vglock.map node02:/etc/lvmconf/

If you don’t deactivate before exporting the VG, then you will get a warning which you can ignore.

Use the following commands to create the required group file on 2nd node and import the lock VG using the map file that was transferred from the first node. Minor number of a VG must be same on all the cluster nodes.

# mkdir /dev/vglock

# mknod /dev/vglock/group c 64 0x010000

# vgimport -s -v -m /etc/lvmconf/vglock.map vglock
Beginning the import process on Volume Group "vglock".
Volume group "/dev/vglock" has been successfully created.

6. Now you are ready to create the cluster. Execute the following command to create an ascii cluster configuration file.

# cd /etc/cmcluster
# cmquerycl -v -n node01 -n node02 -C crmcluster.ascii
Looking for other clusters ... Done
Gathering storage information
Found 23 devices on node node01
Found 23 devices on node node02
Analysis of 46 devices should take approximately 5 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 2 volume groups on node node01
Found 2 volume groups on node node02
Analysis of 4 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Note: Disks were discovered which are not in use by either LVM or VxVM.
      Use pvcreate(1M) to initialize a disk for LVM or,
      use vxdiskadm(1M) to initialize a disk for VxVM.
Volume group /dev/vglock is configured differently on node node01 than on node node02
Volume group /dev/vglock is configured differently on node node02 than on node node01
Gathering network information
Beginning network probing
Completed network probing

Node Names:   node01
              node02

Bridged networks (local node information only - full probing was not performed):

1       lan0           (node01)

2       lan1           (node01)

5       lan0           (node02)

6       lan1           (node02)

IP subnets:

IPv4:

10.10.96.0         lan0      (node01)
                   lan0      (node02)

1.1.1.0             lan1      (node01)
                    lan1      (node02)

IPv6:

Possible Heartbeat IPs:

10.10.96.0                        10.10.96.162        (node01)
                                  10.10.96.164        (node02)

1.1.1.0                            1.1.1.34            (node01)
                                   1.1.1.24            (node02)

Possible Cluster Lock Devices:

/dev/dsk/c4t0d5    /dev/vglock          66 seconds

LVM volume groups:

/dev/vg00               node01

/dev/vglock             node01
                        node02

/dev/vg00               node02

LVM physical volumes:

/dev/vg00
/dev/dsk/c2t1d0    0/1/1/0.1.0                   node01

/dev/vglock
/dev/dsk/c5t0d7    0/2/1/0/4/0.1.0.255.0.0.7     node01

/dev/dsk/c4t0d7    0/2/1/0/4/0.1.0.0.0.0.7       node02
/dev/dsk/c5t0d7    0/2/1/0/4/0.1.0.255.0.0.7     node02

/dev/vg00
/dev/dsk/c2t1d0    0/1/1/0.1.0                   node02

LVM logical volumes:

Volume groups on node01:
/dev/vg00/lvol1                           FS MOUNTED   /stand
/dev/vg00/lvol2
/dev/vg00/lvol3                           FS MOUNTED   /
/dev/vg00/lvol4                           FS MOUNTED   /home
/dev/vg00/lvol5                           FS MOUNTED   /tmp
/dev/vg00/lvol6                           FS MOUNTED   /opt
/dev/vg00/lvol7                           FS MOUNTED   /usr
/dev/vg00/lvol8                           FS MOUNTED   /var

Volume groups on node02:
/dev/vg00/lvol1                           FS MOUNTED   /stand
/dev/vg00/lvol2
/dev/vg00/lvol3                           FS MOUNTED   /
/dev/vg00/lvol4                           FS MOUNTED   /home
/dev/vg00/lvol5                           FS MOUNTED   /tmp
/dev/vg00/lvol6                           FS MOUNTED   /opt
/dev/vg00/lvol7                           FS MOUNTED   /usr
/dev/vg00/lvol8                           FS MOUNTED   /var

Writing cluster data to crmcluster.ascii.

The above command will build the /etc/cmcluster/crmcluster.ascii file. This file defines the nodes, the disks, the LAN cards, and any other resources that are to be part of the cluster. You can edit this file and change various cluster parameter like HEARTBEAT_INTERVAL, NODE_TIMEOUT etc. Here is the content of the file:

# cat crmcluster.ascii 

CLUSTER_NAME            crmcluster

FIRST_CLUSTER_LOCK_VG           /dev/vglock

NODE_NAME               node01
  NETWORK_INTERFACE     lan0
    HEARTBEAT_IP        10.10.96.162
  NETWORK_INTERFACE     lan1
    HEARTBEAT_IP        1.1.1.34
  FIRST_CLUSTER_LOCK_PV /dev/dsk/c5t0d7

NODE_NAME               node02
  NETWORK_INTERFACE     lan0
    HEARTBEAT_IP        10.10.96.164
  NETWORK_INTERFACE     lan1
    HEARTBEAT_IP        1.1.1.24
  FIRST_CLUSTER_LOCK_PV /dev/dsk/c4t0d7

HEARTBEAT_INTERVAL           1000000
NODE_TIMEOUT                 2000000
AUTO_START_TIMEOUT           600000000
NETWORK_POLLING_INTERVAL     2000000
NETWORK_FAILURE_DETECTION    INOUT
MAX_CONFIGURED_PACKAGES      150
VOLUME_GROUP                 /dev/vglock

Once the cluster configuration file is edited, you need to use the cmcheckconf command to check the file for errors:

# cd /etc/cmcluster
# cmcheckconf -v -C crmscluster.ascii
Checking cluster file: crmcluster.ascii
Note : a NODE_TIMEOUT value of 2000000 was found in line 127. For a
significant portion of installations, a higher setting is more appropriate.
Refer to the comments in the cluster configuration ascii file or Serviceguard
manual for more information on this parameter.
Checking nodes ... Done
Checking existing configuration ... Done
Gathering storage information
Found 2 devices on node node01
Found 3 devices on node node02
Analysis of 5 devices should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 2 volume groups on node node01
Found 2 volume groups on node node02
Analysis of 4 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Volume group /dev/vglock is configured differently on node node01 than on node node02
Volume group /dev/vglock is configured differently on node node02 than on node node01
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
Checking for inconsistencies
Adding node node01 to cluster crmcluster
Adding node node02 to cluster crmcluster
cmcheckconf: Verification completed with no errors found.
Use the cmapplyconf command to apply the configuration.

After the file has been verified as containing no errors, the cmapplyconf command is used to create and distribute the cluster binary file:

# cmapplyconf -v -C crmcluster.ascii
Checking cluster file: crmcluster.ascii
Note : a NODE_TIMEOUT value of 2000000 was found in line 127. For a
significant portion of installations, a higher setting is more appropriate.
Refer to the comments in the cluster configuration ascii file or Serviceguard
manual for more information on this parameter.
Checking nodes ... Done
Checking existing configuration ... Done
Node node01 is refusing Serviceguard communication.
Please make sure that the proper security access is configured on node
node01 through either file-based access (pre-A.11.16 version) or role-based
access (version A.11.16 or higher) and/or that the host name lookup
on node node01 resolves the IP address correctly.
cmapplyconf: Failed to gather configuration information

As you see from the above output, the command is failed to run successfully. Check the output carefully. It says, there are some problems in name resolution. I was stuck at this point for a significant amount of time. Then I found that there is no /etc/nsswitch.conf file. However, there were nsswitch.files, nsswitch.dns and nsswitch.nis on /etc directory. So I just renamed the file /etc/nsswitch.files to /etc/nsswitch.conf. You can use any of the following commands:

# cp /etc/nsswitch.files /etc/nsswitch.conf

or

# mv /etc/nsswitch.files /etc/nsswitch.conf

Then run the command cmapplyconf again and press y when it will ask for confirmation:

# cd /etc/cmcluster
# cmapplyconf -v -C crmcluster.ascii
Checking cluster file: crmcluster.ascii
Note : a NODE_TIMEOUT value of 2000000 was found in line 127. For a
significant portion of installations, a higher setting is more appropriate.
Refer to the comments in the cluster configuration ascii file or Serviceguard
manual for more information on this parameter.
Checking nodes ... Done
Checking existing configuration ... Done
Volume group /dev/vglock is configured differently on node node01 than on node node02
Volume group /dev/vglock is configured differently on node node02 than on node node01
Checking for inconsistencies
Modifying configuration on node node01
Modifying configuration on node node02
Modifying node wdkrac1 in cluster crmcluster
Modifying node wdkrac2 in cluster crmcluster

Modify the cluster configuration ([y]/n)? y
Marking/unmarking volume groups for use in the cluster
Completed the cluster creation

Now you will be able to start the cluster using the cmruncl command:

# cmruncl -v
cmruncl: Validating network configuration...
cmruncl: Network validation complete
Waiting for cluster to form ..... done
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.

To check the cluster status you can use the cmviewcl command:

# cmviewcl

CLUSTER        STATUS
crmcluster    up

  NODE           STATUS       STATE
  node01        up           running
  node02        up           running

Now the cluster is configured. But there is no package configured yet. We will do the package configuration in my next post.

Note: Cluster planning, installation and administration requires extensive knowledge about cluster technology. This tutorial is only applicable for those who already know the basic concept and just want to learn the basic configuration steps. This may also help the people who already worked with other high availability product like IBM HACMP, Sun Cluster, Veritas cluster etc.

About these ads

Written by Ahmed Sharif

July 11, 2011 at 12:02 PM

2 Responses

Subscribe to comments with RSS.

  1. [...] the first part of this tutorial I showed the basic serviceguard cluster configuration. Now I am going to show [...]

  2. I tried in RHEL 5, getting error as Unable to recieve device query message from node…Pls help me

    selva

    November 24, 2011 at 10:50 AM


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: