openais: an alternative to clvm with cman
- Apr 23rd. 2009
- Posted in Code . Debian . HowTo . Networking . Xen
- By jlott
- Write comment
I’ve been battling lately with a lot of problems with cman, part of Red Hat Cluster Suite. Specifically, the fencing tool (fenced) is pretty much junk when you try to start using it with Xen dom0′s. After much searching and gnashing of teeth I happened upon this mailing list post. The promise there is that you could take clvm and compile it against openais and get a cluster aware LVM which doesnt require the rest of Red Hat Cluster Suite (and its crappy documentation, crappy fencing, and general all around crappiness). A little more searching turned up this web site from Olivier Le Cam which pretty much did 90% of the work for me.
After some testing I’m happy to say it appears to work smashingly. What follows is a somewhat more complete version of how to achieve the same results on Debian Lenny. Enjoy :)
The first thing that needs to be done is to get the debian sources for clvm and modify then to use openais. After that we will recompile new packages from that source, then set up openais on our cluster nodes.
Install all the dependencies we need to compile clvm
root@host:~# apt-get build-dep clvm root@host:~# apt-get install libopenais-dev
Now download the source files and cd into our working directory
root@host:~# cd /usr/src/ root@host:/usr/src# apt-get source clvm root@host:/usr/src# cd lvm2-2.02.39
Now we’ll modify a few files in the source:
- The first is debian/clvm.init. You’ll need to remove any references to cman or cluster.conf. You can download an already edited version here.
- The next is debian/control. Modify the dependies (lvm2 without the version number, openais in place of cman) and modify the comments accordingly. A pre-edited version is here.
- The last file is debian/rules. Replace cman by openais in the configure options, and add the PATH where to find the openais libs. Again, a pre-made version is here.
For clarities sake, here is the actual code block from debian/rules:
$(STAMPS_DIR)/setup-deb: SOURCE_DIR = $(BUILD_DIR)/source
$(STAMPS_DIR)/setup-deb: DIR = $(BUILD_DIR)/build-deb
$(STAMPS_DIR)/setup-deb: $(STAMPS_DIR)/source
rm -rf $(DIR)
cp -al $(SOURCE_DIR) $(DIR)
cd $(DIR); \
./configure CFLAGS="$(CFLAGS)" \
LDFLAGS="-L/usr/lib/openais" \
$(CONFIGURE_FLAGS) \
--with-optimisation="" \
--with-clvmd=openais \
--enable-readline
touch $@
The last thing to do is update the internal version number of the clvm package and add some comments to the changelog:
root@host:/usr/src/lm2-2.02.39# dch -i
Now go ahead and compile the package:
root@host:/usr/src/lm2-2.02.39# dpkg-buildpackage -rfakeroot -uc -b
After the compliation completes you should have some shiny new .deb files in /usr/src. The one we are interested in is clvm_2.02.39-7.1_i386.deb (the actual version of yours may vary depending on what you put in the debian changelog in the previous step).
So now that weve got our custom version of clvm compiled, its time to move on the cluster nodes. On each node in the cluster, do the following…
Install openais and add a user for it:
root@node:~# apt-get install openais root@node:~# mkdir -p /etc/ais root@node:~# adduser --no-create-home --disabled-password --disabled-login --gecos openAIS ais
Now create the following config in /etc/ais/openais.conf. This is the most basic config you can have. All you need to do is set 192.168.1.0 to be your actual network address.
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.1.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
openais does not include a proper debian init sript so you can download one here and save it as /etc/init.d/openais. After that is done, add it to the proper runlevels by issuing:
root@node:~# update-rc.d openais start 62 S . start 50 0 6 .
Now we can install our custom deb and turn on LVM clustering
root@node:~# dpkg -i clvm_2.02.39-7.1_i386.deb root@node:~# sed -i 's/^ locking_type = 1$/ locking_type = 3/' /etc/lvm/lvm.conf
Thats it! Reboot your cluster nodes and they should all be cluster aware now :)
Dear author,
I liked very much the excellent solution you published, so I installed it inside my 3-node cluster, hoping it will be soon embodied inside the lenny distro. Just like you, we need Xen and only Xen.
Maybe due to my ignorance, I need to solve some runtime problems yet. I do not think they are related to your solution, but not knowing the reason why they happen and method I can use to have a better idea, I’d like to know your opinion, if it were possible. I know nothing about openais and little about clvm.
It happens to me that commands like lvcreate, lvremove get sometimes blocked. The only workaround seems to be killing the process from another session, kiiling clvm, aisexec and then starting openais and clvm again. After such operation the chance that the command works and produces effects visible from the same or another node is high even though not absolute. It is like a brutal manual fencing.
I thought of a timeout problem for openais, but I have no info about its status and its operation. I also noticed that the installation created a directory /var/log/openais which is empty. There is no useful info in /var/log/syslog .
What do you suggest to do ?
Ciao and thank you for your patch.
Ezio.
ezio,
I have seen some of the same problems actually. I have discovered that its possible to recover from it by simply restarting the openais/clvm/lvm stack in the correct order. My best guess at this point is that there is some kind of poor interaction between openais and clvm but I havent been able to test it much.
The problem is very intermittent too. Some times it will occur twice in one day. Other times I can go weeks and not have the issue at all.
I have plans for trying to get to the bottom of it but I dont have any more information than that at this time. If you discover anything, please post it here for others.
Thanks!
- jcl
Hi jcl,
my SAN is not in a stable status. I could not configure the multipath service against my active-passive controllers yet, since I am waiting for a multipath compatible firmware upgrade. so my logs are full of errors because the nodes try to connect through FC connections to redundant devices which are not actually reachable. The fall of performance could by huge. Nevertheless I decided to test your environment, because I felt that your solution was excellent for our aims.
Since the problem described is more frequent in my case, and my nodes are quite full of syslog errors due to the problems above, this could mean that the problem is related to timeout problems, performance of the SAN and ways to configure timeouts in the clvm and openais packages.
Is /var/log/openais empty In your installation too ?
I ordered 2 little AoE’s too and am studying your article about it.
Thanks.
ezio
ezio,
/var/log/openais is indeed empty on my installations. im not sure if you have to enable logging somehow in openais or not.
there might be some way to configure openais/clvm timeout periods… i havent looked into that. its also possible that clvm assumes certain things about cman that arent true about openais and that is causing the problem. perhaps there have been improvements in the latest source code for both packages.
another thing you could try, though its dangerous if you do modifications to the cluster lvm metadata from different nodes at the same time, would be to put each openais instance into its own ring. that would trick clvm into thinking it had a quorate cluster since each node is, in effect, the only node in the cluster.
- jcl
Thanks for the great writeup, however I’m having difficulty building this on lenny. At the dpkg-buildpackage stage I get this error.
/usr/src/lvm2-2.02.39# dpkg-buildpackage -rfakeroot -uc -b
dpkg-buildpackage: warning: using a gain-root-command while being root
dpkg-buildpackage: set CFLAGS to default value: -g -O2
dpkg-buildpackage: set CPPFLAGS to default value:
dpkg-buildpackage: set LDFLAGS to default value:
dpkg-buildpackage: set FFLAGS to default value: -g -O2
dpkg-buildpackage: set CXXFLAGS to default value: -g -O2
dpkg-buildpackage: source package lvm2
dpkg-buildpackage: source version 2.02.39-7.3
dpkg-buildpackage: source changed by root
dpkg-buildpackage: host architecture amd64
fakeroot debian/rules clean
/usr/bin/fakeroot: line 164: debian/rules: Permission denied
dpkg-buildpackage: failure: fakeroot debian/rules clean gave error exit status 126
Any ideas? Do you have a package anywhere for download? If I could get clvm going without all the other nonsense it would be amazing. Thanks!
Xen Fan,
Looks like youre trying to use fakeroot when youre already root.
Yeah, I was confused because you did the same thing above:
root@host:/usr/src/lm2-2.02.39# dpkg-buildpackage -rfakeroot -uc -b
I’m not clear on where I need to become another user. A regular user won’t be able to create the build dir in /usr/src.
Xen Fan,
If you are already root, you dont need to use -rfakeroot at all.
Regardless of userid or the use of fakeroot, the dpkg-buildpackage gives a rules error.
With fakeroot as root:
fakeroot debian/rules clean
/usr/bin/fakeroot: line 164: debian/rules: Permission denied
dpkg-buildpackage: failure: fakeroot debian/rules clean gave error exit status 126
Without fakeroot as root:
debian/rules clean
Can’t exec “debian/rules”: Permission denied at /usr/bin/dpkg-buildpackage line 475.
dpkg-buildpackage: failure: debian/rules clean failed with unknown exit code -1
With fakeroot as regular user:
fakeroot debian/rules clean
/usr/bin/fakeroot: line 164: debian/rules: Permission denied
dpkg-buildpackage: failure: fakeroot debian/rules clean gave error exit status 126
I appreciate your help and apologize for the annoyance, but not being a debian package guy I’ve been banging my head on this for a while. I can’t even figure out what it’s trying to do.
I compiled your how-to and the cluster works fine. But there is a problem with clvmd… Look at this log when I launch it in debug mode :
# clvmd -d 1
CLVMD[3e5cd770]: Apr 9 13:56:41 CLVMD started
CLVMD[3e5cd770]: Apr 9 13:56:41 Our local node id is -1062714761
CLVMD[3e5cd770]: Apr 9 13:56:41 Add_internal_client, fd = 7
CLVMD[3e5cd770]: Apr 9 13:56:41 Connected to OpenAIS
CLVMD[3e5cd770]: Apr 9 13:56:41 Cluster ready, doing some more initialisation
CLVMD[3e5cd770]: Apr 9 13:56:41 starting LVM thread
CLVMD[40800950]: Apr 9 13:56:41 LVM thread function started
CLVMD[3e5cd770]: Apr 9 13:56:41 clvmd ready for work
CLVMD[3e5cd770]: Apr 9 13:56:41 Using timeout of 60 seconds
CLVMD[3e5cd770]: Apr 9 13:56:41 confchg callback. 1 joined, 0 left, 2 members
File descriptor 4 left open
File descriptor 5 left open
File descriptor 6 left open
WARNING: Locking disabled. Be careful! This could corrupt your metadata.
CLVMD[40800950]: Apr 9 13:56:41 LVM thread waiting for work
Did you see the WARNING ? The locking does not work, and I have no idea why ! my /etc/lvm/lvm.conf is modified to use internal locking, I can see lock/unlock requests in openais logs, but clvmd just doesn’t care about that !!!
Any help will be appreciated :)
Hi
thx for your howto, but i have a problem to compile the clvm package. I got this error message
make[3]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons/clvmd’
make[3]: Entering directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons/clvmd’
gcc -c -I../../include -I/usr/src/lvm2-2.02.39/debian/build/build-deb//include -DUSE_OPENAIS -D_REENTRANT -DHAVE_CONFIG_H -g -O2 -g -O2 -fPIC -Wall -Wundef -Wshadow -Wcast-align -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Winline -Wmissing-noreturn -Wformat-security -g -O2 -fPIC -Wall -Wundef -Wshadow -Wcast-align -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Winline -Wmissing-noreturn -Wformat-security -fno-strict-aliasing -g -O2 -fPIC -Wall -Wundef -Wshadow -Wcast-align -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Winline -Wmissing-noreturn -Wformat-security clvmd-command.c -o clvmd-command.o
In file included from clvmd-command.c:74:
clvmd-comms.h:79:29: error: openais/saAis.h: No such file or directory
clvmd-comms.h:80:35: error: openais/totem/totem.h: No such file or directory
In file included from clvmd-command.c:76:
clvmd.h:32: error: ‘SA_MAX_NAME_LENGTH’ undeclared here (not in a function)
make[3]: *** [clvmd-command.o] Error 1
make[3]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons/clvmd’
make[2]: *** [clvmd] Error 2
make[2]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb/daemons’
make[1]: *** [daemons] Error 2
make[1]: Leaving directory `/usr/src/lvm2-2.02.39/debian/build/build-deb’
make: *** [debian/stamps/build-deb] Error 2
dpkg-buildpackage: failure: debian/rules build gave error exit status 2
Any ideas about that?
regards
olly
Thanks for the great howto.
As of lvm2-2.02.66 (possibly earlier) you must also install libcorosync-dev in order to build your modified clvm.
I also suggest providing patches (using diff -u) rather than full file listings for the modified files, as this simplifies applying your changes to later source versions.
Dave
Users of dependency-managed run control will probably prefer this openais init script:
#!/bin/sh
#
### BEGIN INIT INFO
# Provides: openais
# Required-Start: $network $remote_fs $syslog
# Required-Stop: $network $remote_fs $syslog
# Default-Start: S
# Default-Stop: 0 6
# Short-Description: start and stop the openais cluster management daemon
### END INIT INFO
#
PATH=/sbin:/usr/sbin:/bin:/usr/bin
#PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC=”OpenAIS Cluster Management Daemon”
NAME=openais
DAEMON=/usr/sbin/aisexec
SCRIPTNAME=/etc/init.d/openais
FLAGS=
test -f $DAEMON || exit 0
set -e
JOIN_TIMEOUT=15
# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME
case “$1″ in
start)
echo -n “Starting $DESC: ”
start-stop-daemon –start –quiet -o –exec $DAEMON — $FLAGS
time=0
while [ "$JOIN_TIMEOUT" -eq 0 ] || [ "$time" -lt "$JOIN_TIMEOUT" ] ; do
sleep 1
if openais-cfgtool -s &>/dev/null ; then
echo “$NAME.”
exit 0
else
echo -n ” . ”
time=$(($time + 1))
fi
done
echo “FAILED”
exit 1
;;
stop)
echo -n “Stopping $DESC: ”
start-stop-daemon –stop –quiet -o –exec $DAEMON
echo “$NAME.”
;;
reload|force-reload)
echo “Reloading $DESC configuration files.”
start-stop-daemon –stop –signal 1 –quiet -o –exec $DAEMON
;;
restart)
echo -n “Restarting $DESC: ”
start-stop-daemon –stop –quiet -o –exec $DAEMON
sleep 1
start-stop-daemon –start –quiet -o –exec $DAEMON — $FLAGS
echo “$NAME.”
;;
*)
N=/etc/init.d/$NAME
echo “Usage: $N {start|stop|restart|reload|force-reload}” >&2
exit 1
;;
esac
exit 0
You should then also add openais to Required-Start and Required-Stop in clvm’s init script when updating the clvm deb config. That is, just replace cman on those lines with openais.
And a more robust sed expression for updating /etc/lvm/lvm.conf would be:
sed -i ‘s/^\([[:space:]]*locking_type = \)1$/\13/’ /etc/lvm/lvm.conf
Thought I should update that the openais init script and everything related to it are not required on Squeeze. Use corosync instead.
Thanks for the excellent comments Dave :)
Thanks for the kind words ;)
I hit another issue with using a patched clvm on Debian Squeeze today which I have resolved using the following patch to /etc/init.d/corosync (thanks to Michael Schwartzkopff who has alerted the Debian corosync maintainer). As of corosync-1.2.1-4 this is not yet patched in the Debian package. I have provided the required change as a unified diff here for people who follow the above and hit this issue. In addition to this patch please add
OPENAIS_SERVICES=yes
to your /etc/default/corosync.
— /etc/init.d/corosync 2011-01-03 00:49:16.000000000 +1000
+++ /etc/init.d/corosync.patched 2011-07-05 16:32:21.000000000 +1000
@@ -31,6 +31,11 @@
exit 0
fi
+if [ "$OPENAIS_SERVICES" = "yes" ]; then
+ export COROSYNC_DEFAULT_CONFIG_IFACE
+ : ${COROSYNC_DEFAULT_CONFIG_IFACE=”openaisserviceenableexperimental:corosync_parser”}
+fi
+
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions
To help Googlers find this page, the output you would receive when starting clvmd -d without the above patch would be similar to this:
root@xen5.tst:/etc/corosync# clvmd -d
CLVMD[7276f7a0]: Jul 5 16:27:12 CLVMD started
CLVMD[7276f7a0]: Jul 5 16:27:12 Cannot initialise OpenAIS lock service: 12
CLVMD[7276f7a0]: Jul 5 16:27:12 Can’t initialise cluster interface
Can’t initialise cluster interface
I’ve now also had success using clvm directly with corosync on Squeeze by compiling clvm by changing –with-clvmd=cman to –with-clvmd=corosync.
The above patch to /etc/init.d/corosync is still needed if you plan to layer o2cb (OCFS2) on your cLVM as o2cb depends upon openais services.
Sorry JCL – I really should create my own blog ;)
Dave
So have you found that you don’t require fencing with clvmd then? What happens if the cluster goes split brain and then each tries to create a logical volume?
Of course you need fencing in any situation beyond initial lab testing. Incidentally I’ve had loss of cluster sync (resulting in 12-way split brain) several times in developing a 12 node clustered xen environment on top of this and miraculously suffered no disk corruption. But I also did not have any single-instance services (such as VMs) on top of lvm configured to start automatically if they disappeared from view of the cluster master.