Be careful: Upgrading Debian Jessie to Stretch, with Pacemaker DRBD and an nested ext4 LVM hosted on VMware products

Detached DRBD (diskless)

In the past I setup some new Pacemaker clustered nodes with a fresh Debian Stretch installation. I followed our standard installation guide, created also shared replicated DRBD storage, but whenever I tried to mount the ext4 storage DRBD detached the disks on both node sides with I/O errors. After recreating it, using other storage volumes and testing my ProLiant hardware (whop I thought it had got a defect..) it still occurs, but somewhere in the middle of testing, a quicker setup without LVM it worked fine, hum..

Much later I found this (only post at this time about it) on the DRBD-user mailinglist: [0]
This means, if you use the combination of VMware-Product -> Debian Stretch -> local Storage -> DRBD -> LVM -> ext4 you will be affected by this bug. This happens, because VMware always publishs the information, that the guest is able to support the “WRITE SAME” feature, which is wrong. Since the DRBD version, which is also shipped with Stretch, DRBD now also supports WRITE SAME, so it tries to use this feature, but this fails then.
This is btw the same reason, why VMware users see in their dmesg this:

WRITE SAME failed.Manually zeroing.

As a workaround I am using now systemd, to disable “WRITE SAME” for all attached block devices in the guest. Simply run the following:

for i in `find /sys/block/*/device/scsi_disk/*/max_write_same_blocks`; do echo “w $i  –   –   –   –  0” ; done > /etc/tmpfiles.d/write_same.conf


Pacemaker failovers with DRBD+LVM do not work

If you use a DRBD with a nested LVM, you already had to add the following lines to your /etc/lvm/lvm.conf in past Debian releases (assuming that sdb and sdc are DRBD devices):

filter = [ “r|/dev/sdb.*|/dev/sdc.*|”  ]
write_cache_state = 0

Wit Debian Stretch this is not enough. Your failovers will result in a broken state on the second node, because it can not find your LVs and VGs. I found out, that killing lvmetad helps. So I also added a global_filter (it should be used for all LVM services):

global_filter = [ “r|/dev/sdb.*|/dev/sdc.*|”  ]

But this also didn’t helped.. My only solution was to disable lvmetad (which I am also not using at all). So adding this all – in combination – works now for me and failovers are as smooth as with Jessie:

filter = [ “r|/dev/sdb.*|/dev/sdc.*|”  ]
global_filter = [ “r|/dev/sdb.*|/dev/sdc.*|”  ]
write_cache_state = 0
use_lvmetad = 0

Do not forget to update your initrd, so that the LVM configuration is updated on booting your server:

update-initramfs -k all -u

Reboot, that’s it :)

Packages updates from may

There are some news on my packaging work from may:

  • OTRS
    • I have updated it to version 5.0.10
    • Also I have updated the jessie backports version from 5.0.8 to 5.0.10
    • I have to test the new issue #825291 (database update with Postgres fails UTF-8 Perl error), maybe someone has got an idea?
  • needrestart
    • Thanks to Thomas Liske (upstream author) for adressing mostly all open bugs and wishes from the Debian BTS and Github. Version 2.8 fixes 6 Debian bugs
    • Already available in jessie-backports :)
  • geoip-database
    • As usual package updated and uploaded to jessie-backports and wheezy-backports-sloppy
  • geoip
    • Someone here interested in fixing #811767 with GCC 6? I were not able to fix it
    • .. and if it compiles, the result segfaults :(
  • fglrx-driver
    • I have removed the fglrx-driver from the Debian sid/stretch repository
    • This means that fglrx in Debian is dead
    • You should use the amdgpu driver instead :)
  • icinga2
    • After some more new upstream releases I have updated the jessie-backports version to 2.4.10 and it works like a charm :)

Current jessie-backports packaging work in march


I have uploaded some new already accepted packages to jessie-backports:

  • geoip-database
    • Regular monthly update to 20160317-1 in wheezy-backports-sloppy and jessie-backports
  • otrs2
    • Updated to current version 5.0.8-1~bpo8+1
  • glusterfs
    • Updated to version 3.7.9-1~bpo8+1
  • icinga2
    • After my last upload to bring icinga2 2.4.1 online in jessie-backports, I have updated the packaging now up to version 2.4.4-1~bpo8+1

I hope you enjoy the current versions in Debian 8. :-)

Debian Jessie 8.3: Short howto for Corosync+Pacemaker Active/Passive Cluster with two nodes and DRBD/LVM


since I had to change my old “heartbeat v1” setup to an more modern Corosync+Pacemaker setup, because “heartbeat v1” does not support systemd (it first looks like it is working, but it fails on service start/stops), I want to share a simple setup:

  • Two nodes (node1-1 and node1-2)
  • Active/Passive setup
  • Shared IP (here:
  • Internal network on eth1 (here:
  • DRBD shared storage
  • LVM on top of DRBD
  • Multiple services, depending also on the DRBD/LVM storage

First you have to activate the jessie-backports repository, because the cluster stack is not available/broken in Debian Jessie. Install the required packages with:

apt-get install -t jessie-backports libqb0 fence-agents pacemaker corosync pacemaker-cli-utils crmsh drbd-utils

After that configure your DRBD and LVM (VG+LV) on it (there are enough tutorials for it).

Then deploy this configuration to /etc/corosync/corosync.conf:

totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
clear_node_high_bit: yes
crypto_cipher: none
crypto_hash: none
transport: udpu
interface {
ringnumber: 0

logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off

quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1

nodelist {
node {
ring0_addr: node1-1
node {
ring0_addr: node1-2

Both nodes require a passwordless keypair, which is copied to the another node, so that you can ssh from one to each other.

Then you can start with crm configure:

property stonith-enabled=no
property no-quorum-policy=ignore
property default-resource-stickiness=100

primitive DRBD_r0 ocf:linbit:drbd params drbd_resource=”r0″ op start interval=”0″ timeout=”240″ \
op stop interval=”0″ timeout=”100″ \
op monitor role=Master interval=59s timeout=30s \
op monitor role=Slave interval=60s timeout=30s
primitive LVM_r0 ocf:heartbeat:LVM params volgrpname=”data1″ op monitor interval=”30s”
primitive SRV_MOUNT_1 ocf:heartbeat:Filesystem params device=”/dev/mapper/data1-lv1″ directory=”/srv/storage” fstype=”ext4″ options=”noatime,nodiratime,nobarrier” op monitor interval=”40s”

primitive IP-rsc ocf:heartbeat:IPaddr2 params ip=”″ nic=”eth0″ cidr_netmask=”24″ meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart
primitive IPInt-rsc ocf:heartbeat:IPaddr2 params ip=”″ nic=”eth1″ cidr_netmask=”24″ meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart

primitive MariaDB-rsc lsb:mysql meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart
primitive Redis-rsc lsb:redis-server meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart
primitive Memcached-rsc lsb:memcached meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart
primitive PHPFPM-rsc lsb:php5-fpm meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart
primitive Apache2-rsc lsb:apache2 meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart
primitive Nginx-rsc lsb:nginx meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart

group APCLUSTER LVM_r0 SRV_MOUNT_1 IP-rsc IPInt-rsc MariaDB-rsc Redis-rsc Memcached-rsc PHPFPM-rsc Apache2-rsc Nginx-rsc
ms ms_DRBD_APCLUSTER DRBD_r0 meta master-max=”1″ master-node-max=”1″ clone-max=”2″ clone-node-max=”1″ notify=”true”

order APCLUSTER_after_DRBD_r0 inf: ms_DRBD_APCLUSTER:promote APCLUSTER:start


The last (bold marked) lines made me some headache. In short they define that the DRBD device on the active node has to be the primary one and that it is required to start the “APCLUSTER” on the host, since the LVM, filesystem and services require to access its data.

Just a short copy paste howto for an simple use case with not so much deep explanaitions..

More wheezy-backports work


now you can install the following package versions from wheezy-backports:

  • apt-dater-host (Source split, 0.9.0-3+wheezy1 => 1.0.0-2~bpo70+1)
  • glusterfs (3.2.7-3+deb7u1 => 3.5.2-1~bpo70+1)
  • geoip-database (20141009-1~bpo70+1 => 20150209-1~bpo70+1)

geoip-database introduces a new package geoip-database-extra, which includes the free GeoIP City and GeoIP ASNum databases.

glusterfs will get an update in a few days ago to fix CVE-2014-3619.

Todays wheezy-backports work


I have updated geoip in wheezy-backports today from version 1.5.0-3~bpo70+1 to 1.6.2-4~bpo70+1, which includes also the new generators for the City and ASN database. This is also a prerequisite for the upcoming geoip-database updates!

For the otrs users: Now you can also install otrs 3.3.9-3~bpo70+1 in Wheezy, instead of the realy old version 3.2.11-1~bpo70+1.