Installing OpenShift Container Storage on IBM Cloud Bare Metal RedHat Openshift Kubernetes Service (ROKS)
Installing OpenShift Container Storage on IBM Cloud Bare Metal RedHat Openshift Kubernetes Service (ROKS)
This feature set, deploying OCS on Bare Metal, has not gone GA yet and is still a work in progress.[1] It should be coming soon, but the main issue is that there is no documentation specific to this deployment scenario and the kubelet home directory on IBM ROKS is not the same as Red Hat OpenShift Container Platform.
Install OpenShift Container Storage Operator.
-
Do Section 1.1.1 in the official documentation.
-
Skip Section 1.1.2 because this is not a typical install.
-
Since these are RHEL based nodes do Section 1.1.3.
In our case everything was already there, we just needed to enable cephfs through SELinux.
# setsebool -P container_use_cephfs on
Prepare the Raw Storage on Each Node
These particular nodes had an SSD attached that was automatically picked up by multipath and also had a filesystem on it. We need to remove the multipath and clear the disk using wipefs
. To do this, we add the wwid
to the blacklist for the local disk per RHEL guidlines for local SCSI storage devices. Then we wipe the volume data.
References:
-
-
Identify the nodes. We are working with the first three. ROKS doesn’t show us easily which ones we need. You may know this from the provision order, or you may be able to do describes on these nodes.
$ oc get nodes NAME STATUS ROLES AGE VERSION 10.194.150.196 Ready master,worker 7h44m v1.16.2 10.194.150.201 Ready master,worker 7h25m v1.16.2 10.194.150.204 Ready master,worker 7h44m v1.16.2 10.194.150.205 Ready master,worker 7d3h v1.16.2 10.194.150.232 Ready master,worker 7d3h v1.16.2 10.194.150.244 Ready master,worker 7d1h v1.16.2
-
Drop into a debug pod to get a shell on the node. There are several different ways to see the drive device.
$ oc debug node/10.194.150.196 Starting pod/10194150196-debug ... To use host binaries, run `chroot /host` Pod IP: 10.194.150.196 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.2# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |-sda1 8:1 0 256M 0 part /boot |-sda2 8:2 0 1G 0 part `-sda3 8:3 0 1.8T 0 part / sdb 8:16 0 893.8G 0 disk `-sdb1 8:17 0 893.8G 0 part /var/data sdc 8:32 0 1.8T 0 disk `-3600605b00b5982502642ac600341c0bb 253:0 0 1.8T 0 mpath `-3600605b00b5982502642ac600341c0bb1 253:1 0 1.8T 0 part sh-4.2# ls /dev/mapper/ 3600605b00b5982502642ac600341c0bb 3600605b00b5982502642ac600341c0bb1 control sh-4.2# ls -lah /dev/sdc brw-rw----. 1 root disk 8, 32 May 4 12:24 /dev/sdc sh-4.2# ls -lah /dev/mapper/3600605b00b5982502642ac600341c0bb1 lrwxrwxrwx. 1 root root 7 May 4 12:24 /dev/mapper/3600605b00b5982502642ac600341c0bb1 -> ../dm-1 sh-4.2# dmsetup ls 3600605b00b5982502642ac600341c0bb1 (253:1) 3600605b00b5982502642ac600341c0bb (253:0) sh-4.2# multipath -ll 3600605b00b5982502642ac600341c0bb dm-0 AVAGO ,MR9361-8i size=1.7T features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 0:2:2:0 sdc 8:32 active ready running
-
Check the
wwids
andmultipath.conf
file.sh-4.2# cat /etc/multipath/wwids # Multipath wwids, Version : 1.0 # NOTE: This file is automatically maintained by multipath and multipathd. # You should not need to edit this file in normal circumstances. # # Valid WWIDs: /3600605b00b5982502642ac600341c0bb/ sh-4.2# cat /etc/multipath.conf defaults { user_friendly_names no max_fds max flush_on_last_del yes queue_without_daemon no dev_loss_tmo infinity fast_io_fail_tmo 5 } # All data under blacklist must be specific to your system. blacklist { wwid "SAdaptec*" devnode "^hd[a-z]" devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^cciss.*" } devices { device { vendor "NETAPP" product "LUN" path_grouping_policy group_by_prio features "3 queue_if_no_path pg_init_retries 50" prio "alua" path_checker tur failback immediate path_selector "round-robin 0" hardware_handler "1 alua" rr_weight uniform rr_min_io 128 uid_attribute ID_SERIAL } }
-
Note that you cannot use the
/dev/sdc
device name yet because of devicemapper and multipath.sh-4.2# mkdir /tmp/testa sh-4.2# mount /dev/sdc /tmp/testa mount: /dev/sdc is already mounted or /tmp/testa busy sh-4.2# mount /dev/mapper/3600605b00b5982502642ac600341c0bb1 /tmp/testa sh-4.2# umount /tmp/testa
-
Backup
multipath.conf
and id it to add your devicewwid
to the blacklist.sh-4.2# cp /etc/multipath.conf{.bk,} sh-4.2# vi /etc/multipath.conf sh-4.2# diff /etc/multipath.conf{,.bk} 11d10 < wwid 3600605b00b5982502642ac600341c0bb sh-4.2# cat /etc/multipath.conf defaults { user_friendly_names no max_fds max flush_on_last_del yes queue_without_daemon no dev_loss_tmo infinity fast_io_fail_tmo 5 } # All data under blacklist must be specific to your system. blacklist { wwid 3600605b00b5982502642ac600341c0bb wwid "SAdaptec*" devnode "^hd[a-z]" devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^cciss.*" } devices { device { vendor "NETAPP" product "LUN" path_grouping_policy group_by_prio features "3 queue_if_no_path pg_init_retries 50" prio "alua" path_checker tur failback immediate path_selector "round-robin 0" hardware_handler "1 alua" rr_weight uniform rr_min_io 128 uid_attribute ID_SERIAL } }
Nothing happens until you refresh the service.
sh-4.2# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |-sda1 8:1 0 256M 0 part /boot |-sda2 8:2 0 1G 0 part `-sda3 8:3 0 1.8T 0 part / sdb 8:16 0 893.8G 0 disk `-sdb1 8:17 0 893.8G 0 part /var/data sdc 8:32 0 1.8T 0 disk |-sdc1 8:33 0 1.8T 0 part `-3600605b00b5982502642ac600341c0bb 253:0 0 1.8T 0 mpath `-3600605b00b5982502642ac600341c0bb1 253:1 0 1.8T 0 part sh-4.2# systemctl reload multipathd.service sh-4.2# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |-sda1 8:1 0 256M 0 part /boot |-sda2 8:2 0 1G 0 part `-sda3 8:3 0 1.8T 0 part / sdb 8:16 0 893.8G 0 disk `-sdb1 8:17 0 893.8G 0 part /var/data sdc 8:32 0 1.8T 0 disk `-sdc1 8:33 0 1.8T 0 part sh-4.2# mount /dev/sdc1 /tmp/testa sh-4.2# umount /tmp/testa
-
Wipe the volume data (it came with a single XFS partition).
sh-4.2# wipefs /dev/sdc offset type ---------------------------------------------------------------- 0x1fe dos [partition table] sh-4.2# wipefs /dev/sdc --all --force /dev/sdc: 2 bytes were erased at offset 0x000001fe (dos): 55 aa /dev/sdc: calling ioclt to re-read partition table: Success sh-4.2# wipefs /dev/sdc sh-4.2# exit exit
-
Scripting it, to repeat:
cp /etc/multipath.conf{,.bk} sed -i 's/wwid \"SAdaptec\*\"/wwid \"SAdaptec\*\"\nwwid '$(multipath -ll | grep AVAGO | awk '{print $1}')'/g' /etc/multipath.conf systemctl reload multipathd.service wipefs /dev/sdc --all --force
-
Note
|
These RHEL systems are "ephemeral" if "reset" using the ibmcloud command. You will lose any alteration to the underlying OS. Further work and testing needs to be done if we need to be able to use the machines and their storage correctly on a "reset". It is unclear to me if there is a "Reboot" option that does not "reset" the OS. For the purposes of this POC it does not appear to be important as long as we do not "Reset" the machines.
|
Install the Local Storage Operator
-
Do Section 1.2.2 in the official documentation.
-
Create the LocalVolume object:
echo 'apiVersion: local.storage.openshift.io/v1 kind: LocalVolume metadata: name: ocs-backing namespace: local-storage spec: nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - 10.194.150.196 - 10.194.150.201 - 10.194.150.204 storageClassDevices: - devicePaths: - /dev/sdc storageClassName: local-ocs-backing volumeMode: Block' | oc create -f -
Note
|
Best practices suggested by the OCS documentation is to use the /dev/disks/by-id name of the disk. This was not originally done in this POC.
|
Results:
$ oc get pods
NAME READY STATUS RESTARTS AGE
local-ssd-file-platinum-kafka-local-diskmaker-77dz5 1/1 Running 0 17h
local-ssd-file-platinum-kafka-local-diskmaker-jzm6g 1/1 Running 0 17h
local-ssd-file-platinum-kafka-local-diskmaker-qnmb6 1/1 Running 0 17h
local-ssd-file-platinum-kafka-local-provisioner-66n4b 1/1 Running 0 17h
local-ssd-file-platinum-kafka-local-provisioner-gnr2x 1/1 Running 0 17h
local-ssd-file-platinum-kafka-local-provisioner-kzbfw 1/1 Running 0 17h
local-storage-operator-5df7f84f45-kttf4 1/1 Running 0 14h
ocs-backing-local-diskmaker-fkbqn 1/1 Running 0 4s
ocs-backing-local-diskmaker-l82n6 1/1 Running 0 4s
ocs-backing-local-diskmaker-zhl9d 1/1 Running 0 4s
ocs-backing-local-provisioner-5sqq4 1/1 Running 0 4s
ocs-backing-local-provisioner-n9mxg 1/1 Running 0 4s
ocs-backing-local-provisioner-wz4qk 1/1 Running 0 4s
Note that the "diskmaker" logs don’t take any action until there is a PVC created to match the PV.
$ oc logs -f ocs-backing-local-diskmaker-fkbqn
I0505 02:28:06.808416 1 diskmaker.go:23] Go Version: go1.12.12
I0505 02:28:06.808796 1 diskmaker.go:24] Go OS/Arch: linux/amd64
^C
The "provisioner" logs show the capture of the raw block device.
$ oc logs -f ocs-backing-local-provisioner-5sqq4
I0505 02:28:06.851623 1 common.go:320] StorageClass "local-ocs-backing" configured with MountDir "/mnt/local-storage/local-ocs-backing", HostDir "/mnt/local-storage/local-ocs-backing", VolumeMode "Block", FsType "", BlockCleanerCommand ["/scripts/quick_reset.sh"]
I0505 02:28:06.851784 1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-ocs-backing:{HostDir:/mnt/local-storage/local-ocs-backing MountDir:/mnt/local-storage/local-ocs-backing BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Block FsType:}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:ocs-backing storage.openshift.com/local-volume-owner-namespace:local-storage]}
I0505 02:28:06.851817 1 main.go:64] Ready to run...
W0505 02:28:06.851829 1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default.
W0505 02:28:06.851839 1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set.
I0505 02:28:06.852418 1 common.go:382] Creating client using in-cluster config
I0505 02:28:06.884228 1 main.go:85] Starting controller
I0505 02:28:06.884272 1 main.go:100] Starting metrics server at :8080
I0505 02:28:06.884403 1 controller.go:45] Initializing volume cache
I0505 02:28:07.087009 1 controller.go:108] Controller started
E0505 02:28:07.087143 1 discovery.go:201] Error reading directory: open /mnt/local-storage/local-ocs-backing: no such file or directory
I0505 02:28:17.087958 1 discovery.go:304] Found new volume at host path "/mnt/local-storage/local-ocs-backing/sdc" with capacity 1919816826880, creating Local PV "local-pv-a0747d99", required volumeMode "Block"
I0505 02:28:17.102694 1 discovery.go:337] Created PV "local-pv-a0747d99" for volume at "/mnt/local-storage/local-ocs-backing/sdc"
I0505 02:28:17.102824 1 cache.go:55] Added pv "local-pv-a0747d99" to cache
I0505 02:28:17.116552 1 cache.go:64] Updated pv "local-pv-a0747d99" to cache
^C
$ oc logs -f ocs-backing-local-provisioner-n9mxg
I0505 02:28:06.628631 1 common.go:320] StorageClass "local-ocs-backing" configured with MountDir "/mnt/local-storage/local-ocs-backing", HostDir "/mnt/local-storage/local-ocs-backing", VolumeMode "Block", FsType "", BlockCleanerCommand ["/scripts/quick_reset.sh"]
I0505 02:28:06.628779 1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-ocs-backing:{HostDir:/mnt/local-storage/local-ocs-backing MountDir:/mnt/local-storage/local-ocs-backing BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Block FsType:}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:ocs-backing storage.openshift.com/local-volume-owner-namespace:local-storage]}
I0505 02:28:06.628809 1 main.go:64] Ready to run...
W0505 02:28:06.628820 1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default.
W0505 02:28:06.628831 1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set.
I0505 02:28:06.629393 1 common.go:382] Creating client using in-cluster config
I0505 02:28:06.658010 1 main.go:85] Starting controller
I0505 02:28:06.658072 1 main.go:100] Starting metrics server at :8080
I0505 02:28:06.658185 1 controller.go:45] Initializing volume cache
I0505 02:28:06.860545 1 controller.go:108] Controller started
E0505 02:28:06.860666 1 discovery.go:201] Error reading directory: open /mnt/local-storage/local-ocs-backing: no such file or directory
I0505 02:28:16.861298 1 discovery.go:304] Found new volume at host path "/mnt/local-storage/local-ocs-backing/sdc" with capacity 1919816826880, creating Local PV "local-pv-4db9cb47", required volumeMode "Block"
I0505 02:28:16.877592 1 discovery.go:337] Created PV "local-pv-4db9cb47" for volume at "/mnt/local-storage/local-ocs-backing/sdc"
I0505 02:28:16.877745 1 cache.go:55] Added pv "local-pv-4db9cb47" to cache
I0505 02:28:16.887963 1 cache.go:64] Updated pv "local-pv-4db9cb47" to cache
^C
$ oc logs -f ocs-backing-local-provisioner-wz4qk
I0505 02:28:06.777835 1 common.go:320] StorageClass "local-ocs-backing" configured with MountDir "/mnt/local-storage/local-ocs-backing", HostDir "/mnt/local-storage/local-ocs-backing", VolumeMode "Block", FsType "", BlockCleanerCommand ["/scripts/quick_reset.sh"]
I0505 02:28:06.777974 1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-ocs-backing:{HostDir:/mnt/local-storage/local-ocs-backing MountDir:/mnt/local-storage/local-ocs-backing BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Block FsType:}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:ocs-backing storage.openshift.com/local-volume-owner-namespace:local-storage]}
I0505 02:28:06.778007 1 main.go:64] Ready to run...
W0505 02:28:06.778018 1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default.
W0505 02:28:06.778027 1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set.
I0505 02:28:06.778609 1 common.go:382] Creating client using in-cluster config
I0505 02:28:06.803697 1 main.go:85] Starting controller
I0505 02:28:06.803728 1 main.go:100] Starting metrics server at :8080
I0505 02:28:06.803815 1 controller.go:45] Initializing volume cache
I0505 02:28:07.006479 1 controller.go:108] Controller started
E0505 02:28:07.006588 1 discovery.go:201] Error reading directory: open /mnt/local-storage/local-ocs-backing: no such file or directory
I0505 02:28:17.007375 1 discovery.go:304] Found new volume at host path "/mnt/local-storage/local-ocs-backing/sdc" with capacity 1919816826880, creating Local PV "local-pv-7f5385f2", required volumeMode "Block"
I0505 02:28:17.026660 1 discovery.go:337] Created PV "local-pv-7f5385f2" for volume at "/mnt/local-storage/local-ocs-backing/sdc"
I0505 02:28:17.026720 1 cache.go:55] Added pv "local-pv-7f5385f2" to cache
I0505 02:28:17.036906 1 cache.go:64] Updated pv "local-pv-7f5385f2" to cache
^C
The PersistentVolumes show up:
$ oc get pv | grep ocs-backing
local-pv-4db9cb47 1787Gi RWO Delete Available local-ocs-backing 22m
local-pv-7f5385f2 1787Gi RWO Delete Available local-ocs-backing 22m
local-pv-a0747d99 1787Gi RWO Delete Available local-ocs-backing 22m
Create an OCS StorageCluster
-
Edit the system:node ClusterRole. Alternatively, you can set
enable-controller-attach-detach
to true on the kubelet.[2]$ oc edit clusterrole system\:node - apiGroups: - storage.k8s.io resources: - volumeattachments verbs: - get - create - update - delete - list
-
Do Section 1.2.6 in the official documentation to create a custom cluster.
$ echo 'apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: name: ocs-storagecluster namespace: openshift-storage spec: monDataDirHostPath: /var/lib/rook storageDeviceSets: - config: {} count: 1 dataPVCTemplate: spec: accessModes: - ReadWriteOnce resources: requests: storage: 1787Gi storageClassName: local-ocs-backing volumeMode: Block name: ocs-deviceset placement: {} replica: 3 resources: {} version: 4.3.0' | oc create -f -
Notice that our storage size matches the PV size, which is not one of the prescripted sizes in the GUI. Also, the
mons
are being hosted on the root disk under/var/lib/rook
. You can alternatively deploy another PV if you have something else on your bare metal cluster to back them. In IBM Cloud this could be any of many different options. We did this for simplicity as it was not critical to the POC. -
Edit the DaemonSets and change all references to the kublet location from
/var/lib/kubelet
to/var/data/kubelet
$ oc edit ds csi-cephfsplugin $ oc edit ds csi-rbdplugin
Automated this would look like:+
$ oc get ds csi-rbdplugin -o yaml | sed 's/var\/lib\/kubelet/var\/data\/kubelet/g' | oc apply -f - $ oc get ds csi-cephfsplugin -o yaml | sed 's/var\/lib\/kubelet/var\/data\/kubelet/g' | oc apply -f -
Debugging
Sometimes the pods that prepare the OSDs (e.g. rook-ceph-osd-prepare-ocs-deviceset-0-0-zf9xk-jckrc
) may recycle after failing and crashing a few times.
Kubelet log on each node is very helpful: /var/log/kubelet.log
This error is persistent but benign:
May 11 21:30:34 kube-bqjdgjof0jub1eqc7830-cluster0-storage-00000d76 kubelet.service: E0511 21:30:34.616492 15432 goroutinemap.go:150] Operation for "/var/data/kubelet/plugins/openshift-storage.rbd.csi.ceph.com/csi.sock" failed. No retries permitted until 2020-05-11 21:32:36.616461186 -0500 CDT m=+637666.488684071 (durationBeforeRetry 2m2s). Error: "RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/data/kubelet/plugins/openshift-storage.rbd.csi.ceph.com/csi.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration"
This error indicates you did not edit the clusterrole correctly, or did not do it at all:
May 11 10:56:11 kube-bqjdgjof0jub1eqc7830-cluster0-storage-00000d76 kubelet.service: E0511 10:56:11.498116 15432 nestedpendingoperations.go:20] Operation for "\"kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-abcba85c-9395-11ea-abcf-52f7713ae97\"" failed. No retries permitted until 2020-05-11 10:58:13.498080928 -0500 CDT m=+599603.370303765 (durationBeforeRetry 2m2s). Error: "AttahVolume.Attach failed for volume \"pvc-3797850b-b025-46fc-a177-242b86483049\" (UniqueName: \"kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^001-0011-openshift-storage-0000000000000002-abcba85c-9395-11ea-abcf-52f77713ae97\") from node \"10.194.150.204\" : kubernetes.io/csi: attacher.Attch failed: volumeattachments.storage.k8s.io is forbidden: User \"system:node:10.194.150.204\" cannot create resource \"volumeattachments\" in API roup \"storage.k8s.io\" at the cluster scope: can only get individual resources of this type"