介绍

公司想控制成本来替换HDFS场景下的一些需求,计划用对象存储来做数据存储,用Spark等做计算,存储分离设计。

计划:

  1. 部署分布式minio
  2. 部署spark集群
  3. 数据量的测试是否满足一些数据的批量处理任务
  4. 使用k8s统一运维minio和spark
  5. 。。。。。。

环境

  • CentOS Linux release 7.9.2009 (Core)

  • 4台服务器

  • 每台服务器配置2块硬盘

硬件资源计算

查询硬件计算地址:https://min.io/product/erasure-code-calculator

可以根据自己的场景需要来计算需要多少台服务和硬盘等。

服务安装

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 磁盘格式化分区
# 查看硬盘列表
fdisk -l

# 创建分区,这里的/dev/sdc 是你的硬盘
parted -s -a optimal /dev/sdc mklabel gpt -- mkpart primary ext4 1 -1

# 格式化文件系统
mkfs.ext4 /dev/sdc

# 查看数据盘分区
lsblk -f

# /data/minio2 这个是需要挂载的目录名称
# 编辑 /etc/fstab 文件,添加 nodelalloc 挂载参数
# 方式一
vim /etc/fstab
UUID=15f25734-f39a-4187-a32b-34e0a62179ab /data/minio2 ext4 defaults,nodelalloc,noatime 0 2
# 方式二
cat >> /etc/fstab <<EOF
UUID=6973df10-39a1-4255-9611-4df95b9ac33d /data/minio2 ext4 defaults,nodelalloc,noatime 0 2
EOF

# 挂在硬盘
mkdir -p /data/minio1 && \
mount -a

# 检查是否成功
mount -t ext4

要先确保服务器上的硬盘已经格式化好,再进行下面的操作

1
2
3
4
# 下载二进制文件
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
sudo mv minio /usr/local/bin/
1
2
3
4
5
6
7
# 配置文件
$ vim /etc/default/minio.conf 
MINIO_ROOT_USER=minio
MINIO_ROOT_PASSWORD=minio123
MINIO_PROMETHEUS_AUTH_TYPE="public"
MINIO_VOLUMES="http://spark0{1...4}:9000/data/minio{1...2}" 
MINIO_OPTS='--console-address \":9001\"'
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 启动的文件
$ vim /etc/systemd/system/minio.service 

[Unit]
Description=MinIO
Documentation=https://docs.min.io
Wants=network-noline.target
After=network-noline.target

[Service]
WorkingDirectory=/usr/local/bin

User=root
Group=root
EnvironmentFile=-/etc/default/minio.conf
ExecStartPre=/bin/bash -c "if [ -z \"${MINIO_VOLUMES}\" ]";then echo \"Variable MINIO_VOLUEMS not set in /etc/default/minio\";exit 1;fi"
ExecStart=/usr/local/bin/minio server --certs-dir /etc/minio $MINIO_OPTS $MINIO_VOLUMES
Restart=always

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
# 重新加载配置文件,重启集群和查看当前的状态
$ systemctl daemon-reload && systemctl restart minio && systemctl status minio

# 如果有失败的情况,注意查看当前的日志
# 查看当前的服务器运行日志,最后50行日志
journalctl -n 50 -u minio

因为配置文件中MINIO_OPTS=’–console-address ":9001"',配置了管理端口,所以可以通过页面的方式查看是否成功了

客户端

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 安装
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/mc

# 查看当前的集群的状态
$ set +o history
$ mc alias set local http://172.26.1.231:8080 minio minio123
$ set -o history
$ mc admin info local

分布式minio集群目前状态健康,后续基于此集群来做性能测试了

服务器优化脚本

摘自网络脚本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#!/bin/bash

cat > sysctl.conf <<EOF
# maximum number of open files/file descriptors
fs.file-max = 4194303

# use as little swap space as possible
vm.swappiness = 1

# prioritize application RAM against disk/swap cache
vm.vfs_cache_pressure = 50

# minimum free memory
vm.min_free_kbytes = 1000000

# follow mellanox best practices https://community.mellanox.com/s/article/linux-sysctl-tuning
# the following changes are recommended for improving IPv4 traffic performance by Mellanox

# disable the TCP timestamps option for better CPU utilization
net.ipv4.tcp_timestamps = 0

# enable the TCP selective acks option for better throughput
net.ipv4.tcp_sack = 1

# increase the maximum length of processor input queues
net.core.netdev_max_backlog = 250000

# increase the TCP maximum and default buffer sizes using setsockopt()
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304
net.core.rmem_default = 4194304
net.core.wmem_default = 4194304
net.core.optmem_max = 4194304

# increase memory thresholds to prevent packet dropping:
net.ipv4.tcp_rmem = "4096 87380 4194304"
net.ipv4.tcp_wmem = "4096 65536 4194304"

# enable low latency mode for TCP:
net.ipv4.tcp_low_latency = 1

# the following variable is used to tell the kernel how much of the socket buffer
# space should be used for TCP window size, and how much to save for an application
# buffer. A value of 1 means the socket buffer will be divided evenly between.
# TCP windows size and application.
net.ipv4.tcp_adv_win_scale = 1

# maximum number of incoming connections
net.core.somaxconn = 65535

# maximum number of packets queued
net.core.netdev_max_backlog = 10000

# queue length of completely established sockets waiting for accept
net.ipv4.tcp_max_syn_backlog = 4096

# time to wait (seconds) for FIN packet
net.ipv4.tcp_fin_timeout = 15

# disable icmp send redirects
net.ipv4.conf.all.send_redirects = 0

# disable icmp accept redirect
net.ipv4.conf.all.accept_redirects = 0

# drop packets with LSR or SSR
net.ipv4.conf.all.accept_source_route = 0

# MTU discovery, only enable when ICMP blackhole detected
net.ipv4.tcp_mtu_probing = 1

EOF

echo "Enabling system level tuning params"
sysctl --quiet --load sysctl.conf && rm -f sysctl.conf

# `Transparent Hugepage Support`*: This is a Linux kernel feature intended to improve
# performance by making more efficient use of processor’s memory-mapping hardware.
# But this may cause https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp
# for non-optimized applications. As most Linux distributions set it to `enabled=always` by default,
# we recommend changing this to `enabled=madvise`. This will allow applications optimized
# for transparent hugepages to obtain the performance benefits, while preventing the
# associated problems otherwise. Also, set `transparent_hugepage=madvise` on your kernel
# command line (e.g. in /etc/default/grub) to persistently set this value.

echo "Enabling THP madvise"
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled