storage - 高可用性存储

Question

我想通过 NFS 和 CIFS 提供 2 TB 左右的空间。我正在寻找 2 个（或更多）服务器解决方案，以实现高可用性以及尽可能跨服务器进行负载平衡的能力。有关集群或高可用性解决方案的任何建议？

这是商业用途，计划在未来几年增长到 5-10 TB。我们的设施几乎每天 24 小时，每周 6 天。我们可能会有 15-30 分钟的停机时间，但我们希望尽量减少数据丢失。我想尽量减少凌晨 3 点的通话。

我们目前在 Solaris 上运行一台带有 ZFS 的服务器，并且我们正在研究用于 HA 部分的 AVS，但是我们在 Solaris 上遇到了一些小问题（CIFS 实施不适用于 Vista 等），这些问题阻碍了我们。

我们已经开始关注

DRDB over GFS（用于分布式锁功能的 GFS）
Gluster（需要客户端，没有本地 CIFS 支持？）
Windows DFS（文档说仅在文件关闭后复制？）

我们正在寻找一个提供数据的“黑匣子”。

我们目前在 ZFS 中对数据进行快照，并通过网络将快照发送到远程数据中心以进行异地备份。

我们最初的计划是每 10 - 15 分钟拥有第二台机器和 rsync。失败的问题是正在进行的生产过程会丢失 15 分钟的数据并留在“中间”。他们几乎从一开始就比在中间弄清楚从哪里开始要容易得多。这就是促使我们关注 HA 解决方案的原因。

score 6 · Accepted Answer

我最近使用 DRBD 作为后端部署了 hanfs，在我的情况下，我正在运行主动/备用模式，但我也已经在主/主模式下使用 OCFS2 成功测试了它。不幸的是，关于如何最好地实现这一点的文档并不多，存在的大多数文档充其量也几乎没有用处。如果您确实选择了 drbd 路线，我强烈建议您加入 drbd 邮件列表，并阅读所有文档。这是我为处理 ha 的故障而编写的 ha/drbd 设置和脚本：

需要 DRBD8 - 这是由 drbd8-utils 和 drbd8-source 提供的。一旦安装了这些（我相信它们是由 backports 提供的），您可以使用模块助手来安装它 - ma i drbd8。此时可以使用 depmod -a 或重新启动，如果您使用 depmod -a，则需要 modprobe drbd。

您将需要一个后端分区来用于 drbd，不要将此分区设为 LVM，否则您会遇到各种问题。不要将 LVM 放在 drbd 设备上，否则会遇到各种问题。

汉弗斯1：


/etc/drbd.conf

global {
        usage-count no;
}
common {
        protocol C;
        disk { on-io-error detach; }
}
resource export {
        syncer {
                rate 125M;
        }
        on hanfs2 {
                address         172.20.1.218:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
        }
        on hanfs1 {
                address         172.20.1.219:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
       }
}

Hanfs2 的 /etc/drbd.conf：


global {
        usage-count no;
}
common {
        protocol C;
        disk { on-io-error detach; }
}
resource export {
        syncer {
                rate 125M;
        }
        on hanfs2 {
                address         172.20.1.218:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
        }
        on hanfs1 {
                address         172.20.1.219:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
       }
}

配置完成后，接下来我们需要启动 drbd。

drbdadm create-md 导出
drbdadm 附加导出
drbdadm 连接导出

我们现在必须执行数据的初始同步 - 显然，如果这是一个全新的 drbd 集群，那么您选择哪个节点都没有关系。

完成后，您需要在您的 drbd 设备上创建 mkfs.yourchoiceofilesystem - 我们上面的配置中的设备是 /dev/drbd1。http://www.drbd.org/users-guide/p-work.html是使用 drbd 时阅读的有用文档。

心跳

安装心跳2。（很简单，apt-get install heartbeat2）。

每台机器上的 /etc/ha.d/ha.cf 应包括：

hanfs1：


logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120

ucast eth1 172.20.1.218

auto_failback no

node hanfs1
node hanfs2

hanfs2：


logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120

ucast eth1 172.20.1.219

auto_failback no

node hanfs1
node hanfs2

/etc/ha.d/haresources 在两个 ha 盒子上应该是相同的：

hanfs1 IPaddr::172.20.1.230/24/eth1
hanfs1 HeartBeatWrapper

我编写了一个包装脚本来处理故障转移场景中由 nfs 和 drbd 引起的特性。该脚本应该存在于每台机器上的 /etc/ha.d/resources.d/ 中。



!/bin/bash

heartbeat fails hard.

so this is a wrapper

to get around that stupidity

I'm just wrapping the heartbeat scripts, except for in the case of umount

as they work, mostly

if [[ -e /tmp/heartbeatwrapper ]]; then
    runningpid=$(cat /tmp/heartbeatwrapper)
    if [[ -z $(ps --no-heading -p $runningpid) ]]; then
        echo "PID found, but process seems dead.  Continuing."
    else

        echo "PID found, process is alive, exiting."

        exit 7

    fi

fi                                                            

echo $$ > /tmp/heartbeatwrapper

if [[ x$1 == "xstop" ]]; then

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.

Thanks guys, you really make my day with this bullshit.

Because of the above, we just have to hope that nfs actually catches the signal

to exit, and manages to shut down its connections.

If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.

I found this to be an interesting insight into just how badly NFS is written.

sleep 1

#we don't want to shutdown nfs first!
#The lock files might go away, which would be bad.

#The above seems to not matter much, the only thing I've determined
#is that if you have anything mounted synchronously, it's going to break
#no matter what I do.  Basically, sync == screwed; in NFSv3 terms.      
#End result of failing over while a client that's synchronous is that   
#the client hangs waiting for its nfs server to come back - thing doesn't
#even bother to time out, or attempt a reconnect.                        
#async works as expected - it insta-reconnects as soon as a connection seems
#to be unstable, and continues to write data.  In all tests, md5sums have   
#remained the same with/without failover during transfer.                   

#So, we first unmount /export - this prevents drbd from having a shit-fit
#when we attempt to turn this node secondary.                            

#That's a lie too, to some degree. LVM is entirely to blame for why DRBD
#was refusing to unmount.  Don't get me wrong, having /export mounted doesn't
#help either, but still.                                                     
#fix a usecase where one or other are unmounted already, which causes us to terminate early.

if [[ "$(grep -o /varlibnfs/rpc_pipefs /etc/mtab)" ]]; then                                 
    for ((test=1; test <= 10; test++)); do                                                  
        umount /export/varlibnfs/rpc_pipefs  >/dev/null 2>&1                                
        if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then                        
            break                                                                           
        fi                                                                                  
        if [[ $? -ne 0 ]]; then                                                             
            #try again, harder this time                                                    
            umount -l /var/lib/nfs/rpc_pipefs  >/dev/null 2>&1                              
            if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then                    
                break                                                                       
            fi                                                                              
        fi                                                                                  
    done                                                                                    
    if [[ $test -eq 10 ]]; then                                                             
        rm -f /tmp/heartbeatwrapper                                                         
        echo "Problem unmounting rpc_pipefs"                                                
        exit 1                                                                              
    fi                                                                                      
fi                                                                                          

if [[ "$(grep -o /dev/drbd1 /etc/mtab)" ]]; then                                            
    for ((test=1; test <= 10; test++)); do                                                  
        umount /export  >/dev/null 2>&1                                                     
        if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then                                   
            break                                                                           
        fi                                                                                  
        if [[ $? -ne 0 ]]; then                                                             
            #try again, harder this time                                                    
            umount -l /export  >/dev/null 2>&1                                              
            if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then                               
                break                                                                       
            fi                                                                              
        fi                                                                                  
    done                                                                                    
    if [[ $test -eq 10 ]]; then                                                             
        rm -f /tmp/heartbeatwrapper                                                         
        echo "Problem unmount /export"                                                      
        exit 1                                                                              
    fi                                                                                      
fi                                                                                          


#now, it's important that we shut down nfs. it can't write to /export anymore, so that's fine.
#if we leave it running at this point, then drbd will screwup when trying to go to secondary.  
#See contradictory comment above for why this doesn't matter anymore.  These comments are left in
#entirely to remind me of the pain this caused me to resolve.  A bit like why churches have Jesus
#nailed onto a cross instead of chilling in a hammock.                                           

pidof nfsd | xargs kill -9 >/dev/null 2>&1

sleep 1                                   

if [[ -n $(ps aux | grep nfs | grep -v grep) ]]; then
    echo "nfs still running, trying to kill again"   
    pidof nfsd | xargs kill -9 >/dev/null 2>&1       
fi                                                   

sleep 1

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

sleep 1

#next we need to tear down drbd - easy with the heartbeat scripts
#it takes input as resourcename start|stop|status                
#First, we'll check to see if it's stopped                       

/etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1
if [[ $? -eq 2 ]]; then                                    
    echo "resource is already stopped for some reason..."  
else                                                       
    for ((i=1; i <= 10; i++)); do                          
        /etc/ha.d/resource.d/drbddisk export stop >/dev/null 2>&1
        if [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Secondary" ]] || [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Unknown" ]]; then                                                                                                                             
            echo "Successfully stopped DRBD"                                                                                                             
            break                                                                                                                                        
        else                                                                                                                                             
            echo "Failed to stop drbd for some reason"                                                                                                   
            cat /proc/drbd                                                                                                                               
            if [[ $i -eq 10 ]]; then                                                                                                                     
                    exit 50                                                                                                                              
            fi                                                                                                                                           
        fi                                                                                                                                               
    done                                                                                                                                                 
fi                                                                                                                                                       

rm -f /tmp/heartbeatwrapper                                                                                                                              
exit 0                                                                                                                                                   


elif [[ x$1 == "xstart" ]]; then

#start up drbd first
/etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1
if [[ $? -ne 0 ]]; then                                   
    echo "Something seems to have broken. Let's check possibilities..."
    testvar=$(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2)       
    if [[ $testvar == "Primary/Unknown" ]] || [[ $testvar == "Primary/Secondary" ]]
    then                                                                           
        echo "All is fine, we are already the Primary for some reason"             
    elif [[ $testvar == "Secondary/Unknown" ]] || [[ $testvar == "Secondary/Secondary" ]]
    then                                                                                 
        echo "Trying to assume Primary again"                                            
        /etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1                       
        if [[ $? -ne 0 ]]; then                                                          
            echo "I give up, something's seriously broken here, and I can't help you to fix it."
            rm -f /tmp/heartbeatwrapper                                                         
            exit 127                                                                            
        fi                                                                                      
    fi                                                                                          
fi                                                                                              

sleep 1                                                                                         

#now we remount our partitions                                                                  

for ((test=1; test <= 10; test++)); do                                                          
    mount /dev/drbd1 /export >/tmp/mountoutput                                                  
    if [[ -n $(grep -o export /etc/mtab) ]]; then                                               
        break                                                                                   
    fi                                                                                          
done                                                                                            

if [[ $test -eq 10 ]]; then                                                                     
    rm -f /tmp/heartbeatwrapper                                                                 
    exit 125                                                                                    
fi                                                                                              


#I'm really unsure at this point of the side-effects of not having rpc_pipefs mounted.          
#The issue here, is that it cannot be mounted without nfs running, and we don't really want to start
#nfs up at this point, lest it ruin everything.                                                     
#For now, I'm leaving mine unmounted, it doesn't seem to cause any problems.                        

#Now we start up nfs.

/etc/init.d/nfs-kernel-server start >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "There's not really that much that I can do to debug nfs issues."
    echo "probably your configuration is broken.  I'm terminating here."
    rm -f /tmp/heartbeatwrapper
    exit 129
fi

#And that's it, done.

rm -f /tmp/heartbeatwrapper
exit 0


elif [[ "x$1" == "xstatus" ]]; then

#Lets check to make sure nothing is broken.

#DRBD first
/etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

#mounted?
grep -q drbd /etc/mtab >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

#nfs running?
/etc/init.d/nfs-kernel-server status >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

echo "running"
rm -f /tmp/heartbeatwrapper
exit 0


fi

完成上述所有操作后，您只需配置 /etc/exports

/export 172.20.1.0/255.255.255.0(rw,sync,fsid=1,no_root_squash)

那么这只是在两台机器上启动心跳并在其中一台上发出 hb_takeover 的情况。您可以通过确保发出接管的设备是主要设备来测试它是否正常工作 - 检查 /proc/drbd，设备安装正确，并且您可以访问 nfs。

--

祝你好运。对我来说，从头开始设置是一次非常痛苦的经历。

score 3 · Accepted Answer

如今 2TB 可装入一台机器，因此您可以选择从简单到复杂的各种选项。这些都假定 linux 服务器：

您可以通过设置两台机器并定期从主要机器到备份机器进行 rsync 来获得穷人的 HA。
您可以使用DRBD在块级别从另一个镜像。这样做的缺点是将来有点难以扩展。
您可以改为使用OCFS2对磁盘进行集群，以便将来进行扩展。

也有很多商业解决方案，但现在 2TB 对他们中的大多数人来说有点小。

您还没有提到您的应用程序，但是如果不需要热故障转移，并且您真正想要的是能够承受丢失一两个磁盘的东西，请找到支持 RAID-5 的 NAS，至少 4 个驱动器，和热插拔，你应该很高兴。

score 1 · Accepted Answer

我会推荐NAS 存储。（网络附加存储）。

惠普有一些不错的供您选择。

http://h18006.www1.hp.com/storage/aiostorage.html

以及集群版本：

http://h18006.www1.hp.com/storage/software/clusteredfs/index.html?jumpid=reg_R1002_USEN

score 0 · Accepted Answer

有两种方法可以解决这个问题。第一种是直接从戴尔或惠普购买 SAN 或 NAS，然后花钱解决问题。现代存储硬件只是让所有这一切变得容易，节省您的专业知识以解决更多核心问题。

如果您想自己动手，请查看将 Linux 与 DRBD 结合使用。

http://www.drbd.org/

DRBD 允许您创建联网的块设备。考虑跨两台服务器的 RAID 1，而不仅仅是两个磁盘。DRBD 部署通常使用 Heartbeat 进行故障转移，以防一个系统死机。

我不确定负载平衡，但您可能会调查并查看 LVS 是否可用于跨 DRBD 主机的负载平衡：

http://www.linuxvirtualserver.org/

最后，让我重申一下，从长远来看，您可能会为自己节省大量时间，只需花钱购买 NAS。

score 0 · Accepted Answer

您在寻找“企业”解决方案还是“家庭”解决方案？从您的问题中很难看出，因为 2TB 对于企业来说非常小，而对于家庭用户（尤其是两台服务器）来说高端一点。您能否澄清一下需求，以便我们讨论权衡？

score 0 · Accepted Answer

我从您的问题主体中假设您是商业用户？我从 Silicon Mechanics 购买了一个 6TB RAID 5 单元并连接了 NAS，我的工程师在我们的服务器上安装了 NFS。通过 rsync 执行的备份到另一个大容量 NAS。

score 0 · Accepted Answer

看看 Amazon Simple Storage Service (Amazon S3)

http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA

- 这可能会引起人们的兴趣。高可用性

尊敬的 AWS 客户：

你们中的许多人都要求我们提前让您了解当前正在开发的功能和服务，以便您可以更好地规划该功能如何与您的应用程序集成。为此，我们很高兴与您分享一些关于我们在 AWS 正在开发的新产品的早期细节 - 内容交付服务。

这项新服务将为您提供一种向最终用户分发内容的高性能方法，从而在您的客户访问您的对象时为他们提供低延迟和高数据传输率。初始版本将帮助需要通过 HTTP 连接交付流行的、公开可读的内容的开发人员和企业。我们的目标是创建一个内容交付服务：

让开发者和企业轻松上手——没有最低费用，也没有承诺。您只需为实际使用的内容付费。简单易用 - 只需一个简单的 API 调用即可开始交付您的内容。与 Amazon S3 无缝协作 - 这为您提供持久存储文件的原始、最终版本，同时使内容交付服务更易于使用。业务遍及全球——我们使用遍布三大洲的全球边缘站点网络，从最合适的位置交付您的内容。

您首先将对象的原始版本存储在 Amazon S3 中，确保它们是公开可读的。然后，您将进行一个简单的 API 调用，以将您的存储桶注册到新的内容交付服务。此 API 调用将返回一个新域名供您包含在您的网页或应用程序中。当客户端使用此域名请求对象时，它们将被自动路由到最近的边缘位置，以便高性能地交付您的内容。就是这么简单。

我们目前正在与一小部分私人测试版客户合作，并希望在今年年底之前广泛提供这项服务。如果您想在我们发布时收到通知，请点击此处告知我们。

真挚地，

亚马逊网络服务团队

score 0 · Accepted Answer

你最好的选择可能是与以做这种事情为生的专家合作。这些人实际上是在我们的办公大楼里……我有机会与他们一起参与我领导的一个类似项目。

http://www.deltasquare.com/关于

score 0 · Accepted Answer

0

我建议您访问 F5 网站并查看http://www.f5.com/solutions/virtualization/file/

于 2009-04-18T21:40:09.703 回答

score 0 · Accepted Answer

您可以查看镜像文件系统。它在文件系统级别进行文件复制。主系统和备份系统上的相同文件是活动文件。

http://www.linux-ha.org/RelatedTechnologies/Filesystems

storage - 高可用性存储

10 回答 10

!/bin/bash

heartbeat fails hard.

so this is a wrapper

to get around that stupidity

I'm just wrapping the heartbeat scripts, except for in the case of umount

as they work, mostly

NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.

Thanks guys, you really make my day with this bullshit.

Because of the above, we just have to hope that nfs actually catches the signal

to exit, and manages to shut down its connections.

If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.

I found this to be an interesting insight into just how badly NFS is written.

Related

Reference