hadoop - Getting "No space left on device" for approx. 10 GB of data on EMR m1.large instances

Question

I am getting an error "No space left on device" when I am running my Amazon EMR jobs using m1.large as the instance type for the hadoop instances to be created by the jobflow. The job generates approx. 10 GB of data at max and since the capacity of a m1.large instance is supposed to be 420GB*2 (according to: EC2 instance types ). I am confused how just 10GB of data could lead to a "disk space full" kind of a message. I am aware of the possibility that this kind of an error can also be generated if we have completely exhausted the total number of inodes allowed on the filesystem but that is like a big number amounting to millions and I am pretty sure that my job is not producing that many files. I have seen that when I try to create an EC2 instance independently of m1.large type it by default assigns a root volume of 8GB to it. Could this be the reason behind the provisioning of instances in EMR also? Then, when do the disks of size 420GB get alloted to an instance?

Also, here is the output of of "df -hi" and "mount"

$ df -hi
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/xvda1              640K    100K    541K   16% /
tmpfs                   932K       3    932K    1% /lib/init/rw
udev                    930K     454    929K    1% /dev
tmpfs                   932K       3    932K    1% /dev/shm
ip-10-182-182-151.ec2.internal:/mapr
                        100G     50G     50G   50% /mapr

$ mount
/dev/xvda1 on / type ext3 (rw,noatime)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/var/run on /run type none (rw,bind)
/var/lock on /run/lock type none (rw,bind)
/dev/shm on /run/shm type none (rw,bind)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
ip-10-182-182-151.ec2.internal:/mapr on /mapr type nfs (rw,addr=10.182.182.151)

$ lsblk
NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
xvda1 202:1    0    10G  0 disk /
xvdb  202:16   0   420G  0 disk 
xvdc  202:32   0   420G  0 disk

score 3 · Accepted Answer

在@slayedbylucifer 的帮助下，我能够确定问题是默认情况下集群上的 HDFS 可以使用完整的磁盘空间。因此，有默认的 10GB 空间安装在机器上/可供机器本地使用。有一个名为--mfs-percentagewhich 的选项可用于（在使用 Hadoop 的 MapR 发行版时）指定本地文件系统和 HDFS 之间的磁盘空间分割。它将本地文件系统配额挂载在/var/tmp. 确保该选项mapred.local.dir设置为内部目录，/var/tmp因为这是 tasktracker 尝试的所有日志所在的位置，对于大型作业来说，它的大小可能很大。在我的情况下，日志记录导致磁盘空间错误。我将值设置--mfs-percentage为 60 并且之后能够成功运行该作业。

hadoop - Getting "No space left on device" for approx. 10 GB of data on EMR m1.large instances

1 回答 1

Related

Reference