Linux: ncdu and mc to manage large directories

Creating, modifying, and deleting files are everyday tasks performed in any operating system, even more so by Sysadmins, Developers, and Programmers. For the most part, these tasks are fast enough when managing a handful of files. However, on Linux and especially with servers, you may, at some time, have to manage millions or even billions of files in a single directory.

For example, three weeks ago, I had to delete a directory containing 711,057,408 files because the StackLinux VPS hosting client could not delete them. I made a note then to blog about that issue. However, there are already many blog posts and QA’s with alternative commands for deleting large numbers of files.

So although we will briefly touch on rm command alternatives, I want to focus on two command-line tools for viewing and managing the Linux file system. They are mc and ncdu. More on those later.

“cannot execute [Argument list too long]” when using rm

Have you tried to delete the contents of a directory using rm command but received the following error? Cannot execute [Argument list too long]. This limitation occurs when the rm command is used to delete a directory containing a large number of files. In short, the shell fails to invoke the command if the ARG_MAX limit is exceeded.

Check limit with:

getconf ARG_MAX

With Linux 2.6.23+, ARG_MAX is not hard-coded anymore. See the git entry. It is limited to 1/4 of the stack size (ulimit -s), which ensures that the program can still run. See also the git diff of fs/exec.c. This limit will also be hit when using cp, ls, mv, etc. So you can use alternative commands that do not hit this limit. For example, find -exec or find -delete (faster!) :

find . -type f -delete

To delete files with only specific extension:

find . -name "*.log" -type f -delete

My favorite method, which is about twice as fast as using the find -delete, uses rsync. Rsync is commonly used for synchronizing files between two different locations, usually remote, but can also be on the same system. What we want to do is sync the target directory (the directory with a large # of files) with an empty directory. In my case, the /path/to/var/session/ directory had over seven hundred million files (a symptom of sessions management). So we would first create an empty directory: empty_dir/. (can be named anything).

mkdir empty_dir/

Next, we will use the same -delete option used with find, also with rsync. Example:

rsync -a --delete [empty directory] [target directory]

Or, in the case mentioned in the outset with:

rsync -a --delete /path/to/empty_dir/ /path/to/var/session/

-a or –archive is equivalent to using -rlptgoD. It is a quick way of saying you want recursion and want to preserve almost everything (with -H being a notable omission). The only exception to the above equivalence is when –files-from is specified, in which case -r is not implied. Note that -a does not preserve hard links because finding multiply-linked files is expensive. You must separately specify -H. (see: man rsync)

What would have usually taken hours, completed in about 10 to 15 mins using rsync.

ncdu (NCurses Disk Usage)

ncdu

ncdu

ncdu is a curses-based version of the well-known ‘du.’ It provides a fast way to view and manage directories using disk space. Users can navigate using the arrow keys and delete files taking up too much space by pressing the ‘d’ key.

To install on Debian or Ubuntu, run:

apt install ncdu

On CentOS, enable Epel repo, then install:

yum install epel-release
yum install ncdu

To delete a directory or file, select and press d. Type ? for a list of shortcuts.

mc (Midnight Commander) file manager

mc (Midnight Commander)

A much more powerful and feature-filled alternative would be Midnight Commander (mc). ‘mc’ is a directory browser/file manager for Unix-like operating systems. Midnight Commander’s features include the ability to view the contents of RPM package files, work with archive formats, and as an FTP client. Midnight Commander includes mcedit a standalone editor called.

To install on Debian or Ubuntu, run:

apt install mc

On CentOS:

yum install mc

As per the screenshot, press F8 on the keyboard to delete. mc and ncdu are additional command-line tools for use with already well-known commands such as ls, du, and df.

Also, read bash: /usr/bin/rm: Argument list too long – Solution

Reference:
mc guide (pdf): http://nawaz.org/media/docs/mc/mc.pdf
man mc: https://linux.die.net/man/1/mc
man ncdu: https://linux.die.net/man/1/ncdu
arg_max: https://www.in-ulm.de/~mascheck/various/argmax/

Tags: , , ,



Top ↑