How to Make the Linux find Command Execute Faster

find is an awesome Linux command that can come in handy all the time. It allows us to search for certain files, then execute a command (or commands) on all of those files. In this guide, we’ll show you a few ways that you can make your find command execute a lot more quickly. Check out some of the examples below.

As a test environment, I have about 260 JPG files in a directory, along with some other file types. We’re going to use the find command to execute md5sum on the JPG files, which will give us a checksum of each one. We’ll try several different methods, while using the time command to see which gives the speediest result.

Example 1. This is the usual way people use find. For a job like this, it’s actually the slowest.

$ time find . -iname "*.jpg" -exec md5sum {} \;

real	0m4.575s
user	0m3.848s
sys	0m0.697s

About 4.5 seconds. The problem here is that the files are executed one at a time. Let’s see how to execute md5sum on multiple files at once.

Example 2. Rather than use \; at the end of our command, we’ll use +, which will allow find to pass multiple file names to md5sum simultaneously.

$ time find . -iname "*.jpg" -exec md5sum {} +

real	0m3.995s
user	0m3.543s
sys	0m0.447s

There’s a slight improvement, but pretty negligible. On some commands, this would produce a big difference – for md5sum, it doesn’t.

Example 3. We could try using xargs instead of the -exec option in find. Maybe that will help? The first command below will process multiple files at once, and the second command processes one at a time with -n 1.

$ time find . -iname "*.jpg" -print0 | xargs -0 md5sum

real	0m3.955s
user	0m3.487s
sys	0m0.469s

$ time find . -iname "*.jpg" -print0 | xargs -0 -n 1 md5sum

real	0m4.597s
user	0m3.858s
sys	0m0.744s

No, using xargs didn’t help much at all, but it can if we use the right options.

Example 4. Okay, this is the one you’re looking for. The best way to speed up find is indeed by using xargs in place of -exec, but also including the -P option in your command, which will instruct xargs to use multiple CPU cores.

$ time find . -iname "*jpg" -print0 | xargs -0 -n 1 -P 4 md5sum

real	0m1.526s
user	0m4.736s
sys	0m1.167s

There’s the time savings we’ve been looking for! We’ve used the -P 4 option so xargs uses four CPU cores simultaneously, and the -n 1 option so xargs processes each file one at a time. For various commands, you may find that adjusting the -n number could lead to better or worse results. In our experiments with md5sum, the sweet spot seemed to be -n 10.

$ time find . -iname "*jpg" -print0 | xargs -0 -n 10 -P 4 md5sum

real	0m1.421s
user	0m4.671s
sys	0m0.722s

Use Multiple CPU Cores

Essentially, the secret here was just to use more than one CPU core by specifying -P, and experimenting with -n option in xargs. Note that “cores” in this case is logical CPUs. So, if your CPU has 4 cores and multithreading, it has 8 logical CPUs.

You can execute the following command to see how many logical CPUs you have.

$ getconf _NPROCESSORS_ONLN

Leave a Comment

Your email address will not be published. Required fields are marked *