find
is an awesome Linux command that can come in handy all the time. It allows us to search for certain files, then execute a command (or commands) on all of those files. In this guide, we’ll show you a few ways that you can make your find
command execute a lot more quickly. Check out some of the examples below.
As a test environment, I have about 260 JPG files in a directory, along with some other file types. We’re going to use the find
command to execute md5sum
on the JPG files, which will give us a checksum of each one. We’ll try several different methods, while using the time
command to see which gives the speediest result.
Example 1. This is the usual way people use find
. For a job like this, it’s actually the slowest.
$ time find . -iname "*.jpg" -exec md5sum {} \; real 0m4.575s user 0m3.848s sys 0m0.697s
About 4.5 seconds. The problem here is that the files are executed one at a time. Let’s see how to execute md5sum
on multiple files at once.
Example 2. Rather than use \;
at the end of our command, we’ll use +
, which will allow find
to pass multiple file names to md5sum
simultaneously.
$ time find . -iname "*.jpg" -exec md5sum {} + real 0m3.995s user 0m3.543s sys 0m0.447s
There’s a slight improvement, but pretty negligible. On some commands, this would produce a big difference – for md5sum
, it doesn’t.
Example 3. We could try using xargs
instead of the -exec
option in find. Maybe that will help? The first command below will process multiple files at once, and the second command processes one at a time with -n 1
.
$ time find . -iname "*.jpg" -print0 | xargs -0 md5sum real 0m3.955s user 0m3.487s sys 0m0.469s $ time find . -iname "*.jpg" -print0 | xargs -0 -n 1 md5sum real 0m4.597s user 0m3.858s sys 0m0.744s
No, using xargs
didn’t help much at all, but it can if we use the right options.
Example 4. Okay, this is the one you’re looking for. The best way to speed up find
is indeed by using xargs
in place of -exec
, but also including the -P
option in your command, which will instruct xargs to use multiple CPU cores.
$ time find . -iname "*jpg" -print0 | xargs -0 -n 1 -P 4 md5sum real 0m1.526s user 0m4.736s sys 0m1.167s
There’s the time savings we’ve been looking for! We’ve used the -P 4
option so xargs uses four CPU cores simultaneously, and the -n 1
option so xargs processes each file one at a time. For various commands, you may find that adjusting the -n
number could lead to better or worse results. In our experiments with md5sum
, the sweet spot seemed to be -n 10
.
$ time find . -iname "*jpg" -print0 | xargs -0 -n 10 -P 4 md5sum real 0m1.421s user 0m4.671s sys 0m0.722s
Use Multiple CPU Cores
Essentially, the secret here was just to use more than one CPU core by specifying -P
, and experimenting with -n
option in xargs. Note that “cores” in this case is logical CPUs. So, if your CPU has 4 cores and multithreading, it has 8 logical CPUs.
You can execute the following command to see how many logical CPUs you have.
$ getconf _NPROCESSORS_ONLN