This article will go over the step by step instructions to download an entire website from the Wayback Machine at archive.org. This can be done from the Linux command line, with a lot of options that enable us to download a certain snapshot/timestamp of the website, or all captures available on the Wayback Machine.
We’ll be using the open source project Wayback Machine Downloader available on GitHub. Follow the instructions below to see how to use it and begin downloading a website of your choice from the Wayback Machine.
1. Since Wayback Machine Downloader is written in the Ruby programming language, we’ll need to have Ruby installed on our system. Use the appropriate command below to install it on your Linux distribution.
$ sudo apt install ruby # Ubuntu, Debian, Linux Mint $ sudo dnf install ruby # Fedora, AlmaLinux, CentOS, RHEL $ sudo pacman -s ruby # Arch Linux, Manjaro
2. Once Ruby is installed, execute the following command to install the Wayback Machine downloader.
$ sudo gem install wayback_machine_downloader
3. To download a website, use the following command syntax. The program will generate a
websites directory in your present working directory, and download the specified website in there.
$ wayback_machine_downloader https://example.com
That’s all there is to it, but there are a few handy options that you may like to know about. Take a look at the examples below, and use one or more of the following options in your commands if they suit you.
1. To specify a different directory for the website to download to, use the
$ wayback_machine_downloader https://example.com -d /path/to/download
2. To download all timestamps for a given website, specify the
$ wayback_machine_downloader -s https://example.com
3. You can use the
-f (from timestamp) or
-t (to timestamp) options to download all captures taken FROM a certain point forward, or all captures up TO a certain point. Specify the capture date in the format seen inside a captured URL (i.e. http://web.archive.org/web/20060716231334/http://example.com). Specify one or both of these options in your command if you want to select a specific date range.
$ wayback_machine_downloader http://example.com -t 20100916231334 OR $ wayback_machine_downloader http://example.com -f 20100916231334 OR $ wayback_machine_downloader http://example.com -f 20100916231334 -t 20100614225930
There are even more options available, but these are the commands we found most useful. Refer to the help page or GitHub repo for more information.
$ wayback_machine_downloader --help