[[!tag rust summain]]
I've been learning Rust lately. As part of that, I rewrote my summain program from Python to Rust (see summainrs). It's not quite a 1:1 rewrite: the Python version outputs RFC822-style records, the Rust one uses YAML. The Rust version is my first attempt at using multithreading, something I never added to the Python version.
Results:
- Input is a directory tree with 8.9 gigabytes of data in 9650 files and directories.
- Each file gets stat'd, and regular files get SHA256 computed.
- Run on a Thinkpad X220 laptop with a rotating hard disk. Two CPU cores, 4 hyperthreads. Mostly idle, but desktop-py things running in the background. (Not a very systematic benchmark.)
- Python version: 123 seconds wall clock time, 54 seconds user, 6 second system time.
- Rust version: 61 seconds wall clock (50% of the speed), 56 seconds user (104%), and 4 seconds system time (67&).
A nice speed improvement, I think. Especially, since the difference
between the single and multithreaded version of the Rust program is
four characters (par_iter
instead of iter
in the process_chunk
function).