A random thought that Might Work, Who Knows(tm): first compress long-ish repetit...

A random thought that Might Work, Who Knows(tm): first compress long-ish repetitions in the input and store the copies separately from the literals (like zstd). Then run just blocks of literals through the BWT before entropy coding.

The idea is that the BWT can help take advantage of however much context remains after the copies are snipped out, whether that's one byte or a few, and shrinking the input with the LZ-ish step may make it faster. It might be strictly worse than using PPM or basic context modeling like Brotli's; either of those can be "aware" of the preceding bytes even when they come from copies rather than literals.

It's implied in "successor to bzip2" and a lot of the comments, but it's worth highlighting that high-compression algorithms, especially those that are also slow to decompress, are a pretty specialized niche now. Using zstd or brotli at low to medium settings sometimes speeds things up by reducing network or storage transfers more than their CPU use slows you down. (Especially when your storage transfers are network transfers.) Even when compressing isn't a net speedup, you pay fairly little time for the saved space. Even lowish levels of zstd and brotli often eke out decent compression ratios for big inputs since, with modern quantities of RAM, their MBs-long history windows let them take advantage of long-range matches.