Somewhat surprisingly, tracking bytes_out_len makes a large difference in performance. It drops the three_level test time from 9.2s => 8.2s. My best guess is that when adding with Z_SYNC_FLUSH we get a *lot* of small strings, and we loop over it each time we add another string. Real world tests show improvement, too. For mysql, repack=2,nocopy time 59.3=>57.4 For bzr.dev, repack=2,nocopy time 9.6=>9.3