---------------------------------------------------------------------------
On a new LZ codec implementation /3

Its actually not so obvious why CM is slower than LZ.
Its usually implied that CM encodes data bit by bit, while
LZ can encode whole strings, but lzma is based on bitwise
rc anyway, so we can directly compare the counts of rc calls:

 + ccm 5:           size=22004602 rc_count=800,000,000
 + lzma (original): size=24560059 rc_count=236,322,579
 + bwt+bsc_qlfc:    size=20787422 rc_count=214,320,941 (26790117)
 + ppmd -o5 -m8 -r1 size=24400108 rc_count=116,848,981

(rc_count is rangecoder call count, for bitwise coders its equal to number of bits encoded)
This means that lzma's LZ transform already compresses
enwik8 from 100M to 29,540,323 bytes, even before entropy coding.

But from http://encode.ru/threads/379-BWT-with-compressed-input-data?p=7540&viewfull=1#post7540
its known that o1 bitcode can compress enwik8 to 34,736,128 (actually better, because its not huffman),
so this part can be significantly improved in CMs as well.

Anyway, CM and "fast algos" seem to be closer in terms of computation than expected.
But when best CM does 5Mb/s and LZMA decoding does 50Mb/s (with same rangecoder and counters even),
there has to be some explanation.

And that's probably the memory access difference. For LZ all statistics fit into L2, and for CM
they don't. An obvious solution is to make a CM with 4-8M memory usage, and really, PPMd -o5 -m8 -r1
compresses enwik8 to 24412885 at 8.5Mb/s.
But that's still slow comparing to LZ decoding.
Well, statistical modelling _is_ slow too - more or less maximum possible is 2 counters and
simple mixing.

And the usual issue there is that LZ can afford a 1G window at that speed, and CM cannot.
The reason for that being asymmetry - LZ compression can be easily slower than CM, but its
decoder doesn't have to access statistics of all the possibilities.
In that sense, the main difference between LZ and CM is that LZ encodes distances, while
CM encodes word indexes. And the main reason for asymmetry is length coding, which can be
also added to CM - as length of sequence until misprediction (coding of bit with p<0.5),
that would require some masking though.

Also another possibility (for CM asymmetry) is coding a flag for each byte, which would
say whether this byte is referenced in the future. Its kinda hard to think of corresponding
masking for this, except for obvious update exclusion.

LZ way has other benefits too, though - like taking into account distance alignments or
excluding some blocks, or coding with continuations. Unfortunately its basically impossible
to mix these - integrating distance probabilities by referenced string prefix would be
too slow.