---------------------------------------------------------------------------
Input compression techniques

Memory access became very slow recently (comparing to arithmetics), so now
it makes sense to compress data before storing - probably even page compression
schemes can be useful (again).
But the best opportunity for speed optimization appears when its possible 
to perform required operations directly with compressed data, without decoding.
We can save time on memory access, and additionally any sequential processing
(string comparison/sorting, entropy coding) would require less iterations.
But there's a problem - the needed compression method can't be spatially
adaptive, because it has to provide random access to all the data.

I. Candidate compression methods

1. Static huffman compression (possible o1 or o2) - optimal bitcode size,
good access speed, but changes the lexicographic order of compressed strings.

2. Optimal ordered code - optimal bitcode, which preserves the original alphabet order;
but compression is noticeably worse than with huffman.

3. Arithmetic code - the best compression, and, in theory, preserves the lexicographic
order of strings, but the produced code for the same string depends on initial range value.

4. Arithmetic code with huffman freqs - huffman code provides the optimal freq alignment
to 2^n and then arithmetic code allows to preserve the lexicographic order.

5. Fritcoding - two-stage coding when 1st stage produces bits with known skewed probability
distribution, and 2nd stage encodes fixed-size chunks of skewed bits (aka fractional bits, or "frits").
> tested, but likely not practical

6. Nonprefix bitcoding - its possible to add special "escape codes" and implement a stable
backtracking decoder which won't be limited by huffman's prefix property.
This method provides higher compression than huffman and has comparable encoding and decoding speeds.
Unfortunately alphabet order is not preserved and a fast code generation method is unknown.

7. MSB-like higher order compression. The context order in normal methods (huffman etc) is
limited because of speed/memory issues. But it may be also possible to quantize a longer
context to 8-16 bits (equivalent of o1-o2), which would be more efficient than 8-16 bits of direct context.
The main expected benefit of this approach is that the same code can be used to compress any
kinds of data, so it can be used in sequential compression methods, while huffman coding etc
imply blockwise processing with possible additional redundancy (we may need to encode the
huffman tables if they're required for decoding).
> Compression in MSB_v0 is far from impressive

II. Applications

1. BWT sorting - input compression can be easily applied, and should especially improve
the efficiency of radix sorting.
Contextual compression significantly reduces the size of actual input data (and thus
compared strings etc), but contexts have to be stored with the pointers, which actually
increases the required memory.
Thus, the main current idea is the multiscan approach - compression makes it easy to
split the set of strings into chunks of near-equal size (simply by dividing 0..2^32-1
into necessary number of intervals), which enables us to extract fixed-size BWT chunks 
at each pass - obviously with fixed memory use.
This approach seems especially attractive due to current trend to add more cores to cpus -
sequential buffer scans are much cheaper than random access, and its also possible to
pass the generated BWT chunks to postcoder without waiting for the whole block.

1a. Interval coding - arithmetic coding actually can be used if we won't restrict ourselves
with actual compression. Its possible to encode substrings to fixed-size units (eg. 32-bit),
which can be done in O(N) time recursively, and then all the possibilities of huffman coding
apply, but with improve compression due to entropy coding (it doesn't save the memory,
but still reduces iterations of string comparisons).
> Unfortunately with O(N) coding its hard to determine the capacity of units
> (how many symbols are stored in each AC-coded unit), and when we know the capacity,
> the speed becomes O(N^2) (potentially).

2. LZ matchfinding. LZ with large windows became pretty important, and input compression
allows to significantly (easily 2-3x) increase the window size within the same memory.
But LZ is (normally) sequential, and global static bitcoding looks like a very bad idea.
Splitting the window into multiple blocks with independent compression can solve the
compression ratio issue, but we mostly need a large window to find rare strings, and its
hard to determine whether a string exists anywhere, so with multiple blocks it would be slow.
That's where MSB comes into the picture - its especially compatible with LZ, because
in LZ its possible to look up the short matches directly (in hashtables), so a longer
compression context won't be an issue.

3. CM speed optimization. Input compression directly affects the speed of a bitwise CM,
because it simply reduces the number of bits to model and encode. But in CM lower-order
contexts are much more important, so it may be hard to apply a o4 MSB or something similar.
Still, with hashtables its also impossible to remap statistics to different codes once
per block. But instead, we can use adaptive compression in this case (for symbol decomposition),
which may be a good workaround (ppmd's symbol ranking is an example of that).
However, good CM also needs a data window, so it won't be a wonder if we'd actually need
two different kinds of codes.
Unfortunately adaptive decomposition is not compatible with probability-based counters,
especially fsm counters, so static codes are still important here.

4. Bitmask compression. This is somewhat unrelated to all the previous stuff, but its still
a compression application. The idea is to compress the tree nodes by replacing empty leaves
with bit flags. So to a 256-symbol table, we can add a 32-byte bitmask which would show
which symbols are actually present.

III. Further research

1a. Precise arithmetic code not limited to 2^n freqs.
1b. Arithmetic code with redundancy, to preserve the order of strings despite imprecision.

2. A complete BWT compressor with input compression and multipass radix sorting, preferably GPU-based.

3. LZ with window compression

4. CM with efficient symbol decomposition, bitmask compression of tree nodes, and window compression.