---------------------------------------------------------------------------
Transfer+Storage solutions

1. Offline compression + realtime decoding

Sometimes we can prepare the data for downloading etc,
including installers and such.
In this case asymmetric methods with slow compression
can be used, and also some kinds of dictionaries built
for random access.

1.1. Sequential access

Many fast methods are blockwise, so they require
to download a block first, then decode it.
This can still run in parallel with download, but
there's a considerable delay.
But actually its most convenient when we can immediately
decode some data from a few available compressed input bytes.

1.2. Sequential multi-threaded

Sometimes we want to read/decode multiple streams of
data and produce a single output stream.
It can be implemented with large block random access methods,
but that's a bit redundant.
A more efficient solution could use the data from neighbour
streams with some delay (not the most recent data).

1.3. Random access

If its random access, we have to be able to decode each
block of data independently, and these blocks are usually
small (eg. 4k).
But with offline compression we can build a common dictionary
which can be used within all blocks.
There're commonly lots of other specifics though, like
arithmetic coding allowing to preserve the lexicographic
order of compressed records, end-of-record coding in this case,
and dictionary references breaking the lexicographic order.

2. Realtime compression + decoding

Again, there's a delay problem, but there're also others,
like having high compression speed, using much less memory
(to allow compressing multiple streams at once), inability
to use multi-pass processing and analysis.
Also sometimes there's a fixed maximum bitrate known for the
channel, which is a problem, because in offline schemes 
we can discard a whole compressed file if its expanded,
but its not that easy in no-delay online systems.
Basically, we have to start by sending uncompressed data,
while measuring its redundancy in background, and a larger
delay allows to make it more efficient.
Btw, sometimes the input data precisely fills the channel bitrate,
so its impossible to even add a flag showing whether its uncompressed.
Also, in such a case, the data redundancy itself cannot be
used for data type detection as incompressible data would
likely be indistinguishable from compressed data with other
methods too.
So there's likely no perfect solution, but one practical
possibility is using some kind of (relatively long)
signature for compressed blocks, so that a chance of
uncompressed data having the same signature would be low
(for example, a hash of compressed data).

2.1. Sequential access

Here we just need the best compression method with given
(low profile) speed and memory requirements (and low delay),
but without much further quirks, so many existing codecs
apply.

2.2. Sequential multi-threaded

With the lack of (offline) analysis, reusing the data known
to be received on other side becomes much more important,
even some feedback from decoder can be used on transmitting
side (how much data is decoded and available for use as context).

2.3. Random access

Its the simplest case, because all we can do is separately
compress the blocks with best available codec (within given
constraints).
But at least, there's no no-delay requirement here, so
there're more fitting codecs (like LZ and BWT) and a simpler
API can be used, working with whole blocks.
Still, its usually possible to improve things with static
analysis, like adding a biggest possible dictionary to the
codec itself etc, so its like a static version of [1.3].

3. Bidirectional protocols

 - updates (local diffs)
 - online backup (remote diffs)
 - p2p source finding (finding matches without actual data)
 - file recovery after transmission
 - file recovery without communication

3. Existing codecs and file compressors

 - experimental LZ77 codecs

There're many of these, and they mostly show better results
than "professional" codecs like zlib, but they commonly also
don't have any API usable for applications listed above, use
a lot of memory, compression can be slower than CM, or
compression ratio is too bad, and they frequently work with
large blocks (sometimes whole files).
Now, imagine that we accept a request to transmit a file,
then we have to compress it, which takes some time, then
transmit, then decode, and these times add up instead of
overlapping, which is bad however small they are, because if
the transmission speed is slow enough (and its realistically
slow - like even 3-5MB/s), the delay makes the overall
results worse than much slower CM with no delay and better
compression.

Tlz = S/Csp1 + S*Crt1/Tsp + S/Dsp1 vs
Tcm = Max( S/Csp2, S*Crt2/Tsp, S/Dsp2 )

Now, lets assume that  Csp1=Dsp1=100MB/s,
Crt1 = 0.314 (etincelle LTCB = 314801710),
Csp2=Dsp2=5MB/s, Crt2=0.183 (ccm LTCB = 182784655)

Now, let's solve Tlz < Tcm (find when LZ results in faster transmission)
Tlz = 2/100000000 + 0.314801710/x
Tcm = Max( 1/5000000, 0.182784655/x )

// http://www.wolframalpha.com/input/?i=Solve[+0.182784655/x+%3C+1/5000000,+x]
0.182784655/x < 1/5000000   

x > 913923:
// http://www.wolframalpha.com/input/?i=Solve[+2/100000000+%2B+0.314801710/x+%3C+1/5000000,+x]
2/100000000 + 0.314801710/x < 1/5000000
x > 1748900

x>0 && x < 913923:
// http://www.wolframalpha.com/input/?i=Solve[+2/100000000+%2B+0.314801710/x+%3C+0.182784655/x,+x]
2/100000000 + 0.314801710/x < 0.182784655/x
no solution

So, if the transfer speed is less than 1748900 bytes per
second (which is ~15mbps), appears that we have smaller times
using a 20x slower CM with 70% better compression.

Well, naturally, de/compression speeds faster than transfer
speed don't make sense for methods which can be overlapped
with transfer.

 - experimental ROLZ codecs

In a way, a good tradeoff, but they inherit the bad parts
both from CM (symmetry) and LZ (bruteforce parsing
optimizations, multiple encoding redundancy, inability to
handle complex data syntax), so don't look really good
comparing to BWT.

 - symbol ranking

Hardly make much sense usually because compression loss in
such schemes is significant (ranking + simple fast
postcoder) comparing to CMs using the same data structure,
but ranking itself is not quite fast, so speed gain isn't
worthy.

 - BWT codecs

These commonly offer the best ratio/memory/speed trade-off,
also they can be partly overlapped with transmission
(postcoding parts), but they're naturally blockwise, with
large block sizes too, to make sense, and are only really
good for specific types of data (mostly text).

 - PPM codecs

For now, there's only one known near-practical
implementation - ppmd, but there're still things to solve -
like how to deal with the compression+time loss on memory
overflows, and what to do with "worst cases" where
processing becomes too slow (because of linear alphabet
scanning), or compression deteriorates due to misdetection
of main model order.
Still, this class seems to have some potential, if we'd find
solutions for its problems.

 - Bitwise CM codecs

It seems most practical for now (ccm,m1), but having to
process all data bits is really troublesome.

 - Bytewise CM codecs

Share all the problems of PPM, with a "bonus" of being
slower by definition (mixing is used instead of plain
statistics), but show better compression.

 - Future possibilities

The optimal hybrid could be a large alphabet CM (but with
bitwise coding) with parsing of specific data formats (eg.
text is a sequence of words). using direct access/trees for
statistic lookups, and with a LZ preprocessor for removal of
long/far matches.