---------------------------------------------------------------------------
Adding extra symbols to a byte stream

Problem: sometimes we have to merge multiple streams into one,
in which case its necessary to provide a way to identify block
boundaries within a stream.

1. From decoding side, the best way is to have length prefixes
for blocks. But at encoding side, it requires either random access
to output file (seek to stream start and write the header), or
being able to cache the whole streams, which is, in general, impossible.

2. Alternatively, we can add length headers (+ some flags) to
blocks of cacheable size. Its surely a solution, but handling is much more
complex than [1], especially at encoding (presuming that i/o operations
are done with aligned fixed-size blocks).
Well, one possible implementation is to write a 0 byte into the buffer,
then stream data until its filled. So prefix byte = 0 would mean that
there're bufsize-1 bytes of stream data next, and !=0 would mean that
there's less... in which case we would be able to insert another prefix
byte if end-of-stream is reached. This would only work with bufsize=32k
or so, because otherwise the block length would require 3+ bytes to store,
and there would be a problem with handling of the case with end-of-stream
when there's only one byte of free space in the buffer.
(One solution to that would be storing 2-byte prefixes to each buffer
and adding 3rd byte when necessary; another is to provide a 2-byte encoding
for some special block lengths like bufsize-2).
Either way its no so good, because even 1 extra byte per 64k would accumulate
to a noticeable number with large files (1526 bytes per 100M). Also hardcoding
of the block size into format is bad too.

3. Escape prefix. Eg. EC 4B A7 00 = EC 4B A7,  EC 4B A7 01 = end-of-stream.
Now this is really easy to encode, but decoding is pretty painful - requires
a messy state machine even to extract single bytes.
But overall it adds least overhead, so it seem that we still need to find
a good implementation for buffered decoding.

3a. Escape prefix with all same bytes (Eg. FF FF FF). Much easier to check,
but runs of the same byte in the stream would produce a huge overhead (like 25%),
and its not unlikely with any byte value chosen for escape code.

3b. Escape postfix. Store the payload byte before the marker - then decoder
just has to skip 1 byte before masked marker, and 4 bytes for control code.
So this basically introduces a fixed 4-byte delay for decoder, while [3]
has a complex path where marker bytes have to be returned one by one.
Still, with [3] encoder is much simpler (it just has to write an extra 0
when marker matches), and this doesn't really simplify the buffer processing.