--------------------------------------------------------------------------- On a new LZ codec implementation /4 - flag per data byte for reference markup - Is there a way to find a longest recent prefix for given substring? (newer than substring itself) * its possible to do aligned matching, eg. check hashtable for 8-byte string prefix, if there's a newer offset, length is >=8 bytes, etc. Also the symbol after match can't be the last symbol of match, so some length values can be skipped. We'll end up with unary coding for lengths like that, i guess. * an interesting idea is to decode only distances first * if we'd encode far references with absolute offsets instead of distances, the length masking would automatically apply to all occurrences of similar strings with any other offsets. Or, actually, it would become a probability competition - other occurences with shorter distance codes would have priority for smaller lengths. Thus it would basically become something like PPM with "length until misprediction" coding. There's an interesting difference from PPM though - after identifying an unique string there'd be no need to continue coding its symbols. And another - the reverse masking trick applies here too, so the first symbol from next literal/match would be useable for skipping some length values. Length coding is not the only asymmetric thing there though, now that I think about it, there's also id coding (literal/match)... or is literal the same as length=1?.. But still, I guess it might be interesting to make such a PPM/LZ hybrid - the "match" concept might really simplify modelling, comparing to CM PoV. I mean, if we're indentifying a string in the window by coding its prefix, then its natural to only update statistics in prefix context... same as with distance model in LZ. * as to alignment context in PPM-like coding, I guess it could still apply as context