<Shelwien> test2019-04-29 22:58:36
<FunkyBob> o/2019-04-30 00:51:48
 yay, people came :)2019-04-30 00:52:07
<Shelwien> ...and went :)2019-04-30 01:48:01
 but we have bots!2019-04-30 01:48:17
<FunkyBob> :)2019-04-30 01:49:10
<Shelwien> btw, what do you think about fuzzy matches and delta matches (like bsdiff)?2019-04-30 01:50:20
 i think these would allow to make something interesting even without entropy coding2019-04-30 01:50:42
 ...and another thing is secondary LZ2019-04-30 01:51:02
<FunkyBob> delta matches?2019-04-30 01:51:53
 I've been considering doing some sort of entropy-lowering matching2019-04-30 01:52:07
<Shelwien> see how bsdiff works2019-04-30 01:52:18
 http://nishi.dreamhosters.com/u/bsdiff_sh3.rar2019-04-30 01:52:29
 its basically LZ2019-04-30 01:52:42
 although its matchfinder is based on BWT SA2019-04-30 01:52:54
 and instead of replacing strings with references2019-04-30 01:53:17
 it subtracts the matches from data2019-04-30 01:53:30
 so full matches just leave lots of zeroes2019-04-30 01:53:39
 ...which gets removed by a RLE pass later2019-04-30 01:54:15
<FunkyBob> hrm2019-04-30 01:54:39
<Shelwien> but it also allows bsdiff to work with fuzzy matches2019-04-30 01:54:52
 if there're some differences in the middle, they just remain in literal buffer2019-04-30 01:55:16
 and when you subtract code in v2 exe from code in v1 exe2019-04-30 01:55:39
<FunkyBob> hrm... similar to my thinking :)2019-04-30 01:55:40
<Shelwien> you get lots of same code, but addr differences in it2019-04-30 01:56:07
 and deltas of these addrs tend to be the same2019-04-30 01:56:23
 so these match deltas are successfully compressed by 2nd LZ pass2019-04-30 01:56:53
<FunkyBob> well, seems you've solved the last problem I had left :)2019-04-30 01:56:57
<Shelwien> but what I also found2019-04-30 01:57:12
* FunkyBob is currently working on a better structure for finding matches for his "basic lz" to make it faster2019-04-30 01:57:33
 is that bsdiff+lzma works much better than just lzma of same data2019-04-30 01:57:57
<FunkyBob> I also need to take some time and work out how to do some sort of "optimal" parsing2019-04-30 01:58:02
 Shelwien: valuable info2019-04-30 01:58:25
<Shelwien> I think its easier to start with dumb bruteforce matchfinding2019-04-30 01:58:54
<FunkyBob> I have2019-04-30 01:59:02
<Shelwien> like, build a digram hashchain2019-04-30 01:59:06
<FunkyBob> and I have a hash -> fixed array system2019-04-30 01:59:10
<Shelwien> and do full scans on it2019-04-30 01:59:12
 uh, not that2019-04-30 01:59:19
<FunkyBob> "digram hashchain"?2019-04-30 01:59:19
<Shelwien> digram = 2 symbols2019-04-30 01:59:48
<FunkyBob> ahhh2019-04-30 02:00:04
<Shelwien> and hashchain is the data structure which is used in most LZ libs, like zlib2019-04-30 02:00:16
<FunkyBob> yus2019-04-30 02:00:27
<Shelwien> where hashtable keeps the address of last context match2019-04-30 02:00:34
 and then a table parallel to data window (pointer per byte)2019-04-30 02:00:54
 keeps previous context occurence addr for each symbol2019-04-30 02:01:19
 so you get a linked list for all occurences of these 2 bytes2019-04-30 02:02:09
<FunkyBob> neat2019-04-30 02:02:27
 right now, I'm going to get dinner :)2019-04-30 02:02:33
<Shelwien> its still faster than a linear scan2019-04-30 02:02:35
 and doesn't lose any matches2019-04-30 02:02:53
 though i've been thinking that it might be possible to force linear scans to become practical for small windows2019-04-30 02:03:37
 like with boyer-moor LUTs, or some vector magic2019-04-30 02:04:14
 btw, you can also look at this: https://encode.ru/threads/3005-TP-LINK-router-config-compression2019-04-30 02:06:47
 i didn't see format quite like that anywhere else (with interleaved bit and byte streams)2019-04-30 02:07:57
 its kinda annoying to encode, because code bytes have to be arranged to follow decoder logic2019-04-30 02:08:37
 but decoding should have the same speed/complexity like normal LZSS2019-04-30 02:09:16
 while supporting variable-size bitfields for distance/length2019-04-30 02:09:49
*** richox has joined the channel2019-04-30 02:59:34
*** unic0rn has joined the channel2019-04-30 03:21:11
<FunkyBob> back2019-04-30 03:41:54
 well, as I'm sure you've seen from my posts and my code, I'm still new to actually coding compression stuff, despite having studied it for a looong time :)2019-04-30 03:45:32
<richox> :)2019-04-30 03:57:45
<Shelwien> there're plenty of people who just invent compression from scratch for themselves2019-04-30 04:01:42
 like mahoney2019-04-30 04:01:50
 so the only requirement for writing a good compressor is programming in C/C++ :)2019-04-30 04:03:26
 everything else can be always reinvented, when you have a hard case of "not invented here syndrome" 2019-04-30 04:04:53
<FunkyBob> well, I'm only one hamming step from mahoney :P2019-04-30 04:05:14
<Shelwien> btw, did you see DCE?2019-04-30 04:05:25
<FunkyBob> ?2019-04-30 04:05:30
<Shelwien> http://mattmahoney.net/dc/dce.html2019-04-30 04:05:41
<FunkyBob> oh, right... yes2019-04-30 04:06:36
 thanks, though2019-04-30 04:06:40
 I'm partly using this exploration as a way to refresh my C skills2019-04-30 04:08:11
 but also... exploring how different changes match up with my mental models2019-04-30 04:08:24
<Shelwien> even if you don't want STL strings and vectors2019-04-30 04:09:56
 these days its a good idea to use C++ rather than C2019-04-30 04:10:10
 most of C code compiles as C++ anyway, except for some typecasts maybe2019-04-30 04:10:38
 but C++ also has templates2019-04-30 04:10:57
 class methods that let you avoid typing ptr-> before every name2019-04-30 04:11:38
<FunkyBob> most of the good bits of c++ I liked are in C now anyway :P2019-04-30 04:11:47
 there is that, yes2019-04-30 04:11:53
<Shelwien> nope, templates are not2019-04-30 04:11:58
<FunkyBob> no, that's true2019-04-30 04:12:08
<Shelwien> and they're very, very helpful for performance2019-04-30 04:12:12
 since without templates2019-04-30 04:12:19
<FunkyBob> my everyday language is Python, so classes are natural to me:)2019-04-30 04:12:20
<Shelwien> code usually ends up with lots of dynamic memory allocation2019-04-30 04:12:33
 and that's crazily slow these days2019-04-30 04:12:52
 and in C++ with templates2019-04-30 04:13:07
 I can just define everything statically2019-04-30 04:13:36
 for example, reflate library2019-04-30 04:13:50
 it has to decode deflate stream, encode it, then encode the diff with original stream2019-04-30 04:14:33
 and then there's recursion2019-04-30 04:14:52
 ie a deflate stream can contain a pdf file2019-04-30 04:15:02
 with its own deflate streams2019-04-30 04:15:07
 so, the reflate library does stream processing2019-04-30 04:15:45
<FunkyBob> I'll look into it2019-04-30 04:16:07
<Shelwien> take input stream, detects deflate in it, unpacks the data + adds diffs for lossless restore2019-04-30 04:16:15
 and there's no dynamic memory allocation at all2019-04-30 04:16:30
<FunkyBob> sounds like what I use AdvanceCOMP for2019-04-30 04:16:41
<Shelwien> well, most of office formats these days use zip containers2019-04-30 04:19:06
 also apk, jar etc2019-04-30 04:19:19
 so any modern archiver needs something like that2019-04-30 04:19:35
 otherwise there's kinda nothing to compress :)2019-04-30 04:19:45
<FunkyBob> yeah, re use xlsx at work, so I repack that2019-04-30 04:21:07
<Shelwien> advancecomp is lossy, right?2019-04-30 04:24:01
 its a different kind of recompression then2019-04-30 04:24:37
 i meant something like https://github.com/schnaader/precomp-cpp2019-04-30 04:26:15
<FunkyBob> no, advancecomp uses a varienty of deflate engines2019-04-30 04:26:43
  -1, --shrink-fast Compress fast (zlib)2019-04-30 04:26:57
  -2, --shrink-normal Compress normal (libdeflate)2019-04-30 04:26:57
  -3, --shrink-extra Compress extra (7z)2019-04-30 04:26:57
  -4, --shrink-insane Compress extreme (zopfli)2019-04-30 04:26:57
  -i N, --iter=N Compress iterations2019-04-30 04:27:14
<Shelwien> i mean, it can optimize the container, but can't compress it, then restore as it was, right?2019-04-30 04:27:20
<FunkyBob> it's solely for recompressing deflate streams2019-04-30 04:27:55
 either on their own, or in png or zip files2019-04-30 04:28:01
<Shelwien> but for a normal archiver, like 7-zip, you need a lossless method2019-04-30 04:28:08
<FunkyBob> it's lossless...2019-04-30 04:28:24
 it's deflate2019-04-30 04:28:26
<Shelwien> yes, but processing is lossy2019-04-30 04:28:42
<FunkyBob> well, you won't get back the original archive ... but you will get back the original decompressed data2019-04-30 04:29:46
<Shelwien> btw http://nishi.dreamhosters.com/u/stegzip_v2.rar :)2019-04-30 04:30:04
<FunkyBob> reminded me... I wanted to ping the ULZ thread for a Linux version2019-04-30 04:30:30
<Shelwien> just compile it?2019-04-30 04:30:45
<FunkyBob> last I saw , no source2019-04-30 04:30:53
<Shelwien> https://github.com/encode84/ulz2019-04-30 04:31:14
<FunkyBob> sorry, meant https://encode.ru/threads/550-Ultra-fast-LZ/page82019-04-30 04:31:41
*** richox has left the channel2019-04-30 04:31:47
 must have missed that post2019-04-30 04:32:58
<Shelwien> btw, as to optimal parsing2019-04-30 04:37:41
 its very simple for this kind of coders2019-04-30 04:37:52
 you just have to make a price/choice array with element per symbol2019-04-30 04:38:33
 and at each position try all possible choices (literal(s)/matches)2019-04-30 04:39:02
 pos[i+len].price = min( pos[i+len].price, pos[i].price + price(current_choice) );2019-04-30 04:40:04
 so when you reach the end of data2019-04-30 04:40:23
 at pos[filesize] you would have your optimize compressed size2019-04-30 04:40:41
 *optimal2019-04-30 04:40:57
 and then you have to do a backward pass to reconstruct the choices required to reach this len2019-04-30 04:41:34
 (choices are actually just stored too when new min is found)2019-04-30 04:41:51
 that's it2019-04-30 04:41:55
<FunkyBob> yeh, I've readt some posts [from cbloom et al] on how... especially since there's no entropy codec2019-04-30 04:43:55
 right now I don't feel so bad, watching how long ulz is taking2019-04-30 04:44:05
 sure, at level 9 it beats lz4, but not for the time2019-04-30 04:46:13
<Shelwien> encode's usual problem is that he uses getc/putc for i/o :)2019-04-30 04:47:09
<FunkyBob> erk... I stopped that some time back :)2019-04-30 04:47:25
 am reading 8MB at a time...2019-04-30 04:47:29
 was 16M, but... reasons :)2019-04-30 04:47:37
<Shelwien> i checked and ulz has blockwise fread, so i guess i was wrong2019-04-30 04:48:16
 but still, that's 16M blocks, which is also bad for performance2019-04-30 04:48:42
 i/o needs to be either async, mmap, or small aligned blocks, at most 64k2019-04-30 04:49:19
 reading or writing unaligned chunks is slower because OS has to align it for you - it only can write whole clusters at once to file2019-04-30 04:50:13
<FunkyBob> given deflate is, what, 32k blocks, and lz4 is 64k...2019-04-30 04:50:19
 I should also test if alighning my 8byte comparison helps2019-04-30 04:50:36
<Shelwien> I solved the i/o problem for myself with coroutines2019-04-30 04:51:00
 so frontend always does aligned i/o2019-04-30 04:51:36
<FunkyBob> so which are your tools?2019-04-30 04:51:37
 yeah, i have a fixed "main.c" I share between my tests2019-04-30 04:52:00
 reads 8M at a time,2019-04-30 04:52:16
<Shelwien> yeah, that's also not too good2019-04-30 04:52:30
 it blocks while reading that2019-04-30 04:52:36
<FunkyBob> point2019-04-30 04:52:47
 but for now it's consistent2019-04-30 04:52:53
<Shelwien> https://github.com/Shelwien/ppmd_sh/blob/master/pmd.cpp#L542019-04-30 04:53:26
 i don't have any bytewise LZs so its hard to compare2019-04-30 04:54:03
<FunkyBob> ah, ppmd2019-04-30 04:54:30
<Shelwien> found some old pictures2019-04-30 05:18:46
 https://sites.google.com/site/shelwien/bcm8e25.png2019-04-30 05:18:49
 (red = memory used, green = bytes read, blue = bytes written)2019-04-30 05:19:06
 (horizontal axis is time in seconds, vertical is volume in bytes)2019-04-30 05:20:25
 https://sites.google.com/site/shelwien/ccm_5.png2019-04-30 05:20:41
 ccm graph is especially nice because its visibile how it takes time to initialize 512M hashtable at start :)2019-04-30 05:21:39
*** richox has joined the channel2019-04-30 05:50:28
<richox> i no longer use c++ after a learned rust2019-04-30 05:51:43
 c++ sucks2019-04-30 05:51:46
<unic0rn> sadly, it sometimes can't be avoided2019-04-30 05:58:09
<FunkyBob> I'd rather avoid language wars in here, if possible :)2019-04-30 05:59:01
<Shelwien> rust uses llvm for compiling, right?2019-04-30 06:01:08
 unfortunately its not good enough yet2019-04-30 06:01:25
 gcc and intelc are better compilers, quite frequently2019-04-30 06:01:43
<FunkyBob> for C, perhaps2019-04-30 06:02:08
<Shelwien> for x86/x64, more like2019-04-30 06:02:31
 llvm is still pretty bad at auto-vectorization2019-04-30 06:03:00
 sometimes it can be better at scalar code - for example, i've got a faster build for precomp with clang2019-04-30 06:03:36
 but overall its still worse2019-04-30 06:04:04
  1.750s 5.687s: raw2unp_gcc820x.exe enwik9.gz x2019-04-30 06:05:14
  1.906s 6.672s: raw2unp_cl601x.exe enwik9.gz x2019-04-30 06:05:14
  1.328s 2.578s: raw2unp.exe enwik9.gz x -- IC19/x64 (1-2.578/3.469)*100 = 25.68%2019-04-30 06:05:15
 for example2019-04-30 06:05:17
 that's my deflate decoder on two 1gb samples2019-04-30 06:05:51
 sure, its getting better with time, but gcc also does2019-04-30 06:06:53
<FunkyBob> on a practical note, what the world needs is better offline deflate and brotle encoders :)2019-04-30 06:07:51
<Shelwien> somehow i dislike brotli a lot2019-04-30 06:08:18
 its encoder levels are unbalanced, thus useless2019-04-30 06:08:43
 we actually tested it for possible integration into our archive format2019-04-30 06:09:38
 but didn't find a use case for it2019-04-30 06:09:51
 either some other codec has better compression2019-04-30 06:10:12
 or brotli encoder is so slow, that nobody would use it2019-04-30 06:10:24
 zstd turned out much more practical2019-04-30 06:11:54
 but then, zstd level profiles are actually generated by a special tool, based on speed/ratio balance2019-04-30 06:12:43
<FunkyBob> yeah, I'm not all that enamoured of it, either... but it's supported2019-04-30 06:13:06
<Shelwien> i want to write my own parsing optimizer for deflate though2019-04-30 06:13:42
<FunkyBob> please do :)2019-04-30 06:14:07
<Shelwien> there won't be any practical use in it though2019-04-30 06:14:43
 there're plenty of these optimizers already2019-04-30 06:15:02
 in fact 7-zip only has an optimizing encoder for deflate2019-04-30 06:15:20
 which is kinda slow2019-04-30 06:15:31
 so at best I can expect combining 7z+defluff results in one coder2019-04-30 06:16:15
 so i'm actually more interested in lzma optimization2019-04-30 06:17:03
 you see, lzma has position context for tokens2019-04-30 06:17:56
 but its matchfinder only collects one most recent match per length2019-04-30 06:18:21
 basically it collects distances for matches from len=2 to 2732019-04-30 06:18:42
 and passes these to optimizer2019-04-30 06:18:56
 but taking into account position context, we'd actually need at least a match per len for each alignment2019-04-30 06:20:15
 and then2019-04-30 06:20:26
 lzma's rep-match system2019-04-30 06:20:37
 actually mostly works because of fuzzy matches2019-04-30 06:21:00
 if you'd look at lzma parsing results2019-04-30 06:22:06
 rep-matches are mostly used to skip "parameters" in patterns2019-04-30 06:23:07
 but lzma matchfinder doesn't have any support for fuzzy matches2019-04-30 06:23:37
 although parsing optimizer does check some specific patterns2019-04-30 06:24:19
 like match-literal-rep_match2019-04-30 06:24:26
 but overall it mostly uses them on accident2019-04-30 06:24:49
 so it won't be surprising if a proper parsing optimizer could improve lzma compression by 10% or so2019-04-30 06:26:00
 at least on structured data, like exes2019-04-30 06:26:08
*** Jibz has joined the channel2019-04-30 07:00:41
 still trying to optimize a counter FSM with SAT2019-04-30 07:09:38
 CBMC now loads the generated script and translates2019-04-30 07:11:11
 but can't reach solving stage - it hits 56G memory usage and quits2019-04-30 07:11:55
 I tried to precalculate some stuff to reduce size of generated source2019-04-30 07:12:53
 but looks like it somehow complicated things for CBMC2019-04-30 07:13:18
*** richox has left the channel2019-04-30 08:45:09
*** schnaader has joined the channel2019-04-30 09:28:09
*** schnaader has left the channel2019-04-30 10:17:30
*** Eppie has left the channel2019-04-30 11:28:17
*** Jibz has left the channel2019-04-30 12:00:55
*** Eppie has joined the channel2019-04-30 13:40:08
<Eppie> I've been working on this for a while, do any of you think anyone will find any value in it? https://github.com/andrew-epstein/paq_history2019-04-30 13:41:15
*** Eppie has left the channel2019-04-30 15:03:03
<Shelwien> "history" actually started from this: http://mattmahoney.net/dc/paq.html#neural2019-04-30 20:09:24
 i think its a good idea to archive them all on github2019-04-30 20:10:03
 although i'd appreciate something more detailed2019-04-30 20:10:54
 like the list of involved ideas (stochastic fsm counters?)2019-04-30 20:12:57
 and their sources2019-04-30 20:13:19
 like, SSE was first implemented by Dmitry Shkarin in ppmonstr, based on SEE in C.Bloom's ppmz2019-04-30 20:14:42
 also, another useful project2019-04-30 20:16:25
 would be splitting paq into individul components (rather than single .cpp source)2019-04-30 20:16:45
 and tracking their individual evolution over paq development2019-04-30 20:17:04
<FunkyBob> Shelwien: so, I've taken your advice, and am starting on a new tac... and, wishing TurboDebugger was still a thing :P2019-04-30 20:39:35
 time to learn me some gdb :P2019-04-30 20:40:08
<unic0rn> there are no bugs, only features ;)2019-04-30 21:13:22
<FunkyBob> well, this feature means my code never gets past the first match lookup :/2019-04-30 21:13:59
<Shelwien> i'm using ida pro these days, its almost like turbo debugger :)2019-04-30 21:44:52
 http://nishi.dreamhosters.com/u/ida_dbg1.png2019-04-30 21:48:18
<FunkyBob> looks purdy2019-04-30 21:58:27
 https://www.hex-rays.com/products/ida/debugger/index.shtml ?2019-04-30 21:58:48
<Shelwien> yes, it can be found in torrents, latest available version is 7.02019-04-30 22:35:31
 source-level debug needs exes compiled with debug-info2019-04-30 22:35:49
 also there's the decompiler plugin otherwise2019-04-30 22:36:20
<FunkyBob> well, gdb has a built in tui, which is not so bad...2019-04-30 22:38:01
<Shelwien> for most of the normal problems its more convenient to write some logs anyway2019-04-30 23:01:46
 but sometimes there're compiler-related problems2019-04-30 23:02:16
 where i need to see actual code to understand what happens2019-04-30 23:02:57
 and in these cases ida is useful2019-04-30 23:03:28
*** maniscalco_ has joined the channel2019-04-30 23:13:07
 hi~~2019-04-30 23:13:16
<maniscalco_> hi Eugene. glad to see a channel is back2019-04-30 23:17:35
<Shelwien> not much talking here, as usual :)2019-04-30 23:18:24
 how's your progress btw?2019-04-30 23:22:01
 your code is kinda scary because of STL though :)2019-04-30 23:22:09
<maniscalco_> what, all of that template stuff? 2019-04-30 23:22:54
 definitely get a speed boost though.2019-04-30 23:23:10
<Shelwien> MT itself is okay, sure2019-04-30 23:23:41
 but pthread is one thing, and std::thread is another2019-04-30 23:24:01
 std::thread is not properly supported by all compilers on all platforms2019-04-30 23:24:25
<maniscalco_> which platforms? (I don't get out much)2019-04-30 23:25:20
<Shelwien> i'm still on windows2019-04-30 23:25:31
 gcc/mingw just plain doesn't have std::thread2019-04-30 23:25:42
<maniscalco_> I've been toying with a different approach to recency coding lately. It looks like it's never going to be quiet able to keep up with MTF because it isn't vectorizable 2019-04-30 23:25:50
 Windows? Why?2019-04-30 23:25:56
<Shelwien> there're some unofficial 3rd-party implementations, but when I compiled msufsort with that, it crashed2019-04-30 23:26:34
<maniscalco_> really. hmm. I'll try to find some portable alternative perhaps2019-04-30 23:27:03
<Shelwien> pthread itself is ok2019-04-30 23:27:13
<maniscalco_> ok, well, that wouldn't be too hard to switch to I suppose2019-04-30 23:27:32
<Shelwien> that one is not supported by VS/IC, but there're working implementations2019-04-30 23:27:51
<maniscalco_> I really have to force myself to make time to finish v4 anyhow2019-04-30 23:28:08
<Shelwien> yeah :)2019-04-30 23:28:14
<maniscalco_> instead I'm screwing around with MTF alternatives ...2019-04-30 23:28:32
<Shelwien> well, BWT certainly needs some kind of solution for compression issue2019-04-30 23:28:57
 there's certainly plenty of potential for transform speed improvement2019-04-30 23:29:24
 but what to compress with it?2019-04-30 23:29:33
<maniscalco_> the logical next advance would be context based but also building up a model2019-04-30 23:29:51
 oh, I had some insights into bit wise bwt. 2019-04-30 23:30:06
 I remeber that we were discussing it a while back. could it be better than 32N2019-04-30 23:30:29
 and it definitely can be ... by a lot2019-04-30 23:30:39
<Shelwien> we tried to use BWT/qlfc in our archiver (basically a refactored bsc, because it uses random access to files)2019-04-30 23:30:45
<maniscalco_> what archiver is this?2019-04-30 23:31:04
<Shelwien> i added MT support and everything2019-04-30 23:31:04
 http://nishi.dreamhosters.com/u/7zdll_vF6.rar2019-04-30 23:31:24
 .pa format2019-04-30 23:31:26
<maniscalco_> I'll take a look2019-04-30 23:31:35
<Shelwien> i just added my selection of codecs to .7z basically2019-04-30 23:32:02
 BWT-wise its nothing interesting really2019-04-30 23:32:33
 though you can look at this: http://nishi.dreamhosters.com/7zcmd.html2019-04-30 23:32:47
 one mostly unknown interesting feature of 7-zip is its support for codec trees with multiple output streams2019-04-30 23:33:36
<maniscalco_> I'm pretty sure that bitwise BWT can be produced with a worst case of 8N space2019-04-30 23:33:50
 codec trees? 2019-04-30 23:34:25
<Shelwien> btw, i have a simple bitwise BWT there2019-04-30 23:35:00
<maniscalco_> expand 1 bit to 1 byte?2019-04-30 23:35:22
<Shelwien> yes, but still can estimate compression with it2019-04-30 23:35:45
 actually i needed it for cdm2019-04-30 23:36:04
 which is a kind of universal recompressor for bitcode formats2019-04-30 23:36:19
 cdm itself uses optimal parsing to detect and compress areas with slightly skewed distribution of 0/12019-04-30 23:36:57
<maniscalco_> there are special nuances of bitwise suffix arrays that allow it to be extremely fast for the second stage of the ITS. That insight, plus the fact that binary m99 can encode/decode in the hundreds of MB/sec single threaded is what motivated me to see if I could produce a superior recency transform.2019-04-30 23:37:27
<Shelwien> and bitwise BWT creates these for formats based on huffman coding2019-04-30 23:37:36
 https://encode.ru/threads/2742-Compressed-data-model?p=52493&viewfull=1#post524932019-04-30 23:37:59
<maniscalco_> I'll take a look. Stepping away, have to put the kids to bed.2019-04-30 23:38:58
<Shelwien> ah, ok2019-04-30 23:39:04
 as to codec trees btw2019-04-30 23:39:53
 7-zip has a system designed like directshow system2019-04-30 23:40:32
 https://en.wikipedia.org/wiki/DirectShow#Architecture2019-04-30 23:41:09
 video streams have comples design with lots of replaceable parts2019-04-30 23:41:43
 container format parser, video/audio/subtitle streams, decoders for these, etc2019-04-30 23:42:29
 and now we kinda have the same situation in compression2019-04-30 23:43:06
 because there're various preprocessing methods for different file formats2019-04-30 23:43:50
 and sometimes preprocessors output multiple streams, which have to be compressed separately, etc2019-04-30 23:44:20
 for BWT thats even more important2019-04-30 23:46:17
* FunkyBob still remembers the derision when BWT was first announced2019-04-30 23:47:33
 derision?2019-04-30 23:48:04
<FunkyBob> yeah... people initially thought it was yet another "infinite compression" scan2019-04-30 23:48:28
<Shelwien> well, I didn't like how they used text preprocessing and compared the results to straight LZ/PPM2019-04-30 23:48:31
<FunkyBob> but upon careful reading, it was clear that's not what they claimed at all2019-04-30 23:48:50
<Shelwien> ah, no, we didn't have that, its a pretty obvious transform, with lots of previously known methods leading to it2019-04-30 23:49:20
 for example, we had static CM2019-04-30 23:49:44
 which generated order* statistics for a file2019-04-30 23:50:02
 then compressed these statistics instead of data2019-04-30 23:50:11
 also, ST2-ST4 are very obvious and easy to make2019-04-30 23:51:18
 so the only real novelty is how simple inverse BWT can be2019-04-30 23:54:36
 but that doesn't affect the compression2019-04-30 23:54:48
 still, there was a lot of excitement for a few years, when it appeared2019-04-30 23:55:59
 and comparing to compression algorithms like LZ77/8 and PPM, its certainly a step forward2019-04-30 23:57:00
 but unfortunately it loses to serial LZ/CM/PPM when they start using different contexts2019-04-30 23:58:15
 like position alignment or fuzzy matches in lzma2019-04-30 23:58:55
<FunkyBob> ok, I've fixed the stalling problem... now I just need to fix the "never finds a match" problem2019-05-01 00:23:43
<maniscalco_> There are improvements yet to be had in BWT. But it's going to require m03 like parsing and building up context models to make any significant advances.2019-05-01 00:24:26
 Plus It has other applications outside of compression2019-05-01 00:24:52
<Shelwien> sure, it even has applications in compression - for efficient data indexing2019-05-01 00:27:56
 i want to try making a LZ matchfinder with it2019-05-01 00:28:18
 by building SAs for small blocks (64k?) and linking them2019-05-01 00:29:08
 for 64k-or-smaller blocks it would be 2N for SA, 2N for links to prev SA, 1N for prefix lengths, 1N for actual data2019-05-01 00:31:03
<FunkyBob> yay, progress!2019-05-01 00:31:57
 first block got compression... 2019-05-01 00:32:02
<Shelwien> :)2019-05-01 00:32:05
<FunkyBob> and fixed2019-05-01 00:32:31
<Shelwien> wanna see the lzma bt4 matchfinder?2019-05-01 00:32:59
<FunkyBob> not sure I can brain enough to grok it right now :)2019-05-01 00:33:48
<Shelwien> http://nishi.dreamhosters.com/u/mf_bt4_v0.rar2019-05-01 00:33:59
 it generates readable output2019-05-01 00:34:19
<FunkyBob> 51,518,3282019-05-01 00:36:32
 is my initial result for enwik82019-05-01 00:36:40
<Shelwien> yeah, better switch to BWT i guess :)2019-05-01 00:37:30
 bsc output for enwik8 is 24M _before_ actual entropy coding2019-05-01 00:37:58
<maniscalco_> what are the steps for BSC again. BWT->QLFC and aynthing else?2019-05-01 00:42:33
<Shelwien> also LZP, segmentation2019-05-01 00:43:01
<maniscalco_> enwik8 is pretty stationary so its not difficult to expect it to be greatly reduced like that before entropy coding2019-05-01 00:43:15
<Shelwien> and QLFC is not just reverse-BWT2019-05-01 00:43:20
<maniscalco_> I'm not overly familiar with QLFC. only in passing2019-05-01 00:43:42
<Shelwien> * reverse-MTF i mean2019-05-01 00:43:44
 its actually reverse-MTF + RLE + VL bitcode + order1 CM2019-05-01 00:44:30
<maniscalco_> reverse mtf? explain please2019-05-01 00:45:35
<Shelwien> http://nishi.dreamhosters.com/u/bsc_usage.txt2019-05-01 00:45:40
<maniscalco_> jesus Eugine. you have a link for everything, don't you?2019-05-01 00:46:06
<Shelwien> that's my file dump, i just upload stuff2019-05-01 00:46:22
 mtf can be better explained in terms of distance coding2019-05-01 00:46:47
 mtf turns symbols into counts of different symbols between two occurrences of current symbol2019-05-01 00:47:32
<maniscalco_> yeah, i'm familar with that. DC is from early 2000s2019-05-01 00:47:37
 Binder, I believe was the guy's name.2019-05-01 00:48:06
<Shelwien> while DC encodes distance not counting new symbols2019-05-01 00:48:17
 and QLFC kinda uses MTF ranks2019-05-01 00:48:44
 but they're counts of new symbols not since last symbol occurrence, like normal2019-05-01 00:49:03
 but _to_ next occurrence2019-05-01 00:49:10
 so statistically its the same MTF2019-05-01 00:49:19
 but decoder knowns the actual symbol when reading its rank2019-05-01 00:49:41
 while normally with MTF we'd read the rank, then decode the symbol from it2019-05-01 00:49:59
 and apparently using actual symbol as context for its rank2019-05-01 00:50:19
 can improve entropy coding a lot2019-05-01 00:50:26
 there've been a recent improvement on that, i guess2019-05-01 00:58:09
 https://github.com/loxxous/Behemoth-Rank-Coding2019-05-01 00:58:10
 its similar, but also vectorized2019-05-01 00:58:28
<maniscalco_> yes, lucas' work.2019-05-01 00:58:55
<Shelwien> so, as to m032019-05-01 01:02:09
 i think it extracts the original prefix context for a symbol2019-05-01 01:02:31
<maniscalco_> it parses the boundaries of each context at each order on order at a time.2019-05-01 01:03:11
<Shelwien> so it becomes possible to compress as good as best prefix CM2019-05-01 01:03:17
 but it still processes symbols not in original order, but rather in order of contexts2019-05-01 01:03:47
 there're certainly some benefits to that2019-05-01 01:04:05
 but unfortunately plain 8-bit text is not that popular anymore2019-05-01 01:04:32
<maniscalco_> so if you had the input `swiss miss` then the BWT is `swm ssiiss` 2019-05-01 01:04:33
 The order zero context bounaries are trivially easy to locate for both the encoder and decoder:2019-05-01 01:04:52
<Shelwien> so the question is how to modify BWT2019-05-01 01:05:07
 to at least compress 16-bit alphabet for unicode2019-05-01 01:05:17
 or 2D for pictures2019-05-01 01:05:26
<maniscalco_> s|wm| |ssiis|s2019-05-01 01:05:44
<Shelwien> yeah, as i mentioned before, there've been some static CM attempts2019-05-01 01:06:23
<maniscalco_> there's no reason this couldn't be used for 16 bit symbols as well, really2019-05-01 01:06:46
 you do need a bit more space since symbols are now 2 bytes wide but I think that's it2019-05-01 01:07:08
<Shelwien> for 16bit i think there'd be scaling issues because of cpu cache size2019-05-01 01:07:17
 where with 8-bit symbol everything is nicely in L12019-05-01 01:07:31
 with 16-bit symbols it won't fit2019-05-01 01:07:41
<maniscalco_> sure, there would be a performance issue but in terms of algorithmic performance, its all the same2019-05-01 01:08:02
<Shelwien> anyway, ideally we'd need a generalized transform2019-05-01 01:08:51
 which would sort symbols based on whatever function of context2019-05-01 01:09:12
 its actually possible to implement like ST2019-05-01 01:09:36
<maniscalco_> I thought about that in the past. Somehow consider distance in the sort2019-05-01 01:09:45
<Shelwien> simple compute the index for current symbol and output the symbol to stream[index]2019-05-01 01:10:12
<maniscalco_> so that the transform wouldn't attempt to associate 'similar' contexts that are a million miles appart2019-05-01 01:10:15
<Shelwien> well, that's an big issue with BWT compression of sorted dictionaries :)2019-05-01 01:10:42
<maniscalco_> exactly. and why st4 does better2019-05-01 01:10:53
 it's 'too stupid' to see the really good correlations that are toxic in reality2019-05-01 01:11:41
<Shelwien> also, there was an interesting algorithm for serializing of pictures2019-05-01 01:11:59
<maniscalco_> so it chooses whatever is close by, which in the case of dictionaries, IS the most similar2019-05-01 01:12:02
<Shelwien> https://www.youtube.com/watch?v=cGGzZY8OYts2019-05-01 01:12:24
<maniscalco_> the repo on my github does the full m03 parsing. it wouldn't be hard to modify it to do 16 bit symbols instead.2019-05-01 01:13:18
 that would give you a really good idea as to how much it would degrade2019-05-01 01:13:38
<Shelwien> yeah, but we need it to work not much slower than 8-bit, otherwise there's no point2019-05-01 01:13:53
<maniscalco_> only one way to find out2019-05-01 01:15:10
<Shelwien> i got reminded of that idea by NNCP2019-05-01 01:15:50
 it has a self-made preprocessor which extends the alphabet to 16 bits2019-05-01 01:16:17
 then iteratively finds most frequent strings and assigns them to new symbols2019-05-01 01:16:36
 https://bellard.org/nncp/2019-05-01 01:17:05
 https://encode.ru/threads/2702-Adaptive-Lossless-Prediction-impl-in-C2019-05-01 01:21:10
<maniscalco_> so m03 has almost no performance loss for 16 bit symbols2019-05-01 02:05:26
 at least for the parsing2019-05-01 02:05:44
 I kept the BWT the same (8 bit) and then widened the BWT output to 16 bits per symbol2019-05-01 02:06:09
 and process as if they were always 16 bit symbols2019-05-01 02:06:17
 results for 16 bit enwik8: 2019-05-01 02:06:47
 ./build/release/bin/m19 3 ~/programming/testset/enwik/enwik8 m19 - next generation BWT compressor. Author: M.A. Maniscalco (2018 - 2019) *** This build is pre-alpha and demonstrates both M19 and M03 context parsing only *** max order = 5317 elapsed time: 8288 ms, 11.5067 MB/sec2019-05-01 02:06:49
 and 8 bit normal enwik8:2019-05-01 02:07:14
 ./build/release/bin/m19 3 ~/programming/testset/enwik/enwik8 m19 - next generation BWT compressor. Author: M.A. Maniscalco (2018 - 2019) *** This build is pre-alpha and demonstrates both M19 and M03 context parsing only *** max order = 5317 elapsed time: 8035 ms, 11.869 MB/sec2019-05-01 02:07:15
 +250ms2019-05-01 02:07:22
 about 2 sec of that time is just the BWT time2019-05-01 02:07:53
<Shelwien> can you try it with nncp preprocessor?2019-05-01 02:08:23
 https://bellard.org/nncp/nncp-2019-04-13.tar.gz2019-05-01 02:08:45
 just compiler preprocess.c2019-05-01 02:09:21
 then run .\preprocess c words.txt enwik9 enwik9.bin 65500 22019-05-01 02:09:56
<maniscalco_> and come to think of it, that 250ms is probably the overhead for widening the array2019-05-01 02:13:58
 will do. just pushing the change to m03/19 first2019-05-01 02:14:16
 https://github.com/michaelmaniscalco/m19.git just in case anyone wants to confirm for themselves2019-05-01 02:22:27
<Shelwien> post bwt its less interesting, especially if you just expand 8 to 16 bits2019-05-01 02:24:31
 but that preprocessor can really use the full 16-bit alphabet2019-05-01 02:25:01
<maniscalco_> nncp-2019-04-13$ make gcc -O3 -Wall -Wpointer-arith -g -fno-math-errno -fno-trapping-math -MMD -Wno-format-truncation -c -o nncp.o nncp.c gcc -O3 -Wall -Wpointer-arith -g -fno-math-errno -fno-trapping-math -MMD -Wno-format-truncation -c -o cp_utils.o cp_utils.c gcc -O3 -Wall -Wpointer-arith -g -fno-math-errno -fno-trapping-math -MMD -Wno-format-truncation -c -o arith.o arith.c gcc -o nncp nncp.o cp_utils.o arith.o libnc.a -l2019-05-01 02:25:12
<Shelwien> ?2019-05-01 02:25:29
 no need really2019-05-01 02:25:33
<maniscalco_> libnc needs to be compiled with PIC2019-05-01 02:25:46
 Sure, its more interesting with 16 bit BWT. But I can't just write one of those up in 20 minutes (^:2019-05-01 02:26:42
<Shelwien> yes, i mean, just type "gcc preprocess.c"2019-05-01 02:26:43
<maniscalco_> k2019-05-01 02:26:49
 gcc preprocess.c /tmp/ccdVrLNV.o: In function `get_n_bits': preprocess.c:(.text+0x11c2): undefined reference to `log2' preprocess.c:(.text+0x12a6): undefined reference to `log2' /tmp/ccdVrLNV.o: In function `compute_score': preprocess.c:(.text+0x13b9): undefined reference to `log2' /tmp/ccdVrLNV.o: In function `compute_entropy': preprocess.c:(.text+0x1b16): undefined reference to `log2' preprocess.c:(.text+0x1c17): undefined2019-05-01 02:27:15
 log2 as in the math function ?2019-05-01 02:27:31
<Shelwien> its probably in headers, let me see...2019-05-01 02:28:03
 well, just define it then2019-05-01 02:28:54
<maniscalco_> added -lm at the end and it worked2019-05-01 02:29:36
<Shelwien> cool :)2019-05-01 02:29:44
<maniscalco_> no words.txt file though2019-05-01 02:34:32
<Shelwien> its generated by preprocessor2019-05-01 02:35:04
<maniscalco_> how long should the preprocessing take?2019-05-01 02:40:59
<Shelwien> algorithm is dumb, so not fast2019-05-01 02:42:07
 depends on last number - its a cut-off frequency2019-05-01 02:42:18
 the higher it is, the earlier it would quit2019-05-01 02:42:41
<maniscalco_> regarding m03 and m19 with 16 bit symbols. I can't test it with real 16 bit data because the algorithm only works for *real* bwt data. if I made fake 16 bit input it wouldn't work.2019-05-01 02:46:08
 To test a real world input I would need a BWT that does 16 bit symbols. Do you know of any?2019-05-01 02:46:45
<Shelwien> i can quickly make a qsort one2019-05-01 02:46:59
<maniscalco_> it would allow us to test the parsing with real data. although it shouldn't affect m03/19 performance at all in any case2019-05-01 02:47:46
 right about now we should be hearing the ghost of David Scott saying "you should be using BWTS instead"2019-05-01 02:49:32
 (^:2019-05-01 02:49:34
<Shelwien> http://nishi.dreamhosters.com/u/BWT16.cpp2019-05-01 02:51:51
 i actually added BWTS and bitwise BWTS to .pa, for cdm2019-05-01 02:52:19
 turned out to be the perfect transform for that kind of thing2019-05-01 02:52:46
<maniscalco_> thanks for the BWT .. still has an issue though. M03/19 needs to know the sentinel position in the BWT2019-05-01 03:04:49
 otherwise it will fail2019-05-01 03:04:56
 so it has to be a BWT that uses $ 2019-05-01 03:05:13
 and the preprocessor is still working away ...2019-05-01 03:05:38
<Shelwien> yeah, i also started it:2019-05-01 03:07:06
 Z:\003>timetest .\a c words.txt enwik9 enwik9.bin 65500 2562019-05-01 03:07:08
 input: 1000000000 bytes2019-05-01 03:07:08
 after case/space preprocessing: 1119415035 symbols2019-05-01 03:07:08
  4396 280965958 3483463112019-05-01 03:07:08
 first number is the number of allocated symbols2019-05-01 03:07:24
 http://nishi.dreamhosters.com/u/BWT16.rar2019-05-01 03:21:55
 more complete version with index and unBWT2019-05-01 03:22:14
 and it kinda uses EOF, if you mean the comparison wrap2019-05-01 03:22:54
 prev version sorts in reverse order2019-05-01 03:29:19
 but this one sorts in forward order2019-05-01 03:29:29
 and both don't wrap around, just stop at end of text buffer2019-05-01 03:29:48
<maniscalco_> ok. I'm going to sign off for the night and leave the preprocessor running overnight. hopefully it will be done by the morning. I'll let you know2019-05-01 03:31:24
<Shelwien> ok :)2019-05-01 03:31:39
 !next2019-05-01 03:34:42