*** NCDR has joined the channel2019-05-06 07:58:56
*** NCDR has left the channel2019-05-06 08:10:29
<FunkyBob> o/2019-05-06 12:54:24
<Shelwien> ?2019-05-06 12:56:53
<FunkyBob> waving hello2019-05-06 13:02:16
 just got to day1 of the PyCon sprints2019-05-06 13:02:33
 what timezone are you in, Shelwien ?2019-05-06 13:02:41
*** maniscalco_ has left the channel2019-05-06 13:08:29
<Shelwien> utc+22019-05-06 13:14:42
 just tested lztp with enwik82019-05-06 13:15:02
 32k window: 40,674,4122019-05-06 13:15:25
 64k window: 40,570,2242019-05-06 13:15:47
 greedy parsing2019-05-06 13:15:54
<FunkyBob> lztp?2019-05-06 13:16:00
 what's the algorithms?2019-05-06 13:16:27
<Shelwien> a mostly-bytewise lz scheme from tplink router2019-05-06 13:16:31
<FunkyBob> ah2019-05-06 13:16:35
 that's quite impressive2019-05-06 13:16:39
<Shelwien> i reverse-engineered it some time ago2019-05-06 13:16:47
 its kinda like lzss2019-05-06 13:16:57
 but with an interesting quirk2019-05-06 13:17:09
<FunkyBob> ulz can get about that, in ~1min on my machine2019-05-06 13:17:16
<Shelwien> d16 encoding takes 6s on enwik82019-05-06 13:17:57
<FunkyBob> d16?2019-05-06 13:18:36
<Shelwien> 1<<16 window2019-05-06 13:18:53
<FunkyBob> ah2019-05-06 13:18:57
<Shelwien> aka 64k2019-05-06 13:18:58
<FunkyBob> for reference, how long does gzip -9 take?2019-05-06 13:19:13
<Shelwien> 5.4s2019-05-06 13:19:40
 4.1s for lztp with d152019-05-06 13:20:43
<FunkyBob> hrm2019-05-06 13:21:21
<Shelwien> anyway, what i wanted to say2019-05-06 13:21:27
<FunkyBob> gzip -9 enwi8 for me takes ~29sec2019-05-06 13:21:34
<Shelwien> dunno, its not at it best atm2019-05-06 13:21:50
<FunkyBob> mmm?2019-05-06 13:21:50
<Shelwien> i have 4 threads of optimizer running2019-05-06 13:22:00
 ...is that lzss just uses packed bits for literal/match flags2019-05-06 13:22:28
 while lztp developer went a step further2019-05-06 13:22:47
<FunkyBob> hmm... must have had my laptop unplugged when I ran that.. it's only ~17sec now2019-05-06 13:23:08
 Shelwien: mmm?2019-05-06 13:23:16
<Shelwien> and turned it into an actual interleaved bitstream2019-05-06 13:23:16
 in lzss its just a flag byte2019-05-06 13:23:34
<FunkyBob> I'm toying with the idea of a larger history window, perhaps 20 bits2019-05-06 13:23:41
 ah, so a byte indicating the next 8 terms?2019-05-06 13:24:02
<Shelwien> but in lztp they also put bits of variable-length len/dist there2019-05-06 13:24:10
<FunkyBob> I was thinking I could use the first nibble for (match, size of len, 2bit size-of-offset)2019-05-06 13:24:36
<Shelwien> and then there's a need to rearrange bytes from bit stream and byte stream2019-05-06 13:24:45
<FunkyBob> where offset would be 4bit, 12bit, 20bit, or rep2019-05-06 13:24:46
<Shelwien> in order of how decoder would read them2019-05-06 13:24:52
<FunkyBob> ah, yes, I recall reading something of this2019-05-06 13:25:08
<Shelwien> but otherwise i think this scheme has potential2019-05-06 13:25:14
<FunkyBob> clearly2019-05-06 13:27:12
<Shelwien> btw, i was talking about this: https://encode.ru/threads/3005-TP-LINK-router-config-compression2019-05-06 13:30:02
<FunkyBob> yeah, I recall the thread2019-05-06 13:33:59
 am building up to trying a more "optimal" parsing... forward scan, reverse walk thing2019-05-06 13:58:22
 clearly it'll be slower, as it will require a match scan at _every_ byte2019-05-06 13:58:41
<Shelwien> yes, but only at encoding, so that's ok2019-05-06 14:13:31
 the common optimization is like "fb" thing in lzma2019-05-06 14:13:52
 you can set the threshold match len for greedy encoding2019-05-06 14:14:07
 ie if len>32, just encode the match2019-05-06 14:14:17
 its also efficient for splitting the data into separately optimized segments2019-05-06 14:14:44
 so lzma manages to be streamable despite optimal parsing2019-05-06 14:15:37
 also, i'd suggest looking into BWT2019-05-06 14:16:33
 its the most efficient way to find all matches at once :)2019-05-06 14:16:58
<FunkyBob> :)2019-05-06 14:17:22
 have just tried gating lazy parsing on match < 32... no noticeable difference for enwik8... trying 92019-05-06 14:21:54
 clearly I need to pick a different cutoff point... but, yah, I see the potential2019-05-06 14:33:50
<Shelwien> there's also usually a depth limit on hash chain ("mc" in lzma)2019-05-06 14:35:17
<FunkyBob> currently I search back as far as the window allows2019-05-06 14:43:06
<Shelwien> yeah, but that can fail on redundant data2019-05-06 14:44:02
<FunkyBob> "fail" how?2019-05-06 14:44:26
<Shelwien> like, generate the file like "abcd" x 10000002019-05-06 14:44:29
 and test the speed2019-05-06 14:44:32
<FunkyBob> right,o k2019-05-06 14:44:44
 degenerate case2019-05-06 14:44:47
<Shelwien> unfortunately its surprisingly common2019-05-06 14:45:04
 just consider filetypes like bmp or wav2019-05-06 14:45:13
 or some log files2019-05-06 14:45:26
<FunkyBob> hrm2019-05-06 14:46:06
<unic0rn> or XPADDING in exe2019-05-06 14:47:12
 altough that's not that long2019-05-06 14:47:25
<Shelwien> exes too, yeah2019-05-06 14:48:16
 they tend to have lots of tables and unrolled code2019-05-06 14:48:26
<FunkyBob> ok, so, thinking out aloud... if I scan forward, and at each position store the offset of the best match, but at the _end_ [here + match len] store the len... I can traverse back easily by knowing how far back to look for the match2019-05-06 15:12:26
 and only update the length if it's longer than the preveious stored one2019-05-06 15:12:48
<Shelwien> yes2019-05-06 15:14:33
 well, in general its slightly different2019-05-06 15:15:08
<FunkyBob> then again, I can store the full match details 2019-05-06 15:15:11
 at the offset of then end of the match2019-05-06 15:15:22
 Shelwien: oh?2019-05-06 15:15:26
<Shelwien> 1) you have to replace the token if its codelen is smaller, rather than just match len2019-05-06 15:15:46
<FunkyBob> yes2019-05-06 15:16:01
 but for now, all my tokens are the same size2019-05-06 15:16:09
<Shelwien> for example, shorter match might have shorter code2019-05-06 15:16:10
 2) you also have to count literal runs the same way2019-05-06 15:16:28
<FunkyBob> shall we say "only update if it's of a greater saving"2019-05-06 15:16:33
<Shelwien> yes2019-05-06 15:16:45
 its basically recursion2019-05-06 15:17:27
<FunkyBob> interesting... lz4 works on a max of 4MB blocks2019-05-06 15:18:07
*** danlock has joined the channel2019-05-06 16:08:16
<Shelwien> hi2019-05-06 16:09:55
<danlock> Greetings. I'm just looking around; I just came from the encode forums.2019-05-06 16:12:07
<FunkyBob> o/2019-05-06 16:12:50
 I'm playing around with possible lit/mat/offset encodings... find something decent that's not just "LZ4 again"2019-05-06 16:14:44
<danlock> :) I appreciate the welcome. I don't have any compression thoughts or questions right now, though. @FunkyBob good luck!2019-05-06 16:15:10
<Shelwien> FunkyBob: btw, how about making a lookup table for actual len values?2019-05-06 16:16:00
 danlock: it doesn't really have to be compression2019-05-06 16:16:24
 we just had a discussion about AI and quantum mechanics :)2019-05-06 16:16:42
<FunkyBob> Shelwien: hrm2019-05-06 16:16:56
<Shelwien> like, all len values up to 16 or so, and then something exponent-based2019-05-06 16:17:40
 and pack the len to 5 bits or so, the use saved bits for flags2019-05-06 16:18:11
<danlock> It's good to know I'm in here with like-minded people.2019-05-06 16:18:49
<FunkyBob> Shelwien: I wonder how much I'd lose by lowering match resolution as length grows...2019-05-06 16:20:16
<Shelwien> based on freqtables that i made - not much2019-05-06 16:20:51
<FunkyBob> like you were daying, exponent based.... encode unit + multiplier2019-05-06 16:21:06
 so 0001 would mean (1 * 1), but 0110 would be (4 * 2)2019-05-06 16:21:53
<Shelwien> well, you'd need all short lens, up to 16 or so2019-05-06 16:22:12
 but then it doesn't really matter whether its 32 or 332019-05-06 16:22:28
 especially with added support for rep-matches2019-05-06 16:22:42
<danlock> If the len were packed to ~5 bits, would that give you 3 bits for flags or 11? I guess I'm thinking of 8-bit bytes, which would mean your uncompressed len value would comprise two bytes, right?2019-05-06 16:27:35
<Shelwien> his current format is like this:2019-05-06 16:28:05
<FunkyBob> danlock: yeah, for now I'm trying to stick with byte-aligned2019-05-06 16:28:08
<Shelwien> 0LLLLLLL = literal run2019-05-06 16:28:16
 1LLLLLLL DDDDDDDD DDDDDDDD = match for 64k window2019-05-06 16:28:43
 and a known useful thing is rep-match2019-05-06 16:29:09
* danlock nods2019-05-06 16:29:38
 that is, a match with same distance as previous match2019-05-06 16:29:39
 10LLLLLL = rep-match2019-05-06 16:30:22
 11LLLLLL DDDDDDDD DDDDDDDD = normal match2019-05-06 16:30:34
<FunkyBob> danlock: my current conclusion as to why I lose to lz4 is they imply the match/literal field by _always_ having (literal run, match, offset)2019-05-06 16:30:43
 danlock: and can encode short runs in 4bits, where I can't2019-05-06 16:31:02
 Shelwien: will try2019-05-06 16:31:59
<Shelwien> another interesting thing2019-05-06 16:32:29
<danlock> Putting something into the spec is one way of reducing the number of things for the encoder to keep track of with extra bits2019-05-06 16:32:31
<Shelwien> is that although rep-match is a simple straight optimization2019-05-06 16:32:53
 which is also the reason for the near-useless "rep stack" in many codecs2019-05-06 16:33:26
 but the main compression improvement from rep-matches actually comes from fuzzy matches2019-05-06 16:34:01
 for example, book1 file has a <P ##>\n page number markup2019-05-06 16:34:32
 so lzma encodes it as match:"<P 5" literal:"2" rep-match:">\n"2019-05-06 16:35:10
 and since you don't yet have large dictionary or complex match-finding structures2019-05-06 16:35:48
 it means you can continue comparing strings after first mismatch2019-05-06 16:36:08
 ie _properly_ look for fuzzy matches2019-05-06 16:36:24
 where most coders just use them accidentally2019-05-06 16:36:42
 when its the only choice2019-05-06 16:36:51
<danlock> passive vs. active2019-05-06 16:37:53
<Shelwien> danlock: which color scheme do you like more? http://nishi.dreamhosters.com/chanloglist.pl http://nishi.dreamhosters.com/chanloglist.pl?&cs2 2019-05-06 16:38:09
<danlock> I like chanloglist.pl better than chanloglist.pl?&cs2, but people with dimmer displays might prefer the latter.2019-05-06 16:39:50
<Shelwien> :)2019-05-06 16:40:23
<FunkyBob> think I'll do some more debug analysis to see what rep-offset might gain me... as well as match sizes2019-05-06 16:42:02
 and lit lens2019-05-06 16:42:05
<Shelwien> might be better to have optimal parsing first2019-05-06 16:42:52
<unic0rn> FunkyBob: just curious, what's your current code size?2019-05-06 16:43:28
<Shelwien> 13k with mingw2019-05-06 16:45:36
<unic0rn> nah, i meant the source code2019-05-06 16:45:52
<Shelwien> around 7-8k2019-05-06 16:46:06
<unic0rn> i've mentioned 14k recently i think, but i was wrong. saw a status line, didn't realize it's from saving another file. actual encoder-in-progress is below 6k so far2019-05-06 16:46:32
<FunkyBob> 14464 stripped on x86-64 linux2019-05-06 16:46:40
<unic0rn> either i'm slow or turning it into rocket science.2019-05-06 16:46:49
<FunkyBob> 5728 for the C file2019-05-06 16:47:22
<unic0rn> there's no entropy coder?2019-05-06 16:48:09
<FunkyBob> no2019-05-06 16:48:13
<unic0rn> not yet? :)2019-05-06 16:48:50
<FunkyBob> oh, and a 1498 byte "main"2019-05-06 16:49:05
 no, not yet :)2019-05-06 16:49:08
 I did find an interesting algorithm for building prefix codes [like Huffman] that makes it fairly cheap to update to "near optiomal" encodings2019-05-06 16:49:43
<Shelwien> adaptive bitcoding is totally useless2019-05-06 16:50:22
 its always slower than adaptive arithmetic coding2019-05-06 16:50:38
 and compresses worse2019-05-06 16:50:44
<unic0rn> you may check deflate's wiki page, there's a short description of its huffman coding2019-05-06 16:51:15
 i would just go with range though2019-05-06 16:51:19
<Shelwien> deflate uses static huffman coding, not adaptive2019-05-06 16:51:39
<unic0rn> i know2019-05-06 16:51:50
<Shelwien> saves a length table for a block in block header2019-05-06 16:51:59
 with ~64k blocks usually2019-05-06 16:52:06
<unic0rn> well, since adaptive huffman is useless, that's i guess the only option as far as huffman is concerned2019-05-06 16:52:49
<Shelwien> well, did you read about fritcode?2019-05-06 16:53:18
<unic0rn> now i did. https://encode.ru/threads/1116-Charles-Blooms-new-huffman-optimisation2019-05-06 16:55:02
<Shelwien> yes2019-05-06 16:55:14
 its basically two-stage entropy coding2019-05-06 16:55:47
 blockwise-static code optimized for data2019-05-06 16:56:20
<unic0rn> i was considering fritcode earlier, that is, had that idea just kinda missing the point, i didn't think about adding another stage of entropy coding after that2019-05-06 16:56:32
<Shelwien> and fully static code to compress the first code, which has intentionally skewed bit probabilities2019-05-06 16:57:08
<unic0rn> but since fritcode is very simple, i was thinking about using one of the codes as an escape code2019-05-06 16:57:09
 for example, "set fritcode for last decoded symbol to 0"2019-05-06 16:57:42
<Shelwien> that's common - eg. deflate has a special code for end-of-block2019-05-06 16:57:50
<unic0rn> because rearranging this is far easier than rebuilding whole huffman tree2019-05-06 16:57:55
 so it could be adaptive2019-05-06 16:58:32
 and fast at the same time2019-05-06 16:58:43
<Shelwien> uh, no2019-05-06 16:58:44
 optimal fritcode is harder to build than huffman2019-05-06 16:59:05
 but same block-wise static as huffman2019-05-06 16:59:35
 can get you better compression2019-05-06 16:59:42
 since with fritcode, symbol codes are not limited with 1/2^x2019-05-06 17:00:00
 probabiilities, i mean2019-05-06 17:00:11
<unic0rn> well, encoder wouldn't be exactly simple, because it would have to make quite a few estimations to find the most optimal route for switching codes2019-05-06 17:01:17
 but decoding would be fast2019-05-06 17:01:22
<Shelwien> well, you can try2019-05-06 17:02:04
<unic0rn> should be clearly faster than adaptive huffman2019-05-06 17:02:07
 nah. i see no point, unless one's bound to not using range nor arithmetic, or ans2019-05-06 17:02:28
<FunkyBob> sorry, how is fritcode different to any other prefix scheme?2019-05-06 17:02:32
<Shelwien> but adaptive bitcode wasn't ever used in any practical coder in last 20 years or so2019-05-06 17:02:51
<FunkyBob> that post doesn't appear to explain it ... at all2019-05-06 17:03:17
<Shelwien> its simply inefficient because of current cpu design2019-05-06 17:03:18
 FunkyBob: as i said, its a two-stage entropy coding scheme2019-05-06 17:03:45
 the fritcode itself is like huffman, but based on skewed bit probabilities2019-05-06 17:04:10
 like 0=1/3, 1=2/3 instead of both 1/22019-05-06 17:04:24
 which results in longer codes and better precision of symbol probability approximation2019-05-06 17:04:49
 and then we just need a 2nd stage code for bytes of 1st code2019-05-06 17:05:10
<unic0rn> inefficient... lets talk about inefficient... elapsed time: 00:00:17.203 - current processing of 256k block, far from done, single core2019-05-06 17:05:24
<Shelwien> which can be static huffman, or arithmetic, or anything2019-05-06 17:05:41
 bits with skewed probability are packed to bytes, also with known probability distribution2019-05-06 17:06:51
<FunkyBob> unic0rn: using which algorithms?2019-05-06 17:08:17
<unic0rn> that information is classified ;)2019-05-06 17:08:36
 in short: my own2019-05-06 17:08:41
<FunkyBob> ok2019-05-06 17:09:09
 well, here's hoping they get great ratios... for that much time :P2019-05-06 17:09:21
<unic0rn> for now, there's hoping it'll work at all2019-05-06 17:09:42
 gotta finish stage 2 to have something possible to test with decompression, then add stage 3 - entropy coder2019-05-06 17:10:22
<FunkyBob> that's a good first goal :)2019-05-06 17:10:23
<unic0rn> for now i'm drowning in statistics from stage 12019-05-06 17:10:46
<Shelwien> NNCP basic speed is 100 bytes/s :)2019-05-06 17:11:17
<unic0rn> well, creating the slowest encoder around isn't exactly my goal, so i'll let it have that :P2019-05-06 17:12:00
<Shelwien> because of that he had to add MT and stream interleaving right away2019-05-06 17:13:15
 got it 10x faster with that2019-05-06 17:13:22
 i guess 1kb/s is much better :)2019-05-06 17:13:35
<unic0rn> at least i'm faster than that. and MT will help a lot, but that's left for the end2019-05-06 17:14:21
 well, in the end i'll probably rewrite half of it in assembler2019-05-06 17:14:47
<danlock> There are bound to be people who don't mind slow encoders, but if faster encoders with similar compression ratios exist, a slow encoder might not be the way to go. rewriting in ASM will probably improve the speed and decrease the size, but be careful not to introduce new bugs.2019-05-06 17:19:55
<unic0rn> i guess i wrote more crazy stuff in asm2019-05-06 17:22:00
<danlock> asm is great for reducing size, but it might limit the program to a specific type of processor2019-05-06 17:23:11
<Shelwien> asm is very inconvenient for compression research2019-05-06 17:23:55
 its one thing when you write a codec by format specification2019-05-06 17:24:21
 but when you have to actually experiment with it - tweak various parameters, precision etc2019-05-06 17:24:49
 asm just doesn't allow to do it2019-05-06 17:25:08
<danlock> If you're doing active research, a high-level language is probably better because it enables you to change specific parts of the encoder for testing more easily. Only after you've found the best possible way to do things should you work on optimization, IMO.2019-05-06 17:27:42
 Yeah. what Shelwien said. ;)2019-05-06 17:28:36
<Shelwien> in compression there's a very little choice of "high-level language" too2019-05-06 17:28:58
 basically from C and C++2019-05-06 17:29:08
<unic0rn> not exactly, danlock. i've got 2 definitions i knew for a fact won't change no matter what. rewrote them in asm, 3x speed increase.2019-05-06 17:29:33
 kinda helps with testing.2019-05-06 17:29:38
 and those were basically oneliners, more or less.2019-05-06 17:29:59
<Shelwien> yeah, C++ compiler would do it automatically2019-05-06 17:30:06
<unic0rn> i have my doubts about that. it would be faster, agreed on that, but those are pretty specific operations2019-05-06 17:31:01
<danlock> ...and that's why I have so few comments to make on the encode forums, which is why the fact I've been told I don't have permission to reply (the couple of times I've tried to reply) on the encode forums isn't a very big deal.2019-05-06 17:31:04
<Shelwien> uh, you probably didn't get the confirmation email2019-05-06 17:31:32
 what's your login on the forum? same?2019-05-06 17:31:43
<unic0rn> of course there are intrinsics, but then that's hardly not optimizing stuff until the very end2019-05-06 17:31:58
<danlock> yeah.. same2019-05-06 17:32:10
<Shelwien> yeah, "awaiting email confirmation"2019-05-06 17:32:41
<danlock> at least it (last time I checked) allows me to mark posts as helpful2019-05-06 17:32:42
<Shelwien> switched to normal, you should be able to post now2019-05-06 17:33:12
 C++ compiler can do function inlining and global optimizations2019-05-06 17:34:59
 so in many cases, the whole algorithm is just inlined in main()2019-05-06 17:35:27
<danlock> Thank you very much. Because you guys (the frequent posters, called "lurkers" by FunkyBob, IIRC, are so much more experienced and active than I am, though, I /very seldom/ have anything to say. Sometimes, though, I feel like I have a valid point.2019-05-06 17:36:02
<Shelwien> and then, it can also do expression optimizations when there're constants2019-05-06 17:36:02
 and there's template syntax2019-05-06 17:36:27
* danlock thinks.... . o O ( template syntax? )2019-05-06 17:37:02
 which lets you force the compiler to generate the code for multiple choices of specific constant value2019-05-06 17:37:06
<danlock> ahh.2019-05-06 17:37:22
<Shelwien> so you can get same performance like with constants2019-05-06 17:37:28
 but still choose their values in runtime2019-05-06 17:37:41
<danlock> Man, it's been too long since I was actively coding2019-05-06 17:37:42
<Shelwien> https://github.com/rarten/ooz/blob/master/compr_kraken.cpp#L5832019-05-06 17:38:48
 as example2019-05-06 17:38:55
<unic0rn> yeah, we had that discussion. thing is, in this case i don't think C++ compiler would magically figure out what i'm trying to do and use proper asm instruction on its own. and using intrinsics just throws the "compiler will do that for you" out of the window. in this case, i don't think it would. and sure, whole code in general, compiled by C++ compiler, would be faster. but as it is, it's much 2019-05-06 17:39:09
 easier to maintain for me, because of tons of optimizations built in by design to make it work at reasonable speed at all2019-05-06 17:39:09
 optimizations that would look much messier in C++, imho2019-05-06 17:39:46
<danlock> I took a Uni class in ANSI C and another C class later on (er... no, that was a Calculus class in which the instructor gave me his old copy of a book about coding in C++)2019-05-06 17:39:52
<Shelwien> sure, proper manually written code is better than what compilers can do2019-05-06 17:40:16
<unic0rn> i'll optimize it further later on, it'll be fast enough regardless of the language2019-05-06 17:40:20
<Shelwien> but then, you can usually only support a single target2019-05-06 17:40:40
 while there's at least 4 now2019-05-06 17:40:59
<danlock> :highfive: unic0rn2019-05-06 17:41:01
<Shelwien> SSE4/AVX/AVX2/AVX5122019-05-06 17:41:05
 well, language syntax is important2019-05-06 17:41:36
<danlock> AVX512. I think this is the first I've heard of that.2019-05-06 17:41:39
<Shelwien> uh, my cpu has it?2019-05-06 17:41:50
 language syntax strongly affects your choices2019-05-06 17:42:33
 you won't do stuff that is hard to write in your chosen language2019-05-06 17:42:55
<danlock> what cpu do you have?2019-05-06 17:42:58
<unic0rn> luckily, forth has almost none2019-05-06 17:43:02
<Shelwien> intel 7820X2019-05-06 17:43:14
 also, even with C++ I sometimes have to write custom preprocessors2019-05-06 17:44:01
 that is, scripts that generate c++ code from custom syntax2019-05-06 17:44:40
 here's an example: https://encode.ru/threads/3070-mod_CM-another-paq-submodel?p=59503&viewfull=1#post595032019-05-06 17:45:31
<danlock> Mine's an i5 4670K which runs at 4192MHz under load. CPU-Z v1.88.0.x64 says it supports MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX, AVX2, FMA32019-05-06 17:47:03
<unic0rn> https://pastebin.com/raw/UMTf92Py2019-05-06 17:47:33
 as for redefining syntax2019-05-06 17:47:41
<Shelwien> sure, but anything actually useful would be much harder2019-05-06 17:48:27
 like, how would you go about defining a custom float-point type in forth?2019-05-06 17:49:00
 like, you have a program that worked with lots of 32-bit floats2019-05-06 17:49:22
 then decided to save memory by reducing mantissa size to 8 bits2019-05-06 17:49:41
 and saving them as 16-bit values2019-05-06 17:49:53
<danlock> FORTH is a language I remember existing LONG ago... but I don't know anything else about it.2019-05-06 17:49:59
<Shelwien> its used in bitcoin2019-05-06 17:50:12
<FunkyBob> seriously?2019-05-06 17:50:26
<Shelwien> yes2019-05-06 17:50:32
<FunkyBob> for what?2019-05-06 17:50:37
<unic0rn> by rewriting stores and loads. 2 basic operations. assuming calculations can still run in single precision and all you want is storing them as half2019-05-06 17:50:45
<Shelwien> custom transaction checking2019-05-06 17:50:46
<unic0rn> that's basically rewriting 2 words, preferably in asm i guess2019-05-06 17:51:02
<Shelwien> unic0rn: how about working with 32-bit and 16-bit floats in the same code?2019-05-06 17:51:32
<unic0rn> then you need separate definitions based on precision, and preferably some conversion words as well2019-05-06 17:52:16
<Shelwien> well, in C++ its transparent2019-05-06 17:52:38
 i can just add a class with operator definitions2019-05-06 17:53:02
 and change the type of some variables to new class2019-05-06 17:53:18
<unic0rn> forth does nothing on its own. which, depending on what you want, may be a good thing2019-05-06 17:53:35
<Shelwien> code of actual algorithm would remain exactly the same2019-05-06 17:53:38
 yes, i know2019-05-06 17:53:50
 but it lacks too much of that "syntax sugar"2019-05-06 17:54:10
<unic0rn> well, in general, same here. you can change some basic blocks upon which the rest of the code is built, and it'll work like nothing happened2019-05-06 17:54:19
<Shelwien> it may be possible to implement on your own, but why bother?2019-05-06 17:54:24
<danlock> Shelwien: Are you suggesting ideas as a way of stimulating imagination and generating new ideas/thoughts/creativity to potentially find new methods for accomplishing what you'd like to do, or helping us to learn things you're already familiar with, or.. what?2019-05-06 17:56:02
 Whatever your answer, I like it.2019-05-06 17:56:14
 My uncle, a chemist (who is mostly focused on spectrography, but also other fascinating things), often asks me for ideas about how to do certain things and it's always been a mind-expanding thing for me.2019-05-06 17:58:03
<Shelwien> well, getting feedback is frequently useful2019-05-06 17:59:05
 i tried just writing down my thoughts about random things2019-05-06 17:59:41
 but explaining stuff to an actual person works better2019-05-06 18:00:19
<danlock> When I was learning about chemistry long ago, I learned (as most beginning chemistry students probably do) about atoms having protons and (usually) neutrons in their nucleus and electrons in their respective "shells" orbiting the nucleus, but it gets a lot more complicated than that. (I'm thinking of your having mentioned that you had been discussing AI and quantum mechanics)2019-05-06 18:01:16
<unic0rn> as for missing different targets, in case of my compression it may not matter much. kinda doubt any critical parts will be easily vectorizable. it will benefit from MT, but from SIMD... kinda doubt it2019-05-06 18:01:33
<Shelwien> unic0rn: you may be surprised...2019-05-06 18:02:04
 even basic stuff like memset/memcpy benefits from it2019-05-06 18:02:17
<danlock> I usually hope that things I say can "spark" a train of thought in someone that will somehow generate an "aha!" moment ...2019-05-06 18:02:36
<unic0rn> except that basic stuff isn't performance critical there2019-05-06 18:02:39
 what eats up the most time, are loops that are hardly vectorizable2019-05-06 18:03:14
<Shelwien> still, new extensions add more registers at least2019-05-06 18:03:47
 so there're all kind of indirect effects when its done by compiler2019-05-06 18:04:01
<danlock> unic0rn: can you say that basic stuff will never, under any circumstances, become performance-critical in the future?2019-05-06 18:04:04
<Shelwien> but of course you can't do it manually2019-05-06 18:04:19
 danlock: its supposedly a solution for the hutter-prize contest2019-05-06 18:04:50
 there's a time limit too, but its a few hours, so most of things would be ok2019-05-06 18:05:22
<unic0rn> danlock: when that basic stuff is called once per 256kB block, it simply doesn't matter2019-05-06 18:05:41
 because the rest is so much slower2019-05-06 18:05:59
<Shelwien> and its true that in theory forth could reduce the decoder size2019-05-06 18:06:13
<danlock> I was watching someone code in a game jam a few years ago and I was amazed at how much debugging is taken care of by the top level ... erm..... I can't think of the acronym right now... ...as the coder was writing the source.2019-05-06 18:06:20
<Shelwien> which is also counted as part of result in HP2019-05-06 18:06:24
<unic0rn> it most likely won't be submitted, assuming it beats the result.2019-05-06 18:07:14
<Shelwien> although i kinda doubt that its easy to find a compact forth JIT or interpreter2019-05-06 18:07:15
<unic0rn> there are a few, one can always write his own as well2019-05-06 18:07:57
<Shelwien> yes, but whats their code size?2019-05-06 18:08:35
<danlock> (his desktop was being streamed online... and I was remembering how much less "hand-holding" or "pointing out discrepancies" existed in Turbo Pascal in DOS back before my TBI (traumatic brain injury), which (the TBI) is why I'll often seem scatterbrained2019-05-06 18:09:09
 )2019-05-06 18:09:15
<unic0rn> the one i'm using, 36k2019-05-06 18:09:27
<danlock> 36k of source code? (I presume so, since that's what you were asking about earlier)2019-05-06 18:10:12
<Shelwien> so that's all you need to run HP decoder on a clean machine?2019-05-06 18:10:25
<unic0rn> no, 36k forth binary2019-05-06 18:10:25
<danlock> oh. (note to self: find an online FORTH reference)2019-05-06 18:10:59
<unic0rn> it loads ntdll, kernel32 and kernelbase, so yeah2019-05-06 18:11:03
<Shelwien> no bad i guess, but the size of phda9dec is 436362019-05-06 18:11:18
 system dlls are ok, but forth itself would be likely counted towards the entry size2019-05-06 18:11:47
<unic0rn> as you know, you can throw a ton of forth code on top, and the binary won't grow much2019-05-06 18:12:06
 of course it would be2019-05-06 18:12:12
<Shelwien> yeah, but 43k is the full decoder size atm2019-05-06 18:12:24
<unic0rn> this implementation allows bundling of your own code with the base binary. it actually gets there in compiled form2019-05-06 18:12:31
 because it performs compilation on the fly2019-05-06 18:12:44
<Shelwien> so you're handicapped with forth2019-05-06 18:12:54
<danlock> Ah. I guess it's formally called merely "Forth" ... my bad2019-05-06 18:13:02
<unic0rn> even if, not by much. compression improvement should be much bigger to matter at all2019-05-06 18:13:48
<Shelwien> about forth in bitcoin: https://sourceforge.net/p/bitcoin/mailman/message/31397880/2019-05-06 18:13:51
 uh, that's quite unlikely :)2019-05-06 18:14:15
<unic0rn> also, i don't care much about HP i guess. it's mostly a benchmark2019-05-06 18:14:48
 unlikely... maybe, maybe not2019-05-06 18:15:11
<Shelwien> sure, nncp won't win HP either, but might work as advertisement of fabrice's NN library, as expected2019-05-06 18:15:45
 so getting a good result could be useful for other stuff just as well2019-05-06 18:16:30
 but a good result would be 16M at least2019-05-06 18:16:48
 and even that requires a lot of work and multiple components2019-05-06 18:17:14
 the main problem is actually the memory limit though2019-05-06 18:18:32
<unic0rn> well, when i'll finish stage 2, i'll be able to test it without entropy coder, then i'll know 1) if it works or did i screw up something, and 2) what's the estimated ratio, by throwing the result at gzip2019-05-06 18:18:50
 the memory limit is actually not my problem2019-05-06 18:19:08
 i've solved that one2019-05-06 18:19:16
<Shelwien> better try paq8px for entropy estimation2019-05-06 18:19:16
 ok :)2019-05-06 18:20:19
 normally 1gb is barely enough to fit a bt4 match index for enwik82019-05-06 18:20:44
<danlock> The extent to which you should take into account the memory limit is related to the number of testers of your code you want, because they have finite memory limits2019-05-06 18:20:56
<Shelwien> and then there's no space left for statistics2019-05-06 18:20:56
 well, HP contest rules specify a lower memory limit than what is usually available these days2019-05-06 18:21:36
 (1gb)2019-05-06 18:21:39
<unic0rn> take into account that i'm using 256k blocks, but those are configurable. i could perhaps go with up to 4mb or so2019-05-06 18:21:57
<Shelwien> without solid compression the results would be even worse :)2019-05-06 18:22:30
 (ie whole file as one block)2019-05-06 18:22:50
<danlock> Steve Gibson at grc.com is the only person I can think of right now who typically codes exclusively in ASM. However, as he's developed his free authentication protocol, SQRL, he's had to use other IDEs for implementation on platforms other than x86.2019-05-06 18:24:08
<unic0rn> nah. i'll just put quantum entangled receiver in the decompressor and make it slow enough so i can transmit the whole file by hand2019-05-06 18:24:15
 oh wait... it doesn't work that way2019-05-06 18:24:20
<Shelwien> :)2019-05-06 18:24:52
<unic0rn> :P2019-05-06 18:25:06
<Shelwien> there's one tricky option that should available2019-05-06 18:25:19
 its in theory possible to use OS files as reference2019-05-06 18:25:41
<danlock> It annoys a lot of people when I say something which seems like a comment out of nowhere and like it will derail the conversation, but it's usually a direct reaction to something I've just read wherever I'm posting.2019-05-06 18:25:41
<unic0rn> danlock: i was toying with assembler on zx spectrum first. then i wrote custom video mode + graphics conventer entirely in asm on amiga 600.2019-05-06 18:26:00
 and then some 256b and similar demoscene stuff on pc2019-05-06 18:26:09
 all for fun2019-05-06 18:26:23
 Shelwien: i think that's forbidden2019-05-06 18:26:55
<Shelwien> i don't think it is2019-05-06 18:28:16
<unic0rn> you cannot use input from external sources, files included2019-05-06 18:28:24
<danlock> unic0rn: that's nifty. I love demoscene stuff. What skilled coders can get out of chipsets introduced in the 1970s (like the Atari 8bit computers) and early 80s (Commodore 64/etc.) and PCs in the '90s2019-05-06 18:28:32
<unic0rn> system libraries are allowed for calls like i/o, that's it2019-05-06 18:28:33
 also, Matt will run everything under Linux anyway2019-05-06 18:28:52
 and complain if it doesn't work, despite being an exe file2019-05-06 18:29:14
<Shelwien> maybe, but linux also has plenty of system files, with plenty of text inside2019-05-06 18:29:16
<danlock> I forgot to finish my sentence. sigh. What skilled coders can get... ...is really amazing.2019-05-06 18:29:24
<unic0rn> yeah. good luck accessing those under wine, if he removed the / mapping2019-05-06 18:29:42
<danlock> Mahoney?2019-05-06 18:29:49
<Shelwien> yes2019-05-06 18:29:57
 paq8px already has an option to train the model with its own exe2019-05-06 18:30:49
 and i don't think that usual system files is explicitly forbidden2019-05-06 18:31:11
 but its actually far from easy to make use of it2019-05-06 18:31:25
<danlock> I haven't seen much of what's going on with him since he stopped actively participating on encode.ru and his domain hasn't been as frequently updated either. It seemed like it coincided with a hurricane that hit the south Florida area a while back.2019-05-06 18:31:50
<unic0rn> it's actually nearly impossible. would work only with specific files in specific versions2019-05-06 18:31:56
 he's running.2019-05-06 18:32:10
 supposedly a lot.2019-05-06 18:32:17
<Shelwien> not necessarily2019-05-06 18:32:20
<danlock> isn't that always the case, though? He runs a lot.2019-05-06 18:32:41
<Shelwien> for example, you can locate useful text strings in system files2019-05-06 18:32:50
 and include their hashes in decoder 2019-05-06 18:32:59
<unic0rn> man pages2019-05-06 18:33:08
 free dictionary2019-05-06 18:33:10
<Shelwien> then find them by hashes on target system2019-05-06 18:33:14
 yeah2019-05-06 18:33:15
<danlock> the northern hemisphere was hit by heavy weather this past couple of seasons, though... not that it necessarily impacted running when in the lower latitudes2019-05-06 18:34:39
<Shelwien> danlock: what changed is that he retired from work, and decided to participate in all marathons in US2019-05-06 18:35:10
 so now has no time for programming2019-05-06 18:35:20
 https://www.facebook.com/mattmahoneyfl2019-05-06 18:36:10
<danlock> ahh. I knew about the retirement, but not about the desire to participate in all marathons. I know of three marathons (and an Iron Man competition, which he might not be interested in) that recently occurred in my region of the western US, and if he wants to run in them all it'll take many years just because many of them happen in different cities and states on the same weekends or same days. He's smart enough to determine an opt2019-05-06 18:38:17
<Shelwien> well, it seems to be exactly what he does2019-05-06 18:39:45
<danlock> If the desire is to participate in every single marathon, that's a much bigger thing to take on than just a walk/run/bike from coast to coast2019-05-06 18:39:46
<Shelwien> also he found a woman there :)2019-05-06 18:39:50
<danlock> that'll reduce his availability by a lot even if she's participating in all the marathons with him2019-05-06 18:40:34
<Shelwien> well, he somehow still finds time to post on quora2019-05-06 18:42:15
* danlock laments his own lack of energy and its debilitating effect on his life, which is more sleep than life some days, and more "awake but cognitively-absent" the rest of the time.2019-05-06 18:42:25
<danlock> before the TBI, thinking never, ever tired me. after I awoke from the 14-day coma, it did and I learned that my physical energy and cognitive energy are separate, and that the higher-level thinking goes away a lot more quickly than the rudimentary "alert but not so aware" state of living.2019-05-06 18:45:10
 I'm not looking for sympathy. I'm merely stating my personal experience.2019-05-06 18:46:07
 prior to the TBI, my memory was very good, and maybe even superb, but I didn't remember everything. Since then, I remember fewer events from each discrete time period, whatever you define that to be, and, as a result, time seems to pass much more quickly for me. *shrug*2019-05-06 18:48:07
<Shelwien> well, I was surprised by Matt's productivity at times2019-05-06 18:48:28
 like when he started zpaq project2019-05-06 18:48:37
<danlock> some people have so much energy2019-05-06 18:48:39
<Shelwien> but then, it was based on some pretty weird assumptions2019-05-06 18:49:18
 so despite investing a lot of work in it, its not really getting more popular2019-05-06 18:49:42
<danlock> Sometimes, weird assumptions are where the best results are found at times.2019-05-06 18:49:47
 doh. sorry for the redundancy... heh.2019-05-06 18:50:04
<Shelwien> unlike stuff like lz4, which needs 20 lines of C to decode :)2019-05-06 18:50:06
<danlock> brb2019-05-06 18:50:29
<Shelwien> well, the main feature in zpaq is forward compatibility2019-05-06 18:50:40
 the idea was actually introduced in rar2019-05-06 18:50:54
 and already proved to be useless there2019-05-06 18:51:14
 but Matt still decided to make his own version2019-05-06 18:51:26
 so now we have a zpaq config for pi number generation2019-05-06 18:52:01
 but its not fast, nor has good compression, nor good archive management syntax2019-05-06 18:52:59
 and even Matt seemingly stopped maintaining it2019-05-06 18:53:39
<danlock> How does that work (the pi generator)? I imagine it relies on zpaq's recursive addition from each version to the next of the archive.2019-05-06 18:53:49
<Shelwien> zpaq has a simple programming language in its config2019-05-06 18:54:33
<danlock> I checked that other C book I have from that calculus class... it's about C, not C++2019-05-06 18:54:59
<Shelwien> the idea is that a new compression algorithm can be written in that language2019-05-06 18:55:03
 and then old zpaq version would be able to decode the archive created by new version with new algorithms2019-05-06 18:55:37
<danlock> Have you experimented with that much?2019-05-06 18:55:40
<Shelwien> because it would add the config to archive2019-05-06 18:55:56
 https://encode.ru/threads/1211-Compressing-pi2019-05-06 18:56:11
<danlock> I'm glad he put it in the public domain or open sourced it, at least, so it's not going to remain in a backroom at Dell2019-05-06 18:57:10
<unic0rn> danlock: https://i.warosu.org/data/g/img/0563/02/1472388528965.png - Shelwien will probably disagree ;)2019-05-06 18:57:11
<Shelwien> i think it was his personal project, not related to dell2019-05-06 18:57:52
<danlock> I thought that C was more portable than C++ and has become more of a universal fallback as time has passed2019-05-06 18:58:16
<Shelwien> since its mostly portable via gcc anyway2019-05-06 18:58:41
 there'd be C++ when there's C2019-05-06 18:58:47
 and I agree that C++ standard library is bloated2019-05-06 18:59:05
 but nobody makes you use it2019-05-06 18:59:12
 most of C code can be compiled by C++ compiler anyway2019-05-06 18:59:33
 and C++ has new features which make programming more efficient2019-05-06 18:59:56
 like not having to write ptr-> before every variable2019-05-06 19:00:14
 or not having to copy-paste 10 versions of a function tuned to different constant values2019-05-06 19:00:47
<unic0rn> you really do love your templates2019-05-06 19:01:04
<Shelwien> yes; also coroutines2019-05-06 19:01:19
<danlock> Do some of those features require more commenting, or will they remain readable to anyone who understands C++ code?2019-05-06 19:01:23
 templates can be very useful (remarked the redundant one)2019-05-06 19:02:05
<Shelwien> the syntax is usually pretty simple2019-05-06 19:02:27
 but properly using it requires some experience2019-05-06 19:02:45
 since there're weird limits in places2019-05-06 19:03:06
 as advanced example you can see this: https://encode.ru/threads/3077-C-constexpr-experiments2019-05-06 19:04:09
<danlock> (re: zpaq's integrated encoder) I guess we need someone with Matt's endurance to take that idea and run with it (no pun intended)2019-05-06 19:08:06
<Shelwien> no, its just a failed idea2019-05-06 19:09:03
 like, its impossible to fit a new-style codec, like zstd, into zpaq2019-05-06 19:09:44
 because Matt decided that his entropy coding and file format are already perfect2019-05-06 19:10:40
 it'd be much better to provide a binary plugin interface2019-05-06 19:11:45
 and maybe asymmetric cryptography for plugin signing2019-05-06 19:12:13
<danlock> That's unfortunate. I thought he would be open to improvements/insertions of newer methods into that.2019-05-06 19:12:22
<Shelwien> he expected that it would become popular, and people would add support for everything necessary on their own2019-05-06 19:13:15
 its possible, but2019-05-06 19:13:39
 (1) too much work because zpaql is basically limited asm2019-05-06 19:13:56
<danlock> If he weren't trying to accomplish his running feat, I wouldn't be surprised if he changed zpaq to allow that. I know it's been changed before in ways that made it incompatible with prior versions2019-05-06 19:14:18
<Shelwien> (2) people don't like to give away their coders2019-05-06 19:14:38
<danlock> I don't know whether those were merely bugfixes, though2019-05-06 19:14:40
<Shelwien> (3) progress didn't stop at Matt's entropy coder2019-05-06 19:15:09
<danlock> unfortunately, compression time and decompression time are not asymmetric with zpaq2019-05-06 19:16:02
<Shelwien> they really are2019-05-06 19:16:38
 there's BWT and some kind of LZ even in standard version2019-05-06 19:17:01
<danlock> oh? I thought it was about the same time for each2019-05-06 19:17:03
<Shelwien> BWT is asymmetric, it doesn't need sorting on decode2019-05-06 19:17:21
 https://github.com/pothos/zpaqlpy2019-05-06 19:17:37
<danlock> so not strictly symmetric, but if you use a high-compression mode, won't it take nearly as long to decomp as it did to compress?2019-05-06 19:19:07
<Shelwien> well, CM _is_ symmetric2019-05-06 19:19:41
 and CM _does_ provide the best compression2019-05-06 19:20:02
 but if you want fast decompression, just don't use it?2019-05-06 19:20:17
<danlock> yeah.2019-05-06 19:20:59
 I've been using rz with the sfx creator provided by ScottX (or somebody at encode) to compress some files more than 7z will, because that includes the encoder with the file. The next version might not be compatible with what was last released by Christian, so that type of compression is needed.2019-05-06 19:27:00
 ...if I want to ensure I can extract what I compressed...2019-05-06 19:27:55
<Shelwien> well, Christian never does open source, so you're open of luck here :)2019-05-06 19:28:57
 you can try lzna/kraken i guess, since it was decompiled not long ago2019-05-06 19:29:24
 https://encode.ru/threads/3102-Oodle-compressor-open-source?p=59982&viewfull=1#post599822019-05-06 19:29:51
<danlock> I'll probably just do what I've been doing until Christian can find time to release his next binary (he indicated it has better compression for some things), unless something better from Bulat Ziganshin someone else comes along first.2019-05-06 19:33:24
 *or someone else2019-05-06 19:33:51
<Shelwien> sure2019-05-06 19:34:05
 https://encode.ru/threads/2989-exepack?p=57682&viewfull=1#post576822019-05-06 19:35:10
<danlock> Igor Pavlov increased compression in 18.05 for a lot of my older (and some recent) .7z files (decompressed and recompressed)2019-05-06 19:35:33
 That's great, and RAR is a good codec for a lot of things, but since updated versions of 7z started compressing better for types of files at which RAR was previously the very best, it's mostly on my PC just for quick path-aware recompressions of ZIP files and stuff.2019-05-06 19:39:37
 (files I'll usually go back to later and recompress using something which takes longer but will result in a smaller file)2019-05-06 19:40:30
<Shelwien> well, rar has its own better sfx, its just a demo2019-05-06 19:41:09
<danlock> I wasn't talking specificially about sfx archives, but yeah.. and I think Peazip includes some pre-compressed sfx generators that will make smaller sfx too. 2019-05-06 19:42:54
 Yikes. What I'm typing is starting to sound ... like I'm tired and/or hungry. time for some food. thanks for the conversation. :)2019-05-06 19:48:39
<Shelwien> :)2019-05-06 19:49:04
*** danlock has left the channel2019-05-06 21:36:39
*** danlock has joined the channel2019-05-06 21:39:20
<danlock> info2019-05-06 21:46:17
<unic0rn> ERROR: command not found. please upgrade to WINZIP .0072019-05-06 21:50:15
<danlock> :)2019-05-06 21:52:20
*** danlock has left the channel2019-05-06 23:49:44
*** maniscalco_ has joined the channel2019-05-07 01:38:43
<maniscalco_> quiet around here tonight, huh ?2019-05-07 01:42:12
*** maniscalco_ has left the channel2019-05-07 02:11:41
<unic0rn> and left.2019-05-07 02:26:28
 well, number one rule when you wanna get some coding done: stop looking at IRC2019-05-07 02:26:43
<FunkyBob> heh2019-05-07 02:35:35
<unic0rn> i'm coding. i'm not looking.2019-05-07 02:54:42
<FunkyBob> sure sure :P2019-05-07 02:57:57
<Shelwien> https://encode.ru/threads/3110-gcc-9-1-1-released?goto=newpost2019-05-07 04:47:12
 did a benchmark, somehow gcc82 has the best results?2019-05-07 05:42:23
 except for PGO which is troublesome to use in gcc2019-05-07 05:42:42
*** Jibz has joined the channel2019-05-07 08:08:47
 hi2019-05-07 08:08:55
<Jibz> hey :)2019-05-07 08:09:05
<Shelwien> what do you think would be the best token layout for LZ4-like coder?2019-05-07 08:10:08
 64k window, best compression?2019-05-07 08:10:31
<Jibz> probably depends on the type of data (text binary), one interesting aspect to me would be if it would be better to make a format that enables efficient optimal parsing (like lz4), or using things like repeat match offsets that improve compression but introduce dependencies2019-05-07 08:16:21
<Shelwien> repeat matches are easy enough to handle with optimal parsing2019-05-07 08:17:04
 lzma does it, for example2019-05-07 08:17:20
<Jibz> that would be for values of optimal that are not optimal then, right? :)2019-05-07 08:17:46
 I should probably say bit-optimal instead2019-05-07 08:18:22
<Shelwien> dynamic programming approach should work for any sequential compression method2019-05-07 08:18:24
 the problem with perfect parsing opt for formats with entropy coding2019-05-07 08:19:12
 is either it being not sequential (static bitcode with len table in block header)2019-05-07 08:19:46
 or having to store a copy of all statistics per element of optimizer token array2019-05-07 08:20:44
 lzma "solves" that by only updating counters once per 128 bytes in optimizer2019-05-07 08:21:36
<Jibz> yes, true2019-05-07 08:22:03
<Shelwien> but i think it should be still possible to make a perfect version2019-05-07 08:22:03
 by storing stat diffs per element2019-05-07 08:22:20
 i already did optimal parsing with entropy coding in cdm2019-05-07 08:22:46
 so it would be interesting to port it to lzma sometime2019-05-07 08:23:16
 btw, this is the element struct from lzma:2019-05-07 08:24:57
 struct COptimal {2019-05-07 08:24:59
  uint price;2019-05-07 08:24:59
  uint state;2019-05-07 08:24:59
  uint prev1IsChar;2019-05-07 08:24:59
  uint prev2;2019-05-07 08:24:59
  uint posPrev2;2019-05-07 08:24:59
  uint backPrev2;2019-05-07 08:25:00
  uint posPrev;2019-05-07 08:25:00
  uint backPrev;2019-05-07 08:25:01
  uint backs[LZMA_NUM_REPS];2019-05-07 08:25:01
 };2019-05-07 08:25:02
 so it tracks all 4 rep-distances2019-05-07 08:25:49
<Jibz> you could try out all different token layouts on a set of test file and see which gives the best compression on average? I assume people like charles, yann, and ilya already did this, but likely looking for a compromise between speed and ratio2019-05-07 08:29:52
<Shelwien> i don't think they really tried2019-05-07 08:30:57
 i never heard of anybody making an universal parsing optimizer, which would work with any layout2019-05-07 08:31:18
 so usually once they implement optimal parsing, the format is locked2019-05-07 08:31:50
 and while there're certainly some choices that are always good2019-05-07 08:32:29
 like literal runs vs literal flags2019-05-07 08:32:44
 or rep-distances2019-05-07 08:32:56
 but i don't think anybody tried to optimize a limited set of len or distance values2019-05-07 08:33:28
 an ever further step would be to optimize a whole state machine, i suppose2019-05-07 08:34:45
 btw2019-05-07 08:35:47
 for bytewise LZ2019-05-07 08:36:00
 if we could put input data into output buffer (probably at the end of buffer)2019-05-07 08:36:48
 the same code could be used to copy literal runs and matches2019-05-07 08:37:15
 might be useful?2019-05-07 08:37:17
<Jibz> hmm how do you mean? like a circular buffer?2019-05-07 08:38:45
<Shelwien> no, fast coders are usually memory-to-memory anyway2019-05-07 08:39:51
 but usually literals have to be copyied from input buffer to output2019-05-07 08:40:12
 and match string from output to output2019-05-07 08:40:22
 so it has to be different code2019-05-07 08:40:34
 but if we start by copying compressed input to the end of output buffer2019-05-07 08:40:57
 it becomes kinda the same2019-05-07 08:41:02
 btw2019-05-07 08:41:23
 i also used a similar tricky in my deflate decoder2019-05-07 08:41:35
 it has a 256-byte literal array just before the 32k window buffer2019-05-07 08:41:59
 and a literal is turned into a len=1;dist=pos+256-lit match2019-05-07 08:43:02
 and its still faster like that2019-05-07 08:43:29
 because otherwise there's an unpredictable branch on token type2019-05-07 08:43:49
<Jibz> interesting, and it is faster due to avoiding the branch not the literal table being in the same 64k cache?2019-05-07 08:45:44
<Shelwien> well, of course literal table not being cached would make it worse2019-05-07 08:47:09
 but match copy loop is vectorized2019-05-07 08:47:42
 and its still faster to copy a whole vector than do a if( is_literal ) { *ptr++=lit; } else {...}2019-05-07 08:48:46
<Jibz> nice idea :)2019-05-07 08:49:27
<Shelwien> another interesting thing that i have there is __assume( len<32 )2019-05-07 08:50:12
 modern compilers are rather forceful about converting copy loops to memcpy2019-05-07 08:50:42
 because they have an optimized/vectorized memcpy intrinsic2019-05-07 08:51:03
 so when i looked at asm of my match copy loop2019-05-07 08:51:21
 it actually had 3 versions there2019-05-07 08:51:43
 the last one being something like if( len>96 ) memcpy( ... );2019-05-07 08:52:11
 was very tricky to block that2019-05-07 08:52:56
 i mean, memcpy has versions for all vector extensions, plus loop preparation code at start2019-05-07 08:54:01
 so it may be efficient for copying megabytes of data2019-05-07 08:54:14
 but for LZ matches its certainly bad2019-05-07 08:54:22
 and compilers force-replace any kind of memory copy loop with it2019-05-07 08:54:39
<Jibz> that is interesting, though the use of assume seems dangerous in case len might be more2019-05-07 09:01:05
<Shelwien> https://godbolt.org/z/msimIx2019-05-07 09:02:05
 well, its not _that_ smart atm2019-05-07 09:02:21
 memcpy is apparently only in IntelC version2019-05-07 09:02:50
 but gcc still generates too many dumb branches2019-05-07 09:03:05
<Jibz> gcc has an assume with probability, might be interesting for this2019-05-07 09:05:05
<Shelwien> i tested it, doesn't seem to do anything good atm2019-05-07 09:05:32
<Jibz> okay :)2019-05-07 09:05:48
<Shelwien> builtin_expect is really important though2019-05-07 09:06:20
<Jibz> would have been nice if you could tell it the distribution of short to long matches and get a copy that matches2019-05-07 09:06:39
<Shelwien> its very useful for making various bounds-checking branches almost free2019-05-07 09:06:56
 you can actually do it easily enough2019-05-07 09:07:25
 via PGO2019-05-07 09:07:36
<Jibz> good point2019-05-07 09:07:47
<Shelwien> but its just not any useful atm2019-05-07 09:08:05
 IntelC PGO is much better than gcc's2019-05-07 09:08:19
 gcc's was never useful in my tests2019-05-07 09:08:29
 but even intel's doesn't do anything that smart2019-05-07 09:08:41
 !next2019-05-07 16:30:04