*** jack has left the channel2009-12-01 16:36:38
*** Guest4704955 has left the channel2009-12-01 16:44:47
*** Guest4704955 has joined the channel2009-12-01 16:59:25
*** pinc has left the channel2009-12-01 17:13:11
*** Krugz has joined the channel2009-12-01 17:45:58
*** toffer has joined the channel2009-12-01 18:16:18
*** Guest4704955 has left the channel2009-12-01 18:54:08
*** Guest4704955 has joined the channel2009-12-01 19:05:29
*** pinc has joined the channel2009-12-01 19:06:56
*** schnaader has joined the channel2009-12-01 20:23:15
<Shelwien> people are gathering for some reason, but nobody talks %)2009-12-01 20:33:09
<schnaader_afk> Will be talking in a few minutes :P 2009-12-01 20:33:54
<schnaader> Tada :)2009-12-01 20:40:43
 Well, it's the same for most IRC channels - even if 200 peoples are in, you may wait for half an hour without anyone talking and everything is filled with join/quit messages.2009-12-01 20:43:13
 And I think if you'd compare view/post numbers in the forum, you'd come to similar results :)2009-12-01 20:44:03
<Shelwien> sure2009-12-01 20:44:32
 anyway, this channel's log is much more readable than any other which I know of ;)2009-12-01 20:45:05
<schnaader> Yes, I think that's the fact that people know each other quite a bit already and it's not a bunch of random people.2009-12-01 20:45:39
<Shelwien> ...though now I'm going afk - food calls ;)2009-12-01 20:46:43
<schnaader> :) OK, see ya2009-12-01 20:47:18
 Have a nice meal :)2009-12-01 20:47:52
*** Guest4704955 has left the channel2009-12-01 20:50:46
*** pinc has left the channel2009-12-01 20:51:07
*** Guest4704955 has joined the channel2009-12-01 21:05:26
*** Shelwien has left the channel2009-12-01 21:15:41
*** Guest9968193 has joined the channel2009-12-01 21:15:45
<Shelwien> btw, schnaader2009-12-01 21:18:35
 what happens when precomp encounters a broken deflate stream?2009-12-01 21:18:49
 like a file with remapped cluster in that VM image?2009-12-01 21:19:11
<schnaader> There can be different behaviours.2009-12-01 21:24:44
<Shelwien> i mean, can it extract just a single block for deflate stream?2009-12-01 21:25:17
<schnaader> Worst behaviour would be a deflate (or other) stream that stops somewhere and is followed by a big bunch of same bytes which could lead to a very big output stream, recompression would detect failure in that case.2009-12-01 21:25:57
<Shelwien> %)2009-12-01 21:26:25
<schnaader> Did this experiment with a torrent that not finished downloading once, not recommended ;)2009-12-01 21:27:01
 After the decompressed stream growing to several GB, Precomp stopped with "disk full"2009-12-01 21:27:28
<Shelwien> well, its not a problem with my approach - soundslimmer can losslessly process anything, even not an mp3 file2009-12-01 21:28:09
<schnaader> But streams haven't got to be complete, indeed. In most cases, compression will just stop because the stream is invalid at some point in this case and recompression will see how long the match is.2009-12-01 21:28:37
<Shelwien> ah. its better than i thought, then ;)2009-12-01 21:29:15
<schnaader> There are almost always some rare "attack" cases you can construct, but it's like with hash collisions - they're not that likely to happen :)2009-12-01 21:29:28
 I even have these "penalty bytes" when some bytes of the compressed stream are different, but afterwards it's the same again.2009-12-01 21:30:02
<Shelwien> like patches?2009-12-01 21:30:19
<schnaader> Yes, but not that good, only works if they synchronize again, so it will not work with "00 01 02 04 05", "00 01 02 03 04 05", but with "00 01 02 FF 04 05"2009-12-01 21:31:11
 Could be improved, but as I plan a complete rewrite that doesn't need brute force, it's not necessary :)2009-12-01 21:31:39
 These patches work on bytes, the rewrite will be able to directly correct matches and always re-synchronize successful that way.2009-12-01 21:33:49
<Shelwien> like levenstein distance on bits? ;)2009-12-01 21:35:18
<schnaader> Yes, kind of, only that insertion is missing at the moment.2009-12-01 21:36:04
 The rewrite will basically be an own deflate implementation instead of using zLib, so I can check the recompressed result parallel to decompression and put the deflate differences in a structure that can be appended to the decompressed stream.2009-12-01 21:37:15
<Shelwien> yeah2009-12-01 21:37:31
 i've got a puff.c clean for that too, but didn't start it still ;)2009-12-01 21:37:51
 *cleaned2009-12-01 21:37:59
<schnaader> Like "01230123", "that dumb encoder didn't get the match, encode literals instead" :)2009-12-01 21:37:59
 I've got the decompression and most of the recompression now, but I have to add some ringbuffers to avoid using temporary files again :)2009-12-01 21:38:39
<Shelwien> %)2009-12-01 21:38:51
 btw, why don't you add some other preprocessing too?2009-12-01 21:39:07
 like that record/delta filter in ccm?2009-12-01 21:39:21
<schnaader> I thought about this, especially the 7-Zip + srep results brought that to my mind again.2009-12-01 21:39:44
<Shelwien> well, rep is separate stuff, it takes a lot of memory2009-12-01 21:40:18
<schnaader> It's also getting important with upcoming bZip2 compression-on-the-fly where we might want to reorder the data because we have 900 KB blocks.2009-12-01 21:40:19
<Shelwien> btw, did you see my explanation of what ccm does?2009-12-01 21:40:59
<schnaader> In the forum? Was a while ago, wasn't it?2009-12-01 21:41:47
<Shelwien> i don't quite remember myself ;)2009-12-01 21:42:04
 anyway, its fairly simple, but has a very nice effect2009-12-01 21:42:23
 ccm processes data in 64k blocks2009-12-01 21:42:37
 and reorders them by bytes if it finds any records2009-12-01 21:43:04
*** pinc has joined the channel2009-12-01 21:43:10
 so, 16bit stereo wav2009-12-01 21:43:16
 64k block turns into 4 x 16k byte blocks2009-12-01 21:43:37
 and there's delta too2009-12-01 21:43:45
*** pinc has left the channel2009-12-01 21:44:13
<schnaader> Ah, I see, like abcdabcdabcd => aaabbbcccddd2009-12-01 21:44:32
<Shelwien> yeah, but also with subtractions if necessary2009-12-01 21:45:19
<schnaader> How does it detect this? Does it know for some stream types only or does it use a general attempt by checking some stats about the bytes?2009-12-01 21:45:45
<Shelwien> general afaik2009-12-01 21:46:54
 its a record filter2009-12-01 21:46:58
 it not only supports wavs2009-12-01 21:47:12
 but also images and tables with fixed records2009-12-01 21:47:21
<schnaader> Especially helpful if not 2 or 4 bytes record size2009-12-01 21:47:44
<Shelwien> yeah2009-12-01 21:47:52
*** STalKer-X has joined the channel2009-12-01 21:47:56
 ?2009-12-01 21:48:02
<STalKer-X> *pow*2009-12-01 21:48:09
<schnaader> Could even be generalised to bits, but this would be harder to detect2009-12-01 21:48:12
<Shelwien> not much sense too, imho2009-12-01 21:48:35
<schnaader> Although if bit record size isn't prime, it's almost the same.2009-12-01 21:48:36
*** Guest4704955 has left the channel2009-12-01 21:48:52
<Shelwien> there's another problem though, with database records2009-12-01 21:48:53
 like a record can contain a string and a few numbers2009-12-01 21:49:24
 and encoding the string part by columns might not be a good idea2009-12-01 21:49:53
 as it could otherwise match something else2009-12-01 21:50:08
<schnaader> Well, in the bZip2 case, I can always apply different preprocessing and choose the best result for a given block.2009-12-01 21:51:18
 Of course, most of the worst cases can be detected before, anyway and not be preprocessed.2009-12-01 21:52:33
<Shelwien> btw, intel's bzip is weird2009-12-01 21:54:16
 produces files of different size at the same modes ;)2009-12-01 21:54:38
<schnaader> One of the first filter ideas was for PDF data, do you know these "(word )<ASCII float numbers and PDF commands>(other )(words and perhaps some)(l)<...>(etters)" crap they're doing in there? Splitting up to text, commands and encoding the floats binary would reeeeally help there :)2009-12-01 21:55:17
 Don't combine RNGs and compression ;)2009-12-01 21:56:27
 At least bZip2 isn't as bad as deflate - you know which mode was used and output will be the same most of the time, although still not 100% reliable.2009-12-01 21:57:45
<Shelwien> i guess, there's just much less implementations ;)2009-12-01 21:58:11
 and pdf also has random stuff beside deflate... like that ascii85 etc2009-12-01 21:59:25
<schnaader> Yes, that's the main factor. Not that easy to implement BWT things as with huffman codes and literal/match decisions.2009-12-01 21:59:26
 ascii85 is on my todo list, should have done this already :( Welcome lazyness ;)2009-12-01 21:59:51
 By the way, do you know anything about encrypted PDFs? I'm pretty sure decrypting could be done, but I'm not sure if encrypting it back with same results would be possible. Not to mention that Adobe pretty sure wouldn't like such things...2009-12-01 22:01:45
<Shelwien> well, i can recommend a decrypting utility if you want ;)2009-12-01 22:03:10
<schnaader> I have decryption sources as well, thanks, but nobody cares about re-encryption ;)2009-12-01 22:03:33
<Shelwien> well, i think it should be possible to reconstruct2009-12-01 22:04:52
 if you decrypt it yourself2009-12-01 22:05:09
<schnaader> Depends, as some (or most) of the algorithms seem to be asymmetrical, so you might have a public key, but perhaps would need the private key to re-encrypt, don't know...2009-12-01 22:05:55
<Shelwien> and as to adobe... maybe messing up something to avoid getting a usable decrypted pdf would be a good idea ;)2009-12-01 22:05:59
<schnaader> Output would be pretty messed up by the PCF format already, but I also thought about this, yes :)2009-12-01 22:07:06
<Shelwien> and keys should be available anyway, as software which does the encryption is available ;)2009-12-01 22:07:36
<schnaader> Right :)2009-12-01 22:07:53
<Shelwien> btw, what about text filters?2009-12-01 22:08:14
 including LIPT etc?2009-12-01 22:08:21
 like WRT?2009-12-01 22:08:33
<schnaader> LIPT? Google found "Leymann Inventory of Psychological Terror", lol2009-12-01 22:08:51
 Yes, WRT and especially HTML/XML filters also came to my mind, same thing as with most Precomp ideas - would take too long for know, other things have higher priority :)2009-12-01 22:10:04
 Although I have several code branches with such experiments at least using scripts and made-up examples.2009-12-01 22:10:26
<Shelwien> Length Index Preserving Transform2009-12-01 22:11:30
 your version was better though, at least makes some sense ;)2009-12-01 22:12:07
<schnaader> There's always some sort of psychological terror involved when it comes to compression :)2009-12-01 22:12:57
<Shelwien> still, there're simpler text filters too, which still help2009-12-01 22:15:11
 like "capital conversion"2009-12-01 22:15:17
 and punctuation padding2009-12-01 22:15:54
<toffer> such stuff is more efficient when incorporated into the context generation :D2009-12-01 22:17:52
<schnaader> I think I should go for a cleaned up object oriented version of Precomp in beta phase (which will start soon, supporting multiple files and directories is the only big todo left for that), so generalising pre-/postprocessing would be easy and external DLLs could be used for quick tests.2009-12-01 22:18:14
<toffer> the compressor manually applies these transforms for already processed data to improve context clustering2009-12-01 22:18:18
<Shelwien> toffer: not quite, it also affects symbol decomposition2009-12-01 22:18:43
<toffer> not the decomposition itself, but the processed symbols2009-12-01 22:19:24
<Shelwien> ah2009-12-01 22:19:36
 btw, considering dlls2009-12-01 22:19:46
 did you see my precomp merged into a single exe? with packjpg?2009-12-01 22:19:58
<toffer> someting i always wondered... how large is your source?2009-12-01 22:20:07
 @shelwien did you ever try to optimize such transforms?2009-12-01 22:21:37
<Shelwien> there's not much to optimize kinda2009-12-01 22:22:13
<toffer> a set of flags2009-12-01 22:22:28
<Shelwien> you either use it, or not2009-12-01 22:22:29
<schnaader> @Shelwien: Was this one of the posts in "How small could we get a Precomp SFX"? Something like this would be useful, although I thought about disabling PackJPG by default in the next version because the 2.4WIP version is too unstable.2009-12-01 22:22:31
<toffer> what to apply when2009-12-01 22:22:33
 for every model, of course - assuming cm2009-12-01 22:23:40
<Shelwien> schnaader: i made a tool called dllmerge, which resolves exe imports/exports with a statically binded dll and merges them2009-12-01 22:23:45
<toffer> well it worked with pthread+m12009-12-01 22:24:05
<Shelwien> worked with precomp too2009-12-01 22:24:12
<schnaader> @toffer: At the moment it's about 9000 LOC, 300 KB source size (excluding external GIF routines and zLib). It could be smaller, though as it pretty messed up, for example there are try_decompression_(pdf/zip/...) routines that could be merged into one with some branches.2009-12-01 22:25:08
<toffer> @eugene: i see you did "hand"-tuning to your mtf ? that ranking function only used an enum as a constant2009-12-01 22:25:08
 ouch2009-12-01 22:25:21
<Shelwien> yeah2009-12-01 22:25:24
<toffer> 300kb2009-12-01 22:25:26
 i mean i got a few 1000 loc, but it's just2009-12-01 22:25:44
 80kb2009-12-01 22:25:47
 you should really concider c++2009-12-01 22:26:19
 afaik it was c?2009-12-01 22:26:24
 i mean templates are pretty useful for code generation2009-12-01 22:26:40
<schnaader> This is C++, but you're right, not using OO as I should :)2009-12-01 22:26:42
<toffer> ^^2009-12-01 22:26:53
<schnaader> You also see that routine merge lazyness in the EXE - 400 KB -> 130 KB with UPX.2009-12-01 22:26:59
<Shelwien> ;)2009-12-01 22:27:20
<toffer> usually such code attracts errors quite a bit2009-12-01 22:27:30
<schnaader> So I guess LOC could get down to about 3000 LOC easily, but it just wouldn't change much, so I didn't bother yet.2009-12-01 22:27:31
 @toffer: Yes, this is indeed the best argument for a rewrite.2009-12-01 22:28:01
 It also isn't helpful with new most of the new features like compression-on-the-fly where you have to replace all the fread/fwrite's you didn't generalize although you knew you should have done it :)2009-12-01 22:29:06
<Shelwien> http://en.wikipedia.org/wiki/Coroutine2009-12-01 22:29:51
<toffer> on the other hand i have serious trouble from time to time with c++ stl with vector of vector of vector and some other rather basic stuff. checked the assembly and the code was wrong causing random memory poking, etc.2009-12-01 22:30:26
<schnaader> And recursion would have been a lot easier without all that BAAAD global variables I have to push/pop now :(2009-12-01 22:31:03
<toffer> i mean an excessive usage of such c++ features reveals bugs quite often. 2009-12-01 22:31:04
 that's really ugly2009-12-01 22:31:17
 i got no global vars in my code at all2009-12-01 22:31:32
 ^^2009-12-01 22:31:33
<Shelwien> you have them in fact2009-12-01 22:31:49
 like _errno2009-12-01 22:31:52
<schnaader> Linux version will be an interesting thing because I'll do some valgrind experiments, could reveal some memory leaks/errors that are there quite sure.2009-12-01 22:31:55
<toffer> well that's the c library2009-12-01 22:32:03
 but not the stuff i've written2009-12-01 22:32:09
<Shelwien> ;)2009-12-01 22:32:15
<toffer> @eugene: i made some experiments for possible speedups. there'2009-12-01 22:32:44
<Shelwien> ?2009-12-01 22:32:58
<toffer> there's some potential in replacing hashing with direct lookups2009-12-01 22:33:00
 in m12009-12-01 22:33:01
 but that requires to detect, e.g. order1 and 2 context mask2009-12-01 22:33:16
<Shelwien> ah. like what i did in mix_test?2009-12-01 22:33:21
<toffer> and special code to handle.2009-12-01 22:33:26
 short contexts only2009-12-01 22:33:35
 o1,22009-12-01 22:33:39
 you always used lookup tables afaik2009-12-01 22:33:46
<Shelwien> they don't need any hashing obviously ;)2009-12-01 22:33:54
<toffer> but it's 5-8% faster2009-12-01 22:34:21
 even with dumb code2009-12-01 22:34:26
<Shelwien> should be ;)2009-12-01 22:34:35
 that might be useful for you then - http://encode.dreamhosters.com/showthread.php?t=3962009-12-01 22:35:02
 you can check whether its a constant or variable2009-12-01 22:35:19
 and select direct lookups if its constant and mask fits into 64k2009-12-01 22:35:43
<toffer> well more or less2009-12-01 22:36:22
 but that'd require to makeloadable parameters constant2009-12-01 22:36:41
<Shelwien> you can generate multiple versions in compile-time2009-12-01 22:37:15
 btw, the trick which i did in ccm_sh should be usable with gcc to i think2009-12-01 22:37:33
<toffer> that would bloat the exe size multiple times2009-12-01 22:37:33
<Shelwien> yeah, so what?2009-12-01 22:37:42
 upx etc...2009-12-01 22:37:48
<toffer> that just sounds ill to me2009-12-01 22:38:12
 if i can simply have a single more if2009-12-01 22:38:21
<Shelwien> why not if its faster2009-12-01 22:38:21
 runtime if is bad2009-12-01 22:38:33
<toffer> one more if per model per byte2009-12-01 22:38:35
<Shelwien> worse than division2009-12-01 22:38:37
<toffer> if i'd do that per bit yes2009-12-01 22:38:53
 but that way it's acceptable2009-12-01 22:38:59
<Shelwien> whatever, if the code would be still there2009-12-01 22:39:07
 it also fragments code cache etc2009-12-01 22:39:25
 anyway, i was talking about the idea2009-12-01 22:39:40
 with compiling the same source multiple times with different macro parameters2009-12-01 22:39:59
 and linking it all together after all2009-12-01 22:40:15
 as i found, it gave me a considerable speedup2009-12-01 22:40:35
 because i was able to use separate PGO for decoder and encoder2009-12-01 22:40:55
 and different compiter options2009-12-01 22:41:05
 *compiler2009-12-01 22:41:09
 and their code ranges didn't overlap2009-12-01 22:41:21
<toffer> as you know all parameters must be run-time loadable2009-12-01 22:42:02
 i simply cannot use such an approach2009-12-01 22:42:07
<Shelwien> well, you can2009-12-01 22:42:35
 like, check the masks and select a codec version based on that2009-12-01 22:42:56
 with direct or hashed lookups2009-12-01 22:43:02
<toffer> yes, but i won't do that for *all* possible combinations2009-12-01 22:44:17
 since the number grows exponentially2009-12-01 22:44:23
 it still requires to inject some code2009-12-01 22:44:38
<Shelwien> well, runtime code generation is the best2009-12-01 22:44:51
 but damned C++ doesn't have such a feature2009-12-01 22:45:01
<toffer> not inject in that sense2009-12-01 22:45:01
 but it would be very nice, indeed2009-12-01 22:45:12
 as most parameters are just machine words2009-12-01 22:45:36
<Shelwien> ...afk, sorry2009-12-01 22:48:52
<schnaader> Better afk than your chair getting wet :P2009-12-01 22:49:46
<toffer> i just tested the code2009-12-01 22:50:37
 and hard coded that lookups2009-12-01 22:50:43
 it's 1% faster2009-12-01 22:50:47
 not worth the effort2009-12-01 22:50:52
 5.66s -> 5.61s2009-12-01 22:51:09
<schnaader> Is that 1% constant or would it grow with more complex settings?2009-12-01 22:51:23
<toffer> constant2009-12-01 22:53:45
 it would grow, of course2009-12-01 22:53:54
<schnaader> OK, just thought about it because you said combinations could grow exponentially.2009-12-01 22:54:02
<toffer> but the number of possible different combinations i'd need to compile grows exponentially2009-12-01 22:54:13
<schnaader> Yay, 2 dev/null/nethack trophies this year :) http://nethack.kahrens.com/playertrophies.php?id=424&year=2009&place=First&size=Large2009-12-01 23:18:09
<toffer> dunnot know about that2009-12-01 23:25:07
 gonna watch family guy now2009-12-01 23:25:13
<schnaader> Family guy is so funny :) So have fun ;)2009-12-01 23:26:01
<toffer> really?2009-12-01 23:32:02
 well i like it a lot2009-12-01 23:32:05
<schnaader> I like the kind of strong, but still somewhat critical humor in it, like in American Dad or Drawn Together (or in the Simpsons, although not that extreme).2009-12-01 23:33:15
<toffer> well the simpsons are really great. for kids and for adults. i mean when i was a child i didn't understand all of the stuff in it reflecting something real2009-12-01 23:34:57
<schnaader> Yes, though it's a good mix so you still like it as a child :)2009-12-01 23:35:26
<toffer> cheers2009-12-01 23:38:34
 gn82009-12-02 00:55:59
*** toffer has left the channel2009-12-02 00:56:06
*** schnaader has left the channel2009-12-02 00:57:47
*** STalKer-Y has joined the channel2009-12-02 04:06:57
*** STalKer-X has left the channel2009-12-02 04:10:02
*** Krugz has left the channel2009-12-02 07:02:48
*** pinc has joined the channel2009-12-02 09:15:10
*** schnaader has joined the channel2009-12-02 14:59:45
*** schnaader has left the channel2009-12-02 15:15:06
*** toffer has joined the channel2009-12-02 15:28:12
 hi guys2009-12-02 15:28:50
<Shelwien> hi toffer, they're all bots2009-12-02 15:29:16
<toffer> erm?2009-12-02 15:29:27
 did you finally write some?2009-12-02 15:29:59
<Shelwien> no, they somehow appear even without me ;)2009-12-02 15:30:41
 though i did write complogger ;)2009-12-02 15:31:02
<toffer> well, yes2009-12-02 15:31:09
 but i thought pinc and asmodean are real2009-12-02 15:31:21
<Shelwien> well, sometimes, very rarely ;)2009-12-02 15:31:46
<pinc> yepp, sometimes I'm real ))2009-12-02 15:32:03
<toffer> you must be kidding - they're just idle2009-12-02 15:33:31
<Shelwien> of course, but in a sense, mirc without user is no different from complogger ;)2009-12-02 15:34:48
<toffer> so you're kidding ^^2009-12-02 15:35:51
<Shelwien> ...2009-12-02 15:36:11
 i'm writing a coroutine demo here2009-12-02 15:36:29
 rewritten that mtf utility using setjmp/longjmp2009-12-02 15:37:02
 and gcc is annoying me as usual2009-12-02 15:37:14
 i mean, it works with MSC/Intel, but not gcc2009-12-02 15:37:44
<toffer> how do you want to parallelize it?2009-12-02 15:48:02
<Shelwien> its not about paralleling2009-12-02 15:48:29
 its about building a data processing pipeline with readable syntax2009-12-02 15:49:13
<toffer> erm but...?2009-12-02 15:49:16
<Shelwien> well, i can post the current version, though it doesn't work with gcc yet2009-12-02 15:50:38
 i'm trying to fix that too, but its tricky2009-12-02 15:50:48
<toffer> i'll first have a look at coroutines2009-12-02 15:52:13
 could you grep it2009-12-02 15:52:19
 ?2009-12-02 15:52:20
<Shelwien> you can too2009-12-02 15:52:29
<toffer> !grep or something2009-12-02 15:52:33
 ah2009-12-02 15:52:39
 ^^2009-12-02 15:52:41
<Shelwien> ;)2009-12-02 15:52:42
<toffer> just guessed the syntax right2009-12-02 15:52:51
 !grep coroutine2009-12-02 15:52:54
 mh2009-12-02 15:53:09
 was it on wikipedia?2009-12-02 15:53:12
<Shelwien> its case-sensitive2009-12-02 15:53:12
 !grep Coro2009-12-02 15:53:18
<toffer> ah2009-12-02 15:53:23
 thanks2009-12-02 15:53:25
 btw i evaluated the speed gain of different implementations2009-12-02 15:58:00
 regarding table lookups2009-12-02 15:58:06
<Shelwien> ?2009-12-02 15:58:15
<toffer> i mean under certain circumstances it's beneficial to use lookup tables instead of hashign2009-12-02 15:58:45
 hashing2009-12-02 15:58:48
<Shelwien> well, hashing is lookup tables with randomized indexing2009-12-02 15:59:35
 of course direct indexing is faster2009-12-02 15:59:45
<toffer> i've written code to detect context masks like 0xff, 0x40ff, 0x405ff and the same for order 2. 2009-12-02 15:59:46
 and gonna use specialized codecs for either one or two directly addressable models2009-12-02 16:00:25
 in mos cases i got order 1 and 2 anyway2009-12-02 16:00:39
<Shelwien> well, that's something too, i guess2009-12-02 16:01:04
 though i hope you don't only support fixed masks, but count mask bits2009-12-02 16:01:39
<toffer> i could do that but it'd require to reorder the bits2009-12-02 16:02:25
 which is slow2009-12-02 16:02:38
<Shelwien> yeah, i think that would be still faster than hashing2009-12-02 16:02:45
<toffer> the fastest implementation i can think of is to have translation tables2009-12-02 16:03:22
 e.g. tab[c] is setup for mask m to stuff bits together2009-12-02 16:04:07
 but that'd still require a loop over 8 bytes2009-12-02 16:04:28
 which is slow2009-12-02 16:04:31
<Shelwien> well, yeah, though i just precompile the code for that2009-12-02 16:04:34
 damned google finally completely dropped googlepages a few days ago2009-12-02 16:05:27
 its annoying as hell now2009-12-02 16:05:31
<toffer> ^^2009-12-02 16:05:32
 google is evil2009-12-02 16:05:35
<Shelwien> http://sites.google.com/site/shelwien/gmtf_v0a.rar2009-12-02 16:05:38
<toffer> i cannot get specialized code for all of that2009-12-02 16:05:44
 that's impossible2009-12-02 16:05:47
 even for a single mask2009-12-02 16:05:51
<Shelwien> that depends on what you want to do2009-12-02 16:06:07
<toffer> since it'd require to have 256^# of bytes to translate different pieces2009-12-02 16:06:09
 and you overestimate the speed gain of lookups2009-12-02 16:06:32
 the optimization gets 8%2009-12-02 16:06:43
 speed improvement2009-12-02 16:06:49
<Shelwien> but i don't think that being able to tune to data _and_ use new profiles right away with all possible speed optimization is that important2009-12-02 16:07:16
 so you can either build new versions by recompiling the model after retuning2009-12-02 16:08:18
 like i do, and zpaq now2009-12-02 16:08:23
 or you can also implement a generalized version2009-12-02 16:08:49
 which would support any profiles2009-12-02 16:08:57
 but won't be speed-optimized2009-12-02 16:09:02
<toffer> well i still like to get that speed hit without specialisation2009-12-02 16:10:37
 i could modify my code generator to produce a header with the hard-coded parameters.2009-12-02 16:11:08
<Shelwien> yeah2009-12-02 16:11:14
<toffer> actually it was like that for previous version <= 0.22009-12-02 16:11:17
<Shelwien> and well, i don't see the point with loosing the possible gain with specialization2009-12-02 16:11:37
<toffer> but my current optimizer approach is more generalized thus support run-time parameter loading and multi threading2009-12-02 16:11:39
 it's no loss2009-12-02 16:11:53
 if it cannot use lookup tables it switches to the current implementation: hash tables2009-12-02 16:12:17
<Shelwien> well, of course its no loss until you properly optimize the specialized version2009-12-02 16:12:30
*** chornobl has joined the channel2009-12-02 16:12:33
 bl?2009-12-02 16:13:03
<chornobl> ive shortened it2009-12-02 16:15:27
 since old nick where banned2009-12-02 16:15:38
*** Krugz has joined the channel2009-12-02 16:15:46
 btw, theres question about your p2p idea2009-12-02 16:20:09
<Shelwien> ?2009-12-02 16:20:23
<chornobl> how would it handle multiple nested files2009-12-02 16:21:11
 like iso which contans zip which conatains jpg2009-12-02 16:21:43
<Shelwien> well, its not quite related to p2p - that's more about matching recompressed data2009-12-02 16:21:54
<chornobl> anyway2009-12-02 16:22:05
<Shelwien> and afaiu, we can just compute multiple hashtables for a file2009-12-02 16:22:29
 i mean, there could be a matching compressed version of original file2009-12-02 16:22:58
 or, otherwise, some unpacked contents can match2009-12-02 16:23:16
 but either way, we can detect that2009-12-02 16:23:35
<chornobl> so it will be hierarchical structure2009-12-02 16:24:02
<Shelwien> although reconstructing the file from multiple sources would be very tricky to implement2009-12-02 16:24:16
 i mean, if i downloaded half of the zip archive compressed2009-12-02 16:24:39
 and can't find any more seeds2009-12-02 16:24:48
*** sami has joined the channel2009-12-02 16:25:02
 and then i find other files supposedly contained there2009-12-02 16:25:06
<sami> hi!2009-12-02 16:25:14
<Shelwien> but unpacked, or with different compression2009-12-02 16:25:40
 still, thats better than nothing2009-12-02 16:25:59
 hi sami ;)2009-12-02 16:26:01
<toffer> hi2009-12-02 16:26:13
<Shelwien> sami: http://sites.google.com/site/shelwien/gmtf_v0a.rar2009-12-02 16:26:18
 its my upcoming coroutine demo (still buggy)2009-12-02 16:26:32
 do you have any suggestions?2009-12-02 16:26:41
<chornobl> depth of incapsulation should be limited, or manually controlled, to get sane hash size2009-12-02 16:28:39
<Shelwien> sane hash size doesn't really matter for p2p2009-12-02 16:29:04
 as it won't be transferred anywhere until matches found2009-12-02 16:29:33
<chornobl> still there should be some adaptivity, because video file differs from example mentioned above, so bits need to be spread differently betwen levels (1 vs 3)2009-12-02 16:34:03
<Shelwien> i don't understand2009-12-02 16:34:41
<chornobl> i mean more bits can be given to video file (not precompressible)2009-12-02 16:36:12
<Shelwien> still don't know what are you talking about2009-12-02 16:36:40
 the idea is that we can find somebody who has a given data fragment2009-12-02 16:36:59
<chornobl> than first nested level of same sized iso (precompresseble)2009-12-02 16:37:00
<Shelwien> by its hash2009-12-02 16:37:03
 and some data can have multiple representations2009-12-02 16:37:37
<chornobl> guess i lost some comunication skills recently =)2009-12-02 16:38:00
<Shelwien> so we can index all or at least some of these2009-12-02 16:38:03
<toffer> having three specialized coding routines increases code size just by 20kb2009-12-02 16:38:08
<Shelwien> well, just think about it in asm terms2009-12-02 16:38:28
 its still a lot actually ;)2009-12-02 16:38:31
<toffer> it's 20% slower now... guess gcc didn't do inlining properly...2009-12-02 16:39:42
<Shelwien> ;)2009-12-02 16:40:18
<toffer> yep the bit coding routine isn't inlined2009-12-02 16:40:40
 well explicit template instantiation does the job2009-12-02 16:46:02
 let's see how large the exe will be ^^2009-12-02 16:46:08
<Shelwien> i'd remind the idea from ccm_sh2009-12-02 16:46:32
 you can separately compile multiple codec instances to separate object files2009-12-02 16:47:00
 and only then link them together2009-12-02 16:47:21
 its especially helpful if taking into account the PGO2009-12-02 16:47:47
<sami> http://compressionratings.com/s_ref.html the "new" test files2009-12-02 16:47:58
<Shelwien> did you see new Bulat's benchmark btw?2009-12-02 16:48:16
<sami> it appears sorting the n/a gets put into the top2009-12-02 16:48:28
 no, where is it?2009-12-02 16:48:32
<Shelwien> http://encode.dreamhosters.com/showthread.php?t=5072009-12-02 16:48:43
<sami> just noticed that bwtmix1 didn't get tested in these ref files, will fix that2009-12-02 16:49:04
<Shelwien> hope it won't die2009-12-02 16:49:19
 i mean, freeze ;)2009-12-02 16:49:30
<toffer> somehow that seems to hurt compiler optimizations2009-12-02 16:49:43
<chornobl> it wot grow too much either2009-12-02 16:49:48
<toffer> it's 10% slower now 2009-12-02 16:49:50
 >.<2009-12-02 16:49:53
<Shelwien> what does?2009-12-02 16:50:02
<chornobl> as main purpose (i think) promote fa and new srep2009-12-02 16:50:22
<Shelwien> there's no sense to promote new srep (also its slow, especially decoding)2009-12-02 16:50:54
 because people won't really care until he makes it internal2009-12-02 16:51:17
<chornobl> repack mainacs already care2009-12-02 16:51:50
<Shelwien> sami: http://encode.dreamhosters.com/showthread.php?p=10064#post100642009-12-02 16:52:34
*** pinc has left the channel2009-12-02 17:09:58
<sami> since bulat has public test file(s) that is reasonable and all switches are run already guarantees I pretty much like any test. seems that this is reasonable multithreading + long match test2009-12-02 17:10:59
<Shelwien> yeah, but i wonder about times2009-12-02 17:12:02
<toffer> somehow i get best gcc results when the encodign and decoding routine are separately compiled. but both into the same .cpp2009-12-02 17:13:46
<sami> the nz times doesn't look very positive, but I guess those are possible. io is much more expensive than fa and -cd is slower than -cD, which is only possible with some very huge long match2009-12-02 17:14:06
 also I had to download the script to find out even how much memory is nz using, I wish that info would be on the tables2009-12-02 17:15:04
<Shelwien> ;)2009-12-02 17:15:19
 toffer: yeah, that's what i suggested too2009-12-02 17:15:34
<toffer> not really2009-12-02 17:16:00
<Shelwien> ...meanwhile, it seems like i finally fixed that damned thing2009-12-02 17:16:08
 and it works with gcc now2009-12-02 17:16:12
<toffer> i mean separate cpp for encoding and decoding instanciation hurt2009-12-02 17:16:22
 but both inside the same helps a bit2009-12-02 17:16:33
<Shelwien> not sure what do you mean then2009-12-02 17:17:03
 do you use separate .o files for encoder and decoder, or not?2009-12-02 17:17:28
<toffer> codec<ENCODE> in enc.o and codec<DECODE in dec.o separate hurts code generation after profiling. but both in one file helps2009-12-02 17:18:01
 the thing which helps is to separate the codec instanciation from the driver code2009-12-02 17:18:41
<Shelwien> err... but you have to make different profiles for encoding and decoding, and use them properly2009-12-02 17:20:23
<toffer> i know2009-12-02 17:21:16
 it's still weird2009-12-02 17:21:24
 i got a command line switch to do both, encoding and decoding for profile generation2009-12-02 17:21:44
<Shelwien> yeah, but its bad actually2009-12-02 17:22:01
 you see, the compiler would think that they work at once2009-12-02 17:22:23
 (decoding and encoding)2009-12-02 17:22:28
 it only collects numbers of occurences on branches etc2009-12-02 17:22:46
 but doesn't understand the order2009-12-02 17:22:55
 so if in if(cond) branch1; else branch2;2009-12-02 17:23:22
 branch1 is always taken in encoding2009-12-02 17:23:29
 and branch2 in decoding2009-12-02 17:23:33
 it'd think that branch1 probability is 0.52009-12-02 17:24:15
<toffer> that makes no sense if both routines are separate2009-12-02 17:24:46
<Shelwien> it doesn't understand that2009-12-02 17:24:57
 and it doesn't understand a thing about layouts2009-12-02 17:25:15
 so it would just generate functions in order of parsing2009-12-02 17:25:42
<toffer> i don'T see where the problem should be. there is a specialized function for encoding. it got stats for that. and for decoding there's a specialized function, too2009-12-02 17:26:07
<Shelwien> as i said... it thinks that they work both at once2009-12-02 17:26:32
 so instead of optimizing each function alone2009-12-02 17:26:58
 it would try to optimize "whole program"2009-12-02 17:27:11
 28.547s 31.547s ccm_sh1d992009-12-02 17:28:25
 29.219s 29.891s ccm_sh1d9b2009-12-02 17:28:25
 28.187s 29.515s ccm_sh1d9e # modular build2009-12-02 17:28:25
<toffer> yes, i understand that. but the odd thing i wanted to point out is that doing it that way hurts the generated code2009-12-02 17:28:41
<Shelwien> here first line has global PGO2009-12-02 17:28:46
 and second decoder PGO2009-12-02 17:28:50
 and third has both2009-12-02 17:28:56
<toffer> separating the driver program and the encoder,decoder helps2009-12-02 17:29:17
 but having 3 separate components hurts2009-12-02 17:29:25
 component = object file2009-12-02 17:29:31
<Shelwien> well, i had 3 and it helped2009-12-02 17:29:39
 of course there're various alignment quirks etc2009-12-02 17:29:52
 which i avoided but using different COFF sections for modules2009-12-02 17:30:13
 dunno how to do it with gcc though2009-12-02 17:30:22
<toffer> i gonna do some exact speed tests now. up until now i get 4 models compressing enwik7 in 4.99secs. a single m1 took 2.1sec :D2009-12-02 17:32:57
 somehow i don't understand why it scales better than linear2009-12-02 17:33:14
<Shelwien> memory lookups overlap with computing?2009-12-02 17:33:42
<toffer> dunnot know2009-12-02 17:33:51
 but it looks odd to me2009-12-02 17:33:55
 gonna be back in 30 mins2009-12-02 17:34:00
 bye2009-12-02 17:34:03
<Shelwien> sami?2009-12-02 17:34:17
*** toffer has left the channel2009-12-02 17:34:25
<sami> Shelwien, did I miss something? I'm now looking at your mtf stuff2009-12-02 17:43:11
<Shelwien> http://ctxmodel.net/files/mix_test/gmtf_v1.rar2009-12-02 17:43:24
 supposedly i made it to work with gcc2009-12-02 17:43:42
 please check if you can2009-12-02 17:43:46
 and as to mtf, the version w/o coroutines might be easier to read - http://ctxmodel.net/files/mix_test/gmtf_v0.rar2009-12-02 17:44:33
*** Krugz has left the channel2009-12-02 17:46:01
<sami> g++ compiles it, but -Wall spills out a lot of stuff2009-12-02 17:46:57
<Shelwien> didn't check that, the question is whether it works at all, or crashes2009-12-02 17:47:36
 g++ mtf.cpp -o mtf2009-12-02 17:47:53
 ./mtf c book1bwt 12009-12-02 17:48:01
 ./mtf d 1 22009-12-02 17:48:04
 should produce file 2 identical to book1bwt2009-12-02 17:48:19
<sami> works fine for book1rbwt2009-12-02 17:48:47
<Shelwien> ok, great2009-12-02 17:48:58
 do you know a name for then weird MTF version then?2009-12-02 17:49:15
 *for that2009-12-02 17:49:21
<sami> hopefully I understand soon what the setjmps hackery is2009-12-02 17:49:24
<Shelwien> setjmp hackery is http://en.wikipedia.org/wiki/Coroutine2009-12-02 17:49:43
 after this i'm going to try writing all the coders in this style2009-12-02 17:50:37
 it allows to use memory buffers and fast enough access to everything2009-12-02 17:51:12
 and also allows to write completely separate modules with a simple API2009-12-02 17:51:47
 its not really necessary in this MTF example2009-12-02 17:52:04
 but already for something like Unicode-to-UTF8 converter2009-12-02 17:53:14
 the main look with similar buffering would be much messier2009-12-02 17:53:33
 and with rangecoders2009-12-02 17:53:52
 I didn't really ever see a good library with a universal API2009-12-02 17:54:22
 *the main loop2009-12-02 17:55:24
<sami> don't know what to call this. I've seen a lot of this kind of stuff, I don't recall what were they called. I'm not saying I've seen exactly this though. do you have results for this?2009-12-02 18:04:55
<Shelwien> what kind? i can benchmark v0 vs v1 if you want, but its not very sensible2009-12-02 18:05:47
<sami> probably this is novel anyway2009-12-02 18:05:59
 I mean this kind of symbol ranking variants2009-12-02 18:06:26
<Shelwien> what's interesting2009-12-02 18:06:43
 is that it gains ~3k vs plain MTF2009-12-02 18:07:03
 (after entropy coding of book1rbwt)2009-12-02 18:07:18
 and also it might be actually faster than MTF2009-12-02 18:07:38
 because rank updates are skipped sometimes2009-12-02 18:08:01
<sami> what about the mtf that moves to rank 1 instead of 0 (and only to zero from one)?2009-12-02 18:08:11
<Shelwien> well, i can try that2009-12-02 18:08:47
 btw, this MTF topic appeared2009-12-02 18:09:16
 because of unary coding actually ;)2009-12-02 18:09:21
<sami> also to rank 2 instead of zero and only from <2 to 02009-12-02 18:09:30
<Shelwien> as unary coding uses some ranking2009-12-02 18:09:33
 235959, 232079, 231772, 2294962009-12-02 18:15:23
 mtf+fpaq0p, mtf1, mtf2, gMTF 2009-12-02 18:16:07
<sami> ok, nice2009-12-02 18:16:34
<Shelwien> mtf1 updates rank to rank<2?0:12009-12-02 18:16:39
 mtf2 - rank<3?0:22009-12-02 18:16:45
 its very easy to modify gMTF.inc to do that actually2009-12-02 18:17:06
<sami> although probably more testing would be needed, I mean some basic bwt fenwick structured model before we could say you killed mtf with this2009-12-02 18:17:59
 can do you one more quick test with obj2, mtf2 vs gmtf?2009-12-02 18:18:27
 or some other binary file2009-12-02 18:18:36
<Shelwien> obj2 or obj2bwt?2009-12-02 18:18:39
<sami> obj2bwt yeah2009-12-02 18:18:46
<Shelwien> ok, wait2009-12-02 18:18:52
<sami> the more testing is because the mtfs may be just too quick for fpaq adapt speed, so it may favour your method2009-12-02 18:21:12
<Shelwien> 79724, 82177, 874872009-12-02 18:21:27
 mtf, mtf2, gmtf2009-12-02 18:21:32
<sami> ok2009-12-02 18:21:47
<Shelwien> gmtf has a parameter though2009-12-02 18:22:26
*** schnaader has joined the channel2009-12-02 18:31:01
<sami> so did anybody check the new benchmark data?2009-12-02 18:47:22
<Shelwien> yours? i did open it... and didn't see any benchmark results afair...2009-12-02 18:47:56
<sami> the links should be at the top of the page2009-12-02 18:48:17
<Shelwien> yeah2009-12-02 18:48:27
 i didn't get that actually ;)2009-12-02 18:48:44
 thought that links on files go to file data ;)2009-12-02 18:49:09
<sami> ok, perhaps I can try work around something to avoid that from happening :-)2009-12-02 18:49:59
<Shelwien> results for book1 seem kinda weird... do they include decoder size?2009-12-02 18:50:26
<sami> the second number is without decoder2009-12-02 18:50:47
 I mean the "w/o stub" column2009-12-02 18:50:59
 so xwrt is leading in book1 if we don't take account the one megabyte dictionary2009-12-02 18:51:46
<Shelwien> i think maybe you should add a coefficient to it or something2009-12-02 18:52:06
 because i can compile much smaller ash for sure2009-12-02 18:52:15
<sami> unfortunately that doesn't work because some programs use sfx2009-12-02 18:52:45
 or perhaps we can just do it anyway and ignore the sfx issue like now2009-12-02 18:53:12
 anyway, the whole point is to keep the decoder small2009-12-02 18:53:59
<Shelwien> but ppmy showing "better" results than paqs etc is just dumb2009-12-02 18:54:22
<sami> book1,obj2,geo are too small for test files2009-12-02 18:54:29
 I'm just including them for reference2009-12-02 18:54:41
<Shelwien> so i suggest to add a decoder size coefficient2009-12-02 18:54:48
 like if you compressed 10 such small files2009-12-02 18:55:02
*** jj has joined the channel2009-12-02 18:55:50
<schnaader> Have you checked what this precompressed part of FlashMX.pdf actually includes? There are some big images in it, worst case could be that this is mainly testing image compression, although this wouldn't be that unusual for typical PDFs.2009-12-02 18:55:57
<Shelwien> yeah, probably2009-12-02 18:56:21
<sami> schnaader, no unfortunately I didn't have time to check it2009-12-02 18:56:30
<schnaader> I think I'll have a look at it here.2009-12-02 18:57:37
<sami> but yes, I recognize that may be possible, that's why I didn't cut ohs.doc or vcfiu.hlp because I might just be sampling something less interesting2009-12-02 18:57:38
 Shelwien, so you suggest 0.1 is a good value?2009-12-02 18:58:39
<Shelwien> well, i think yes2009-12-02 18:59:06
<sami> perhaps I must provide a third size which has such coef, because I cannot replace the main size column because of compressors that use sfx2009-12-02 18:59:34
 it's also drawback of the whole system that I cannot easily configure programs to run these test with no sfx2009-12-02 19:00:00
<schnaader> Actually, first 5 MB of FlashMX.pdf seem to be rather well mixed. There's about 3 or maybe 4 MB of it that's image content, but there also is a lot of text in it.2009-12-02 19:03:42
<sami> schnaader so we got lucky :-)2009-12-02 19:04:55
 can you see a better offset there?2009-12-02 19:05:06
<Shelwien> yeah, but sorting by combined size makes it all weird2009-12-02 19:05:36
<sami> so that perhaps we could sample less of the image?2009-12-02 19:05:36
 Shelwien you can sort the tables by clicking at the column2009-12-02 19:06:15
<Shelwien> and get lots of n/a first, yeah ;)2009-12-02 19:06:58
<sami> right, but I will fix that2009-12-02 19:07:23
<schnaader> The part after the first big image (1,2 MB decompressed) seems fine, there's another big image block (2*~1,5 MB) later but I think that's far away from it. So you could try 5 MB with a 2 or 3 MB offset, but as images are pretty mixed up with text, I think it could not be worth the effort and you could just leave it like it is :)2009-12-02 19:08:53
<sami> please reload the pages I forgot something, now they should look as they supposed to2009-12-02 19:09:23
<Shelwien> could you add some visible separators between column titles too?2009-12-02 19:10:05
<sami> schnaader, ok. thanks2009-12-02 19:10:07
 Shelwien there should be pseudoseparators there already, but not between ct & dt (and not between cm & dm)2009-12-02 19:12:02
<Shelwien> i mean like "Size | w/o stub" instead of "Size w/o stub"2009-12-02 19:12:49
 i checked in chrome and still don't see any separators there2009-12-02 19:13:06
<sami> ok not on those table headers. I try to figure out something2009-12-02 19:13:43
*** toffer has joined the channel2009-12-02 19:22:30
* Guest4706822 slaps toffer around a bit with a large fishbot2009-12-02 19:24:21
<schnaader> Ouch.. no trouts here? I guess we'd better not misbehave...2009-12-02 19:24:52
<Guest4706822> not sure toffers awake2009-12-02 19:25:08
<Shelwien> sleepwalking?2009-12-02 19:25:42
*** Guest4706822 has left the channel2009-12-02 19:27:31
<schnaader> That would be nice sleepwalking - "Last night I sleepwalked, logged in to IRC and coded some really nice compressors, now I've to understand what I did there, but results are really impressing" :D2009-12-02 19:32:53
<Shelwien> its happens sometimes with me, when i have to get up and suddenly do something2009-12-02 19:34:48
 might not remember what i did later, especially if i'd return to sleeping after that ;)2009-12-02 19:35:06
<sami> I think 0.1 is too little. 100kb dictionary becomes only 10kb2009-12-02 19:58:15
<Shelwien> http://encode.dreamhosters.com/showthread.php?t=5092009-12-02 19:58:34
<sami> you state there that it's better than mtf. too early. i suggest increasing the fpaq0 adapt speed a bit for mtf2009-12-02 20:01:32
<Shelwien> i test it not only with fpaq0 there, but also with mix_test o2 coder too2009-12-02 20:02:05
<sami> do you have results for mtf2+mixtest vs gmtf+mixtest?2009-12-02 20:03:02
<Shelwien> wait...2009-12-02 20:03:17
<sami> I've never gotten around writing a tool for myself to have various simple models at hand for tests like this. many times it would be useful2009-12-02 20:04:16
<Shelwien> mtf:224696, mtf2:221903, gmtf:2211402009-12-02 20:04:23
 well, i always just did it in the form of toolkits somehow2009-12-02 20:04:54
 unfortunately lots of such tools I had written in asm2009-12-02 20:06:02
 and they're not quite usable these days2009-12-02 20:06:14
<schnaader> But it's not like there's no more assembler out there ;) You could try to convert the code to fasm or some Windows assembler like masm32 (these are nice, it's possible to do WinApi calls with them), although with assembler code that's quite hard if its old and you don't know exactly what it does anymore.2009-12-02 20:07:53
<Shelwien> unfortunately its tasm2009-12-02 20:08:33
 with very heavy use of macros and other specific things2009-12-02 20:08:49
 and then, also these are DOS-32 programs using DPMI2009-12-02 20:09:15
 they still work under XP now, in fact2009-12-02 20:09:27
 like I have a 2k old PPM implementation etc2009-12-02 20:09:41
<schnaader> Ah OK, know these DOS memory things from old PowerBasic programs :) The days when you couldn't simply say "Give me 5 MB of memory"... better not port those monsters, yeah2009-12-02 20:10:46
<Shelwien> they're quite cool in fact2009-12-02 20:11:27
 i'd like very much to have a preprocessor like in masm/tasm for C++2009-12-02 20:11:48
 (though I use perl for that now)2009-12-02 20:11:59
 its multipass and worked in a style of declarative programming to a point2009-12-02 20:12:48
 like, i use a parity align macro somewhere2009-12-02 20:13:07
 it aligned functions in such a way that parity of low byte of their address was fixed2009-12-02 20:13:41
 was useful because of PF flag in x86 and JP/SETP etc stuff2009-12-02 20:14:04
 well, its kinda like what C++ templates could be, but are not ;)2009-12-02 20:14:42
<schnaader> Hehe, well C/C++ has never been perfect, although there were some attempts to improve it, but I guess it's just kind of too popular now so everybody wants to improve different things.2009-12-02 20:16:02
<Shelwien> here's an example: http://91.124.210.5/lng-ppm.txt2009-12-02 20:16:04
 all the unknown keywords are my macros basically2009-12-02 20:16:58
 like functions called by calls are only linked into the program when they're really called from somewhere ;)2009-12-02 20:17:31
 i'd probably program like that even now2009-12-02 20:18:28
<sami> Shelwien, how about this: reduce from size min(decoder size, median decoder size)*0.92009-12-02 20:18:45
<Shelwien> it was much more powerful comparing to C/C++, as weird as it may sound2009-12-02 20:18:47
<schnaader> :) "dvd" is funny, guess it means "define variable data"2009-12-02 20:19:21
<Shelwien> yeah ;)2009-12-02 20:19:25
<sami> also we could do that for sfx programs as well, to approximate the decoder size2009-12-02 20:19:27
<Shelwien> not that i can say anything now2009-12-02 20:19:47
 we only would be able to decide after looking at resulting order i think ;)2009-12-02 20:20:13
 and as i see it, XWRT winning at book1 is wrong, but PPMY winning is wrong too2009-12-02 20:21:13
<sami> well, again book1 is too small for comparison 2 or more compressors as I explain in the text2009-12-02 20:22:23
 ppmy still wins :-) http://compressionratings.com/sort.cgi?s_book1a.full.html+5+n2009-12-02 20:26:26
 this is x-min(stub/10,100000)2009-12-02 20:27:10
 no2009-12-02 20:27:24
 error2009-12-02 20:27:26
 this is correct I hope http://compressionratings.com/sort.cgi?s_book1c.full.html+5+n2009-12-02 20:29:16
*** toffer has left the channel2009-12-02 20:30:10
*** toffer has joined the channel2009-12-02 20:43:40
*** toffer has left the channel2009-12-02 20:44:43
 probably I need to make additional configs for sfx compressors for this test. I try to do that next weekend2009-12-02 21:01:21
*** sami has left the channel2009-12-02 21:01:45
*** schnaader has left the channel2009-12-02 21:07:00
*** chornobl has left the channel2009-12-02 21:09:52
*** Shelwien has left the channel2009-12-02 21:14:49
*** Guest9968193 has joined the channel2009-12-02 21:14:53
*** STalKer-Y has left the channel2009-12-02 21:22:29
*** STalKer-X has joined the channel2009-12-02 21:23:49
*** STalKer-X has left the channel2009-12-02 21:46:11
*** STalKer-X has joined the channel2009-12-02 21:56:39
*** STalKer-X has left the channel2009-12-02 21:56:43
*** toffer has joined the channel2009-12-02 22:16:22
*** toffer has left the channel2009-12-02 23:47:32
*** Krugz has joined the channel2009-12-03 00:58:43
*** bobzilla has joined the channel2009-12-03 05:48:07
*** pinc has joined the channel2009-12-03 06:27:06
*** pinc has left the channel2009-12-03 06:29:17
*** pinc has joined the channel2009-12-03 06:40:22
*** pinc has left the channel2009-12-03 06:41:34
*** bobzilla has left the channel2009-12-03 06:55:42
*** pinc has joined the channel2009-12-03 07:48:45
*** STalKer-X has joined the channel2009-12-03 10:23:26
<Shelwien> ...2009-12-03 11:06:53
<STalKer-X> o_o2009-12-03 11:19:41
* Shelwien goes to bring in another bot2009-12-03 11:21:38
*** compbooks has joined the channel2009-12-03 11:27:44
<Shelwien> !list2009-12-03 11:28:48
<Krugz> compbooks?2009-12-03 11:51:43
<Shelwien> only DCC articles for now2009-12-03 11:52:05
<Krugz> what do you plan to do with it? load it up with computer-related books?2009-12-03 11:52:31
<Shelwien> more like compression-related ;)2009-12-03 11:52:42
<Krugz> ahh ok2009-12-03 11:52:46
 sounds good2009-12-03 11:52:49
 I've been way too busy lately, but in a little while I'll have time to sit down and learn enough to be helpful, or at least interesting, around here2009-12-03 11:53:27
* Krugz hasn't slept yet, has class in an hour2009-12-03 11:53:43
<Shelwien> i don't think you really have to learn anything2009-12-03 11:54:27
<Krugz> ?2009-12-03 11:54:34
<Shelwien> i'm willing to talk about quite a lot of different things ;)2009-12-03 11:54:42
<Krugz> ya but I'm interested in data compression, not extremely or anything but enough that I'd be willing to sit down and learn more2009-12-03 11:55:10
 just don't have the time recently, plus not exactly sure where to get started2009-12-03 11:55:22
<Shelwien> statistics probably2009-12-03 11:55:45
<Krugz> really? hmm2009-12-03 11:55:55
<Shelwien> not whole course maybe2009-12-03 11:56:22
<Krugz> alright well I'll look into it when I get some time2009-12-03 11:57:18
 I have a bit more work to finish off, and then I have to study for finals2009-12-03 11:57:35
<Shelwien> but things like this: http://en.wikipedia.org/wiki/Maximum_likelihood#Examples2009-12-03 11:57:42
<Krugz> but after that, I'm clear to learn whatever I feel like for a long while2009-12-03 11:57:46
<Shelwien> well, i'm not going anywhere as far as i can see2009-12-03 11:58:23
<Krugz> ah don't worry about, I'm not going to drag you around to help me learn stuff :P2009-12-03 11:59:01
 if you suggest where to start and stuff, that should be good :O2009-12-03 11:59:11
<Shelwien> well, there's kinda no publications on real compression algorithms2009-12-03 11:59:53
 so i'd have to help one way or another2009-12-03 12:00:10
<Krugz> ya but I'm far from doing anything with an actual application2009-12-03 12:00:14
<Shelwien> "actual application" is something somewhat unrelated too, in fact ;)2009-12-03 12:00:58
<Krugz> I looked up BWT just a while ago, I understand the basic idea but there's definitely stuff I need to know before I really look into it2009-12-03 12:01:07
 ah not "actual application", I meant like, I'm far from being able to understand how a compression algorithm would work2009-12-03 12:01:44
<Shelwien> most people only use zip despite availability of compressors with much better performance2009-12-03 12:01:58
<Krugz> I use rar mostly2009-12-03 12:02:08
<Shelwien> same rar2009-12-03 12:02:12
 but rar is no better than zip really2009-12-03 12:02:19
<Krugz> I don't really need anything compressed much, I'm sloppy with my data2009-12-03 12:02:26
<Shelwien> same here2009-12-03 12:02:41
<Krugz> I just use it for packaging things to be sent around in one piece2009-12-03 12:02:46
<Shelwien> but as i said, there're lots of application for statistical models2009-12-03 12:02:54
 and the best way to evaluate such models is by compression2009-12-03 12:03:09
<Krugz> hmm2009-12-03 12:03:31
 so using compression as a tool to test models?2009-12-03 12:03:41
<Shelwien> for example, i'm thinking about making a talking bot here2009-12-03 12:03:47
 which would generate text using a statistical model, by channel log data2009-12-03 12:04:05
 for me, yes2009-12-03 12:04:22
<Krugz> ah ok2009-12-03 12:04:45
 I see what you're saying, I think2009-12-03 12:04:52
 jeez.. I'm getting really tired all at once2009-12-03 12:05:13
<Shelwien> ;)2009-12-03 12:05:19
<Krugz> not because of you, lol2009-12-03 12:05:23
 just lack of sleep2009-12-03 12:05:27
 hitting me just now2009-12-03 12:05:33
<Shelwien> i just got up not long ago ;)2009-12-03 12:06:02
<Krugz> I woke up about 24 hours ago, now2009-12-03 12:06:27
 I found an interesting e-book2009-12-03 12:07:00
 it's a puzzle book, I find those interesting from time to time2009-12-03 12:07:13
 this one was pretty well written, the puzzles are definitely entertaining2009-12-03 12:07:32
 ok I have to go shower, and maybe get something to eat2009-12-03 12:08:53
 I don't think I'll be back today/tonight 2009-12-03 12:09:24
 bye bye2009-12-03 12:09:28
*** Krugz has left the channel2009-12-03 12:09:35
*** Shelwien has left the channel2009-12-03 12:09:50
*** Shelwien has joined the channel2009-12-03 12:09:55
<Shelwien> !next2009-12-03 12:10:06