*** jack has left the channel | 2009-12-01 16:36:38 |
*** Guest4704955 has left the channel | 2009-12-01 16:44:47 |
*** Guest4704955 has joined the channel | 2009-12-01 16:59:25 |
*** pinc has left the channel | 2009-12-01 17:13:11 |
*** Krugz has joined the channel | 2009-12-01 17:45:58 |
*** toffer has joined the channel | 2009-12-01 18:16:18 |
*** Guest4704955 has left the channel | 2009-12-01 18:54:08 |
*** Guest4704955 has joined the channel | 2009-12-01 19:05:29 |
*** pinc has joined the channel | 2009-12-01 19:06:56 |
*** schnaader has joined the channel | 2009-12-01 20:23:15 |
<Shelwien> | people are gathering for some reason, but nobody talks %) | 2009-12-01 20:33:09 |
<schnaader_afk> | Will be talking in a few minutes :P | 2009-12-01 20:33:54 |
<schnaader> | Tada :) | 2009-12-01 20:40:43 |
| Well, it's the same for most IRC channels - even if 200 peoples are in, you may wait for half an hour without anyone talking and everything is filled with join/quit messages. | 2009-12-01 20:43:13 |
| And I think if you'd compare view/post numbers in the forum, you'd come to similar results :) | 2009-12-01 20:44:03 |
<Shelwien> | sure | 2009-12-01 20:44:32 |
| anyway, this channel's log is much more readable than any other which I know of ;) | 2009-12-01 20:45:05 |
<schnaader> | Yes, I think that's the fact that people know each other quite a bit already and it's not a bunch of random people. | 2009-12-01 20:45:39 |
<Shelwien> | ...though now I'm going afk - food calls ;) | 2009-12-01 20:46:43 |
<schnaader> | :) OK, see ya | 2009-12-01 20:47:18 |
| Have a nice meal :) | 2009-12-01 20:47:52 |
*** Guest4704955 has left the channel | 2009-12-01 20:50:46 |
*** pinc has left the channel | 2009-12-01 20:51:07 |
*** Guest4704955 has joined the channel | 2009-12-01 21:05:26 |
*** Shelwien has left the channel | 2009-12-01 21:15:41 |
*** Guest9968193 has joined the channel | 2009-12-01 21:15:45 |
<Shelwien> | btw, schnaader | 2009-12-01 21:18:35 |
| what happens when precomp encounters a broken deflate stream? | 2009-12-01 21:18:49 |
| like a file with remapped cluster in that VM image? | 2009-12-01 21:19:11 |
<schnaader> | There can be different behaviours. | 2009-12-01 21:24:44 |
<Shelwien> | i mean, can it extract just a single block for deflate stream? | 2009-12-01 21:25:17 |
<schnaader> | Worst behaviour would be a deflate (or other) stream that stops somewhere and is followed by a big bunch of same bytes which could lead to a very big output stream, recompression would detect failure in that case. | 2009-12-01 21:25:57 |
<Shelwien> | %) | 2009-12-01 21:26:25 |
<schnaader> | Did this experiment with a torrent that not finished downloading once, not recommended ;) | 2009-12-01 21:27:01 |
| After the decompressed stream growing to several GB, Precomp stopped with "disk full" | 2009-12-01 21:27:28 |
<Shelwien> | well, its not a problem with my approach - soundslimmer can losslessly process anything, even not an mp3 file | 2009-12-01 21:28:09 |
<schnaader> | But streams haven't got to be complete, indeed. In most cases, compression will just stop because the stream is invalid at some point in this case and recompression will see how long the match is. | 2009-12-01 21:28:37 |
<Shelwien> | ah. its better than i thought, then ;) | 2009-12-01 21:29:15 |
<schnaader> | There are almost always some rare "attack" cases you can construct, but it's like with hash collisions - they're not that likely to happen :) | 2009-12-01 21:29:28 |
| I even have these "penalty bytes" when some bytes of the compressed stream are different, but afterwards it's the same again. | 2009-12-01 21:30:02 |
<Shelwien> | like patches? | 2009-12-01 21:30:19 |
<schnaader> | Yes, but not that good, only works if they synchronize again, so it will not work with "00 01 02 04 05", "00 01 02 03 04 05", but with "00 01 02 FF 04 05" | 2009-12-01 21:31:11 |
| Could be improved, but as I plan a complete rewrite that doesn't need brute force, it's not necessary :) | 2009-12-01 21:31:39 |
| These patches work on bytes, the rewrite will be able to directly correct matches and always re-synchronize successful that way. | 2009-12-01 21:33:49 |
<Shelwien> | like levenstein distance on bits? ;) | 2009-12-01 21:35:18 |
<schnaader> | Yes, kind of, only that insertion is missing at the moment. | 2009-12-01 21:36:04 |
| The rewrite will basically be an own deflate implementation instead of using zLib, so I can check the recompressed result parallel to decompression and put the deflate differences in a structure that can be appended to the decompressed stream. | 2009-12-01 21:37:15 |
<Shelwien> | yeah | 2009-12-01 21:37:31 |
| i've got a puff.c clean for that too, but didn't start it still ;) | 2009-12-01 21:37:51 |
| *cleaned | 2009-12-01 21:37:59 |
<schnaader> | Like "01230123", "that dumb encoder didn't get the match, encode literals instead" :) | 2009-12-01 21:37:59 |
| I've got the decompression and most of the recompression now, but I have to add some ringbuffers to avoid using temporary files again :) | 2009-12-01 21:38:39 |
<Shelwien> | %) | 2009-12-01 21:38:51 |
| btw, why don't you add some other preprocessing too? | 2009-12-01 21:39:07 |
| like that record/delta filter in ccm? | 2009-12-01 21:39:21 |
<schnaader> | I thought about this, especially the 7-Zip + srep results brought that to my mind again. | 2009-12-01 21:39:44 |
<Shelwien> | well, rep is separate stuff, it takes a lot of memory | 2009-12-01 21:40:18 |
<schnaader> | It's also getting important with upcoming bZip2 compression-on-the-fly where we might want to reorder the data because we have 900 KB blocks. | 2009-12-01 21:40:19 |
<Shelwien> | btw, did you see my explanation of what ccm does? | 2009-12-01 21:40:59 |
<schnaader> | In the forum? Was a while ago, wasn't it? | 2009-12-01 21:41:47 |
<Shelwien> | i don't quite remember myself ;) | 2009-12-01 21:42:04 |
| anyway, its fairly simple, but has a very nice effect | 2009-12-01 21:42:23 |
| ccm processes data in 64k blocks | 2009-12-01 21:42:37 |
| and reorders them by bytes if it finds any records | 2009-12-01 21:43:04 |
*** pinc has joined the channel | 2009-12-01 21:43:10 |
| so, 16bit stereo wav | 2009-12-01 21:43:16 |
| 64k block turns into 4 x 16k byte blocks | 2009-12-01 21:43:37 |
| and there's delta too | 2009-12-01 21:43:45 |
*** pinc has left the channel | 2009-12-01 21:44:13 |
<schnaader> | Ah, I see, like abcdabcdabcd => aaabbbcccddd | 2009-12-01 21:44:32 |
<Shelwien> | yeah, but also with subtractions if necessary | 2009-12-01 21:45:19 |
<schnaader> | How does it detect this? Does it know for some stream types only or does it use a general attempt by checking some stats about the bytes? | 2009-12-01 21:45:45 |
<Shelwien> | general afaik | 2009-12-01 21:46:54 |
| its a record filter | 2009-12-01 21:46:58 |
| it not only supports wavs | 2009-12-01 21:47:12 |
| but also images and tables with fixed records | 2009-12-01 21:47:21 |
<schnaader> | Especially helpful if not 2 or 4 bytes record size | 2009-12-01 21:47:44 |
<Shelwien> | yeah | 2009-12-01 21:47:52 |
*** STalKer-X has joined the channel | 2009-12-01 21:47:56 |
| ? | 2009-12-01 21:48:02 |
<STalKer-X> | *pow* | 2009-12-01 21:48:09 |
<schnaader> | Could even be generalised to bits, but this would be harder to detect | 2009-12-01 21:48:12 |
<Shelwien> | not much sense too, imho | 2009-12-01 21:48:35 |
<schnaader> | Although if bit record size isn't prime, it's almost the same. | 2009-12-01 21:48:36 |
*** Guest4704955 has left the channel | 2009-12-01 21:48:52 |
<Shelwien> | there's another problem though, with database records | 2009-12-01 21:48:53 |
| like a record can contain a string and a few numbers | 2009-12-01 21:49:24 |
| and encoding the string part by columns might not be a good idea | 2009-12-01 21:49:53 |
| as it could otherwise match something else | 2009-12-01 21:50:08 |
<schnaader> | Well, in the bZip2 case, I can always apply different preprocessing and choose the best result for a given block. | 2009-12-01 21:51:18 |
| Of course, most of the worst cases can be detected before, anyway and not be preprocessed. | 2009-12-01 21:52:33 |
<Shelwien> | btw, intel's bzip is weird | 2009-12-01 21:54:16 |
| produces files of different size at the same modes ;) | 2009-12-01 21:54:38 |
<schnaader> | One of the first filter ideas was for PDF data, do you know these "(word )<ASCII float numbers and PDF commands>(other )(words and perhaps some)(l)<...>(etters)" crap they're doing in there? Splitting up to text, commands and encoding the floats binary would reeeeally help there :) | 2009-12-01 21:55:17 |
| Don't combine RNGs and compression ;) | 2009-12-01 21:56:27 |
| At least bZip2 isn't as bad as deflate - you know which mode was used and output will be the same most of the time, although still not 100% reliable. | 2009-12-01 21:57:45 |
<Shelwien> | i guess, there's just much less implementations ;) | 2009-12-01 21:58:11 |
| and pdf also has random stuff beside deflate... like that ascii85 etc | 2009-12-01 21:59:25 |
<schnaader> | Yes, that's the main factor. Not that easy to implement BWT things as with huffman codes and literal/match decisions. | 2009-12-01 21:59:26 |
| ascii85 is on my todo list, should have done this already :( Welcome lazyness ;) | 2009-12-01 21:59:51 |
| By the way, do you know anything about encrypted PDFs? I'm pretty sure decrypting could be done, but I'm not sure if encrypting it back with same results would be possible. Not to mention that Adobe pretty sure wouldn't like such things... | 2009-12-01 22:01:45 |
<Shelwien> | well, i can recommend a decrypting utility if you want ;) | 2009-12-01 22:03:10 |
<schnaader> | I have decryption sources as well, thanks, but nobody cares about re-encryption ;) | 2009-12-01 22:03:33 |
<Shelwien> | well, i think it should be possible to reconstruct | 2009-12-01 22:04:52 |
| if you decrypt it yourself | 2009-12-01 22:05:09 |
<schnaader> | Depends, as some (or most) of the algorithms seem to be asymmetrical, so you might have a public key, but perhaps would need the private key to re-encrypt, don't know... | 2009-12-01 22:05:55 |
<Shelwien> | and as to adobe... maybe messing up something to avoid getting a usable decrypted pdf would be a good idea ;) | 2009-12-01 22:05:59 |
<schnaader> | Output would be pretty messed up by the PCF format already, but I also thought about this, yes :) | 2009-12-01 22:07:06 |
<Shelwien> | and keys should be available anyway, as software which does the encryption is available ;) | 2009-12-01 22:07:36 |
<schnaader> | Right :) | 2009-12-01 22:07:53 |
<Shelwien> | btw, what about text filters? | 2009-12-01 22:08:14 |
| including LIPT etc? | 2009-12-01 22:08:21 |
| like WRT? | 2009-12-01 22:08:33 |
<schnaader> | LIPT? Google found "Leymann Inventory of Psychological Terror", lol | 2009-12-01 22:08:51 |
| Yes, WRT and especially HTML/XML filters also came to my mind, same thing as with most Precomp ideas - would take too long for know, other things have higher priority :) | 2009-12-01 22:10:04 |
| Although I have several code branches with such experiments at least using scripts and made-up examples. | 2009-12-01 22:10:26 |
<Shelwien> | Length Index Preserving Transform | 2009-12-01 22:11:30 |
| your version was better though, at least makes some sense ;) | 2009-12-01 22:12:07 |
<schnaader> | There's always some sort of psychological terror involved when it comes to compression :) | 2009-12-01 22:12:57 |
<Shelwien> | still, there're simpler text filters too, which still help | 2009-12-01 22:15:11 |
| like "capital conversion" | 2009-12-01 22:15:17 |
| and punctuation padding | 2009-12-01 22:15:54 |
<toffer> | such stuff is more efficient when incorporated into the context generation :D | 2009-12-01 22:17:52 |
<schnaader> | I think I should go for a cleaned up object oriented version of Precomp in beta phase (which will start soon, supporting multiple files and directories is the only big todo left for that), so generalising pre-/postprocessing would be easy and external DLLs could be used for quick tests. | 2009-12-01 22:18:14 |
<toffer> | the compressor manually applies these transforms for already processed data to improve context clustering | 2009-12-01 22:18:18 |
<Shelwien> | toffer: not quite, it also affects symbol decomposition | 2009-12-01 22:18:43 |
<toffer> | not the decomposition itself, but the processed symbols | 2009-12-01 22:19:24 |
<Shelwien> | ah | 2009-12-01 22:19:36 |
| btw, considering dlls | 2009-12-01 22:19:46 |
| did you see my precomp merged into a single exe? with packjpg? | 2009-12-01 22:19:58 |
<toffer> | someting i always wondered... how large is your source? | 2009-12-01 22:20:07 |
| @shelwien did you ever try to optimize such transforms? | 2009-12-01 22:21:37 |
<Shelwien> | there's not much to optimize kinda | 2009-12-01 22:22:13 |
<toffer> | a set of flags | 2009-12-01 22:22:28 |
<Shelwien> | you either use it, or not | 2009-12-01 22:22:29 |
<schnaader> | @Shelwien: Was this one of the posts in "How small could we get a Precomp SFX"? Something like this would be useful, although I thought about disabling PackJPG by default in the next version because the 2.4WIP version is too unstable. | 2009-12-01 22:22:31 |
<toffer> | what to apply when | 2009-12-01 22:22:33 |
| for every model, of course - assuming cm | 2009-12-01 22:23:40 |
<Shelwien> | schnaader: i made a tool called dllmerge, which resolves exe imports/exports with a statically binded dll and merges them | 2009-12-01 22:23:45 |
<toffer> | well it worked with pthread+m1 | 2009-12-01 22:24:05 |
<Shelwien> | worked with precomp too | 2009-12-01 22:24:12 |
<schnaader> | @toffer: At the moment it's about 9000 LOC, 300 KB source size (excluding external GIF routines and zLib). It could be smaller, though as it pretty messed up, for example there are try_decompression_(pdf/zip/...) routines that could be merged into one with some branches. | 2009-12-01 22:25:08 |
<toffer> | @eugene: i see you did "hand"-tuning to your mtf ? that ranking function only used an enum as a constant | 2009-12-01 22:25:08 |
| ouch | 2009-12-01 22:25:21 |
<Shelwien> | yeah | 2009-12-01 22:25:24 |
<toffer> | 300kb | 2009-12-01 22:25:26 |
| i mean i got a few 1000 loc, but it's just | 2009-12-01 22:25:44 |
| 80kb | 2009-12-01 22:25:47 |
| you should really concider c++ | 2009-12-01 22:26:19 |
| afaik it was c? | 2009-12-01 22:26:24 |
| i mean templates are pretty useful for code generation | 2009-12-01 22:26:40 |
<schnaader> | This is C++, but you're right, not using OO as I should :) | 2009-12-01 22:26:42 |
<toffer> | ^^ | 2009-12-01 22:26:53 |
<schnaader> | You also see that routine merge lazyness in the EXE - 400 KB -> 130 KB with UPX. | 2009-12-01 22:26:59 |
<Shelwien> | ;) | 2009-12-01 22:27:20 |
<toffer> | usually such code attracts errors quite a bit | 2009-12-01 22:27:30 |
<schnaader> | So I guess LOC could get down to about 3000 LOC easily, but it just wouldn't change much, so I didn't bother yet. | 2009-12-01 22:27:31 |
| @toffer: Yes, this is indeed the best argument for a rewrite. | 2009-12-01 22:28:01 |
| It also isn't helpful with new most of the new features like compression-on-the-fly where you have to replace all the fread/fwrite's you didn't generalize although you knew you should have done it :) | 2009-12-01 22:29:06 |
<Shelwien> | http://en.wikipedia.org/wiki/Coroutine | 2009-12-01 22:29:51 |
<toffer> | on the other hand i have serious trouble from time to time with c++ stl with vector of vector of vector and some other rather basic stuff. checked the assembly and the code was wrong causing random memory poking, etc. | 2009-12-01 22:30:26 |
<schnaader> | And recursion would have been a lot easier without all that BAAAD global variables I have to push/pop now :( | 2009-12-01 22:31:03 |
<toffer> | i mean an excessive usage of such c++ features reveals bugs quite often. | 2009-12-01 22:31:04 |
| that's really ugly | 2009-12-01 22:31:17 |
| i got no global vars in my code at all | 2009-12-01 22:31:32 |
| ^^ | 2009-12-01 22:31:33 |
<Shelwien> | you have them in fact | 2009-12-01 22:31:49 |
| like _errno | 2009-12-01 22:31:52 |
<schnaader> | Linux version will be an interesting thing because I'll do some valgrind experiments, could reveal some memory leaks/errors that are there quite sure. | 2009-12-01 22:31:55 |
<toffer> | well that's the c library | 2009-12-01 22:32:03 |
| but not the stuff i've written | 2009-12-01 22:32:09 |
<Shelwien> | ;) | 2009-12-01 22:32:15 |
<toffer> | @eugene: i made some experiments for possible speedups. there' | 2009-12-01 22:32:44 |
<Shelwien> | ? | 2009-12-01 22:32:58 |
<toffer> | there's some potential in replacing hashing with direct lookups | 2009-12-01 22:33:00 |
| in m1 | 2009-12-01 22:33:01 |
| but that requires to detect, e.g. order1 and 2 context mask | 2009-12-01 22:33:16 |
<Shelwien> | ah. like what i did in mix_test? | 2009-12-01 22:33:21 |
<toffer> | and special code to handle. | 2009-12-01 22:33:26 |
| short contexts only | 2009-12-01 22:33:35 |
| o1,2 | 2009-12-01 22:33:39 |
| you always used lookup tables afaik | 2009-12-01 22:33:46 |
<Shelwien> | they don't need any hashing obviously ;) | 2009-12-01 22:33:54 |
<toffer> | but it's 5-8% faster | 2009-12-01 22:34:21 |
| even with dumb code | 2009-12-01 22:34:26 |
<Shelwien> | should be ;) | 2009-12-01 22:34:35 |
| that might be useful for you then - http://encode.dreamhosters.com/showthread.php?t=396 | 2009-12-01 22:35:02 |
| you can check whether its a constant or variable | 2009-12-01 22:35:19 |
| and select direct lookups if its constant and mask fits into 64k | 2009-12-01 22:35:43 |
<toffer> | well more or less | 2009-12-01 22:36:22 |
| but that'd require to makeloadable parameters constant | 2009-12-01 22:36:41 |
<Shelwien> | you can generate multiple versions in compile-time | 2009-12-01 22:37:15 |
| btw, the trick which i did in ccm_sh should be usable with gcc to i think | 2009-12-01 22:37:33 |
<toffer> | that would bloat the exe size multiple times | 2009-12-01 22:37:33 |
<Shelwien> | yeah, so what? | 2009-12-01 22:37:42 |
| upx etc... | 2009-12-01 22:37:48 |
<toffer> | that just sounds ill to me | 2009-12-01 22:38:12 |
| if i can simply have a single more if | 2009-12-01 22:38:21 |
<Shelwien> | why not if its faster | 2009-12-01 22:38:21 |
| runtime if is bad | 2009-12-01 22:38:33 |
<toffer> | one more if per model per byte | 2009-12-01 22:38:35 |
<Shelwien> | worse than division | 2009-12-01 22:38:37 |
<toffer> | if i'd do that per bit yes | 2009-12-01 22:38:53 |
| but that way it's acceptable | 2009-12-01 22:38:59 |
<Shelwien> | whatever, if the code would be still there | 2009-12-01 22:39:07 |
| it also fragments code cache etc | 2009-12-01 22:39:25 |
| anyway, i was talking about the idea | 2009-12-01 22:39:40 |
| with compiling the same source multiple times with different macro parameters | 2009-12-01 22:39:59 |
| and linking it all together after all | 2009-12-01 22:40:15 |
| as i found, it gave me a considerable speedup | 2009-12-01 22:40:35 |
| because i was able to use separate PGO for decoder and encoder | 2009-12-01 22:40:55 |
| and different compiter options | 2009-12-01 22:41:05 |
| *compiler | 2009-12-01 22:41:09 |
| and their code ranges didn't overlap | 2009-12-01 22:41:21 |
<toffer> | as you know all parameters must be run-time loadable | 2009-12-01 22:42:02 |
| i simply cannot use such an approach | 2009-12-01 22:42:07 |
<Shelwien> | well, you can | 2009-12-01 22:42:35 |
| like, check the masks and select a codec version based on that | 2009-12-01 22:42:56 |
| with direct or hashed lookups | 2009-12-01 22:43:02 |
<toffer> | yes, but i won't do that for *all* possible combinations | 2009-12-01 22:44:17 |
| since the number grows exponentially | 2009-12-01 22:44:23 |
| it still requires to inject some code | 2009-12-01 22:44:38 |
<Shelwien> | well, runtime code generation is the best | 2009-12-01 22:44:51 |
| but damned C++ doesn't have such a feature | 2009-12-01 22:45:01 |
<toffer> | not inject in that sense | 2009-12-01 22:45:01 |
| but it would be very nice, indeed | 2009-12-01 22:45:12 |
| as most parameters are just machine words | 2009-12-01 22:45:36 |
<Shelwien> | ...afk, sorry | 2009-12-01 22:48:52 |
<schnaader> | Better afk than your chair getting wet :P | 2009-12-01 22:49:46 |
<toffer> | i just tested the code | 2009-12-01 22:50:37 |
| and hard coded that lookups | 2009-12-01 22:50:43 |
| it's 1% faster | 2009-12-01 22:50:47 |
| not worth the effort | 2009-12-01 22:50:52 |
| 5.66s -> 5.61s | 2009-12-01 22:51:09 |
<schnaader> | Is that 1% constant or would it grow with more complex settings? | 2009-12-01 22:51:23 |
<toffer> | constant | 2009-12-01 22:53:45 |
| it would grow, of course | 2009-12-01 22:53:54 |
<schnaader> | OK, just thought about it because you said combinations could grow exponentially. | 2009-12-01 22:54:02 |
<toffer> | but the number of possible different combinations i'd need to compile grows exponentially | 2009-12-01 22:54:13 |
<schnaader> | Yay, 2 dev/null/nethack trophies this year :) http://nethack.kahrens.com/playertrophies.php?id=424&year=2009&place=First&size=Large | 2009-12-01 23:18:09 |
<toffer> | dunnot know about that | 2009-12-01 23:25:07 |
| gonna watch family guy now | 2009-12-01 23:25:13 |
<schnaader> | Family guy is so funny :) So have fun ;) | 2009-12-01 23:26:01 |
<toffer> | really? | 2009-12-01 23:32:02 |
| well i like it a lot | 2009-12-01 23:32:05 |
<schnaader> | I like the kind of strong, but still somewhat critical humor in it, like in American Dad or Drawn Together (or in the Simpsons, although not that extreme). | 2009-12-01 23:33:15 |
<toffer> | well the simpsons are really great. for kids and for adults. i mean when i was a child i didn't understand all of the stuff in it reflecting something real | 2009-12-01 23:34:57 |
<schnaader> | Yes, though it's a good mix so you still like it as a child :) | 2009-12-01 23:35:26 |
<toffer> | cheers | 2009-12-01 23:38:34 |
| gn8 | 2009-12-02 00:55:59 |
*** toffer has left the channel | 2009-12-02 00:56:06 |
*** schnaader has left the channel | 2009-12-02 00:57:47 |
*** STalKer-Y has joined the channel | 2009-12-02 04:06:57 |
*** STalKer-X has left the channel | 2009-12-02 04:10:02 |
*** Krugz has left the channel | 2009-12-02 07:02:48 |
*** pinc has joined the channel | 2009-12-02 09:15:10 |
*** schnaader has joined the channel | 2009-12-02 14:59:45 |
*** schnaader has left the channel | 2009-12-02 15:15:06 |
*** toffer has joined the channel | 2009-12-02 15:28:12 |
| hi guys | 2009-12-02 15:28:50 |
<Shelwien> | hi toffer, they're all bots | 2009-12-02 15:29:16 |
<toffer> | erm? | 2009-12-02 15:29:27 |
| did you finally write some? | 2009-12-02 15:29:59 |
<Shelwien> | no, they somehow appear even without me ;) | 2009-12-02 15:30:41 |
| though i did write complogger ;) | 2009-12-02 15:31:02 |
<toffer> | well, yes | 2009-12-02 15:31:09 |
| but i thought pinc and asmodean are real | 2009-12-02 15:31:21 |
<Shelwien> | well, sometimes, very rarely ;) | 2009-12-02 15:31:46 |
<pinc> | yepp, sometimes I'm real )) | 2009-12-02 15:32:03 |
<toffer> | you must be kidding - they're just idle | 2009-12-02 15:33:31 |
<Shelwien> | of course, but in a sense, mirc without user is no different from complogger ;) | 2009-12-02 15:34:48 |
<toffer> | so you're kidding ^^ | 2009-12-02 15:35:51 |
<Shelwien> | ... | 2009-12-02 15:36:11 |
| i'm writing a coroutine demo here | 2009-12-02 15:36:29 |
| rewritten that mtf utility using setjmp/longjmp | 2009-12-02 15:37:02 |
| and gcc is annoying me as usual | 2009-12-02 15:37:14 |
| i mean, it works with MSC/Intel, but not gcc | 2009-12-02 15:37:44 |
<toffer> | how do you want to parallelize it? | 2009-12-02 15:48:02 |
<Shelwien> | its not about paralleling | 2009-12-02 15:48:29 |
| its about building a data processing pipeline with readable syntax | 2009-12-02 15:49:13 |
<toffer> | erm but...? | 2009-12-02 15:49:16 |
<Shelwien> | well, i can post the current version, though it doesn't work with gcc yet | 2009-12-02 15:50:38 |
| i'm trying to fix that too, but its tricky | 2009-12-02 15:50:48 |
<toffer> | i'll first have a look at coroutines | 2009-12-02 15:52:13 |
| could you grep it | 2009-12-02 15:52:19 |
| ? | 2009-12-02 15:52:20 |
<Shelwien> | you can too | 2009-12-02 15:52:29 |
<toffer> | !grep or something | 2009-12-02 15:52:33 |
| ah | 2009-12-02 15:52:39 |
| ^^ | 2009-12-02 15:52:41 |
<Shelwien> | ;) | 2009-12-02 15:52:42 |
<toffer> | just guessed the syntax right | 2009-12-02 15:52:51 |
| !grep coroutine | 2009-12-02 15:52:54 |
| mh | 2009-12-02 15:53:09 |
| was it on wikipedia? | 2009-12-02 15:53:12 |
<Shelwien> | its case-sensitive | 2009-12-02 15:53:12 |
| !grep Coro | 2009-12-02 15:53:18 |
<toffer> | ah | 2009-12-02 15:53:23 |
| thanks | 2009-12-02 15:53:25 |
| btw i evaluated the speed gain of different implementations | 2009-12-02 15:58:00 |
| regarding table lookups | 2009-12-02 15:58:06 |
<Shelwien> | ? | 2009-12-02 15:58:15 |
<toffer> | i mean under certain circumstances it's beneficial to use lookup tables instead of hashign | 2009-12-02 15:58:45 |
| hashing | 2009-12-02 15:58:48 |
<Shelwien> | well, hashing is lookup tables with randomized indexing | 2009-12-02 15:59:35 |
| of course direct indexing is faster | 2009-12-02 15:59:45 |
<toffer> | i've written code to detect context masks like 0xff, 0x40ff, 0x405ff and the same for order 2. | 2009-12-02 15:59:46 |
| and gonna use specialized codecs for either one or two directly addressable models | 2009-12-02 16:00:25 |
| in mos cases i got order 1 and 2 anyway | 2009-12-02 16:00:39 |
<Shelwien> | well, that's something too, i guess | 2009-12-02 16:01:04 |
| though i hope you don't only support fixed masks, but count mask bits | 2009-12-02 16:01:39 |
<toffer> | i could do that but it'd require to reorder the bits | 2009-12-02 16:02:25 |
| which is slow | 2009-12-02 16:02:38 |
<Shelwien> | yeah, i think that would be still faster than hashing | 2009-12-02 16:02:45 |
<toffer> | the fastest implementation i can think of is to have translation tables | 2009-12-02 16:03:22 |
| e.g. tab[c] is setup for mask m to stuff bits together | 2009-12-02 16:04:07 |
| but that'd still require a loop over 8 bytes | 2009-12-02 16:04:28 |
| which is slow | 2009-12-02 16:04:31 |
<Shelwien> | well, yeah, though i just precompile the code for that | 2009-12-02 16:04:34 |
| damned google finally completely dropped googlepages a few days ago | 2009-12-02 16:05:27 |
| its annoying as hell now | 2009-12-02 16:05:31 |
<toffer> | ^^ | 2009-12-02 16:05:32 |
| google is evil | 2009-12-02 16:05:35 |
<Shelwien> | http://sites.google.com/site/shelwien/gmtf_v0a.rar | 2009-12-02 16:05:38 |
<toffer> | i cannot get specialized code for all of that | 2009-12-02 16:05:44 |
| that's impossible | 2009-12-02 16:05:47 |
| even for a single mask | 2009-12-02 16:05:51 |
<Shelwien> | that depends on what you want to do | 2009-12-02 16:06:07 |
<toffer> | since it'd require to have 256^# of bytes to translate different pieces | 2009-12-02 16:06:09 |
| and you overestimate the speed gain of lookups | 2009-12-02 16:06:32 |
| the optimization gets 8% | 2009-12-02 16:06:43 |
| speed improvement | 2009-12-02 16:06:49 |
<Shelwien> | but i don't think that being able to tune to data _and_ use new profiles right away with all possible speed optimization is that important | 2009-12-02 16:07:16 |
| so you can either build new versions by recompiling the model after retuning | 2009-12-02 16:08:18 |
| like i do, and zpaq now | 2009-12-02 16:08:23 |
| or you can also implement a generalized version | 2009-12-02 16:08:49 |
| which would support any profiles | 2009-12-02 16:08:57 |
| but won't be speed-optimized | 2009-12-02 16:09:02 |
<toffer> | well i still like to get that speed hit without specialisation | 2009-12-02 16:10:37 |
| i could modify my code generator to produce a header with the hard-coded parameters. | 2009-12-02 16:11:08 |
<Shelwien> | yeah | 2009-12-02 16:11:14 |
<toffer> | actually it was like that for previous version <= 0.2 | 2009-12-02 16:11:17 |
<Shelwien> | and well, i don't see the point with loosing the possible gain with specialization | 2009-12-02 16:11:37 |
<toffer> | but my current optimizer approach is more generalized thus support run-time parameter loading and multi threading | 2009-12-02 16:11:39 |
| it's no loss | 2009-12-02 16:11:53 |
| if it cannot use lookup tables it switches to the current implementation: hash tables | 2009-12-02 16:12:17 |
<Shelwien> | well, of course its no loss until you properly optimize the specialized version | 2009-12-02 16:12:30 |
*** chornobl has joined the channel | 2009-12-02 16:12:33 |
| bl? | 2009-12-02 16:13:03 |
<chornobl> | ive shortened it | 2009-12-02 16:15:27 |
| since old nick where banned | 2009-12-02 16:15:38 |
*** Krugz has joined the channel | 2009-12-02 16:15:46 |
| btw, theres question about your p2p idea | 2009-12-02 16:20:09 |
<Shelwien> | ? | 2009-12-02 16:20:23 |
<chornobl> | how would it handle multiple nested files | 2009-12-02 16:21:11 |
| like iso which contans zip which conatains jpg | 2009-12-02 16:21:43 |
<Shelwien> | well, its not quite related to p2p - that's more about matching recompressed data | 2009-12-02 16:21:54 |
<chornobl> | anyway | 2009-12-02 16:22:05 |
<Shelwien> | and afaiu, we can just compute multiple hashtables for a file | 2009-12-02 16:22:29 |
| i mean, there could be a matching compressed version of original file | 2009-12-02 16:22:58 |
| or, otherwise, some unpacked contents can match | 2009-12-02 16:23:16 |
| but either way, we can detect that | 2009-12-02 16:23:35 |
<chornobl> | so it will be hierarchical structure | 2009-12-02 16:24:02 |
<Shelwien> | although reconstructing the file from multiple sources would be very tricky to implement | 2009-12-02 16:24:16 |
| i mean, if i downloaded half of the zip archive compressed | 2009-12-02 16:24:39 |
| and can't find any more seeds | 2009-12-02 16:24:48 |
*** sami has joined the channel | 2009-12-02 16:25:02 |
| and then i find other files supposedly contained there | 2009-12-02 16:25:06 |
<sami> | hi! | 2009-12-02 16:25:14 |
<Shelwien> | but unpacked, or with different compression | 2009-12-02 16:25:40 |
| still, thats better than nothing | 2009-12-02 16:25:59 |
| hi sami ;) | 2009-12-02 16:26:01 |
<toffer> | hi | 2009-12-02 16:26:13 |
<Shelwien> | sami: http://sites.google.com/site/shelwien/gmtf_v0a.rar | 2009-12-02 16:26:18 |
| its my upcoming coroutine demo (still buggy) | 2009-12-02 16:26:32 |
| do you have any suggestions? | 2009-12-02 16:26:41 |
<chornobl> | depth of incapsulation should be limited, or manually controlled, to get sane hash size | 2009-12-02 16:28:39 |
<Shelwien> | sane hash size doesn't really matter for p2p | 2009-12-02 16:29:04 |
| as it won't be transferred anywhere until matches found | 2009-12-02 16:29:33 |
<chornobl> | still there should be some adaptivity, because video file differs from example mentioned above, so bits need to be spread differently betwen levels (1 vs 3) | 2009-12-02 16:34:03 |
<Shelwien> | i don't understand | 2009-12-02 16:34:41 |
<chornobl> | i mean more bits can be given to video file (not precompressible) | 2009-12-02 16:36:12 |
<Shelwien> | still don't know what are you talking about | 2009-12-02 16:36:40 |
| the idea is that we can find somebody who has a given data fragment | 2009-12-02 16:36:59 |
<chornobl> | than first nested level of same sized iso (precompresseble) | 2009-12-02 16:37:00 |
<Shelwien> | by its hash | 2009-12-02 16:37:03 |
| and some data can have multiple representations | 2009-12-02 16:37:37 |
<chornobl> | guess i lost some comunication skills recently =) | 2009-12-02 16:38:00 |
<Shelwien> | so we can index all or at least some of these | 2009-12-02 16:38:03 |
<toffer> | having three specialized coding routines increases code size just by 20kb | 2009-12-02 16:38:08 |
<Shelwien> | well, just think about it in asm terms | 2009-12-02 16:38:28 |
| its still a lot actually ;) | 2009-12-02 16:38:31 |
<toffer> | it's 20% slower now... guess gcc didn't do inlining properly... | 2009-12-02 16:39:42 |
<Shelwien> | ;) | 2009-12-02 16:40:18 |
<toffer> | yep the bit coding routine isn't inlined | 2009-12-02 16:40:40 |
| well explicit template instantiation does the job | 2009-12-02 16:46:02 |
| let's see how large the exe will be ^^ | 2009-12-02 16:46:08 |
<Shelwien> | i'd remind the idea from ccm_sh | 2009-12-02 16:46:32 |
| you can separately compile multiple codec instances to separate object files | 2009-12-02 16:47:00 |
| and only then link them together | 2009-12-02 16:47:21 |
| its especially helpful if taking into account the PGO | 2009-12-02 16:47:47 |
<sami> | http://compressionratings.com/s_ref.html the "new" test files | 2009-12-02 16:47:58 |
<Shelwien> | did you see new Bulat's benchmark btw? | 2009-12-02 16:48:16 |
<sami> | it appears sorting the n/a gets put into the top | 2009-12-02 16:48:28 |
| no, where is it? | 2009-12-02 16:48:32 |
<Shelwien> | http://encode.dreamhosters.com/showthread.php?t=507 | 2009-12-02 16:48:43 |
<sami> | just noticed that bwtmix1 didn't get tested in these ref files, will fix that | 2009-12-02 16:49:04 |
<Shelwien> | hope it won't die | 2009-12-02 16:49:19 |
| i mean, freeze ;) | 2009-12-02 16:49:30 |
<toffer> | somehow that seems to hurt compiler optimizations | 2009-12-02 16:49:43 |
<chornobl> | it wot grow too much either | 2009-12-02 16:49:48 |
<toffer> | it's 10% slower now | 2009-12-02 16:49:50 |
| >.< | 2009-12-02 16:49:53 |
<Shelwien> | what does? | 2009-12-02 16:50:02 |
<chornobl> | as main purpose (i think) promote fa and new srep | 2009-12-02 16:50:22 |
<Shelwien> | there's no sense to promote new srep (also its slow, especially decoding) | 2009-12-02 16:50:54 |
| because people won't really care until he makes it internal | 2009-12-02 16:51:17 |
<chornobl> | repack mainacs already care | 2009-12-02 16:51:50 |
<Shelwien> | sami: http://encode.dreamhosters.com/showthread.php?p=10064#post10064 | 2009-12-02 16:52:34 |
*** pinc has left the channel | 2009-12-02 17:09:58 |
<sami> | since bulat has public test file(s) that is reasonable and all switches are run already guarantees I pretty much like any test. seems that this is reasonable multithreading + long match test | 2009-12-02 17:10:59 |
<Shelwien> | yeah, but i wonder about times | 2009-12-02 17:12:02 |
<toffer> | somehow i get best gcc results when the encodign and decoding routine are separately compiled. but both into the same .cpp | 2009-12-02 17:13:46 |
<sami> | the nz times doesn't look very positive, but I guess those are possible. io is much more expensive than fa and -cd is slower than -cD, which is only possible with some very huge long match | 2009-12-02 17:14:06 |
| also I had to download the script to find out even how much memory is nz using, I wish that info would be on the tables | 2009-12-02 17:15:04 |
<Shelwien> | ;) | 2009-12-02 17:15:19 |
| toffer: yeah, that's what i suggested too | 2009-12-02 17:15:34 |
<toffer> | not really | 2009-12-02 17:16:00 |
<Shelwien> | ...meanwhile, it seems like i finally fixed that damned thing | 2009-12-02 17:16:08 |
| and it works with gcc now | 2009-12-02 17:16:12 |
<toffer> | i mean separate cpp for encoding and decoding instanciation hurt | 2009-12-02 17:16:22 |
| but both inside the same helps a bit | 2009-12-02 17:16:33 |
<Shelwien> | not sure what do you mean then | 2009-12-02 17:17:03 |
| do you use separate .o files for encoder and decoder, or not? | 2009-12-02 17:17:28 |
<toffer> | codec<ENCODE> in enc.o and codec<DECODE in dec.o separate hurts code generation after profiling. but both in one file helps | 2009-12-02 17:18:01 |
| the thing which helps is to separate the codec instanciation from the driver code | 2009-12-02 17:18:41 |
<Shelwien> | err... but you have to make different profiles for encoding and decoding, and use them properly | 2009-12-02 17:20:23 |
<toffer> | i know | 2009-12-02 17:21:16 |
| it's still weird | 2009-12-02 17:21:24 |
| i got a command line switch to do both, encoding and decoding for profile generation | 2009-12-02 17:21:44 |
<Shelwien> | yeah, but its bad actually | 2009-12-02 17:22:01 |
| you see, the compiler would think that they work at once | 2009-12-02 17:22:23 |
| (decoding and encoding) | 2009-12-02 17:22:28 |
| it only collects numbers of occurences on branches etc | 2009-12-02 17:22:46 |
| but doesn't understand the order | 2009-12-02 17:22:55 |
| so if in if(cond) branch1; else branch2; | 2009-12-02 17:23:22 |
| branch1 is always taken in encoding | 2009-12-02 17:23:29 |
| and branch2 in decoding | 2009-12-02 17:23:33 |
| it'd think that branch1 probability is 0.5 | 2009-12-02 17:24:15 |
<toffer> | that makes no sense if both routines are separate | 2009-12-02 17:24:46 |
<Shelwien> | it doesn't understand that | 2009-12-02 17:24:57 |
| and it doesn't understand a thing about layouts | 2009-12-02 17:25:15 |
| so it would just generate functions in order of parsing | 2009-12-02 17:25:42 |
<toffer> | i don'T see where the problem should be. there is a specialized function for encoding. it got stats for that. and for decoding there's a specialized function, too | 2009-12-02 17:26:07 |
<Shelwien> | as i said... it thinks that they work both at once | 2009-12-02 17:26:32 |
| so instead of optimizing each function alone | 2009-12-02 17:26:58 |
| it would try to optimize "whole program" | 2009-12-02 17:27:11 |
| 28.547s 31.547s ccm_sh1d99 | 2009-12-02 17:28:25 |
| 29.219s 29.891s ccm_sh1d9b | 2009-12-02 17:28:25 |
| 28.187s 29.515s ccm_sh1d9e # modular build | 2009-12-02 17:28:25 |
<toffer> | yes, i understand that. but the odd thing i wanted to point out is that doing it that way hurts the generated code | 2009-12-02 17:28:41 |
<Shelwien> | here first line has global PGO | 2009-12-02 17:28:46 |
| and second decoder PGO | 2009-12-02 17:28:50 |
| and third has both | 2009-12-02 17:28:56 |
<toffer> | separating the driver program and the encoder,decoder helps | 2009-12-02 17:29:17 |
| but having 3 separate components hurts | 2009-12-02 17:29:25 |
| component = object file | 2009-12-02 17:29:31 |
<Shelwien> | well, i had 3 and it helped | 2009-12-02 17:29:39 |
| of course there're various alignment quirks etc | 2009-12-02 17:29:52 |
| which i avoided but using different COFF sections for modules | 2009-12-02 17:30:13 |
| dunno how to do it with gcc though | 2009-12-02 17:30:22 |
<toffer> | i gonna do some exact speed tests now. up until now i get 4 models compressing enwik7 in 4.99secs. a single m1 took 2.1sec :D | 2009-12-02 17:32:57 |
| somehow i don't understand why it scales better than linear | 2009-12-02 17:33:14 |
<Shelwien> | memory lookups overlap with computing? | 2009-12-02 17:33:42 |
<toffer> | dunnot know | 2009-12-02 17:33:51 |
| but it looks odd to me | 2009-12-02 17:33:55 |
| gonna be back in 30 mins | 2009-12-02 17:34:00 |
| bye | 2009-12-02 17:34:03 |
<Shelwien> | sami? | 2009-12-02 17:34:17 |
*** toffer has left the channel | 2009-12-02 17:34:25 |
<sami> | Shelwien, did I miss something? I'm now looking at your mtf stuff | 2009-12-02 17:43:11 |
<Shelwien> | http://ctxmodel.net/files/mix_test/gmtf_v1.rar | 2009-12-02 17:43:24 |
| supposedly i made it to work with gcc | 2009-12-02 17:43:42 |
| please check if you can | 2009-12-02 17:43:46 |
| and as to mtf, the version w/o coroutines might be easier to read - http://ctxmodel.net/files/mix_test/gmtf_v0.rar | 2009-12-02 17:44:33 |
*** Krugz has left the channel | 2009-12-02 17:46:01 |
<sami> | g++ compiles it, but -Wall spills out a lot of stuff | 2009-12-02 17:46:57 |
<Shelwien> | didn't check that, the question is whether it works at all, or crashes | 2009-12-02 17:47:36 |
| g++ mtf.cpp -o mtf | 2009-12-02 17:47:53 |
| ./mtf c book1bwt 1 | 2009-12-02 17:48:01 |
| ./mtf d 1 2 | 2009-12-02 17:48:04 |
| should produce file 2 identical to book1bwt | 2009-12-02 17:48:19 |
<sami> | works fine for book1rbwt | 2009-12-02 17:48:47 |
<Shelwien> | ok, great | 2009-12-02 17:48:58 |
| do you know a name for then weird MTF version then? | 2009-12-02 17:49:15 |
| *for that | 2009-12-02 17:49:21 |
<sami> | hopefully I understand soon what the setjmps hackery is | 2009-12-02 17:49:24 |
<Shelwien> | setjmp hackery is http://en.wikipedia.org/wiki/Coroutine | 2009-12-02 17:49:43 |
| after this i'm going to try writing all the coders in this style | 2009-12-02 17:50:37 |
| it allows to use memory buffers and fast enough access to everything | 2009-12-02 17:51:12 |
| and also allows to write completely separate modules with a simple API | 2009-12-02 17:51:47 |
| its not really necessary in this MTF example | 2009-12-02 17:52:04 |
| but already for something like Unicode-to-UTF8 converter | 2009-12-02 17:53:14 |
| the main look with similar buffering would be much messier | 2009-12-02 17:53:33 |
| and with rangecoders | 2009-12-02 17:53:52 |
| I didn't really ever see a good library with a universal API | 2009-12-02 17:54:22 |
| *the main loop | 2009-12-02 17:55:24 |
<sami> | don't know what to call this. I've seen a lot of this kind of stuff, I don't recall what were they called. I'm not saying I've seen exactly this though. do you have results for this? | 2009-12-02 18:04:55 |
<Shelwien> | what kind? i can benchmark v0 vs v1 if you want, but its not very sensible | 2009-12-02 18:05:47 |
<sami> | probably this is novel anyway | 2009-12-02 18:05:59 |
| I mean this kind of symbol ranking variants | 2009-12-02 18:06:26 |
<Shelwien> | what's interesting | 2009-12-02 18:06:43 |
| is that it gains ~3k vs plain MTF | 2009-12-02 18:07:03 |
| (after entropy coding of book1rbwt) | 2009-12-02 18:07:18 |
| and also it might be actually faster than MTF | 2009-12-02 18:07:38 |
| because rank updates are skipped sometimes | 2009-12-02 18:08:01 |
<sami> | what about the mtf that moves to rank 1 instead of 0 (and only to zero from one)? | 2009-12-02 18:08:11 |
<Shelwien> | well, i can try that | 2009-12-02 18:08:47 |
| btw, this MTF topic appeared | 2009-12-02 18:09:16 |
| because of unary coding actually ;) | 2009-12-02 18:09:21 |
<sami> | also to rank 2 instead of zero and only from <2 to 0 | 2009-12-02 18:09:30 |
<Shelwien> | as unary coding uses some ranking | 2009-12-02 18:09:33 |
| 235959, 232079, 231772, 229496 | 2009-12-02 18:15:23 |
| mtf+fpaq0p, mtf1, mtf2, gMTF | 2009-12-02 18:16:07 |
<sami> | ok, nice | 2009-12-02 18:16:34 |
<Shelwien> | mtf1 updates rank to rank<2?0:1 | 2009-12-02 18:16:39 |
| mtf2 - rank<3?0:2 | 2009-12-02 18:16:45 |
| its very easy to modify gMTF.inc to do that actually | 2009-12-02 18:17:06 |
<sami> | although probably more testing would be needed, I mean some basic bwt fenwick structured model before we could say you killed mtf with this | 2009-12-02 18:17:59 |
| can do you one more quick test with obj2, mtf2 vs gmtf? | 2009-12-02 18:18:27 |
| or some other binary file | 2009-12-02 18:18:36 |
<Shelwien> | obj2 or obj2bwt? | 2009-12-02 18:18:39 |
<sami> | obj2bwt yeah | 2009-12-02 18:18:46 |
<Shelwien> | ok, wait | 2009-12-02 18:18:52 |
<sami> | the more testing is because the mtfs may be just too quick for fpaq adapt speed, so it may favour your method | 2009-12-02 18:21:12 |
<Shelwien> | 79724, 82177, 87487 | 2009-12-02 18:21:27 |
| mtf, mtf2, gmtf | 2009-12-02 18:21:32 |
<sami> | ok | 2009-12-02 18:21:47 |
<Shelwien> | gmtf has a parameter though | 2009-12-02 18:22:26 |
*** schnaader has joined the channel | 2009-12-02 18:31:01 |
<sami> | so did anybody check the new benchmark data? | 2009-12-02 18:47:22 |
<Shelwien> | yours? i did open it... and didn't see any benchmark results afair... | 2009-12-02 18:47:56 |
<sami> | the links should be at the top of the page | 2009-12-02 18:48:17 |
<Shelwien> | yeah | 2009-12-02 18:48:27 |
| i didn't get that actually ;) | 2009-12-02 18:48:44 |
| thought that links on files go to file data ;) | 2009-12-02 18:49:09 |
<sami> | ok, perhaps I can try work around something to avoid that from happening :-) | 2009-12-02 18:49:59 |
<Shelwien> | results for book1 seem kinda weird... do they include decoder size? | 2009-12-02 18:50:26 |
<sami> | the second number is without decoder | 2009-12-02 18:50:47 |
| I mean the "w/o stub" column | 2009-12-02 18:50:59 |
| so xwrt is leading in book1 if we don't take account the one megabyte dictionary | 2009-12-02 18:51:46 |
<Shelwien> | i think maybe you should add a coefficient to it or something | 2009-12-02 18:52:06 |
| because i can compile much smaller ash for sure | 2009-12-02 18:52:15 |
<sami> | unfortunately that doesn't work because some programs use sfx | 2009-12-02 18:52:45 |
| or perhaps we can just do it anyway and ignore the sfx issue like now | 2009-12-02 18:53:12 |
| anyway, the whole point is to keep the decoder small | 2009-12-02 18:53:59 |
<Shelwien> | but ppmy showing "better" results than paqs etc is just dumb | 2009-12-02 18:54:22 |
<sami> | book1,obj2,geo are too small for test files | 2009-12-02 18:54:29 |
| I'm just including them for reference | 2009-12-02 18:54:41 |
<Shelwien> | so i suggest to add a decoder size coefficient | 2009-12-02 18:54:48 |
| like if you compressed 10 such small files | 2009-12-02 18:55:02 |
*** jj has joined the channel | 2009-12-02 18:55:50 |
<schnaader> | Have you checked what this precompressed part of FlashMX.pdf actually includes? There are some big images in it, worst case could be that this is mainly testing image compression, although this wouldn't be that unusual for typical PDFs. | 2009-12-02 18:55:57 |
<Shelwien> | yeah, probably | 2009-12-02 18:56:21 |
<sami> | schnaader, no unfortunately I didn't have time to check it | 2009-12-02 18:56:30 |
<schnaader> | I think I'll have a look at it here. | 2009-12-02 18:57:37 |
<sami> | but yes, I recognize that may be possible, that's why I didn't cut ohs.doc or vcfiu.hlp because I might just be sampling something less interesting | 2009-12-02 18:57:38 |
| Shelwien, so you suggest 0.1 is a good value? | 2009-12-02 18:58:39 |
<Shelwien> | well, i think yes | 2009-12-02 18:59:06 |
<sami> | perhaps I must provide a third size which has such coef, because I cannot replace the main size column because of compressors that use sfx | 2009-12-02 18:59:34 |
| it's also drawback of the whole system that I cannot easily configure programs to run these test with no sfx | 2009-12-02 19:00:00 |
<schnaader> | Actually, first 5 MB of FlashMX.pdf seem to be rather well mixed. There's about 3 or maybe 4 MB of it that's image content, but there also is a lot of text in it. | 2009-12-02 19:03:42 |
<sami> | schnaader so we got lucky :-) | 2009-12-02 19:04:55 |
| can you see a better offset there? | 2009-12-02 19:05:06 |
<Shelwien> | yeah, but sorting by combined size makes it all weird | 2009-12-02 19:05:36 |
<sami> | so that perhaps we could sample less of the image? | 2009-12-02 19:05:36 |
| Shelwien you can sort the tables by clicking at the column | 2009-12-02 19:06:15 |
<Shelwien> | and get lots of n/a first, yeah ;) | 2009-12-02 19:06:58 |
<sami> | right, but I will fix that | 2009-12-02 19:07:23 |
<schnaader> | The part after the first big image (1,2 MB decompressed) seems fine, there's another big image block (2*~1,5 MB) later but I think that's far away from it. So you could try 5 MB with a 2 or 3 MB offset, but as images are pretty mixed up with text, I think it could not be worth the effort and you could just leave it like it is :) | 2009-12-02 19:08:53 |
<sami> | please reload the pages I forgot something, now they should look as they supposed to | 2009-12-02 19:09:23 |
<Shelwien> | could you add some visible separators between column titles too? | 2009-12-02 19:10:05 |
<sami> | schnaader, ok. thanks | 2009-12-02 19:10:07 |
| Shelwien there should be pseudoseparators there already, but not between ct & dt (and not between cm & dm) | 2009-12-02 19:12:02 |
<Shelwien> | i mean like "Size | w/o stub" instead of "Size w/o stub" | 2009-12-02 19:12:49 |
| i checked in chrome and still don't see any separators there | 2009-12-02 19:13:06 |
<sami> | ok not on those table headers. I try to figure out something | 2009-12-02 19:13:43 |
*** toffer has joined the channel | 2009-12-02 19:22:30 |
* Guest4706822 slaps toffer around a bit with a large fishbot | 2009-12-02 19:24:21 |
<schnaader> | Ouch.. no trouts here? I guess we'd better not misbehave... | 2009-12-02 19:24:52 |
<Guest4706822> | not sure toffers awake | 2009-12-02 19:25:08 |
<Shelwien> | sleepwalking? | 2009-12-02 19:25:42 |
*** Guest4706822 has left the channel | 2009-12-02 19:27:31 |
<schnaader> | That would be nice sleepwalking - "Last night I sleepwalked, logged in to IRC and coded some really nice compressors, now I've to understand what I did there, but results are really impressing" :D | 2009-12-02 19:32:53 |
<Shelwien> | its happens sometimes with me, when i have to get up and suddenly do something | 2009-12-02 19:34:48 |
| might not remember what i did later, especially if i'd return to sleeping after that ;) | 2009-12-02 19:35:06 |
<sami> | I think 0.1 is too little. 100kb dictionary becomes only 10kb | 2009-12-02 19:58:15 |
<Shelwien> | http://encode.dreamhosters.com/showthread.php?t=509 | 2009-12-02 19:58:34 |
<sami> | you state there that it's better than mtf. too early. i suggest increasing the fpaq0 adapt speed a bit for mtf | 2009-12-02 20:01:32 |
<Shelwien> | i test it not only with fpaq0 there, but also with mix_test o2 coder too | 2009-12-02 20:02:05 |
<sami> | do you have results for mtf2+mixtest vs gmtf+mixtest? | 2009-12-02 20:03:02 |
<Shelwien> | wait... | 2009-12-02 20:03:17 |
<sami> | I've never gotten around writing a tool for myself to have various simple models at hand for tests like this. many times it would be useful | 2009-12-02 20:04:16 |
<Shelwien> | mtf:224696, mtf2:221903, gmtf:221140 | 2009-12-02 20:04:23 |
| well, i always just did it in the form of toolkits somehow | 2009-12-02 20:04:54 |
| unfortunately lots of such tools I had written in asm | 2009-12-02 20:06:02 |
| and they're not quite usable these days | 2009-12-02 20:06:14 |
<schnaader> | But it's not like there's no more assembler out there ;) You could try to convert the code to fasm or some Windows assembler like masm32 (these are nice, it's possible to do WinApi calls with them), although with assembler code that's quite hard if its old and you don't know exactly what it does anymore. | 2009-12-02 20:07:53 |
<Shelwien> | unfortunately its tasm | 2009-12-02 20:08:33 |
| with very heavy use of macros and other specific things | 2009-12-02 20:08:49 |
| and then, also these are DOS-32 programs using DPMI | 2009-12-02 20:09:15 |
| they still work under XP now, in fact | 2009-12-02 20:09:27 |
| like I have a 2k old PPM implementation etc | 2009-12-02 20:09:41 |
<schnaader> | Ah OK, know these DOS memory things from old PowerBasic programs :) The days when you couldn't simply say "Give me 5 MB of memory"... better not port those monsters, yeah | 2009-12-02 20:10:46 |
<Shelwien> | they're quite cool in fact | 2009-12-02 20:11:27 |
| i'd like very much to have a preprocessor like in masm/tasm for C++ | 2009-12-02 20:11:48 |
| (though I use perl for that now) | 2009-12-02 20:11:59 |
| its multipass and worked in a style of declarative programming to a point | 2009-12-02 20:12:48 |
| like, i use a parity align macro somewhere | 2009-12-02 20:13:07 |
| it aligned functions in such a way that parity of low byte of their address was fixed | 2009-12-02 20:13:41 |
| was useful because of PF flag in x86 and JP/SETP etc stuff | 2009-12-02 20:14:04 |
| well, its kinda like what C++ templates could be, but are not ;) | 2009-12-02 20:14:42 |
<schnaader> | Hehe, well C/C++ has never been perfect, although there were some attempts to improve it, but I guess it's just kind of too popular now so everybody wants to improve different things. | 2009-12-02 20:16:02 |
<Shelwien> | here's an example: http://91.124.210.5/lng-ppm.txt | 2009-12-02 20:16:04 |
| all the unknown keywords are my macros basically | 2009-12-02 20:16:58 |
| like functions called by calls are only linked into the program when they're really called from somewhere ;) | 2009-12-02 20:17:31 |
| i'd probably program like that even now | 2009-12-02 20:18:28 |
<sami> | Shelwien, how about this: reduce from size min(decoder size, median decoder size)*0.9 | 2009-12-02 20:18:45 |
<Shelwien> | it was much more powerful comparing to C/C++, as weird as it may sound | 2009-12-02 20:18:47 |
<schnaader> | :) "dvd" is funny, guess it means "define variable data" | 2009-12-02 20:19:21 |
<Shelwien> | yeah ;) | 2009-12-02 20:19:25 |
<sami> | also we could do that for sfx programs as well, to approximate the decoder size | 2009-12-02 20:19:27 |
<Shelwien> | not that i can say anything now | 2009-12-02 20:19:47 |
| we only would be able to decide after looking at resulting order i think ;) | 2009-12-02 20:20:13 |
| and as i see it, XWRT winning at book1 is wrong, but PPMY winning is wrong too | 2009-12-02 20:21:13 |
<sami> | well, again book1 is too small for comparison 2 or more compressors as I explain in the text | 2009-12-02 20:22:23 |
| ppmy still wins :-) http://compressionratings.com/sort.cgi?s_book1a.full.html+5+n | 2009-12-02 20:26:26 |
| this is x-min(stub/10,100000) | 2009-12-02 20:27:10 |
| no | 2009-12-02 20:27:24 |
| error | 2009-12-02 20:27:26 |
| this is correct I hope http://compressionratings.com/sort.cgi?s_book1c.full.html+5+n | 2009-12-02 20:29:16 |
*** toffer has left the channel | 2009-12-02 20:30:10 |
*** toffer has joined the channel | 2009-12-02 20:43:40 |
*** toffer has left the channel | 2009-12-02 20:44:43 |
| probably I need to make additional configs for sfx compressors for this test. I try to do that next weekend | 2009-12-02 21:01:21 |
*** sami has left the channel | 2009-12-02 21:01:45 |
*** schnaader has left the channel | 2009-12-02 21:07:00 |
*** chornobl has left the channel | 2009-12-02 21:09:52 |
*** Shelwien has left the channel | 2009-12-02 21:14:49 |
*** Guest9968193 has joined the channel | 2009-12-02 21:14:53 |
*** STalKer-Y has left the channel | 2009-12-02 21:22:29 |
*** STalKer-X has joined the channel | 2009-12-02 21:23:49 |
*** STalKer-X has left the channel | 2009-12-02 21:46:11 |
*** STalKer-X has joined the channel | 2009-12-02 21:56:39 |
*** STalKer-X has left the channel | 2009-12-02 21:56:43 |
*** toffer has joined the channel | 2009-12-02 22:16:22 |
*** toffer has left the channel | 2009-12-02 23:47:32 |
*** Krugz has joined the channel | 2009-12-03 00:58:43 |
*** bobzilla has joined the channel | 2009-12-03 05:48:07 |
*** pinc has joined the channel | 2009-12-03 06:27:06 |
*** pinc has left the channel | 2009-12-03 06:29:17 |
*** pinc has joined the channel | 2009-12-03 06:40:22 |
*** pinc has left the channel | 2009-12-03 06:41:34 |
*** bobzilla has left the channel | 2009-12-03 06:55:42 |
*** pinc has joined the channel | 2009-12-03 07:48:45 |
*** STalKer-X has joined the channel | 2009-12-03 10:23:26 |
<Shelwien> | ... | 2009-12-03 11:06:53 |
<STalKer-X> | o_o | 2009-12-03 11:19:41 |
* Shelwien goes to bring in another bot | 2009-12-03 11:21:38 |
*** compbooks has joined the channel | 2009-12-03 11:27:44 |
<Shelwien> | !list | 2009-12-03 11:28:48 |
<Krugz> | compbooks? | 2009-12-03 11:51:43 |
<Shelwien> | only DCC articles for now | 2009-12-03 11:52:05 |
<Krugz> | what do you plan to do with it? load it up with computer-related books? | 2009-12-03 11:52:31 |
<Shelwien> | more like compression-related ;) | 2009-12-03 11:52:42 |
<Krugz> | ahh ok | 2009-12-03 11:52:46 |
| sounds good | 2009-12-03 11:52:49 |
| I've been way too busy lately, but in a little while I'll have time to sit down and learn enough to be helpful, or at least interesting, around here | 2009-12-03 11:53:27 |
* Krugz hasn't slept yet, has class in an hour | 2009-12-03 11:53:43 |
<Shelwien> | i don't think you really have to learn anything | 2009-12-03 11:54:27 |
<Krugz> | ? | 2009-12-03 11:54:34 |
<Shelwien> | i'm willing to talk about quite a lot of different things ;) | 2009-12-03 11:54:42 |
<Krugz> | ya but I'm interested in data compression, not extremely or anything but enough that I'd be willing to sit down and learn more | 2009-12-03 11:55:10 |
| just don't have the time recently, plus not exactly sure where to get started | 2009-12-03 11:55:22 |
<Shelwien> | statistics probably | 2009-12-03 11:55:45 |
<Krugz> | really? hmm | 2009-12-03 11:55:55 |
<Shelwien> | not whole course maybe | 2009-12-03 11:56:22 |
<Krugz> | alright well I'll look into it when I get some time | 2009-12-03 11:57:18 |
| I have a bit more work to finish off, and then I have to study for finals | 2009-12-03 11:57:35 |
<Shelwien> | but things like this: http://en.wikipedia.org/wiki/Maximum_likelihood#Examples | 2009-12-03 11:57:42 |
<Krugz> | but after that, I'm clear to learn whatever I feel like for a long while | 2009-12-03 11:57:46 |
<Shelwien> | well, i'm not going anywhere as far as i can see | 2009-12-03 11:58:23 |
<Krugz> | ah don't worry about, I'm not going to drag you around to help me learn stuff :P | 2009-12-03 11:59:01 |
| if you suggest where to start and stuff, that should be good :O | 2009-12-03 11:59:11 |
<Shelwien> | well, there's kinda no publications on real compression algorithms | 2009-12-03 11:59:53 |
| so i'd have to help one way or another | 2009-12-03 12:00:10 |
<Krugz> | ya but I'm far from doing anything with an actual application | 2009-12-03 12:00:14 |
<Shelwien> | "actual application" is something somewhat unrelated too, in fact ;) | 2009-12-03 12:00:58 |
<Krugz> | I looked up BWT just a while ago, I understand the basic idea but there's definitely stuff I need to know before I really look into it | 2009-12-03 12:01:07 |
| ah not "actual application", I meant like, I'm far from being able to understand how a compression algorithm would work | 2009-12-03 12:01:44 |
<Shelwien> | most people only use zip despite availability of compressors with much better performance | 2009-12-03 12:01:58 |
<Krugz> | I use rar mostly | 2009-12-03 12:02:08 |
<Shelwien> | same rar | 2009-12-03 12:02:12 |
| but rar is no better than zip really | 2009-12-03 12:02:19 |
<Krugz> | I don't really need anything compressed much, I'm sloppy with my data | 2009-12-03 12:02:26 |
<Shelwien> | same here | 2009-12-03 12:02:41 |
<Krugz> | I just use it for packaging things to be sent around in one piece | 2009-12-03 12:02:46 |
<Shelwien> | but as i said, there're lots of application for statistical models | 2009-12-03 12:02:54 |
| and the best way to evaluate such models is by compression | 2009-12-03 12:03:09 |
<Krugz> | hmm | 2009-12-03 12:03:31 |
| so using compression as a tool to test models? | 2009-12-03 12:03:41 |
<Shelwien> | for example, i'm thinking about making a talking bot here | 2009-12-03 12:03:47 |
| which would generate text using a statistical model, by channel log data | 2009-12-03 12:04:05 |
| for me, yes | 2009-12-03 12:04:22 |
<Krugz> | ah ok | 2009-12-03 12:04:45 |
| I see what you're saying, I think | 2009-12-03 12:04:52 |
| jeez.. I'm getting really tired all at once | 2009-12-03 12:05:13 |
<Shelwien> | ;) | 2009-12-03 12:05:19 |
<Krugz> | not because of you, lol | 2009-12-03 12:05:23 |
| just lack of sleep | 2009-12-03 12:05:27 |
| hitting me just now | 2009-12-03 12:05:33 |
<Shelwien> | i just got up not long ago ;) | 2009-12-03 12:06:02 |
<Krugz> | I woke up about 24 hours ago, now | 2009-12-03 12:06:27 |
| I found an interesting e-book | 2009-12-03 12:07:00 |
| it's a puzzle book, I find those interesting from time to time | 2009-12-03 12:07:13 |
| this one was pretty well written, the puzzles are definitely entertaining | 2009-12-03 12:07:32 |
| ok I have to go shower, and maybe get something to eat | 2009-12-03 12:08:53 |
| I don't think I'll be back today/tonight | 2009-12-03 12:09:24 |
| bye bye | 2009-12-03 12:09:28 |
*** Krugz has left the channel | 2009-12-03 12:09:35 |
*** Shelwien has left the channel | 2009-12-03 12:09:50 |
*** Shelwien has joined the channel | 2009-12-03 12:09:55 |
<Shelwien> | !next | 2009-12-03 12:10:06 |