*** complogger has joined the channel | 2009-10-05 21:03:23 |
*** compbooks has joined the channel | 2009-10-05 21:04:00 |
*** toffer has joined the channel | 2009-10-05 21:30:42 |
<toffer> | hi! just wanted to say have a look at your ftp | 2009-10-05 21:30:48 |
<Shelwien> | hi | 2009-10-05 21:30:54 |
| dcc.7z | 2009-10-05 21:31:09 |
| 7M only | 2009-10-05 21:31:18 |
| now 10 | 2009-10-05 21:31:33 |
<toffer> | it's 78mb | 2009-10-05 21:31:38 |
| unstable connection | 2009-10-05 21:31:50 |
| started with 700kb/s | 2009-10-05 21:31:54 |
| now it's 100 | 2009-10-05 21:31:57 |
<Shelwien> | btw, i can enable a shell for you there, if you want ;) | 2009-10-05 21:33:38 |
<toffer> | just too late | 2009-10-05 21:35:39 |
| via ssh? | 2009-10-05 21:35:41 |
<Shelwien> | yeah | 2009-10-05 21:35:44 |
<toffer> | it was a proxy problem all the time | 2009-10-05 21:35:56 |
<Shelwien> | suspected that | 2009-10-05 21:36:09 |
| but i didn't mean for upload | 2009-10-05 21:36:13 |
| there're compilers and stuff | 2009-10-05 21:36:23 |
| so you'd be able to test m1 there or whatever ;) | 2009-10-05 21:36:39 |
| not really a lot of computing resources though | 2009-10-05 21:37:00 |
<toffer> | guess it's not much of use | 2009-10-05 21:37:13 |
| but having a good remote machine would be great | 2009-10-05 21:37:22 |
| gonna eat and have a beer now | 2009-10-05 21:37:42 |
<Shelwien> | %) | 2009-10-05 21:37:47 |
*** pinc has left the channel | 2009-10-05 22:19:00 |
<toffer> | gn8 | 2009-10-05 23:11:32 |
*** toffer has left the channel | 2009-10-05 23:11:35 |
*** pinc has joined the channel | 2009-10-06 06:31:19 |
*** Shelwien has left the channel | 2009-10-06 08:47:59 |
*** toffer has joined the channel | 2009-10-06 09:46:51 |
| hi | 2009-10-06 10:06:09 |
| somehow 9 bit precision is enough for stretch ^^ | 2009-10-06 11:02:44 |
*** Shelwien has joined the channel | 2009-10-06 14:04:25 |
<Shelwien> | http://toffer.dreamhosters.com/ ;) | 2009-10-06 14:05:28 |
<toffer> | Encoding...done. 22257252/100000000 bytes, 48.00 s that's for 32mb and approx. orders 1,2,4,6 | 2009-10-06 14:12:30 |
<Shelwien> | with 4 models? | 2009-10-06 14:13:03 |
<toffer> | yep | 2009-10-06 14:13:22 |
| i can get 0.8% improvement by raising memory | 2009-10-06 14:13:31 |
| but currently it's 4 ordinary models | 2009-10-06 14:13:55 |
| as i said i wanted to replace one of these with a match model | 2009-10-06 14:14:05 |
<Shelwien> | ccmx/bwt are around 20.8M... | 2009-10-06 14:15:22 |
<toffer> | that's not a fair comparision | 2009-10-06 14:15:37 |
| due to the match model | 2009-10-06 14:15:40 |
| for comparision | 2009-10-06 14:15:42 |
| lpaq1 with order 1246, 100000000 -> 22359789 in 114.21 sec. using 51 MB memory | 2009-10-06 14:15:55 |
| as you see the results are together closely | 2009-10-06 14:16:10 |
<Shelwien> | well, i didn't say that its a bad results ;) | 2009-10-06 14:16:19 |
<toffer> | and lpaq1 with m246 gets ccm like compression | 2009-10-06 14:16:23 |
| it's 20.8xxx.xxx ... | 2009-10-06 14:16:50 |
| ccmx like compression | 2009-10-06 14:16:55 |
<Shelwien> | just that imho its necessary to beat at least BWT | 2009-10-06 14:17:06 |
| and accidentally, ccmx has similar results ;) | 2009-10-06 14:17:35 |
<toffer> | as i said replacing a model with a match modell will give the performance of ccmx | 2009-10-06 14:17:57 |
| at higher speeds :) | 2009-10-06 14:18:07 |
| at least ignoring filterrs | 2009-10-06 14:18:11 |
| filters | 2009-10-06 14:18:16 |
<Shelwien> | well, ccm filters don't apply to enwiks | 2009-10-06 14:18:27 |
<toffer> | not exactly | 2009-10-06 14:18:34 |
| it got some text preprocessing | 2009-10-06 14:18:43 |
<Shelwien> | well, plain ccm surely doesn't | 2009-10-06 14:19:00 |
<toffer> | it quantises an order1 context | 2009-10-06 14:19:22 |
<Shelwien> | !grep skymmer. | 2009-10-06 14:19:26 |
<toffer> | based on c>='a' && ... | 2009-10-06 14:19:28 |
<Shelwien> | damn | 2009-10-06 14:19:38 |
| !grep skymmer.narod | 2009-10-06 14:20:02 |
| ccm 5 = 22 003 958 | 2009-10-06 14:20:29 |
| ccmx 5 = 21 013 793 | 2009-10-06 14:20:34 |
| ccm_sh1d9e 5 = 22 004 883 | 2009-10-06 14:20:39 |
<toffer> | 5 is how much memory? | 2009-10-06 14:20:49 |
| as i said it's just 32mb for me | 2009-10-06 14:21:01 |
<Shelwien> | 550M | 2009-10-06 14:21:05 |
<toffer> | ^^ | 2009-10-06 14:21:08 |
| Allocated 262535 kB. | 2009-10-06 14:21:32 |
| Encoding...done. 21451718/100000000 bytes, 47.49 s | 2009-10-06 14:21:34 |
<Shelwien> | i just wanted to show that plain ccm doesn't have a text filter | 2009-10-06 14:21:56 |
<toffer> | it has | 2009-10-06 14:22:09 |
| look at the source | 2009-10-06 14:22:12 |
| it's not a filter | 2009-10-06 14:22:31 |
<Shelwien> | ah, that... | 2009-10-06 14:22:37 |
<toffer> | it just uses some text specific contexts | 2009-10-06 14:22:38 |
<Shelwien> | well, you would use some too | 2009-10-06 14:22:50 |
| if you optimized the masks etc ;) | 2009-10-06 14:22:58 |
| and 21.4 is certainly more impressive | 2009-10-06 14:23:31 |
<toffer> | it's just the increase in memory | 2009-10-06 14:23:59 |
| i'm testing 2gb now | 2009-10-06 14:24:06 |
| not much improvement | 2009-10-06 14:24:14 |
| Allocated 2097543 kB. | 2009-10-06 14:24:26 |
| Encoding...done. 21371640/100000000 bytes, 52.57 s | 2009-10-06 14:24:28 |
| i guess prior to the match model varaint | 2009-10-06 14:24:36 |
| i'd release a plain 4 model variant | 2009-10-06 14:24:45 |
<Shelwien> | yeah, its ok | 2009-10-06 14:25:07 |
<toffer> | the little effect of memory increase from 256mb to 2gb is due to nibble caching | 2009-10-06 14:25:21 |
<Shelwien> | though how do you scale memory use for different models? | 2009-10-06 14:25:25 |
<toffer> | there're two hash tables | 2009-10-06 14:25:34 |
| w8 | 2009-10-06 14:25:36 |
| two hash tables for high and low nibbles | 2009-10-06 14:34:45 |
| a larger collision domain helps | 2009-10-06 14:34:59 |
| and i need to separate these due to nibble caching | 2009-10-06 14:35:10 |
| (which successfully removes 26% of cache misses) | 2009-10-06 14:35:27 |
<Shelwien> | well, what i meant | 2009-10-06 14:35:42 |
| is that it might be better to specify the hashtable size separately | 2009-10-06 14:35:57 |
<toffer> | ? | 2009-10-06 14:36:31 |
<Shelwien> | if two 16M hashtables work ok | 2009-10-06 14:36:45 |
| that doesn't mean that 32M+32M would be better than 32M+16M | 2009-10-06 14:37:00 |
| well, relative scales | 2009-10-06 14:37:15 |
| also, it would be probably good to optimize the parameters separately for different memory settings | 2009-10-06 14:40:19 |
| afair, m1 is now able to load some parameter profiles in runtime? | 2009-10-06 14:40:39 |
<toffer> | es | 2009-10-06 14:56:09 |
| yes | 2009-10-06 14:56:13 |
| as to has table division | 2009-10-06 15:01:47 |
| i already made some experiment | 2009-10-06 15:01:53 |
| increasing the high nibble's collision domain size helps, since it stores the nibble cache. it boosts both, compression and speed | 2009-10-06 15:02:19 |
* Shelwien tried to use precomp to find the pdf titles, but failed | 2009-10-06 15:44:29 |
*** pinc has left the channel | 2009-10-06 15:47:47 |
| afair the file names should be in lexical order (compared to the index pdf) when you look at | 2009-10-06 15:47:48 |
<Shelwien> | you didn't include the indexes ;) | 2009-10-06 15:49:06 |
<toffer> | i did | 2009-10-06 15:49:20 |
| you need to look at the first few pdfs | 2009-10-06 15:49:34 |
| one of these is a toc sheet | 2009-10-06 15:49:47 |
<Shelwien> | %) | 2009-10-06 15:49:54 |
| found it, but i think getting the indexes off ieee would be more convenient | 2009-10-06 15:52:12 |
<toffer> | there's no ieee index list or something like that | 2009-10-06 15:53:07 |
<Shelwien> | there is | 2009-10-06 15:53:14 |
<toffer> | since all articles just name to ieeexplore.pdf | 2009-10-06 15:53:15 |
<Shelwien> | http://www.ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=30443&isYear=2005 | 2009-10-06 15:53:18 |
<toffer> | well, yes | 2009-10-06 15:53:40 |
| i thought you meant something like a file to download | 2009-10-06 15:53:52 |
| but i found nothing like that | 2009-10-06 15:54:01 |
<Shelwien> | well, i'd just make an index out of this, i guess | 2009-10-06 15:54:07 |
<toffer> | yeah the arnumbers map to pdf names | 2009-10-06 15:54:18 |
| btw with more models i was able to lower counter precision w/o any great compression loss | 2009-10-06 15:59:16 |
<Shelwien> | you mean output probability? | 2009-10-06 15:59:54 |
<toffer> | counters | 2009-10-06 16:00:04 |
<Shelwien> | i thought that you only use bytewise fsm? | 2009-10-06 16:00:08 |
<toffer> | you mean 256 states | 2009-10-06 16:00:24 |
| ? | 2009-10-06 16:00:52 |
<Shelwien> | yes | 2009-10-06 16:01:11 |
<toffer> | yeah i do | 2009-10-06 16:01:24 |
| i meant the counters within the sse maps | 2009-10-06 16:01:30 |
<Shelwien> | ah | 2009-10-06 16:02:57 |
<toffer> | in 0.4 i had 20 bit counters with notable compression improvement | 2009-10-06 16:03:15 |
| but due to stretch/squash mappings (which are 9/12 bits) 16 bit is sufficient | 2009-10-06 16:03:46 |
<Shelwien> | yeah... also counters are quantized anyway | 2009-10-06 16:04:28 |
<toffer> | yeah | 2009-10-06 16:13:13 |
| that's why i guess | 2009-10-06 16:13:16 |
| but i found a mapping of 9->15 bits to be optimal | 2009-10-06 16:13:29 |
| (stretch) | 2009-10-06 16:13:33 |
| and again i removed alot of parameters | 2009-10-06 16:13:47 |
| (which resulted of an extension to 4 models) | 2009-10-06 16:14:07 |
<Shelwien> | well, for me, optimization is a cheap resource, i guess | 2009-10-06 16:15:12 |
| for example, i didn't run any in more than a month ;) | 2009-10-06 16:15:51 |
| and i don't ever switch off the q9450 ;) | 2009-10-06 16:16:11 |
<toffer> | if i knew that i'd ask you to optimize | 2009-10-06 16:18:10 |
| some m1 stuff ^^ | 2009-10-06 16:18:14 |
| a ssh login on a 64 bit linux would be nice | 2009-10-06 16:18:41 |
<Shelwien> | only 32-bit, sorry ;) | 2009-10-06 16:19:18 |
<toffer> | it'd still work | 2009-10-06 16:19:38 |
<Shelwien> | you can try logging to that dreamhost server now | 2009-10-06 16:24:58 |
| same login/pass, same server, but ssh | 2009-10-06 16:25:10 |
<toffer> | thanks | 2009-10-06 16:25:51 |
| but i'd not be useful for optimization i guess | 2009-10-06 16:26:06 |
<Shelwien> | yeah | 2009-10-06 16:26:12 |
<toffer> | but i can use i for file uploads | 2009-10-06 16:26:13 |
<Shelwien> | yeah, as i said, http://toffer.dreamhosters.com now works | 2009-10-06 16:26:40 |
| and it supports scripts and stuff there btw | 2009-10-06 16:26:59 |
*** pinc has joined the channel | 2009-10-06 16:27:06 |
<toffer> | i thought the name is already used | 2009-10-06 16:27:07 |
<Shelwien> | as a username | 2009-10-06 16:27:23 |
| xx.dreamhosters.com are not automatic | 2009-10-06 16:27:39 |
<toffer> | thanks for the login | 2009-10-06 16:30:01 |
<Shelwien> | ;) | 2009-10-06 16:30:14 |
<toffer> | i increased the nibble cache w/o changing memory requirements drastically. now it can directly be compared to lpaq | 2009-10-06 16:38:09 |
| lpaq 1246 100000000 -> 22359789 in 112.95 sec. using 51 MB memory. | 2009-10-06 16:38:21 |
| Allocated 49543 kB. | 2009-10-06 16:38:43 |
| Encoding...done. 21956964/100000000 bytes, 46.87 s | 2009-10-06 16:38:44 |
| 0.4% better compression while beeing 2.4 times as fast | 2009-10-06 16:39:52 |
| and better than a comparable ccm without a match model | 2009-10-06 16:40:15 |
| i mean without having a match model | 2009-10-06 16:40:25 |
<Shelwien> | what about book1 vs ppmd? ;) | 2009-10-06 16:43:12 |
<toffer> | didn't try | 2009-10-06 16:45:13 |
| if you can give me some numbers i can compare it | 2009-10-06 16:45:20 |
| but still a match model provies much better performance | 2009-10-06 16:45:43 |
<Shelwien> | well, ppmd doesn't have a match model either ;) | 2009-10-06 16:46:11 |
| as to numbers... | 2009-10-06 16:46:50 |
<toffer> | well but it can increase its coding order which would have a similar effect | 2009-10-06 16:46:50 |
<Shelwien> | http://compression.ru/ds/ppmdj.rar | 2009-10-06 16:46:58 |
| i think you should be able to compile it even if you're on linux | 2009-10-06 16:47:19 |
| as to similar effect - not really | 2009-10-06 16:47:38 |
| because then it'd have to flush the tree more frequently | 2009-10-06 16:47:57 |
| which would be slower or/and hurt compression | 2009-10-06 16:48:14 |
| but with a small file, like book1 | 2009-10-06 16:48:38 |
| i think ppmd might be a good competition | 2009-10-06 16:48:54 |
<toffer> | well of course it has a similar effect | 2009-10-06 16:50:20 |
| higher orders indicate a greater prediction confidence | 2009-10-06 16:51:00 |
<Shelwien> | it _can_ have, but doesn't in practice | 2009-10-06 16:51:05 |
| ppmd would benefit from using a match model the same as m1 | 2009-10-06 16:51:26 |
<toffer> | having fixed models is a great difference | 2009-10-06 16:51:50 |
<Shelwien> | dunno | 2009-10-06 16:52:07 |
| what's interesting here is that ppmd is bytewise | 2009-10-06 16:52:18 |
| and it still would be slower that m1x2 maybe | 2009-10-06 16:52:35 |
| but it should have better compression | 2009-10-06 16:52:51 |
<toffer> | it doesn' work under linux | 2009-10-06 16:54:31 |
| doesn't | 2009-10-06 16:54:34 |
| it includes windows.h | 2009-10-06 16:54:38 |
<Shelwien> | did you use a makefile? | 2009-10-06 16:55:50 |
<toffer> | yeah | 2009-10-06 16:57:41 |
<Shelwien> | which one? ;) | 2009-10-06 16:57:52 |
| i'd suggest makefile.gmk for gcc ;) | 2009-10-06 16:58:07 |
<toffer> | i needed to modify the source | 2009-10-06 17:00:51 |
| some defines | 2009-10-06 17:00:52 |
| could you suggest any options? | 2009-10-06 17:01:09 |
<Shelwien> | default? | 2009-10-06 17:01:52 |
| well, also -o12 -m50 -r1 maybe | 2009-10-06 17:02:24 |
| btw, i made these indexes for dcc, do you need them? | 2009-10-06 17:05:05 |
<toffer> | you can upload these | 2009-10-06 17:05:58 |
<Shelwien> | done | 2009-10-06 17:06:47 |
<toffer> | Fast PPMII compressor for textual data, variant J, Oct 6 2009 | 2009-10-06 17:10:24 |
| book1: 768771 > 209823, 2.18 bpb, used: 5.4MB, speed: 5024 KB/sec | 2009-10-06 17:10:25 |
| Allocated 49543 kB. | 2009-10-06 17:10:45 |
| Encoding...done. 213728/768771 bytes, 0.36 s | 2009-10-06 17:10:46 |
| model initialisation takes quite a while | 2009-10-06 17:10:54 |
| i tuned ppmds order | 2009-10-06 17:11:02 |
| 6 seems to be optimal | 2009-10-06 17:11:13 |
<Shelwien> | for book1 maybe | 2009-10-06 17:11:38 |
| btw | 2009-10-06 17:12:37 |
| can you also download other dcc years? ;) | 2009-10-06 17:12:51 |
<toffer> | i thought you got these? | 2009-10-06 17:13:03 |
<Shelwien> | i'd build the similar indexes then | 2009-10-06 17:13:06 |
| and yeah, i have them | 2009-10-06 17:13:15 |
| but they're messy | 2009-10-06 17:13:24 |
| and i wonder about pdf quality | 2009-10-06 17:13:31 |
| anyway, they're not from ieee xplore | 2009-10-06 17:14:18 |
| so i'd prefer have it all in a consistent form ;) | 2009-10-06 17:14:41 |
<toffer> | i'll do that | 2009-10-06 17:15:02 |
| but not today | 2009-10-06 17:15:04 |
<Shelwien> | sure | 2009-10-06 17:15:08 |
| btw i just found that they've got a prize for an article ;) | 2009-10-06 17:16:29 |
| i'm not a student unfortunately ;) | 2009-10-06 17:16:37 |
<toffer> | retuned to book1 i get 2k improvement | 2009-10-06 17:18:37 |
| leaving contexts untouched | 2009-10-06 17:18:52 |
<Shelwien> | so 211? | 2009-10-06 17:19:00 |
<toffer> | 190 f* = 211732.000000 | 2009-10-06 17:19:01 |
<Shelwien> | similar to bwt with 4 models, i guess... | 2009-10-06 17:19:28 |
| but hopefully faster | 2009-10-06 17:19:38 |
<toffer> | a match model will do better as i said | 2009-10-06 17:19:50 |
<Shelwien> | not for book1 | 2009-10-06 17:19:58 |
<toffer> | and complete book1 tuning too | 2009-10-06 17:19:58 |
| sure? | 2009-10-06 17:20:06 |
<Shelwien> | there're not much matches | 2009-10-06 17:20:22 |
<toffer> | did you ever couple a match model as i lpaq | 2009-10-06 17:20:23 |
<Shelwien> | order6 for ppmd is a proof of that | 2009-10-06 17:20:28 |
<toffer> | it's not about matches, it's about the correct determinition of context order | 2009-10-06 17:20:38 |
<Shelwien> | we can test my new LZ i guess... | 2009-10-06 17:20:50 |
<toffer> | you mean that fma? | 2009-10-06 17:21:07 |
<Shelwien> | yeah | 2009-10-06 17:21:10 |
| i'd rewritten it yesterday | 2009-10-06 17:21:19 |
| and now it actually transforms and detransforms stuff | 2009-10-06 17:21:41 |
| only -9M off enwik9 though | 2009-10-06 17:22:02 |
<toffer> | so can you use it like rep now? | 2009-10-06 17:22:02 |
<Shelwien> | kinda | 2009-10-06 17:22:10 |
| it outputs two files though | 2009-10-06 17:22:18 |
| wonder if gcc compiles it... | 2009-10-06 17:23:52 |
<toffer> | not if you casted pointer to ints as you usually do | 2009-10-06 17:24:10 |
| ^^ | 2009-10-06 17:24:12 |
<Shelwien> | no, i've been more careful this time... in a way | 2009-10-06 17:24:35 |
| it compiles apparently | 2009-10-06 17:24:39 |
| got that? | 2009-10-06 17:27:01 |
| usage is | 2009-10-06 17:27:40 |
| fma-hash source literal_file structure_file | 2009-10-06 17:28:04 |
| fma-dec literal_file structure_file output | 2009-10-06 17:28:13 |
<toffer> | i didn't get anythign | 2009-10-06 17:29:08 |
| did you upload it? | 2009-10-06 17:29:19 |
| i need the sources | 2009-10-06 17:29:29 |
<Shelwien> | err... PM'ed | 2009-10-06 17:29:30 |
<toffer> | pm? | 2009-10-06 17:29:42 |
<Shelwien> | http://shelwien.googlepages.com/fma_06.rar | 2009-10-06 17:30:15 |
<toffer> | Allocated 49543 kB. | 2009-10-06 17:33:08 |
| Encoding...done. 21956964/100000000 bytes, 46.87 s -> 2133 kb/s | 2009-10-06 17:33:10 |
| cm@e051:~/shared/testset/enwik$ ../../projects/temp/PPMd e -m50 -o10 -r1 enwik8 -f/dev/null | 2009-10-06 17:33:11 |
| Fast PPMII compressor for textual data, variant J, Oct 6 2009 | 2009-10-06 17:33:13 |
| enwik8:100000000 >22632473, 1.38 bpb, used: 39.0MB, speed: 1593 KB/sec | 2009-10-06 17:33:14 |
| cm@e051:~/shared/testset/enwik$ ../../projects/temp/PPMd e -m50 -o12 -r1 enwik8 -f/dev/null | 2009-10-06 17:33:16 |
| Fast PPMII compressor for textual data, variant J, Oct 6 2009 | 2009-10-06 17:33:17 |
| enwik8.pmd already exists, overwrite?: <Y>es, <N>o, <A>ll, <Q>uit?y | 2009-10-06 17:33:19 |
| enwik8:100000000 >22719967, 1.39 bpb, used: 49.7MB, speed: 1413 KB/sec | 2009-10-06 17:33:20 |
<Shelwien> | what? | 2009-10-06 17:33:33 |
<toffer> | it's faster than ppmd | 2009-10-06 17:33:53 |
| with better compression | 2009-10-06 17:34:02 |
| for enwik | 2009-10-06 17:34:09 |
| i'm retrying order 8 | 2009-10-06 17:34:19 |
| dunnot know what'd be bester for e8 | 2009-10-06 17:34:26 |
<Shelwien> | well, for enwik there're memory troubles | 2009-10-06 17:34:43 |
| the thing which Shkarin does to free some memory is just too weird to do any good | 2009-10-06 17:35:06 |
<toffer> | dunnot know | 2009-10-06 17:35:58 |
| i got collision | 2009-10-06 17:36:01 |
| collisions | 2009-10-06 17:36:06 |
| but testing with limited memory should be fair | 2009-10-06 17:36:25 |
<Shelwien> | its still better than flushing all the stats after each few MBs | 2009-10-06 17:37:00 |
<toffer> | the best result i got for ppmd is order8 | 2009-10-06 17:37:03 |
| enwik8:100000000 >22524820, 1.37 bpb, used: 35.8MB, speed: 2106 KB/sec | 2009-10-06 17:37:05 |
| it runs at about the same speed here | 2009-10-06 17:37:13 |
| i used r1 | 2009-10-06 17:37:23 |
| which doesn't flush the model | 2009-10-06 17:37:33 |
<Shelwien> | it does | 2009-10-06 17:37:38 |
| unfortunately | 2009-10-06 17:37:42 |
<toffer> | there's not any better option built in | 2009-10-06 17:37:44 |
<Shelwien> | well, sure | 2009-10-06 17:37:49 |
<toffer> | -rN - set method of model restoration at memory insufficiency: | 2009-10-06 17:38:04 |
| -r0 - restart model from scratch (default) | 2009-10-06 17:38:05 |
| -r1 - cut off model (slow) | 2009-10-06 17:38:07 |
<Shelwien> | i know ;) | 2009-10-06 17:40:14 |
<toffer> | but that's not freeing the model | 2009-10-06 17:40:45 |
| it rebuilds it | 2009-10-06 17:40:49 |
<Shelwien> | it cuts it down until 75% memory is free | 2009-10-06 17:41:07 |
| and that's slow and hurts compression | 2009-10-06 17:41:19 |
<toffer> | still the memory constrain 50 mb | 2009-10-06 17:41:32 |
| is | 2009-10-06 17:41:39 |
<Shelwien> | well, memory issues are more complex to handle in ppmd | 2009-10-06 17:42:29 |
| but it doesn't mean that its fair to compare it knowing that | 2009-10-06 17:42:42 |
<toffer> | dunnot know | 2009-10-06 17:43:04 |
| i can increase the memory | 2009-10-06 17:43:08 |
<Shelwien> | yeah, the overflow handling has to be improved | 2009-10-06 17:43:12 |
| but atm we know that ppmd is bad at that | 2009-10-06 17:43:22 |
<toffer> | on the other hand i could say that it's unfair to compare it without having a match model | 2009-10-06 17:43:29 |
<Shelwien> | so whats the sense to hang on that? | 2009-10-06 17:43:35 |
<toffer> | nothing | 2009-10-06 17:43:40 |
| but it's a good ppm implementation | 2009-10-06 17:43:51 |
<Shelwien> | as i said... ppmd practically doesn't have a match model | 2009-10-06 17:43:52 |
<toffer> | it's not about coding matches separately | 2009-10-06 17:44:05 |
| it's just about knowing the coding order at each step | 2009-10-06 17:44:12 |
| which has a significant influence for fixed models | 2009-10-06 17:44:25 |
<Shelwien> | no, i mean that PPM in theory includes the features of a match model | 2009-10-06 17:44:47 |
| but in practice ppmd overflow handling cuts that down | 2009-10-06 17:45:06 |
<toffer> | still it can access higher order statisticcs | 2009-10-06 17:45:25 |
| but the overflow handling isn't the best. i agree here | 2009-10-06 17:45:45 |
<Shelwien> | well, you can set it to -o8 or whatever you want | 2009-10-06 17:45:45 |
| anyway, you can compare it with 2G of memory ;) | 2009-10-06 17:46:07 |
<toffer> | the best configuration i found was order 8 | 2009-10-06 17:46:12 |
| that's what i call unfair ^^ | 2009-10-06 17:46:26 |
| but a good thing would be to test on a file where no overflows happen | 2009-10-06 17:46:42 |
| and than i'd limit it to order 6 | 2009-10-06 17:47:40 |
<Shelwien> | that's kinda what i meant | 2009-10-06 17:48:12 |
<toffer> | i tried to compile fma | 2009-10-06 17:49:49 |
| without success | 2009-10-06 17:49:52 |
| but i'll stop for today | 2009-10-06 17:49:59 |
| since i need to do something for my thesis, too | 2009-10-06 17:50:06 |
| -.- | 2009-10-06 17:50:08 |
| just wasted too much time | 2009-10-06 17:50:13 |
| cm@e051:/mnt/shared_extern/projects/temp/fma/fma_06$ gcc -O3 -o fma-hash fma-hash.cpp -lstdc++ | 2009-10-06 17:50:26 |
| In file included from fma-hash.cpp:43: | 2009-10-06 17:50:28 |
| hashbuf.inc:29: error: expected constructor, destructor, or type conversion before ‘(’ token | 2009-10-06 17:50:29 |
| hashbuf.inc: In function ‘void HashBufInit()’: | 2009-10-06 17:50:31 |
| hashbuf.inc:36: error: ‘hridx’ was not declared in this scope | 2009-10-06 17:50:32 |
| hashbuf.inc: In function ‘uint HashFind()’: | 2009-10-06 17:50:34 |
| hashbuf.inc:47: error: ‘hridx’ was not declared in this scope | 2009-10-06 17:50:35 |
| hashbuf.inc: In function ‘void HashIndex()’: | 2009-10-06 17:50:37 |
<Shelwien> | well, i compiled it on DH | 2009-10-06 17:50:38 |
<toffer> | hashbuf.inc:68: error: ‘hridx’ was not declared in this scope | 2009-10-06 17:50:38 |
| cm@e051:/mnt/shared_extern/projects/temp/fma/fma_06$ | 2009-10-06 17:50:40 |
<Shelwien> | there's a __declspec accidentally | 2009-10-06 17:50:59 |
| but now it segfaults on any file | 2009-10-06 17:51:08 |
| well, i'd check what happens later, food calls ;) | 2009-10-06 17:51:38 |
<toffer> | enjoy | 2009-10-06 17:54:47 |
<Shelwien> | back... | 2009-10-06 18:12:46 |
| damn, why gcc always has to be so annoying... | 2009-10-06 18:27:59 |
| apparently, it doesn't allow negative array indexes | 2009-10-06 18:28:29 |
| now, why? | 2009-10-06 18:28:31 |
| ok, now it works | 2009-10-06 18:30:18 |
| toffer? | 2009-10-06 18:30:47 |
<toffer> | sorry | 2009-10-06 18:53:50 |
| as i said i gonna work on my thesis now | 2009-10-06 18:53:57 |
| just wasted too much time today | 2009-10-06 18:54:03 |
<Shelwien> | ...can't process enwik8 on dreamhost %) | 2009-10-06 19:00:30 |
*** pinc has left the channel | 2009-10-06 20:48:58 |
| enwik8:200 enwik8:64 enwik8:32 Ecoli:200 Ecoli:64 Ecoli:32 | 2009-10-06 20:56:19 |
| literal 99782568 97838655 95601834 4596028 4581466 4572494 | 2009-10-06 20:56:19 |
| structure 4828 176308 728008 592 1708 3028 | 2009-10-06 20:56:19 |
| ppmd-o8m50 22508525 22459174 22480285 1172440 1169545 1168017 | 2009-10-06 20:56:19 |
| ... | 2009-10-06 20:56:19 |
| E_coli enwik8 | 2009-10-06 20:56:20 |
| source 4638690 100000000 | 2009-10-06 20:56:22 |
| ppmd-o8m50 1182206 22524842 | 2009-10-06 20:56:24 |
| fma preprocessing results ;) | 2009-10-06 20:56:49 |
| toffer? | 2009-10-06 20:56:52 |
<toffer> | yeah | 2009-10-06 21:01:43 |
| looks like it hurts on enwik | 2009-10-06 21:03:02 |
| probably it breaks the contexts | 2009-10-06 21:03:13 |
| ? | 2009-10-06 21:03:14 |
<Shelwien> | that's because ppmd can't compress the structure file | 2009-10-06 21:03:16 |
<toffer> | what does it contain? | 2009-10-06 21:03:40 |
<Shelwien> | literal lengths and match offsets/lens | 2009-10-06 21:03:57 |
| 12-byte records | 2009-10-06 21:04:11 |
<asmodean> | hm | 2009-10-07 00:06:50 |
<Shelwien> | m? | 2009-10-07 00:07:03 |
<asmodean> | 6gb of pngs + masks i want to merge into 32-bit bitmaps and then LZMA | 2009-10-07 00:07:10 |
| wonder if i have enough temporary space for this ;p | 2009-10-07 00:07:18 |
| each file is 150mb heh | 2009-10-07 00:07:31 |
<Shelwien> | compress them first? ;) | 2009-10-07 00:07:38 |
| with pngcrush at least? | 2009-10-07 00:07:45 |
<asmodean> | i wish 7zip/winrar were smart enough to contextually decompress known formats | 2009-10-07 00:08:16 |
| like recognize zlib streams and decompress them before compressing ;p | 2009-10-07 00:08:35 |
<Shelwien> | well, precomp should support pngs | 2009-10-07 00:08:47 |
<asmodean> | precomp? | 2009-10-07 00:09:00 |
<Shelwien> | http://schnaader.info | 2009-10-07 00:09:11 |
<asmodean> | ah, well that's the idea but it still needs temp space | 2009-10-07 00:10:02 |
| that's what i am doing manually right now convertin all the pngs to bitmaps ;p | 2009-10-07 00:10:13 |
| converting | 2009-10-07 00:10:16 |
| haha but look prepaq! | 2009-10-07 00:10:41 |
| those paq algorithms take like 5000,00000 years to run | 2009-10-07 00:10:57 |
<Shelwien> | there's lpaq i think | 2009-10-07 00:11:07 |
| its considerably faster (though not as good) | 2009-10-07 00:11:37 |
<asmodean> | yeah but lpaq is only on one file | 2009-10-07 00:11:45 |
| i need to gain efficiency form repetition between these files | 2009-10-07 00:11:53 |
| from | 2009-10-07 00:11:56 |
<Shelwien> | huh. | 2009-10-07 00:12:16 |
| i that's that's what my new tool is for ;) | 2009-10-07 00:12:23 |
| you can try this though: http://haskell.org/bz/rep.zip | 2009-10-07 00:12:52 |
<asmodean> | i'll just let it gobble up 200gb and then lzma it | 2009-10-07 00:13:07 |
<Shelwien> | not the best idea if there're many similarities | 2009-10-07 00:13:32 |
| try that rep first | 2009-10-07 00:13:40 |
| (+lzma) | 2009-10-07 00:13:44 |
<toffer> | gn8 guys | 2009-10-07 00:13:54 |
<asmodean> | what's it do? | 2009-10-07 00:13:57 |
*** toffer has left the channel | 2009-10-07 00:13:59 |
<Shelwien> | rep? finds long repetitions at large distances | 2009-10-07 00:14:19 |
| and removes these ;) | 2009-10-07 00:14:38 |
<asmodean> | huh | 2009-10-07 00:14:52 |
<Shelwien> | a preprocessor too, like precomp | 2009-10-07 00:14:55 |
<asmodean> | let's see what it does to these images containing animations | 2009-10-07 00:15:00 |
| (very slight differences) | 2009-10-07 00:15:16 |
| haha | 2009-10-07 00:16:53 |
| ** Detailed line noise in Russian ******************* | 2009-10-07 00:17:05 |
<Shelwien> | ? | 2009-10-07 00:17:18 |
<asmodean> | the russian characters don't show well in my locale | 2009-10-07 00:17:43 |
<Shelwien> | its not necessary for it to work ;) | 2009-10-07 00:18:19 |
<asmodean> | it did a lot worse than png ;p | 2009-10-07 00:18:43 |
| ~38mb vs 22mb | 2009-10-07 00:18:51 |
<Shelwien> | that's ok probably | 2009-10-07 00:19:01 |
| as its a preprocessor | 2009-10-07 00:19:09 |
| and also there're options | 2009-10-07 00:19:17 |
<asmodean> | oh i thought it preprocessed and then compressed | 2009-10-07 00:19:18 |
<Shelwien> | its output would be smaller probably | 2009-10-07 00:19:41 |
<asmodean> | so if i were to rar this and a png we'd have a good test | 2009-10-07 00:19:51 |
<Shelwien> | if you'd run it like rep -l32 | 2009-10-07 00:20:01 |
| its default minmatchlen is 512 bytes | 2009-10-07 00:20:22 |
<asmodean> | for this data i suspect longer is better | 2009-10-07 00:20:40 |
| nope | 2009-10-07 00:20:47 |
| -l32 got it down to 17mb | 2009-10-07 00:20:53 |
<Shelwien> | sure | 2009-10-07 00:21:00 |
| and that should be still compressible | 2009-10-07 00:21:10 |
<asmodean> | yeah the 17mb rep output compressed down to 7mb with winrar | 2009-10-07 00:23:29 |
| png 22mb -> 21mb | 2009-10-07 00:23:34 |
| png needs some options other than deflate :P | 2009-10-07 00:24:00 |
<Shelwien> | it kinda has | 2009-10-07 00:24:20 |
| as i said, try pngcrush for it | 2009-10-07 00:24:28 |
<asmodean> | i have | 2009-10-07 00:24:33 |
<Shelwien> | and then pngout+deflopt if that's not enough | 2009-10-07 00:24:37 |
<asmodean> | pngcrush usually only gets 1-2% better than typical '-9' zlib compression | 2009-10-07 00:24:59 |
<Shelwien> | well, i meant that png actually has these delta filters | 2009-10-07 00:25:15 |
<asmodean> | yeah i know | 2009-10-07 00:25:23 |
| and you can use a different filter per line | 2009-10-07 00:25:34 |
| i think photoshop tries them all when saving pngs ;p | 2009-10-07 00:26:09 |
| makes it super slow but its files are slightly smaller | 2009-10-07 00:26:19 |
<Shelwien> | anyway, this rep has lots of options | 2009-10-07 00:26:36 |
| and you can experiment with these, starting with -v ;) | 2009-10-07 00:26:51 |
<asmodean> | haha well it's fun but for practical purposes i'm still going with LZMA on all the tgas ;p | 2009-10-07 00:27:17 |
| oh i should winrar the uncompressed file for comparison | 2009-10-07 00:27:32 |
| 18mb | 2009-10-07 00:29:10 |
| so rep helped a lot | 2009-10-07 00:29:17 |
<Shelwien> | rep+lzma should be better | 2009-10-07 00:30:03 |
| they all have their limits on dictionary and window size | 2009-10-07 00:30:25 |
| deflate had to work in 64k of memory so its worst of all | 2009-10-07 00:30:53 |
<asmodean> | yeah tweaking is always possible. for general purpose use though you're not going to tweak much | 2009-10-07 00:31:01 |
| at least winrar is pretty fast | 2009-10-07 00:31:14 |
<Shelwien> | and rar won't see a repetition further than 4M | 2009-10-07 00:31:17 |
<asmodean> | lzma is slow as SHIT | 2009-10-07 00:31:17 |
| it's like encasing your data in ice | 2009-10-07 00:31:23 |
| to get it out you have to chip away for days | 2009-10-07 00:31:28 |
<Shelwien> | ;) | 2009-10-07 00:31:39 |
| you know, its not necessary to use lzma at ultra mode ;) | 2009-10-07 00:31:58 |
<asmodean> | haha when i started out i even tweaked ultra mode's options upwards | 2009-10-07 00:32:15 |
| so "ultra" is a compromise for me now :P | 2009-10-07 00:32:23 |
| what's a good general purpose level of lzma? | 2009-10-07 00:32:36 |
<Shelwien> | well, anyway, even at its best lzma won't see matches further than 1G or something | 2009-10-07 00:33:09 |
| so rep is preferable for such cases | 2009-10-07 00:33:22 |
| rep + lzma with smaller window | 2009-10-07 00:33:29 |
| and i don't know what's a "general purpose" ;) | 2009-10-07 00:33:54 |
<asmodean> | i should do this compression on my htpc | 2009-10-07 00:34:10 |
| quad core | 2009-10-07 00:34:12 |
<Shelwien> | sure ;) | 2009-10-07 00:34:21 |
<asmodean> | it's faster than my server which runs underclocked at 1.9ghz instead of 3.2 :( | 2009-10-07 00:34:31 |
<Shelwien> | btw, did you see my vectorized rangecoder? ;) | 2009-10-07 00:34:47 |
<asmodean> | nope i've been ignoring this place ;) | 2009-10-07 00:35:00 |
<Shelwien> | sometimes i'm thinking about recommending it to Igor | 2009-10-07 00:35:01 |
| its in the topic (ccm) | 2009-10-07 00:35:10 |
| so if i'd do, you might have to reverse it someday ;) | 2009-10-07 00:35:43 |
<asmodean> | the worst thing is reversing some algorithm that feels like it came from a library but you can't tell which :> | 2009-10-07 00:37:06 |
<Shelwien> | i guess you didn't encounter much of intelc code ;) | 2009-10-07 00:37:57 |
<asmodean> | hm 7zip can't use more than 2 threads | 2009-10-07 00:38:02 |
| why because intelc inlines all over the place? | 2009-10-07 00:38:17 |
<Shelwien> | yeah, and does some even worse stuff ;) | 2009-10-07 00:38:31 |
| like reordering and vectorizing weird places | 2009-10-07 00:38:52 |
| once it vectorize rangecoder i/o %) | 2009-10-07 00:39:14 |
| *vectorized | 2009-10-07 00:39:20 |
| in fpaq0pv4B | 2009-10-07 00:39:23 |
<asmodean> | does intelc produce double digital performance improvements just recompiling with it? | 2009-10-07 00:39:54 |
<Shelwien> | well, if you count in percents, then yeah, probably | 2009-10-07 00:40:21 |
| also depending on tasks | 2009-10-07 00:40:30 |
| but i've got 20% speed improvement | 2009-10-07 00:42:26 |
| from recompiling unrar.dll with intelc | 2009-10-07 00:42:33 |
<asmodean> | surprised more people don't use it | 2009-10-07 00:42:37 |
<Shelwien> | they do in fact | 2009-10-07 00:42:45 |
<asmodean> | maybe i'm lucky the jp dudes haven't discovered it then :) | 2009-10-07 00:43:08 |
<Shelwien> | well, you can search for "GenuineIntel" string in executables and dlls ;) | 2009-10-07 00:44:32 |
<asmodean> | worst optimization headache i've seen is 'whole program' optimization | 2009-10-07 00:45:05 |
<Shelwien> | i always use it ;) | 2009-10-07 00:45:21 |
<asmodean> | embeds lovely assumptions about things being in registers/stack/etc across several levels of function calls | 2009-10-07 00:45:23 |
<Shelwien> | well, i'd use worse things in manual asm though ;) | 2009-10-07 00:49:44 |
| so its all good ;) | 2009-10-07 00:49:48 |
| at least compilers don't keep values in flags across function calls ;) | 2009-10-07 00:50:41 |
| and don't use esp for general i/o ;) | 2009-10-07 00:51:08 |
<asmodean> | heh but manual asm is much harder to write much of | 2009-10-07 00:51:24 |
<Shelwien> | not really, depending on task though | 2009-10-07 00:51:38 |
<asmodean> | i'll take a handful of manual asm than a whole executable of optimized bullshit any day ;p | 2009-10-07 00:51:41 |
<Shelwien> | huh. wanna look at my old compressor written in asm? | 2009-10-07 00:52:10 |
<asmodean> | not really :) i'd rather look at fun easy to understand LZSS variations over either :) | 2009-10-07 00:53:13 |
<Shelwien> | actually i only stopped using asm because of intelc ;) | 2009-10-07 00:53:23 |
| there're no tools for similar global optimization | 2009-10-07 00:53:43 |
| and automatic vectorization | 2009-10-07 00:53:47 |
<asmodean> | lately i spend a lot more time reversing crypto/obfuscation :( ugh | 2009-10-07 00:53:49 |
<Shelwien> | well, its all easy enough if they don't use low-level stuff | 2009-10-07 00:54:32 |
| like drivers etc | 2009-10-07 00:54:36 |
<asmodean> | the only system that bothers me does constant self-checks (anti-debugger, crcs of code sections etc) with inline code | 2009-10-07 00:55:28 |
| makes it terribly irritating to trace the algorithms | 2009-10-07 00:55:37 |
| so i gave up and just dynamically patched it in between checks :P | 2009-10-07 00:56:08 |
| heh | 2009-10-07 00:56:09 |
<Shelwien> | why don't you just block read access to code pages | 2009-10-07 00:56:24 |
| and handle the exception ;) | 2009-10-07 00:56:36 |
<asmodean> | well i'd have to automate backtracking to the code doing the check to get back to the mainline code | 2009-10-07 00:57:19 |
<Shelwien> | i mean, that'd allow you to sniff out all the locations | 2009-10-07 00:57:49 |
<asmodean> | i was finding the checks with memory breakpoints | 2009-10-07 00:58:18 |
<Shelwien> | good too, but there're not much of these ;) | 2009-10-07 00:58:46 |
| and also protections sometimes block them | 2009-10-07 00:59:05 |
<asmodean> | this was a novice system. but since he coded it himself he could make things more annoying by embedding the protection in his algorithms | 2009-10-07 00:59:52 |
| like it if detected you, it doesn't fail sometimes. instead it heads off down some useless path of code | 2009-10-07 01:00:12 |
| which doesn't decrypt data correctly :P | 2009-10-07 01:00:26 |
<Shelwien> | worse case imho is when program works, but slightly wrong | 2009-10-07 01:00:45 |
<asmodean> | right | 2009-10-07 01:00:50 |
<Shelwien> | !next | 2009-10-07 01:52:00 |