*** complogger has joined the channel2009-10-05 21:03:23
*** compbooks has joined the channel2009-10-05 21:04:00
*** toffer has joined the channel2009-10-05 21:30:42
<toffer> hi! just wanted to say have a look at your ftp2009-10-05 21:30:48
<Shelwien> hi2009-10-05 21:30:54
 dcc.7z2009-10-05 21:31:09
 7M only2009-10-05 21:31:18
 now 102009-10-05 21:31:33
<toffer> it's 78mb2009-10-05 21:31:38
 unstable connection2009-10-05 21:31:50
 started with 700kb/s2009-10-05 21:31:54
 now it's 1002009-10-05 21:31:57
<Shelwien> btw, i can enable a shell for you there, if you want ;)2009-10-05 21:33:38
<toffer> just too late2009-10-05 21:35:39
 via ssh?2009-10-05 21:35:41
<Shelwien> yeah2009-10-05 21:35:44
<toffer> it was a proxy problem all the time2009-10-05 21:35:56
<Shelwien> suspected that2009-10-05 21:36:09
 but i didn't mean for upload2009-10-05 21:36:13
 there're compilers and stuff2009-10-05 21:36:23
 so you'd be able to test m1 there or whatever ;)2009-10-05 21:36:39
 not really a lot of computing resources though2009-10-05 21:37:00
<toffer> guess it's not much of use2009-10-05 21:37:13
 but having a good remote machine would be great2009-10-05 21:37:22
 gonna eat and have a beer now2009-10-05 21:37:42
<Shelwien> %)2009-10-05 21:37:47
*** pinc has left the channel2009-10-05 22:19:00
<toffer> gn82009-10-05 23:11:32
*** toffer has left the channel2009-10-05 23:11:35
*** pinc has joined the channel2009-10-06 06:31:19
*** Shelwien has left the channel2009-10-06 08:47:59
*** toffer has joined the channel2009-10-06 09:46:51
 hi2009-10-06 10:06:09
 somehow 9 bit precision is enough for stretch ^^2009-10-06 11:02:44
*** Shelwien has joined the channel2009-10-06 14:04:25
<Shelwien> http://toffer.dreamhosters.com/ ;)2009-10-06 14:05:28
<toffer> Encoding...done. 22257252/100000000 bytes, 48.00 s that's for 32mb and approx. orders 1,2,4,62009-10-06 14:12:30
<Shelwien> with 4 models?2009-10-06 14:13:03
<toffer> yep2009-10-06 14:13:22
 i can get 0.8% improvement by raising memory2009-10-06 14:13:31
 but currently it's 4 ordinary models2009-10-06 14:13:55
 as i said i wanted to replace one of these with a match model2009-10-06 14:14:05
<Shelwien> ccmx/bwt are around 20.8M...2009-10-06 14:15:22
<toffer> that's not a fair comparision2009-10-06 14:15:37
 due to the match model2009-10-06 14:15:40
 for comparision2009-10-06 14:15:42
 lpaq1 with order 1246, 100000000 -> 22359789 in 114.21 sec. using 51 MB memory2009-10-06 14:15:55
 as you see the results are together closely2009-10-06 14:16:10
<Shelwien> well, i didn't say that its a bad results ;)2009-10-06 14:16:19
<toffer> and lpaq1 with m246 gets ccm like compression2009-10-06 14:16:23
 it's 20.8xxx.xxx ... 2009-10-06 14:16:50
 ccmx like compression2009-10-06 14:16:55
<Shelwien> just that imho its necessary to beat at least BWT2009-10-06 14:17:06
 and accidentally, ccmx has similar results ;)2009-10-06 14:17:35
<toffer> as i said replacing a model with a match modell will give the performance of ccmx2009-10-06 14:17:57
 at higher speeds :)2009-10-06 14:18:07
 at least ignoring filterrs2009-10-06 14:18:11
 filters2009-10-06 14:18:16
<Shelwien> well, ccm filters don't apply to enwiks2009-10-06 14:18:27
<toffer> not exactly2009-10-06 14:18:34
 it got some text preprocessing2009-10-06 14:18:43
<Shelwien> well, plain ccm surely doesn't2009-10-06 14:19:00
<toffer> it quantises an order1 context2009-10-06 14:19:22
<Shelwien> !grep skymmer.2009-10-06 14:19:26
<toffer> based on c>='a' && ...2009-10-06 14:19:28
<Shelwien> damn2009-10-06 14:19:38
 !grep skymmer.narod2009-10-06 14:20:02
 ccm 5 = 22 003 9582009-10-06 14:20:29
 ccmx 5 = 21 013 7932009-10-06 14:20:34
 ccm_sh1d9e 5 = 22 004 8832009-10-06 14:20:39
<toffer> 5 is how much memory?2009-10-06 14:20:49
 as i said it's just 32mb for me2009-10-06 14:21:01
<Shelwien> 550M2009-10-06 14:21:05
<toffer> ^^2009-10-06 14:21:08
 Allocated 262535 kB.2009-10-06 14:21:32
 Encoding...done. 21451718/100000000 bytes, 47.49 s2009-10-06 14:21:34
<Shelwien> i just wanted to show that plain ccm doesn't have a text filter2009-10-06 14:21:56
<toffer> it has2009-10-06 14:22:09
 look at the source2009-10-06 14:22:12
 it's not a filter2009-10-06 14:22:31
<Shelwien> ah, that...2009-10-06 14:22:37
<toffer> it just uses some text specific contexts2009-10-06 14:22:38
<Shelwien> well, you would use some too2009-10-06 14:22:50
 if you optimized the masks etc ;)2009-10-06 14:22:58
 and 21.4 is certainly more impressive2009-10-06 14:23:31
<toffer> it's just the increase in memory2009-10-06 14:23:59
 i'm testing 2gb now2009-10-06 14:24:06
 not much improvement2009-10-06 14:24:14
 Allocated 2097543 kB.2009-10-06 14:24:26
 Encoding...done. 21371640/100000000 bytes, 52.57 s2009-10-06 14:24:28
 i guess prior to the match model varaint2009-10-06 14:24:36
 i'd release a plain 4 model variant2009-10-06 14:24:45
<Shelwien> yeah, its ok2009-10-06 14:25:07
<toffer> the little effect of memory increase from 256mb to 2gb is due to nibble caching2009-10-06 14:25:21
<Shelwien> though how do you scale memory use for different models?2009-10-06 14:25:25
<toffer> there're two hash tables2009-10-06 14:25:34
 w82009-10-06 14:25:36
 two hash tables for high and low nibbles2009-10-06 14:34:45
 a larger collision domain helps2009-10-06 14:34:59
 and i need to separate these due to nibble caching2009-10-06 14:35:10
 (which successfully removes 26% of cache misses)2009-10-06 14:35:27
<Shelwien> well, what i meant2009-10-06 14:35:42
 is that it might be better to specify the hashtable size separately2009-10-06 14:35:57
<toffer> ?2009-10-06 14:36:31
<Shelwien> if two 16M hashtables work ok2009-10-06 14:36:45
 that doesn't mean that 32M+32M would be better than 32M+16M2009-10-06 14:37:00
 well, relative scales2009-10-06 14:37:15
 also, it would be probably good to optimize the parameters separately for different memory settings2009-10-06 14:40:19
 afair, m1 is now able to load some parameter profiles in runtime?2009-10-06 14:40:39
<toffer> es2009-10-06 14:56:09
 yes2009-10-06 14:56:13
 as to has table division2009-10-06 15:01:47
 i already made some experiment2009-10-06 15:01:53
 increasing the high nibble's collision domain size helps, since it stores the nibble cache. it boosts both, compression and speed2009-10-06 15:02:19
* Shelwien tried to use precomp to find the pdf titles, but failed2009-10-06 15:44:29
*** pinc has left the channel2009-10-06 15:47:47
 afair the file names should be in lexical order (compared to the index pdf) when you look at 2009-10-06 15:47:48
<Shelwien> you didn't include the indexes ;)2009-10-06 15:49:06
<toffer> i did2009-10-06 15:49:20
 you need to look at the first few pdfs2009-10-06 15:49:34
 one of these is a toc sheet2009-10-06 15:49:47
<Shelwien> %)2009-10-06 15:49:54
 found it, but i think getting the indexes off ieee would be more convenient2009-10-06 15:52:12
<toffer> there's no ieee index list or something like that2009-10-06 15:53:07
<Shelwien> there is2009-10-06 15:53:14
<toffer> since all articles just name to ieeexplore.pdf2009-10-06 15:53:15
<Shelwien> http://www.ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=30443&isYear=20052009-10-06 15:53:18
<toffer> well, yes2009-10-06 15:53:40
 i thought you meant something like a file to download2009-10-06 15:53:52
 but i found nothing like that2009-10-06 15:54:01
<Shelwien> well, i'd just make an index out of this, i guess2009-10-06 15:54:07
<toffer> yeah the arnumbers map to pdf names2009-10-06 15:54:18
 btw with more models i was able to lower counter precision w/o any great compression loss2009-10-06 15:59:16
<Shelwien> you mean output probability?2009-10-06 15:59:54
<toffer> counters2009-10-06 16:00:04
<Shelwien> i thought that you only use bytewise fsm?2009-10-06 16:00:08
<toffer> you mean 256 states2009-10-06 16:00:24
 ?2009-10-06 16:00:52
<Shelwien> yes2009-10-06 16:01:11
<toffer> yeah i do2009-10-06 16:01:24
 i meant the counters within the sse maps2009-10-06 16:01:30
<Shelwien> ah2009-10-06 16:02:57
<toffer> in 0.4 i had 20 bit counters with notable compression improvement2009-10-06 16:03:15
 but due to stretch/squash mappings (which are 9/12 bits) 16 bit is sufficient2009-10-06 16:03:46
<Shelwien> yeah... also counters are quantized anyway2009-10-06 16:04:28
<toffer> yeah2009-10-06 16:13:13
 that's why i guess2009-10-06 16:13:16
 but i found a mapping of 9->15 bits to be optimal2009-10-06 16:13:29
 (stretch)2009-10-06 16:13:33
 and again i removed alot of parameters2009-10-06 16:13:47
 (which resulted of an extension to 4 models)2009-10-06 16:14:07
<Shelwien> well, for me, optimization is a cheap resource, i guess2009-10-06 16:15:12
 for example, i didn't run any in more than a month ;)2009-10-06 16:15:51
 and i don't ever switch off the q9450 ;)2009-10-06 16:16:11
<toffer> if i knew that i'd ask you to optimize2009-10-06 16:18:10
 some m1 stuff ^^2009-10-06 16:18:14
 a ssh login on a 64 bit linux would be nice2009-10-06 16:18:41
<Shelwien> only 32-bit, sorry ;)2009-10-06 16:19:18
<toffer> it'd still work2009-10-06 16:19:38
<Shelwien> you can try logging to that dreamhost server now2009-10-06 16:24:58
 same login/pass, same server, but ssh2009-10-06 16:25:10
<toffer> thanks2009-10-06 16:25:51
 but i'd not be useful for optimization i guess2009-10-06 16:26:06
<Shelwien> yeah2009-10-06 16:26:12
<toffer> but i can use i for file uploads2009-10-06 16:26:13
<Shelwien> yeah, as i said, http://toffer.dreamhosters.com now works2009-10-06 16:26:40
 and it supports scripts and stuff there btw2009-10-06 16:26:59
*** pinc has joined the channel2009-10-06 16:27:06
<toffer> i thought the name is already used2009-10-06 16:27:07
<Shelwien> as a username2009-10-06 16:27:23
 xx.dreamhosters.com are not automatic2009-10-06 16:27:39
<toffer> thanks for the login2009-10-06 16:30:01
<Shelwien> ;)2009-10-06 16:30:14
<toffer> i increased the nibble cache w/o changing memory requirements drastically. now it can directly be compared to lpaq2009-10-06 16:38:09
 lpaq 1246 100000000 -> 22359789 in 112.95 sec. using 51 MB memory.2009-10-06 16:38:21
 Allocated 49543 kB.2009-10-06 16:38:43
 Encoding...done. 21956964/100000000 bytes, 46.87 s2009-10-06 16:38:44
 0.4% better compression while beeing 2.4 times as fast2009-10-06 16:39:52
 and better than a comparable ccm without a match model2009-10-06 16:40:15
 i mean without having a match model2009-10-06 16:40:25
<Shelwien> what about book1 vs ppmd? ;)2009-10-06 16:43:12
<toffer> didn't try2009-10-06 16:45:13
 if you can give me some numbers i can compare it2009-10-06 16:45:20
 but still a match model provies much better performance2009-10-06 16:45:43
<Shelwien> well, ppmd doesn't have a match model either ;)2009-10-06 16:46:11
 as to numbers...2009-10-06 16:46:50
<toffer> well but it can increase its coding order which would have a similar effect2009-10-06 16:46:50
<Shelwien> http://compression.ru/ds/ppmdj.rar2009-10-06 16:46:58
 i think you should be able to compile it even if you're on linux2009-10-06 16:47:19
 as to similar effect - not really2009-10-06 16:47:38
 because then it'd have to flush the tree more frequently2009-10-06 16:47:57
 which would be slower or/and hurt compression2009-10-06 16:48:14
 but with a small file, like book12009-10-06 16:48:38
 i think ppmd might be a good competition2009-10-06 16:48:54
<toffer> well of course it has a similar effect2009-10-06 16:50:20
 higher orders indicate a greater prediction confidence2009-10-06 16:51:00
<Shelwien> it _can_ have, but doesn't in practice2009-10-06 16:51:05
 ppmd would benefit from using a match model the same as m12009-10-06 16:51:26
<toffer> having fixed models is a great difference2009-10-06 16:51:50
<Shelwien> dunno2009-10-06 16:52:07
 what's interesting here is that ppmd is bytewise2009-10-06 16:52:18
 and it still would be slower that m1x2 maybe2009-10-06 16:52:35
 but it should have better compression2009-10-06 16:52:51
<toffer> it doesn' work under linux2009-10-06 16:54:31
 doesn't2009-10-06 16:54:34
 it includes windows.h2009-10-06 16:54:38
<Shelwien> did you use a makefile?2009-10-06 16:55:50
<toffer> yeah2009-10-06 16:57:41
<Shelwien> which one? ;)2009-10-06 16:57:52
 i'd suggest makefile.gmk for gcc ;)2009-10-06 16:58:07
<toffer> i needed to modify the source2009-10-06 17:00:51
 some defines2009-10-06 17:00:52
 could you suggest any options?2009-10-06 17:01:09
<Shelwien> default?2009-10-06 17:01:52
 well, also -o12 -m50 -r1 maybe2009-10-06 17:02:24
 btw, i made these indexes for dcc, do you need them?2009-10-06 17:05:05
<toffer> you can upload these2009-10-06 17:05:58
<Shelwien> done2009-10-06 17:06:47
<toffer> Fast PPMII compressor for textual data, variant J, Oct 6 20092009-10-06 17:10:24
  book1: 768771 > 209823, 2.18 bpb, used: 5.4MB, speed: 5024 KB/sec2009-10-06 17:10:25
 Allocated 49543 kB.2009-10-06 17:10:45
 Encoding...done. 213728/768771 bytes, 0.36 s2009-10-06 17:10:46
 model initialisation takes quite a while2009-10-06 17:10:54
 i tuned ppmds order2009-10-06 17:11:02
 6 seems to be optimal2009-10-06 17:11:13
<Shelwien> for book1 maybe2009-10-06 17:11:38
 btw2009-10-06 17:12:37
 can you also download other dcc years? ;)2009-10-06 17:12:51
<toffer> i thought you got these?2009-10-06 17:13:03
<Shelwien> i'd build the similar indexes then2009-10-06 17:13:06
 and yeah, i have them2009-10-06 17:13:15
 but they're messy2009-10-06 17:13:24
 and i wonder about pdf quality2009-10-06 17:13:31
 anyway, they're not from ieee xplore2009-10-06 17:14:18
 so i'd prefer have it all in a consistent form ;)2009-10-06 17:14:41
<toffer> i'll do that2009-10-06 17:15:02
 but not today2009-10-06 17:15:04
<Shelwien> sure2009-10-06 17:15:08
 btw i just found that they've got a prize for an article ;)2009-10-06 17:16:29
 i'm not a student unfortunately ;)2009-10-06 17:16:37
<toffer> retuned to book1 i get 2k improvement2009-10-06 17:18:37
 leaving contexts untouched2009-10-06 17:18:52
<Shelwien> so 211?2009-10-06 17:19:00
<toffer> 190 f* = 211732.0000002009-10-06 17:19:01
<Shelwien> similar to bwt with 4 models, i guess...2009-10-06 17:19:28
 but hopefully faster2009-10-06 17:19:38
<toffer> a match model will do better as i said2009-10-06 17:19:50
<Shelwien> not for book12009-10-06 17:19:58
<toffer> and complete book1 tuning too2009-10-06 17:19:58
 sure?2009-10-06 17:20:06
<Shelwien> there're not much matches2009-10-06 17:20:22
<toffer> did you ever couple a match model as i lpaq2009-10-06 17:20:23
<Shelwien> order6 for ppmd is a proof of that2009-10-06 17:20:28
<toffer> it's not about matches, it's about the correct determinition of context order2009-10-06 17:20:38
<Shelwien> we can test my new LZ i guess...2009-10-06 17:20:50
<toffer> you mean that fma?2009-10-06 17:21:07
<Shelwien> yeah2009-10-06 17:21:10
 i'd rewritten it yesterday2009-10-06 17:21:19
 and now it actually transforms and detransforms stuff2009-10-06 17:21:41
 only -9M off enwik9 though2009-10-06 17:22:02
<toffer> so can you use it like rep now?2009-10-06 17:22:02
<Shelwien> kinda2009-10-06 17:22:10
 it outputs two files though2009-10-06 17:22:18
 wonder if gcc compiles it...2009-10-06 17:23:52
<toffer> not if you casted pointer to ints as you usually do2009-10-06 17:24:10
 ^^2009-10-06 17:24:12
<Shelwien> no, i've been more careful this time... in a way2009-10-06 17:24:35
 it compiles apparently2009-10-06 17:24:39
 got that?2009-10-06 17:27:01
 usage is2009-10-06 17:27:40
 fma-hash source literal_file structure_file2009-10-06 17:28:04
 fma-dec literal_file structure_file output2009-10-06 17:28:13
<toffer> i didn't get anythign2009-10-06 17:29:08
 did you upload it?2009-10-06 17:29:19
 i need the sources2009-10-06 17:29:29
<Shelwien> err... PM'ed2009-10-06 17:29:30
<toffer> pm?2009-10-06 17:29:42
<Shelwien> http://shelwien.googlepages.com/fma_06.rar2009-10-06 17:30:15
<toffer> Allocated 49543 kB.2009-10-06 17:33:08
  Encoding...done. 21956964/100000000 bytes, 46.87 s -> 2133 kb/s2009-10-06 17:33:10
 cm@e051:~/shared/testset/enwik$ ../../projects/temp/PPMd e -m50 -o10 -r1 enwik8 -f/dev/null2009-10-06 17:33:11
 Fast PPMII compressor for textual data, variant J, Oct 6 20092009-10-06 17:33:13
  enwik8:100000000 >22632473, 1.38 bpb, used: 39.0MB, speed: 1593 KB/sec2009-10-06 17:33:14
 cm@e051:~/shared/testset/enwik$ ../../projects/temp/PPMd e -m50 -o12 -r1 enwik8 -f/dev/null2009-10-06 17:33:16
 Fast PPMII compressor for textual data, variant J, Oct 6 20092009-10-06 17:33:17
 enwik8.pmd already exists, overwrite?: <Y>es, <N>o, <A>ll, <Q>uit?y2009-10-06 17:33:19
  enwik8:100000000 >22719967, 1.39 bpb, used: 49.7MB, speed: 1413 KB/sec2009-10-06 17:33:20
<Shelwien> what?2009-10-06 17:33:33
<toffer> it's faster than ppmd2009-10-06 17:33:53
 with better compression2009-10-06 17:34:02
 for enwik2009-10-06 17:34:09
 i'm retrying order 82009-10-06 17:34:19
 dunnot know what'd be bester for e82009-10-06 17:34:26
<Shelwien> well, for enwik there're memory troubles2009-10-06 17:34:43
 the thing which Shkarin does to free some memory is just too weird to do any good2009-10-06 17:35:06
<toffer> dunnot know2009-10-06 17:35:58
 i got collision2009-10-06 17:36:01
 collisions2009-10-06 17:36:06
 but testing with limited memory should be fair2009-10-06 17:36:25
<Shelwien> its still better than flushing all the stats after each few MBs2009-10-06 17:37:00
<toffer> the best result i got for ppmd is order82009-10-06 17:37:03
  enwik8:100000000 >22524820, 1.37 bpb, used: 35.8MB, speed: 2106 KB/sec2009-10-06 17:37:05
 it runs at about the same speed here2009-10-06 17:37:13
 i used r12009-10-06 17:37:23
 which doesn't flush the model2009-10-06 17:37:33
<Shelwien> it does2009-10-06 17:37:38
 unfortunately2009-10-06 17:37:42
<toffer> there's not any better option built in2009-10-06 17:37:44
<Shelwien> well, sure2009-10-06 17:37:49
<toffer>  -rN - set method of model restoration at memory insufficiency:2009-10-06 17:38:04
  -r0 - restart model from scratch (default)2009-10-06 17:38:05
  -r1 - cut off model (slow)2009-10-06 17:38:07
<Shelwien> i know ;)2009-10-06 17:40:14
<toffer> but that's not freeing the model2009-10-06 17:40:45
 it rebuilds it2009-10-06 17:40:49
<Shelwien> it cuts it down until 75% memory is free2009-10-06 17:41:07
 and that's slow and hurts compression2009-10-06 17:41:19
<toffer> still the memory constrain 50 mb2009-10-06 17:41:32
 is2009-10-06 17:41:39
<Shelwien> well, memory issues are more complex to handle in ppmd2009-10-06 17:42:29
 but it doesn't mean that its fair to compare it knowing that2009-10-06 17:42:42
<toffer> dunnot know2009-10-06 17:43:04
 i can increase the memory2009-10-06 17:43:08
<Shelwien> yeah, the overflow handling has to be improved2009-10-06 17:43:12
 but atm we know that ppmd is bad at that2009-10-06 17:43:22
<toffer> on the other hand i could say that it's unfair to compare it without having a match model2009-10-06 17:43:29
<Shelwien> so whats the sense to hang on that?2009-10-06 17:43:35
<toffer> nothing2009-10-06 17:43:40
 but it's a good ppm implementation2009-10-06 17:43:51
<Shelwien> as i said... ppmd practically doesn't have a match model2009-10-06 17:43:52
<toffer> it's not about coding matches separately2009-10-06 17:44:05
 it's just about knowing the coding order at each step2009-10-06 17:44:12
 which has a significant influence for fixed models2009-10-06 17:44:25
<Shelwien> no, i mean that PPM in theory includes the features of a match model2009-10-06 17:44:47
 but in practice ppmd overflow handling cuts that down2009-10-06 17:45:06
<toffer> still it can access higher order statisticcs2009-10-06 17:45:25
 but the overflow handling isn't the best. i agree here2009-10-06 17:45:45
<Shelwien> well, you can set it to -o8 or whatever you want2009-10-06 17:45:45
 anyway, you can compare it with 2G of memory ;)2009-10-06 17:46:07
<toffer> the best configuration i found was order 82009-10-06 17:46:12
 that's what i call unfair ^^2009-10-06 17:46:26
 but a good thing would be to test on a file where no overflows happen2009-10-06 17:46:42
 and than i'd limit it to order 62009-10-06 17:47:40
<Shelwien> that's kinda what i meant2009-10-06 17:48:12
<toffer> i tried to compile fma2009-10-06 17:49:49
 without success2009-10-06 17:49:52
 but i'll stop for today2009-10-06 17:49:59
 since i need to do something for my thesis, too2009-10-06 17:50:06
 -.-2009-10-06 17:50:08
 just wasted too much time2009-10-06 17:50:13
 cm@e051:/mnt/shared_extern/projects/temp/fma/fma_06$ gcc -O3 -o fma-hash fma-hash.cpp -lstdc++2009-10-06 17:50:26
 In file included from fma-hash.cpp:43:2009-10-06 17:50:28
 hashbuf.inc:29: error: expected constructor, destructor, or type conversion before ‘(’ token2009-10-06 17:50:29
 hashbuf.inc: In function ‘void HashBufInit()’:2009-10-06 17:50:31
 hashbuf.inc:36: error: ‘hridx’ was not declared in this scope2009-10-06 17:50:32
 hashbuf.inc: In function ‘uint HashFind()’:2009-10-06 17:50:34
 hashbuf.inc:47: error: ‘hridx’ was not declared in this scope2009-10-06 17:50:35
 hashbuf.inc: In function ‘void HashIndex()’:2009-10-06 17:50:37
<Shelwien> well, i compiled it on DH2009-10-06 17:50:38
<toffer> hashbuf.inc:68: error: ‘hridx’ was not declared in this scope2009-10-06 17:50:38
 cm@e051:/mnt/shared_extern/projects/temp/fma/fma_06$ 2009-10-06 17:50:40
<Shelwien> there's a __declspec accidentally2009-10-06 17:50:59
 but now it segfaults on any file2009-10-06 17:51:08
 well, i'd check what happens later, food calls ;)2009-10-06 17:51:38
<toffer> enjoy2009-10-06 17:54:47
<Shelwien> back...2009-10-06 18:12:46
 damn, why gcc always has to be so annoying...2009-10-06 18:27:59
 apparently, it doesn't allow negative array indexes2009-10-06 18:28:29
 now, why?2009-10-06 18:28:31
 ok, now it works2009-10-06 18:30:18
 toffer?2009-10-06 18:30:47
<toffer> sorry2009-10-06 18:53:50
 as i said i gonna work on my thesis now2009-10-06 18:53:57
 just wasted too much time today2009-10-06 18:54:03
<Shelwien> ...can't process enwik8 on dreamhost %)2009-10-06 19:00:30
*** pinc has left the channel2009-10-06 20:48:58
  enwik8:200 enwik8:64 enwik8:32 Ecoli:200 Ecoli:64 Ecoli:322009-10-06 20:56:19
 literal 99782568 97838655 95601834 4596028 4581466 45724942009-10-06 20:56:19
 structure 4828 176308 728008 592 1708 30282009-10-06 20:56:19
 ppmd-o8m50 22508525 22459174 22480285 1172440 1169545 11680172009-10-06 20:56:19
 ...2009-10-06 20:56:19
  E_coli enwik82009-10-06 20:56:20
 source 4638690 1000000002009-10-06 20:56:22
 ppmd-o8m50 1182206 225248422009-10-06 20:56:24
 fma preprocessing results ;)2009-10-06 20:56:49
 toffer?2009-10-06 20:56:52
<toffer> yeah2009-10-06 21:01:43
 looks like it hurts on enwik2009-10-06 21:03:02
 probably it breaks the contexts2009-10-06 21:03:13
 ?2009-10-06 21:03:14
<Shelwien> that's because ppmd can't compress the structure file2009-10-06 21:03:16
<toffer> what does it contain?2009-10-06 21:03:40
<Shelwien> literal lengths and match offsets/lens2009-10-06 21:03:57
 12-byte records2009-10-06 21:04:11
<asmodean> hm2009-10-07 00:06:50
<Shelwien> m?2009-10-07 00:07:03
<asmodean> 6gb of pngs + masks i want to merge into 32-bit bitmaps and then LZMA2009-10-07 00:07:10
 wonder if i have enough temporary space for this ;p2009-10-07 00:07:18
 each file is 150mb heh2009-10-07 00:07:31
<Shelwien> compress them first? ;)2009-10-07 00:07:38
 with pngcrush at least?2009-10-07 00:07:45
<asmodean> i wish 7zip/winrar were smart enough to contextually decompress known formats2009-10-07 00:08:16
 like recognize zlib streams and decompress them before compressing ;p2009-10-07 00:08:35
<Shelwien> well, precomp should support pngs2009-10-07 00:08:47
<asmodean> precomp?2009-10-07 00:09:00
<Shelwien> http://schnaader.info2009-10-07 00:09:11
<asmodean> ah, well that's the idea but it still needs temp space2009-10-07 00:10:02
 that's what i am doing manually right now convertin all the pngs to bitmaps ;p2009-10-07 00:10:13
 converting2009-10-07 00:10:16
 haha but look prepaq!2009-10-07 00:10:41
 those paq algorithms take like 5000,00000 years to run2009-10-07 00:10:57
<Shelwien> there's lpaq i think2009-10-07 00:11:07
 its considerably faster (though not as good)2009-10-07 00:11:37
<asmodean> yeah but lpaq is only on one file2009-10-07 00:11:45
 i need to gain efficiency form repetition between these files2009-10-07 00:11:53
 from2009-10-07 00:11:56
<Shelwien> huh.2009-10-07 00:12:16
 i that's that's what my new tool is for ;)2009-10-07 00:12:23
 you can try this though: http://haskell.org/bz/rep.zip2009-10-07 00:12:52
<asmodean> i'll just let it gobble up 200gb and then lzma it2009-10-07 00:13:07
<Shelwien> not the best idea if there're many similarities2009-10-07 00:13:32
 try that rep first2009-10-07 00:13:40
 (+lzma)2009-10-07 00:13:44
<toffer> gn8 guys2009-10-07 00:13:54
<asmodean> what's it do?2009-10-07 00:13:57
*** toffer has left the channel2009-10-07 00:13:59
<Shelwien> rep? finds long repetitions at large distances2009-10-07 00:14:19
 and removes these ;)2009-10-07 00:14:38
<asmodean> huh2009-10-07 00:14:52
<Shelwien> a preprocessor too, like precomp2009-10-07 00:14:55
<asmodean> let's see what it does to these images containing animations2009-10-07 00:15:00
 (very slight differences)2009-10-07 00:15:16
 haha2009-10-07 00:16:53
 ** Detailed line noise in Russian *******************2009-10-07 00:17:05
<Shelwien> ?2009-10-07 00:17:18
<asmodean> the russian characters don't show well in my locale2009-10-07 00:17:43
<Shelwien> its not necessary for it to work ;)2009-10-07 00:18:19
<asmodean> it did a lot worse than png ;p2009-10-07 00:18:43
 ~38mb vs 22mb2009-10-07 00:18:51
<Shelwien> that's ok probably2009-10-07 00:19:01
 as its a preprocessor2009-10-07 00:19:09
 and also there're options2009-10-07 00:19:17
<asmodean> oh i thought it preprocessed and then compressed2009-10-07 00:19:18
<Shelwien> its output would be smaller probably2009-10-07 00:19:41
<asmodean> so if i were to rar this and a png we'd have a good test2009-10-07 00:19:51
<Shelwien> if you'd run it like rep -l322009-10-07 00:20:01
 its default minmatchlen is 512 bytes2009-10-07 00:20:22
<asmodean> for this data i suspect longer is better2009-10-07 00:20:40
 nope2009-10-07 00:20:47
 -l32 got it down to 17mb2009-10-07 00:20:53
<Shelwien> sure2009-10-07 00:21:00
 and that should be still compressible2009-10-07 00:21:10
<asmodean> yeah the 17mb rep output compressed down to 7mb with winrar2009-10-07 00:23:29
 png 22mb -> 21mb2009-10-07 00:23:34
 png needs some options other than deflate :P2009-10-07 00:24:00
<Shelwien> it kinda has2009-10-07 00:24:20
 as i said, try pngcrush for it2009-10-07 00:24:28
<asmodean> i have2009-10-07 00:24:33
<Shelwien> and then pngout+deflopt if that's not enough2009-10-07 00:24:37
<asmodean> pngcrush usually only gets 1-2% better than typical '-9' zlib compression2009-10-07 00:24:59
<Shelwien> well, i meant that png actually has these delta filters2009-10-07 00:25:15
<asmodean> yeah i know2009-10-07 00:25:23
 and you can use a different filter per line2009-10-07 00:25:34
 i think photoshop tries them all when saving pngs ;p2009-10-07 00:26:09
 makes it super slow but its files are slightly smaller2009-10-07 00:26:19
<Shelwien> anyway, this rep has lots of options2009-10-07 00:26:36
 and you can experiment with these, starting with -v ;)2009-10-07 00:26:51
<asmodean> haha well it's fun but for practical purposes i'm still going with LZMA on all the tgas ;p2009-10-07 00:27:17
 oh i should winrar the uncompressed file for comparison2009-10-07 00:27:32
 18mb2009-10-07 00:29:10
 so rep helped a lot2009-10-07 00:29:17
<Shelwien> rep+lzma should be better2009-10-07 00:30:03
 they all have their limits on dictionary and window size2009-10-07 00:30:25
 deflate had to work in 64k of memory so its worst of all2009-10-07 00:30:53
<asmodean> yeah tweaking is always possible. for general purpose use though you're not going to tweak much2009-10-07 00:31:01
 at least winrar is pretty fast2009-10-07 00:31:14
<Shelwien> and rar won't see a repetition further than 4M2009-10-07 00:31:17
<asmodean> lzma is slow as SHIT2009-10-07 00:31:17
 it's like encasing your data in ice2009-10-07 00:31:23
 to get it out you have to chip away for days2009-10-07 00:31:28
<Shelwien> ;)2009-10-07 00:31:39
 you know, its not necessary to use lzma at ultra mode ;)2009-10-07 00:31:58
<asmodean> haha when i started out i even tweaked ultra mode's options upwards2009-10-07 00:32:15
 so "ultra" is a compromise for me now :P2009-10-07 00:32:23
 what's a good general purpose level of lzma?2009-10-07 00:32:36
<Shelwien> well, anyway, even at its best lzma won't see matches further than 1G or something2009-10-07 00:33:09
 so rep is preferable for such cases2009-10-07 00:33:22
 rep + lzma with smaller window2009-10-07 00:33:29
 and i don't know what's a "general purpose" ;)2009-10-07 00:33:54
<asmodean> i should do this compression on my htpc2009-10-07 00:34:10
 quad core2009-10-07 00:34:12
<Shelwien> sure ;)2009-10-07 00:34:21
<asmodean> it's faster than my server which runs underclocked at 1.9ghz instead of 3.2 :(2009-10-07 00:34:31
<Shelwien> btw, did you see my vectorized rangecoder? ;)2009-10-07 00:34:47
<asmodean> nope i've been ignoring this place ;)2009-10-07 00:35:00
<Shelwien> sometimes i'm thinking about recommending it to Igor2009-10-07 00:35:01
 its in the topic (ccm)2009-10-07 00:35:10
 so if i'd do, you might have to reverse it someday ;)2009-10-07 00:35:43
<asmodean> the worst thing is reversing some algorithm that feels like it came from a library but you can't tell which :>2009-10-07 00:37:06
<Shelwien> i guess you didn't encounter much of intelc code ;)2009-10-07 00:37:57
<asmodean> hm 7zip can't use more than 2 threads2009-10-07 00:38:02
 why because intelc inlines all over the place?2009-10-07 00:38:17
<Shelwien> yeah, and does some even worse stuff ;)2009-10-07 00:38:31
 like reordering and vectorizing weird places2009-10-07 00:38:52
 once it vectorize rangecoder i/o %)2009-10-07 00:39:14
 *vectorized2009-10-07 00:39:20
 in fpaq0pv4B2009-10-07 00:39:23
<asmodean> does intelc produce double digital performance improvements just recompiling with it?2009-10-07 00:39:54
<Shelwien> well, if you count in percents, then yeah, probably2009-10-07 00:40:21
 also depending on tasks2009-10-07 00:40:30
 but i've got 20% speed improvement2009-10-07 00:42:26
 from recompiling unrar.dll with intelc2009-10-07 00:42:33
<asmodean> surprised more people don't use it2009-10-07 00:42:37
<Shelwien> they do in fact2009-10-07 00:42:45
<asmodean> maybe i'm lucky the jp dudes haven't discovered it then :)2009-10-07 00:43:08
<Shelwien> well, you can search for "GenuineIntel" string in executables and dlls ;)2009-10-07 00:44:32
<asmodean> worst optimization headache i've seen is 'whole program' optimization2009-10-07 00:45:05
<Shelwien> i always use it ;)2009-10-07 00:45:21
<asmodean> embeds lovely assumptions about things being in registers/stack/etc across several levels of function calls2009-10-07 00:45:23
<Shelwien> well, i'd use worse things in manual asm though ;)2009-10-07 00:49:44
 so its all good ;)2009-10-07 00:49:48
 at least compilers don't keep values in flags across function calls ;)2009-10-07 00:50:41
 and don't use esp for general i/o ;)2009-10-07 00:51:08
<asmodean> heh but manual asm is much harder to write much of2009-10-07 00:51:24
<Shelwien> not really, depending on task though2009-10-07 00:51:38
<asmodean> i'll take a handful of manual asm than a whole executable of optimized bullshit any day ;p2009-10-07 00:51:41
<Shelwien> huh. wanna look at my old compressor written in asm?2009-10-07 00:52:10
<asmodean> not really :) i'd rather look at fun easy to understand LZSS variations over either :)2009-10-07 00:53:13
<Shelwien> actually i only stopped using asm because of intelc ;)2009-10-07 00:53:23
 there're no tools for similar global optimization2009-10-07 00:53:43
 and automatic vectorization2009-10-07 00:53:47
<asmodean> lately i spend a lot more time reversing crypto/obfuscation :( ugh2009-10-07 00:53:49
<Shelwien> well, its all easy enough if they don't use low-level stuff2009-10-07 00:54:32
 like drivers etc2009-10-07 00:54:36
<asmodean> the only system that bothers me does constant self-checks (anti-debugger, crcs of code sections etc) with inline code2009-10-07 00:55:28
 makes it terribly irritating to trace the algorithms2009-10-07 00:55:37
 so i gave up and just dynamically patched it in between checks :P2009-10-07 00:56:08
 heh2009-10-07 00:56:09
<Shelwien> why don't you just block read access to code pages2009-10-07 00:56:24
 and handle the exception ;)2009-10-07 00:56:36
<asmodean> well i'd have to automate backtracking to the code doing the check to get back to the mainline code2009-10-07 00:57:19
<Shelwien> i mean, that'd allow you to sniff out all the locations2009-10-07 00:57:49
<asmodean> i was finding the checks with memory breakpoints2009-10-07 00:58:18
<Shelwien> good too, but there're not much of these ;)2009-10-07 00:58:46
 and also protections sometimes block them2009-10-07 00:59:05
<asmodean> this was a novice system. but since he coded it himself he could make things more annoying by embedding the protection in his algorithms2009-10-07 00:59:52
 like it if detected you, it doesn't fail sometimes. instead it heads off down some useless path of code 2009-10-07 01:00:12
 which doesn't decrypt data correctly :P2009-10-07 01:00:26
<Shelwien> worse case imho is when program works, but slightly wrong2009-10-07 01:00:45
<asmodean> right2009-10-07 01:00:50
<Shelwien> !next2009-10-07 01:52:00