*** toffer has joined the channel2009-09-30 21:44:05
<toffer> somehow my browser crashed2009-09-30 21:48:20
 somehow i think i really need to grab a standalone irc client ^^2009-09-30 21:48:57
<Shelwien> mirc?2009-09-30 21:49:10
 ah, linux...2009-09-30 21:49:17
 well, there're many too2009-09-30 21:49:21
<toffer> as i said it needs to be cross plattform2009-09-30 21:50:57
 since i got all important software for windows and linux2009-09-30 21:51:11
<Shelwien> well, firefox should have a built-in irc client2009-09-30 21:51:32
<toffer> mostly openoffice, latex, inkscape, octave, codeblocks, gcc/mingw, firefox, thunderbird2009-09-30 21:51:35
 really2009-09-30 21:51:40
<Shelwien> yeah, try opening irc://irc.irchighway.org/compression in firefox2009-09-30 21:52:01
*** toffer has left the channel2009-09-30 21:53:47
*** cm has joined the channel2009-09-30 21:54:47
<cm> well i had to install it2009-09-30 21:55:00
 but the webchat looks far more convenient to me2009-09-30 21:55:15
<Shelwien> ;)2009-09-30 21:55:45
<toffer> at least here's copy and paste functionality2009-09-30 21:57:13
 looks a bit ugly, but ok - guess i'll keep it2009-09-30 21:57:23
<Shelwien> yeah, also webchat won't be able to access bots2009-09-30 21:57:49
 ...though i guess DCC doesn't work there either...2009-09-30 21:58:13
<toffer> [ERROR] Internal error dispatching command “dcc-accept”.2009-09-30 21:58:41
 [ERROR] Must be in REQUESTED state and direction GET.2009-09-30 21:58:42
<Shelwien> internal errors are fun2009-09-30 21:59:04
 i guess it would accept a file though2009-09-30 21:59:38
<toffer> an internal error report is still better than a segfault or a failed assertion :)2009-09-30 22:00:26
<Shelwien> ...doesn't accept a file either, it seems2009-09-30 22:00:54
<toffer> well i did accept2009-09-30 22:01:18
 but it doesn't work due to university firewall i guess2009-09-30 22:01:32
 maybe if you change the port2009-09-30 22:01:45
 <10002009-09-30 22:01:48
<Shelwien> nah, its direct p2p2009-09-30 22:02:17
<toffer> direct connections are blocked for ports > 10002009-09-30 22:02:38
<Shelwien> well, whatever2009-09-30 22:02:38
<toffer> ] Got DCC File Transfer offer from “Shelwien” (91.124.210.54:1024) 2009-09-30 22:02:47
<Shelwien> well, i can try...2009-09-30 22:03:08
<toffer> you don't need to2009-09-30 22:03:22
 we got ftp and stuff so why bother2009-09-30 22:03:30
 ^^2009-09-30 22:03:32
<Shelwien> hmm... maybe works...2009-09-30 22:04:42
 blocking ports >1000 is weird though...2009-09-30 22:04:58
*** toffer has left the channel2009-09-30 22:05:37
*** toffer has joined the channel2009-09-30 22:06:37
<toffer> weird2009-09-30 22:06:43
<Shelwien> on my side it said "transfer complete" ;)2009-09-30 22:09:57
<toffer> -rwxrwxr-x 1 cm cm 0 2009-10-01 00:04 /mnt/shared_extern/temp/arith-speedups.pdf2009-09-30 22:10:50
 well here it just froze2009-09-30 22:10:56
<Shelwien> ...2009-09-30 22:11:03
 well, i wasn't going to rely on that for sending you files anyway ;)2009-09-30 22:11:31
<toffer> yep. me not either2009-09-30 22:11:45
<Shelwien> but actually irc is pretty convenient for filesharing2009-09-30 22:11:50
 more than icq at least2009-09-30 22:12:08
 btw, that pdf reminded me2009-09-30 22:13:43
 did you check whether you have (or can have) access to DCC articles on ieee site?2009-09-30 22:14:03
<toffer> i guess yes2009-09-30 22:15:49
 at least i once tried2009-09-30 22:15:58
 but not everything was accessible though2009-09-30 22:16:25
<Shelwien> ?2009-09-30 22:16:42
 also, i wonder if i'd be able to access that if i'd pay that membership free..2009-09-30 22:17:44
 *fee2009-09-30 22:17:47
<toffer> dunnot know2009-09-30 22:20:53
 it`s some university license2009-09-30 22:20:59
 they sell some kind of access packages2009-09-30 22:21:09
 and the university got some kind of access all flatrate2009-09-30 22:21:23
<Shelwien> well, it'd be really nice of you 2009-09-30 22:23:24
 if you could download all the DCC articles since 2005 ;)2009-09-30 22:23:47
 or explain how else i can get them without paying $25 per article ;)2009-09-30 22:24:06
<toffer> no problem2009-09-30 22:24:15
 but it'll take some time2009-09-30 22:24:34
<Shelwien> i'm not in a hurry at all2009-09-30 22:24:58
 i'm just trying to collect them ;)2009-09-30 22:25:04
<toffer> is there any easy way to get all the urls2009-09-30 22:25:06
 cause i could use wget than2009-09-30 22:25:16
<Shelwien> that's probably unlikely, because you can't download them without logging in2009-09-30 22:25:52
 hell, they sell them ;)2009-09-30 22:25:56
 i'd write some perl parsers in such cases though2009-09-30 22:26:31
 but dunno whether you'd bother with that2009-09-30 22:26:41
<toffer> i don't need to log in2009-09-30 22:26:54
 it's detected via ip ranges2009-09-30 22:27:08
<Shelwien> you have to check it yourself then ;)2009-09-30 22:27:33
 its not like that for me unfortunately ;)2009-09-30 22:27:45
 or i guess you can setup a proxy for me there ;)2009-09-30 22:27:57
 eg. i like this one - http://www.3proxy.ru/download/2009-09-30 22:29:16
 btw http://91.124.210.54/list.txt2009-09-30 22:29:30
 list of compression-related OCRed books i have2009-09-30 22:30:00
*** pinc has joined the channel2009-09-30 22:37:38
 btw, any progress with m1? ;)2009-09-30 22:52:49
 like multiple submodels? ;)2009-09-30 22:52:57
 btw, i guess you're not going to try anything funny with rangecoders?2009-09-30 22:53:45
<toffer> i played around with wget2009-09-30 23:00:49
 and these ursl2009-09-30 23:00:52
 urls2009-09-30 23:00:53
 i can automatically download everything now ^^2009-09-30 23:01:03
 well i've made a mixing scheme which combines static estimations only via additions2009-09-30 23:01:49
 i can now implement a mixing tree via that2009-09-30 23:02:45
<Shelwien> i'm not sure why you would need static mixing2009-09-30 23:04:08
 can't you just make a fsm to work like that?2009-09-30 23:04:21
<toffer> downloading ddc 20052009-09-30 23:10:17
 it doesn't require any auxilary memory2009-09-30 23:10:40
<Shelwien> and what's good in that and static mixing anyway? ;)2009-09-30 23:11:38
<toffer> it's used as a context for sse2009-09-30 23:13:47
 after a few papers wget failed "All online seats are currently occupied."2009-09-30 23:14:03
<Shelwien> %)2009-09-30 23:14:18
 i guess you can setup a retry there...2009-09-30 23:14:33
<toffer> yep2009-09-30 23:14:46
 anyway you can be sure to get that stuff next week2009-09-30 23:15:57
 all of it2009-09-30 23:15:58
 well maybe they block such script downloads (quickly accessing urls)2009-09-30 23:16:59
<Shelwien> try adding some sleep between wget calls maybe?2009-09-30 23:19:21
<toffer> it's a cookie issue2009-09-30 23:23:34
 but i can fix that i guess2009-09-30 23:23:39
 seems to work now2009-09-30 23:33:47
 fetching everything2009-09-30 23:33:52
 ^^2009-09-30 23:33:54
<Shelwien> %)2009-09-30 23:36:14
<toffer> mh2009-09-30 23:41:49
 somehow it still doesnt2009-09-30 23:41:53
<Shelwien> delays and retries?2009-09-30 23:42:34
<toffer> retries don't work2009-09-30 23:43:29
 but i can run the script with something like 15min delay2009-09-30 23:43:39
 that'd be a normal usage pattern2009-09-30 23:43:44
 that's the session expire limit2009-09-30 23:43:57
<Shelwien> no, i mean2009-09-30 23:44:10
 you can check if its saved a pdf2009-09-30 23:44:19
*** pinc|mirror has joined the channel2009-09-30 23:44:23
<toffer> i know2009-09-30 23:44:25
<Shelwien> and retry if it didn't2009-09-30 23:44:29
<toffer> that's what i did2009-09-30 23:44:30
 but the html page it generated2009-09-30 23:44:37
 said that the seat limit is reached2009-09-30 23:44:50
 and seats expire after 15 minutes of inactivity2009-09-30 23:45:00
<Shelwien> ok, though i still think that there's no sense to wait for 15 min2009-09-30 23:45:28
 just add some delay and check what it stored2009-09-30 23:46:18
<toffer> the cookies store some session id2009-09-30 23:46:44
<Shelwien> yeah2009-09-30 23:46:57
 you can get it with wget i thin2009-09-30 23:47:05
<toffer> but re-using the cookies doesn't work somehow2009-09-30 23:47:07
<Shelwien> *think2009-09-30 23:47:07
<toffer> i used --load-cookie and --save-cookie2009-09-30 23:47:30
<Shelwien> you can open some html page with gives you a cookie first2009-09-30 23:47:32
 yeah2009-09-30 23:47:34
 and then get the pdf2009-09-30 23:47:39
 but there might be an additional option necessary2009-09-30 23:47:58
 like2009-09-30 23:48:13
 --keep-session-cookies load and save session (non-permanent) cookies.2009-09-30 23:48:13
<toffer> that's what grep told me too ^^2009-09-30 23:49:07
*** pinc has left the channel2009-09-30 23:49:45
 it seems to work with a 30sec delay2009-09-30 23:51:55
 guess tomorrow everything will be here2009-09-30 23:54:09
 but i gonna go to bed now2009-09-30 23:54:13
<Shelwien> i should too, i guess2009-09-30 23:54:33
 bye ;)2009-09-30 23:54:35
<toffer> gn82009-09-30 23:55:01
*** toffer has left the channel2009-09-30 23:55:09
*** pinc|mirror has left the channel2009-10-01 00:18:22
*** Shelwien has left the channel2009-10-01 02:06:42
*** pinc has joined the channel2009-10-01 07:19:17
*** Shelwien has joined the channel2009-10-01 07:32:53
*** pinc has left the channel2009-10-01 08:03:19
*** pinc has joined the channel2009-10-01 08:30:09
*** pinc|mirror has joined the channel2009-10-01 08:40:26
*** pinc has left the channel2009-10-01 08:41:12
*** toffer has joined the channel2009-10-01 08:58:56
<Shelwien> hi2009-10-01 08:59:37
<toffer> slept already?2009-10-01 08:59:45
<Shelwien> not sure2009-10-01 09:00:07
<toffer> well you have to know2009-10-01 09:01:49
 otherwise i guess you have serious trouble -.-2009-10-01 09:02:00
<Shelwien> not sure whether it counts ;)2009-10-01 09:02:10
<toffer> yestreday i downloaded the whole ddc 20052009-10-01 09:02:23
 that additional cookie option helped2009-10-01 09:02:31
<Shelwien> ;)2009-10-01 09:02:36
<toffer> the rest will be here today2009-10-01 09:02:43
 but currently i'm under windows2009-10-01 09:02:54
<Shelwien> there's wget for windows too ;)2009-10-01 09:03:07
 anyway, do you have a place where to upload it?2009-10-01 09:05:02
<toffer> yeah2009-10-01 09:06:32
 i'll upload when i got everything2009-10-01 09:08:45
<Shelwien> sure2009-10-01 09:08:52
<toffer> but there're alot of "papers" which are just a bit more than an abstract2009-10-01 09:09:01
<Shelwien> weird2009-10-01 09:09:22
<toffer> and what else happened at your side?2009-10-01 09:12:28
<Shelwien> nothing probably2009-10-01 09:12:54
*** Shelwien has left the channel2009-10-01 09:13:55
*** Guest9968193 has joined the channel2009-10-01 09:13:59
 <Shelwien> thinking how to make that damned LZ thing to work in one pass2009-10-01 09:17:25
 there's a data window and hashtable window2009-10-01 09:20:10
 and now i have to undo changes in these when a match is found2009-10-01 09:21:00
<toffer>  if it's just arithmetic ops 2009-10-01 09:23:16
 you could simply undo it2009-10-01 09:23:24
<Shelwien> not really2009-10-01 09:24:14
<toffer> dunnot know about your internal structure2009-10-01 09:24:21
 a+b-a=b2009-10-01 09:24:29
<Shelwien> there's another window for rolling hashes2009-10-01 09:24:30
 and that's not the main problem2009-10-01 09:24:56
 just that its too complicated and annoying2009-10-01 09:25:15
 for a task like this2009-10-01 09:25:22
 well, i guess it could be much simpler if i didn't try to speed-optimize it ;)2009-10-01 09:26:14
 atm processes enwik9 in 7s btw2009-10-01 09:26:31
 finding matches at any distance2009-10-01 09:27:14
<toffer> that's pretty good2009-10-01 09:28:02
 btw how are the statistics 2009-10-01 09:28:11
<Shelwien> ?2009-10-01 09:28:18
<toffer> e.g., a match length distribution 2009-10-01 09:28:24
<Shelwien> well, it finds only longer matches2009-10-01 09:28:51
 like 100+ bytes2009-10-01 09:28:54
<toffer> still acceptible as a preprocessor2009-10-01 09:29:23
 could you lower that limit to say 202009-10-01 09:29:37
<Shelwien> well, there're algorithm-specific quirks2009-10-01 09:30:12
 but yes, basically2009-10-01 09:30:21
 "100" is just a random setting2009-10-01 09:30:38
 in fact, i think it should still support changing it as a commandline option %)2009-10-01 09:30:57
<toffer> well how much compression do you get when replacing 100 byte strings with 1 byte tokens2009-10-01 09:31:22
<Shelwien> there're 16500 matches with >100 len2009-10-01 09:33:18
<toffer> and alltogether2009-10-01 09:34:14
 just 1.6%2009-10-01 09:34:20
 whoops2009-10-01 09:34:58
 0.162009-10-01 09:35:02
 guess to use it as a preprocessor one needs to drastically lower the length2009-10-01 09:35:28
 the match length distribution i've seen looked like a laplacian distribution2009-10-01 09:36:10
 can you easily lower the length limit?2009-10-01 09:36:50
<Shelwien> 7747637 bytes in matched strings2009-10-01 09:38:50
<toffer> for e9?2009-10-01 09:39:30
<Shelwien> yeah2009-10-01 09:39:39
<toffer> what about a match length of 202009-10-01 09:39:58
<Shelwien> its kinda troublesome, but i'd try2009-10-01 09:41:19
 29s like that2009-10-01 09:41:49
 ...and a 500M hashtable... figures...2009-10-01 09:42:15
 4.6M matches like that2009-10-01 09:43:55
 ...but i guess i won't be able to sum it up with console utils2009-10-01 09:44:40
<toffer> ok2009-10-01 09:46:21
 ^^2009-10-01 09:46:23
<Shelwien> actually there's not much sense in LZ for enwik anyway2009-10-01 09:47:47
 even in template-based articles there's not that much complete matches2009-10-01 09:48:19
<toffer> i guess there're long matches (>15 bytes) due to repeating phrases from time to time. and xml stuff2009-10-01 09:48:51
<Shelwien> well, maybe some kind of more advanced analysis would help2009-10-01 09:48:53
 well, dunno about 20-byte matches2009-10-01 09:49:25
 but i looked at longer ones2009-10-01 09:49:32
 and there was stuff like reference links in some soccer articles %)2009-10-01 09:49:55
*** pinc has joined the channel2009-10-01 10:22:34
*** pinc|mirror has left the channel2009-10-01 10:26:30
<toffer> changing m1 to 3 models just required to add 2 lines of code to the main model :)2009-10-01 11:49:05
<Shelwien> ;)2009-10-01 11:49:27
<toffer> a prediction is found via2009-10-01 11:49:37
 sse2d( quant[model0.state], mix2( quant[model1], quant[model2] ) )2009-10-01 11:50:22
 mix2 is static2009-10-01 11:50:24
 can be made dynamic...2009-10-01 11:50:52
 but let's see how that preforms2009-10-01 11:50:57
<Shelwien> i'd not call that 3 models with static mix2009-10-01 11:51:06
<toffer> the reason for mixing quantised predictions is to take advantage of the nonlinearity contained in the fsm probability quantisation. in fact it's like mixing stretched predicitons2009-10-01 11:53:19
 and how would you call that?2009-10-01 11:53:28
 i really like the simplicity in that scheme. the prediction function just is sse( q[s0] + (q[s1]+q[s2]>>N) )2009-10-01 11:57:31
 the whole complexity is moved into initialization2009-10-01 11:57:47
<Shelwien> "encode" likes that too ;)2009-10-01 11:58:05
<toffer> just look at distribution or fs classes2009-10-01 11:58:07
 fsm2009-10-01 11:58:12
 hey - it's no lz2009-10-01 11:58:26
 ^^2009-10-01 11:58:27
<Shelwien> no i meant his bcm etc2009-10-01 11:58:43
<toffer> so how would you call that scheme?2009-10-01 11:59:07
<Shelwien> that's still 2 models i think, one of these is just more complex than a simple counter2009-10-01 12:00:08
 imho its a matter of "degree of freedom"2009-10-01 12:00:46
 and adaptivity2009-10-01 12:00:55
<toffer> well there're 3 context mask2009-10-01 12:01:30
 s2009-10-01 12:01:32
<Shelwien> but not that i care if it'd really improve the compression ;)2009-10-01 12:01:39
<toffer> i came up with two layouts2009-10-01 12:02:10
 this one2009-10-01 12:02:19
 and the obvious solution of mixing 2xm12009-10-01 12:02:40
 well for 3 models i'd fix the other input2009-10-01 12:03:09
<Shelwien> it'd be interesting though, if you could compare static mix vs adaptive mix vs sse2 there2009-10-01 12:06:18
<toffer> i will do that2009-10-01 12:09:23
 but that'll take some time2009-10-01 12:09:28
 and "some" should be months, since my spare time shirnks more and more2009-10-01 12:09:44
 at least now there's a flexible framework to "plug together" and quantise distributions represented as an array of counters2009-10-01 12:12:10
<Shelwien> err... why would it take months to replace that static mix with another sse2?2009-10-01 12:13:42
<toffer> no more time2009-10-01 12:18:02
 and in future work2009-10-01 12:18:09
 i guess the best overall variant will be mix(sse2,sse2)2009-10-01 12:22:37
<Shelwien> yeah, probably2009-10-01 12:23:01
 sse2(sse2()) might work too, but its unstable2009-10-01 12:23:35
<toffer> wow there're the first failed assertions :)2009-10-01 12:28:33
<Shelwien> %)2009-10-01 12:28:46
<toffer> it was just a typo in the source2009-10-01 12:42:21
 optimization is running now2009-10-01 12:42:25
 the optimizer blocked the statically mixed models...2009-10-01 13:06:25
<Shelwien> huh...2009-10-01 13:06:51
<toffer> that's due to the parameter which controls blocking is just 7 bit and the two models are configured via 2x80 bit2009-10-01 13:06:54
<Shelwien> well, why don't you start with it then?2009-10-01 13:07:03
 i mean, optimize that static mix first2009-10-01 13:07:13
 and then add sse2 and another model2009-10-01 13:07:23
<toffer> so to quickly improve compression it's more likely to just drop that 8 bit2009-10-01 13:07:30
 erm 72009-10-01 13:08:12
 well i fixed that parameter now2009-10-01 13:08:17
 and guess what2009-10-01 13:11:23
 it's blocked again2009-10-01 13:11:28
 via another parameter2009-10-01 13:11:31
 ^^2009-10-01 13:11:33
<Shelwien> ;)2009-10-01 13:11:39
<toffer> adaptive mixing won't do that2009-10-01 13:11:42
 i'll now fix the static mix to w=0.5 to see what happens2009-10-01 13:11:58
 now there's another problem with collisions2009-10-01 13:27:28
 ^^2009-10-01 13:27:41
 it doesn'T stop2009-10-01 13:27:46
 guess i'll have to try layout 2 that weekend taking this lesson into account2009-10-01 13:28:34
*** toffer has left the channel2009-10-01 13:42:54
*** toffer has joined the channel2009-10-01 14:10:42
*** pinc has left the channel2009-10-01 15:31:08
*** toffer has left the channel2009-10-01 18:22:12
<Shelwien> !next2009-10-01 19:56:41