*** complogger has joined the channel2009-09-28 10:15:18
*** compbooks has joined the channel2009-09-28 10:15:59
*** toffer has joined the channel2009-09-28 14:34:16
<toffer> hi2009-09-28 14:34:29
*** pinc has left the channel2009-09-28 15:40:20
<Shelwien> hi toffer2009-09-28 16:04:26
<toffer> i recently found the term w* - w in the cost function for entropy optimal linear mixing, w* = (y-p0)/(p1-p0), p_mix = (p1-p0)*w + p02009-09-28 16:15:03
 and i improved my time series predictor2009-09-28 16:15:32
 my advisor at fraunhofer was surprised that i developed stuff similar to his approach (well the results of data analysis) for that2009-09-28 16:16:17
<Shelwien> huh...2009-09-28 16:23:25
 its exactly how my linear mixer works... %)2009-09-28 16:23:37
<toffer> there is a mistake2009-09-28 16:24:12
 i know2009-09-28 16:24:14
 it is "not y" in the nominator2009-09-28 16:24:25
 a single dp_mix/dw is 1/(w*-w)2009-09-28 16:24:53
 -1/(w*-w)2009-09-28 16:25:04
 but again that w*-w is in the denominator2009-09-28 16:25:21
 i just wanted to say that this does show similarities2009-09-28 16:25:42
 and again2009-09-28 16:27:31
 i forgot to insert the entropy2009-09-28 16:27:41
 to be exact2009-09-28 16:27:44
 y ln p_mix + (1-y) ln 1-p_mix2009-09-28 16:28:10
 -[ y ln p_mix + (1-y) ln 1-p_mix ]2009-09-28 16:28:22
 grml2009-09-28 16:28:24
 you know what i mean2009-09-28 16:28:29
 ^^2009-09-28 16:28:29
 never made that much typos2009-09-28 16:28:36
 in such a short time2009-09-28 16:28:41
<Shelwien> ...2009-09-28 16:29:10
<toffer> i just found it interesting2009-09-28 16:29:23
<Shelwien> ...well, as to me, i was recently thinking about mapping of statistics2009-09-28 16:30:47
 like, computing p.d for DCT coefficients by p.d of image pixels ;)2009-09-28 16:31:25
<toffer> why not2009-09-28 16:32:30
 that baysian stuff is really good for optimization2009-09-28 16:32:55
<Shelwien> well, it has many uses actually2009-09-28 16:33:02
 i think it should be applicable for your time series as well2009-09-28 16:33:14
<toffer> i can hardly change anything now2009-09-28 16:33:25
 it'll be finished in 2 to 3 weeks2009-09-28 16:33:32
 afterwards i gonna correct some mistakes2009-09-28 16:33:40
 and get it printed2009-09-28 16:33:42
 guess what the optimal solution for an entropy optimal linear mixer is2009-09-28 16:35:14
 i just calculated it2009-09-28 16:35:23
 it's an average of these w*2009-09-28 16:35:42
 w_opt = (sum c_k w*_k) / (sum c_k)2009-09-28 16:38:35
 that's pretty much like a linear counter2009-09-28 16:38:46
 but y_k is replaced with w*2009-09-28 16:39:02
 w*_k2009-09-28 16:39:09
 in the stationary case k->oo it's like exponential decay of weights2009-09-28 16:39:56
 w*_k2009-09-28 16:40:03
<Shelwien> whatever... it all depends on a practical implementation anyway2009-09-28 16:40:11
<toffer> so my approximation and your counter backprop is all the same2009-09-28 16:40:19
 at least you now know that your scheme is optimal2009-09-28 16:40:51
 well implementation is step 2 right after knowing what to do2009-09-28 16:41:39
 did you use P(y=0) or P(y=1)2009-09-28 16:43:03
<Shelwien> my models usually work with P(y=0)2009-09-28 16:45:45
<toffer> well in the case w* is (y-p0)/(p1-p0) ;)2009-09-28 16:46:31
 that case2009-09-28 16:46:36
<Shelwien> i guess ;)2009-09-28 16:46:52
 but why linear mixer suddenly? ;)2009-09-28 16:47:14
<toffer> since a piecewise linear function can approximate anything2009-09-28 16:48:23
 as you know for linear mixing i looked up weights in a context of p0,p12009-09-28 16:48:39
 and nonlinear weighting can be emulated by making w a (linear) function of nonlinear transformations of p0,p12009-09-28 16:49:54
 thus i can mirror the behaviour of stretch/squash2009-09-28 16:50:10
 while still having the ability to get an optimal solution2009-09-28 16:50:24
<Shelwien> %)2009-09-28 16:50:29
<toffer> which cannot be calculated for paq style mixing2009-09-28 16:50:59
<Shelwien> well, i'd like to somehow extend the the max likelihood approach (from logistic mixing) to some other areas2009-09-28 16:51:13
<toffer> at least not in a closed form2009-09-28 16:51:17
<Shelwien> but until now couldn't really think of anything2009-09-28 16:51:28
<toffer> parameter estimation?2009-09-28 16:51:38
 why don't you make an ew optimizer2009-09-28 16:53:24
 new2009-09-28 16:53:26
 that basian stuff i read about simultaniously estimates: 1. cost function parameters and 2. search process parameters2009-09-28 16:54:02
 that sounded specific2009-09-28 16:54:15
 but was pretty much general2009-09-28 16:54:23
 and very powerful2009-09-28 16:54:26
<Shelwien> no, i wasn't talking about optimization2009-09-28 16:58:46
 if you remember, there was that explanation of logistic mixing via string probabilities2009-09-28 16:59:24
<toffer> for me it's still more an interpretation than explaination2009-09-28 16:59:57
<Shelwien> so logistic mixer appeared to be an approximation of static switching of static estimations2009-09-28 16:59:58
 Matt clearly explained after that that it was derived purely ad hoc2009-09-28 17:00:55
 and my interpretation or whatever at least provides a "physical" background for it2009-09-28 17:01:57
 anyway, it looks fairly solid for me - with verification by alternative update formula and all2009-09-28 17:03:02
<toffer> the intention of optimization is enough for me2009-09-28 17:03:10
 and the characteristic of the contained transform2009-09-28 17:03:19
<Shelwien> so i'd like to apply a similar approach somewhere else2009-09-28 17:03:44
 for example, maybe we can somehow build a logistic mixer for p=~0.52009-09-28 17:04:41
 which would be more efficient than linear one ;)2009-09-28 17:04:53
<toffer> near 0.5 logistic mixing behaves almost like linear mixing2009-09-28 17:05:38
<Shelwien> yeah, but its worse2009-09-28 17:05:58
<toffer> not really in that case2009-09-28 17:06:17
 but the logistic behaviour is very different near 0/12009-09-28 17:06:29
<Shelwien> maybe only a little, but taking into account that linear mixing is also much simpler...2009-09-28 17:06:30
 as i mentioned, i tried it2009-09-28 17:06:46
<toffer> especially for weighs far from 0/12009-09-28 17:06:46
<Shelwien> was expecting a lot after mix_test experiments2009-09-28 17:07:07
 and tried to apply that logistic mix2 in my audio codec entropy model2009-09-28 17:07:39
 and it was significantly worse in the same circumstances (with only mixer parameters optimized)2009-09-28 17:08:33
<toffer> logistic mixing just works well iff predictions near 0/1 provide a high confidence2009-09-28 17:08:42
<Shelwien> well, as i said, my case is more about p=~0.52009-09-28 17:09:09
 like that zero flag model2009-09-28 17:09:20
 but i wonder whether that "explanation" can be adjusted somehow2009-09-28 17:10:35
 like, maybe single ~0.5 prediction can be split into a mix of 0/1 and 0.52009-09-28 17:10:59
<toffer> the actual question is why the overall result is worse: since logistic mixing doesn't behave "linear" enough for 0.5 or since it behaves "logistic" for other values2009-09-28 17:11:20
<Shelwien> actually it could be explained by another feature of my linear mixer2009-09-28 17:12:29
 it only updates the weight when distance between inputs is large enough2009-09-28 17:13:01
<toffer> the essence is the same compared to my previous statement. 2009-09-28 17:14:00
 since it depends on the model output2009-09-28 17:14:13
<Shelwien> so that mixer can leave some cases untouched2009-09-28 17:14:32
 like, one of inputs is good enough, so lets leave it as is2009-09-28 17:15:00
 but logistic mixer always transforms the inputs2009-09-28 17:15:26
 and the integer implementation is not very precise2009-09-28 17:15:44
<toffer> well logistic mixing itself doesn't. but the optimization technique matt proposed2009-09-28 17:17:12
 and you don't need to do that2009-09-28 17:18:30
<Shelwien> i don't understand you at all ;)2009-09-28 17:19:10
<toffer> the update rule in paq is a gradient step2009-09-28 17:19:34
 that is implementation 2009-09-28 17:19:42
 you can do it differently2009-09-28 17:19:49
<Shelwien> i was saying that one explanation for logistic mixing having poor performance in that case2009-09-28 17:22:03
 is precision/performance issues in my mix22009-09-28 17:22:22
<toffer> yes but you don't need to "update" if p0 ~ p12009-09-28 17:24:30
<Shelwien> yeah, and i tried adding that condition, and it helped a little even2009-09-28 17:25:09
 my point was that logistic mixer always modifies the inputs2009-09-28 17:25:44
<toffer> you'd need to do that with the w* approach in the linear domain, too2009-09-28 17:26:05
 it's just an implementation issue2009-09-28 17:26:13
<Shelwien> ...anyway, thats not really interesting2009-09-28 17:26:32
<toffer> that's right, definitely :)2009-09-28 17:26:50
<Shelwien> i was trying to discuss alternative applications of the same string probability approach2009-09-28 17:26:53
 for example, CTW is one of them2009-09-28 17:27:08
 or, for another example2009-09-28 17:27:45
 we can try replacing the switching in the formula2009-09-28 17:27:59
 with linear mixing2009-09-28 17:28:02
<toffer> well i don't know that much about ctw since i lost interest in it after noticing that it's rather inefficient2009-09-28 17:29:37
 the other idea might be interesting for suer2009-09-28 17:30:10
 sure2009-09-28 17:30:11
<Shelwien> then there's one more thing2009-09-28 17:30:52
 afair, in my likelihood interpretation for logistic mixing2009-09-28 17:31:10
 the bit probability estimation was derived from likelihoods2009-09-28 17:31:35
 and mixer weight was optimized by only that2009-09-28 17:32:24
 but maybe we should try optimizing by the whole codelength2009-09-28 17:33:34
 well, i already did that with BFA though2009-09-28 17:33:49
 and current mixer implementations (both logistic and linear) seem to have better performance2009-09-28 17:34:25
<toffer> these are adaptive2009-09-28 17:34:54
 bfa is limited to a fixed window2009-09-28 17:35:01
<Shelwien> BFA is too, of course2009-09-28 17:35:03
 not really, though there was one like that too2009-09-28 17:35:21
<toffer> afaik bfa searches for an optimal parameter2009-09-28 17:35:34
 via bruteforcing2009-09-28 17:35:36
<Shelwien> most straightforward BFA implementation2009-09-28 17:35:48
<toffer> for a *single* situation2009-09-28 17:35:55
 but2009-09-28 17:35:57
 both paq and linear mixers updates are based on adaption taking previous results into account2009-09-28 17:36:23
<Shelwien> is accumulating (exponentially weighted) codelengths with different parameter values2009-09-28 17:36:24
<toffer> that is inefficient2009-09-28 17:36:34
 since it introduces phase delays2009-09-28 17:36:42
 which isn't the case for gradient descent2009-09-28 17:36:59
<Shelwien> in a way, but that's not always bad2009-09-28 17:37:04
<toffer> well, yes2009-09-28 17:37:38
 but you might imagine what iterative minimization actually does2009-09-28 17:38:09
 the cost functions slightly changes each step 2009-09-28 17:38:25
 thus you can assume that between a few steps it is constant2009-09-28 17:38:39
 and the parameter noise gets averaged2009-09-28 17:38:56
 and i need to get into a supermarket now to buy something to eat2009-09-28 17:39:41
<Shelwien> ;)2009-09-28 17:39:46
<toffer> i'll be back in 30mins2009-09-28 17:39:47
<Shelwien> good luck ;)2009-09-28 17:39:53
<toffer> and that dna guy is really annoying to me2009-09-28 17:40:36
<Shelwien> yeah, but dna sequence itself is interesting2009-09-28 17:41:12
 currenly using it for FMA testing ;)2009-09-28 17:41:26
 (e.coli)2009-09-28 17:41:31
 also there's stuff like "complimentary polyndromes" 2009-09-28 17:44:56
 about which i'm not sure how to model it with CM2009-09-28 17:45:17
 these specialized implementations which support it, though2009-09-28 17:45:50
 are LZ-like and have worse results than ash ;)2009-09-28 17:46:23
 *palindromes2009-09-28 17:53:11
 ;)2009-09-28 17:53:11
*** pinc has joined the channel2009-09-28 19:18:48
*** pinc has left the channel2009-09-28 19:49:28
 !quit2009-09-28 19:54:57