*** Shelwien has left the channel		2009-09-09 02:53:59
*** pinc has joined the channel		2009-09-09 07:08:27
*** pinc has left the channel		2009-09-09 08:22:31
*** pinc has joined the channel		2009-09-09 08:45:45
*** Shelwien has joined the channel		2009-09-09 11:01:08
<osman>	here is a really lazy programmer: http://imafrogg.com/blog/jpeg-text-compression/	2009-09-09 11:15:47
	:)	2009-09-09 11:15:49
<Shelwien>	...	2009-09-09 11:17:08
	as funny as it may sound, there's some sense in using the visual text representation for compression	2009-09-09 11:21:17
	(and other tasks too)	2009-09-09 11:21:50
<osman>	it's another topic i think	2009-09-09 11:22:09
	i know what do you exactly mean	2009-09-09 11:22:18
<Shelwien>	like, how spammers write some keywords in their mails?	2009-09-09 11:22:19
	also the same applies to the audio version ;)	2009-09-09 11:23:18
	but both are only usable as contexts, not as the main stream	2009-09-09 11:23:41
<osman>	yep. that's the point IMO	2009-09-09 11:29:52
<Shelwien>	...	2009-09-09 11:34:32
	btw, what do you think about my static compression idea?	2009-09-09 11:35:14
	i mean, the one with log2(c[i]^c) contexts?	2009-09-09 11:35:42
	i was thinking about fma-delta	2009-09-09 11:36:47
	and, well, if there's a data window, and larger window is better	2009-09-09 11:37:23
	then it might be reasonable to compress the data in there ;)	2009-09-09 11:37:38
	and then, for hashing there's also a sense to use some compression	2009-09-09 11:39:03
	so it seems more practical to use the same coding for both	2009-09-09 11:39:37
	but hashing requires that coding to be completely static	2009-09-09 11:39:53
	because otherwise hashes for different files won't match ;)	2009-09-09 11:40:14
<osman>	looks interesting at least :)	2009-09-09 11:44:06
<Shelwien>	do you understand the idea?	2009-09-09 11:44:23
	basically its like extended RLE	2009-09-09 11:44:31
<osman>	why do you use log2(c[i]^c) as context?	2009-09-09 11:44:42
<Shelwien>	number of matchin MSBs actually	2009-09-09 11:44:58
	in context byte and next byte	2009-09-09 11:45:12
<osman>	ah..ok	2009-09-09 11:45:32
<Shelwien>	and of course i mean to use multiple such contexts	2009-09-09 11:45:35
	like 4-5-6	2009-09-09 11:45:41
<osman>	so, it's somehow an extended REP like coder?	2009-09-09 11:45:53
<Shelwien>	err... what is?	2009-09-09 11:46:05
<osman>	i mean whole idea	2009-09-09 11:46:24
	with over greater distance than actual window size	2009-09-09 11:46:39
<Shelwien>	in a way, more or less	2009-09-09 11:46:48
	its an engine for fast finding of long matches	2009-09-09 11:47:10
	i already posted the remote diff-patch kit based on that	2009-09-09 11:47:35
<osman>	yeah. i remember	2009-09-09 11:47:49
<Shelwien>	and next i'm thinking to write a tool similar to xdelta	2009-09-09 11:48:08
	but that requires to keep a data window for better efficiency	2009-09-09 11:48:30
	and i'm thinking that it might be cool to compress the data in window ;)	2009-09-09 11:48:53
	btw, here's that game of mine - http://shelwien.googlepages.com/hopters.com	2009-09-09 11:55:42
	seems to be ok at 50k in dosbox too	2009-09-09 11:57:47
	arrows/ASWD and left/right shifts	2009-09-09 11:58:11
<osman>	hehe...	2009-09-09 11:59:23
	there is a funny bug	2009-09-09 11:59:28
	even after "exploding" i can still shoot :)	2009-09-09 11:59:40
<Shelwien>	its not a bug ;)	2009-09-09 11:59:47
	its justice ;)	2009-09-09 11:59:50
<osman>	i have a "ultra-futuristic" helicopter now. i can move exploded helicopter with almost no effort 8-)	2009-09-09 12:00:47
<Shelwien>	;)	2009-09-09 12:01:05
<osman>	i think it's really good. i can't see any technical differences between dangerous dave (afair) or yours	2009-09-09 12:01:57
	and at that time i really like "dave" :)	2009-09-09 12:02:09
<Shelwien>	yeah, its actually even playable	2009-09-09 12:02:20
	and there was even some networking support %)	2009-09-09 12:02:49
	very weird though	2009-09-09 12:02:57
<osman>	it could be good if i could play against to machine	2009-09-09 12:03:14
<Shelwien>	i made a 2nd keyboard emulator TSR ;)	2009-09-09 12:03:20
<osman>	playing with "myself" made thinking	2009-09-09 12:03:35
	cool %)	2009-09-09 12:03:44
<Shelwien>	it was transmitting the keypresses from a different machine	2009-09-09 12:03:48
	and pushing them into local keyboard controller	2009-09-09 12:04:06
	btw, its undocumented	2009-09-09 12:04:12
	but there was a way to store your own value into port 60	2009-09-09 12:04:35
	and generate IRQ1 even	2009-09-09 12:04:39
	it was originally made for MK3 fights though ;)	2009-09-09 12:05:18
	to avoid keyboard blocking ;)	2009-09-09 12:05:23
<osman>	at win9x time, i have tried to read keyboard, comport, mouse etc	2009-09-09 12:05:40
	and thought as like that "what if i try to read all ports in a specific range" %)	2009-09-09 12:06:03
	voila! i had got a "guarantee" computer freezer :)	2009-09-09 12:06:23
<Shelwien>	well, dunno how to get that with reading	2009-09-09 12:06:46
	but writing would certainly work	2009-09-09 12:06:58
<osman>	with "in" instruction	2009-09-09 12:06:58
<Shelwien>	for example, there was that 8042 timer	2009-09-09 12:07:08
	one channel of which controlled the memory refresh\	2009-09-09 12:07:18
	so it was possible to make programs to run a little faster	2009-09-09 12:07:51
	with the risk of memory loss	2009-09-09 12:08:03
<osman>	%)	2009-09-09 12:08:57
*** pinc has left the channel		2009-09-09 15:07:47
<Shelwien>	btw	2009-09-09 16:23:57
	how to print exactly what i want on my printer still remains the question	2009-09-09 16:24:15
	by i've got another idea	2009-09-09 16:24:23
	instead, i can show a picture on the screen ;)	2009-09-09 16:24:45
	and take a photo	2009-09-09 16:24:56
	and then recover the information out of it	2009-09-09 16:25:09
	its a considerably different task	2009-09-09 16:25:38
	but would be still a good application for my error-correction ideas ;)	2009-09-09 16:26:02
*** Simon\|B has joined the channel		2009-09-09 17:44:58
*** toffer has joined the channel		2009-09-09 17:45:24
<toffer>	hi	2009-09-09 17:46:45
<Shelwien>	hi	2009-09-09 17:46:53
* Shelwien is writing the log2(c^c[i]) static coder		2009-09-09 17:47:16
<toffer>	sorry that i hardly participate - the deadline for my thesis is approaching ^^	2009-09-09 17:47:17
	?	2009-09-09 17:48:03
<Shelwien>	i told you before	2009-09-09 17:48:18
	that i'd like to use some compression before hashing etc	2009-09-09 17:48:41
	to improve randomness etc	2009-09-09 17:48:47
	but it has to be a static model, same for all files	2009-09-09 17:49:14
<toffer>	"before" was some time ago	2009-09-09 17:49:36
<Shelwien>	so i'd invented something like extended RLE	2009-09-09 17:49:37
<toffer>	and how does it work?	2009-09-09 17:50:58
	"in short"	2009-09-09 17:51:03
	since i'd leave in ~20 minutes	2009-09-09 17:51:12
<Shelwien>	as i said... c[i] are previous symbols, and c is current one	2009-09-09 17:51:37
	and context is something like	2009-09-09 17:51:53
<toffer>	well i read the expression differently :)	2009-09-09 17:52:10
<Shelwien>	log2(c^c[0])	2009-09-09 17:52:11
<toffer>	ok	2009-09-09 17:52:19
<Shelwien>	log2(c[0]^c[1])	2009-09-09 17:52:20
	etc	2009-09-09 17:52:20
<toffer>	i do remember that	2009-09-09 17:52:21
<Shelwien>	basically the number of matching MSBs in symbols	2009-09-09 17:52:39
	well, it works more or less	2009-09-09 17:52:58
<toffer>	do you have any results alread?	2009-09-09 17:52:59
<Shelwien>	with order-4 like that	2009-09-09 17:53:07
	999*9 contexts	2009-09-09 17:53:13
<toffer>	just 9 ?	2009-09-09 17:53:33
<Shelwien>	3.1M->2.1M calgary.tar compression	2009-09-09 17:53:39
	matching bits	2009-09-09 17:53:49
	0..8	2009-09-09 17:53:52
<toffer>	ok	2009-09-09 17:54:39
<Shelwien>	have to do more tests	2009-09-09 17:54:53
	and maybe extend the context	2009-09-09 17:54:58
<toffer>	i guess that kind of context quantisation is well suited for redundant data	2009-09-09 17:55:03
<Shelwien>	but i think this would be usable	2009-09-09 17:55:08
<toffer>	i mean directly	2009-09-09 17:55:25
<Shelwien>	the whole point is	2009-09-09 17:55:26
	to compress redundant data	2009-09-09 17:55:31
<toffer>	not as a generator	2009-09-09 17:55:37
<Shelwien>	and to not expand anything	2009-09-09 17:55:39
	and it has to be a static model	2009-09-09 17:55:48
	and that's the idea i've got	2009-09-09 17:56:02
	maybe you can suggest something else to apply in this case?	2009-09-09 17:56:54
<toffer>	well i cannot imagine anything which would be that fast	2009-09-09 17:57:34
	since it's just a lookupp	2009-09-09 17:57:39
	lookup	2009-09-09 17:57:41
<Shelwien>	yeah, but i'm talking about the model	2009-09-09 17:57:55
	do you have any alternative ideas for a model	2009-09-09 17:58:12
	which would be static	2009-09-09 17:58:20
	would allow some compression sometimes	2009-09-09 17:58:31
	and would not significantly expand anything	2009-09-09 17:58:39
	despite being static	2009-09-09 17:58:43
<toffer>	some alphabet decomposition based on prefix codes (e.g. huffman)	2009-09-09 17:59:27
	would hardly expand anything	2009-09-09 17:59:38
	and provide some compression	2009-09-09 17:59:47
<Shelwien>	well, obviously i plan to use huffman with this coding	2009-09-09 17:59:54
	but plain static huffman won't work	2009-09-09 18:00:05
<toffer>	not static	2009-09-09 18:00:09
	dynamic	2009-09-09 18:00:14
<Shelwien>	not static can't be used in this case	2009-09-09 18:00:21
<toffer>	but that's still a two pass process	2009-09-09 18:00:23
	you can store the tree	2009-09-09 18:00:36
<Shelwien>	as i need encoded block hashes in different files	2009-09-09 18:00:39
	to match	2009-09-09 18:00:41
	(for equal substrings)	2009-09-09 18:01:06
<toffer>	it's for your diff?	2009-09-09 18:01:15
<Shelwien>	for all of it	2009-09-09 18:01:24
	i've started writing it now	2009-09-09 18:01:33
<toffer>	well storing a huffman tree would be bad for a diff	2009-09-09 18:01:40
<Shelwien>	because fma-delta needs a data window	2009-09-09 18:01:45
<toffer>	but still be acceptable for compression	2009-09-09 18:01:47
<Shelwien>	and more data would fit into the window in compressed form	2009-09-09 18:02:01
	and i need this for better hashing of redundant data anyway	2009-09-09 18:02:23
<toffer>	what about reusing unused symbols?	2009-09-09 18:02:51
<Shelwien>	diff just won't work with a stored huffman tree ;)	2009-09-09 18:02:52
<toffer>	that's why i asked for the application	2009-09-09 18:03:05
<Shelwien>	and LZ-like algos won't work either ;)	2009-09-09 18:03:12
<toffer>	or extending the alphabet to 9 bit and do some ngram replacement	2009-09-09 18:03:18
<Shelwien>	won't work	2009-09-09 18:03:30
<toffer>	why?	2009-09-09 18:03:35
	mh well ok for diffing it won't	2009-09-09 18:03:59
<Shelwien>	ngram replacement might help, but there're plans to use such filters separately from FMA engine anyway	2009-09-09 18:05:09
	(FMA = far match analysis)	2009-09-09 18:05:19
	and shrinking the alphabet	2009-09-09 18:05:44
	won't work because some files would have full alphabet	2009-09-09 18:05:56
	and the same substrings	2009-09-09 18:06:00
<toffer>	i cannot imagine anything atm	2009-09-09 18:07:55
<Shelwien>	why, there's a lot	2009-09-09 18:08:10
<toffer>	at least nothing which isn't adaptive	2009-09-09 18:08:11
<Shelwien>	for example, MTF can be applicable	2009-09-09 18:08:16
	with some restrictions	2009-09-09 18:08:29
<toffer>	but mtf is adaptive	2009-09-09 18:08:46
<Shelwien>	yeah, but its adaptivity can be contained in a small window	2009-09-09 18:09:03
	and i only need such a coding	2009-09-09 18:09:36
	that in the equal 512-byte blocks in different files	2009-09-09 18:09:52
	hashes of at least one 256-byte substring would match	2009-09-09 18:10:05
	but mtf has a different problem	2009-09-09 18:10:44
	i don't know how to prevent it from being redundant on random data ;)	2009-09-09 18:11:05
<toffer>	maybe you should restate the exact requirements	2009-09-09 18:12:11
<Shelwien>	i need a model, which would provide some compression for redundant data	2009-09-09 18:13:10
	and won't expand random etc data	2009-09-09 18:13:18
	and codes of substrings in different files encodings	2009-09-09 18:14:02
	have to still match if strings match	2009-09-09 18:14:11
*** pinc has joined the channel		2009-09-09 18:15:48
<toffer>	gonna leave now. back again later on	2009-09-09 18:23:00
	bye	2009-09-09 18:23:02
*** toffer has left the channel		2009-09-09 18:23:06
<Shelwien>	;)	2009-09-09 18:23:11
*** asmodean has left the channel		2009-09-09 18:48:00
*** pinc has left the channel		2009-09-09 18:48:00
*** Simon\|B has left the channel		2009-09-09 18:48:00
*** Shelwien has left the channel		2009-09-09 18:48:00
*** osman has left the channel		2009-09-09 18:48:00
*** Shelwien has joined the channel		2009-09-09 18:48:01
*** pinc has joined the channel		2009-09-09 18:48:01
*** Simon\|B has joined the channel		2009-09-09 18:48:01
*** osman has joined the channel		2009-09-09 18:48:01
*** asmodean has joined the channel		2009-09-09 18:48:01
* ChanServ This channel has been registered with ChanServ.		2009-09-09 18:48:01
<osman>	hi shelwien	2009-09-09 19:11:17
	seems i have found something weird again :)	2009-09-09 19:11:27
	you know pattern matching is a important part of an archiver	2009-09-09 19:11:56
	so, i've worked on it.	2009-09-09 19:12:03
	but, at a time, i realized that actually we can't easily do it. because, unicode coding is variable and so, we can't work on arrays	2009-09-09 19:12:43
	for ensuring my idea, i've looked at sami's fnmatch and 7-zip wildcards source	2009-09-09 19:13:21
	they are all "assume" as strings are basically arrays and each independent array element represent a single character	2009-09-09 19:14:15
	so, at the end, both 7zip and sami's work should fail on asian languages with "?" wildcards %)	2009-09-09 19:14:59
	what do you think about it?	2009-09-09 19:29:56
<Shelwien>	there's probably a lot of other problems anyway	2009-09-09 19:39:28
	like sami's works imho don't support filename shortcuts like PROGRA~1 for "Program Files"	2009-09-09 19:40:04
	don't remember about nz, but "archiver template" doesn't for sure	2009-09-09 19:40:29
	also, i don't think that console archivers actually need anything more complex than *.exe	2009-09-09 19:41:37
<osman>	imagine if someone tries to only "archive" with 3 letters and they will surely use "???" as pattern	2009-09-09 19:46:46
<Shelwien>	yeah, you can imagine anything, but did you ever use something like that? ;)	2009-09-09 19:47:31
<osman>	but, in asian languages each unicode codepoint sometimes > 0xFFFF, so, both "archiver template" and 7zip will fail to match correctly	2009-09-09 19:47:52
	you are right. i didn't use. but what if some use? ;)	2009-09-09 19:48:16
	i wouldn't call that as "unicode" support	2009-09-09 19:48:28
<Shelwien>	there're GUIs etc anyway	2009-09-09 19:48:29
	which normally don't have such features at all ;)	2009-09-09 19:48:45
<osman>	even winrar can fail in that area ;)	2009-09-09 19:48:49
<Shelwien>	whatever	2009-09-09 19:49:11
	i'm just trying to say that building a perfect pattern matcher	2009-09-09 19:49:20
	might be not practical	2009-09-09 19:49:25
<osman>	because i didn't see any special handling of string in unrar source. afair, filename stored as UTF-16 in archiver	2009-09-09 19:49:50
<Shelwien>	at least, if it'd slow down the file enumeration for more common patterns	2009-09-09 19:49:59
	but well	2009-09-09 19:50:38
	if we're gonna work with utf8 anyway	2009-09-09 19:50:46
	then supporting this makes sense ;)	2009-09-09 19:51:03
<osman>	yeah. don't forget. i'm working on both linux and windows simultanesly now.	2009-09-09 19:51:25
	so, i'm considering both utf-8 and utf-16	2009-09-09 19:51:38
<Shelwien>	why?	2009-09-09 19:51:49
<osman>	for taking some ideas, i have just downloaded linux kernel %)	2009-09-09 19:51:53
<Shelwien>	just convert utf-16 to utf-8	2009-09-09 19:51:56
<osman>	i realized that working with utf-8 can be a high overload	2009-09-09 19:52:29
	so, i'll use utf-16 under windows and utf-8 under posix compliant OSes	2009-09-09 19:52:46
	for only internal representation	2009-09-09 19:53:13
	but, in archive data etc, i'll always use utf-8	2009-09-09 19:53:33
	"my heart will go on utf-8" :)	2009-09-09 19:53:50
<Shelwien>	what kind of "overload"?	2009-09-09 19:54:05
	i don't think that utf8-utf16 conversion would be any slower than wstrcpy (or how its called)	2009-09-09 19:54:54
<osman>	conversion on API calls and checking surrogates for ensuring character length	2009-09-09 19:54:59
<Shelwien>	dunno	2009-09-09 19:55:18
	i think that utf8 would be actually faster as it would be more compact	2009-09-09 19:55:32
<osman>	ahhh...actually even my str length function is wrong now %)	2009-09-09 19:55:44
	seems using two different handling could cause a real "headache" %)	2009-09-09 19:56:13
*** pinc\|mirror has joined the channel		2009-09-09 19:56:13
<Shelwien>	its very easy to count symbols in utf8 strings	2009-09-09 19:56:19
	as you can just ignore some codes	2009-09-09 19:56:35
<osman>	do you know a "shortcut"?	2009-09-09 19:56:36
<Shelwien>	?	2009-09-09 19:56:48
<osman>	i mean a easy way	2009-09-09 19:57:02
	without handling surrogates	2009-09-09 19:57:10
<Shelwien>	as i said... in utf8 it seems simple	2009-09-09 19:57:22
<osman>	more preciesly less branches	2009-09-09 19:57:26
<Shelwien>	just ignore the 10xxxxxx codes	2009-09-09 19:57:41
*** pinc has left the channel		2009-09-09 19:59:55
<osman>	len += ((c & 128) != 0) or something like that?	2009-09-09 20:00:28
*** pinc\|mirror has left the channel		2009-09-09 20:00:53
<Shelwien>	not exactly	2009-09-09 20:01:29
	(c & 0xC0) != 0xC0	2009-09-09 20:01:41
<osman>	7zip has been frozen while extracting linux kernel %)	2009-09-09 20:03:27
<Shelwien>	?	2009-09-09 20:04:02
<osman>	i mean did not respond for a long time	2009-09-09 20:04:36
	btw, why do almost all archivers first extract files to temp and then move the actual extraction target?	2009-09-09 20:32:20
<Shelwien>	"all"?	2009-09-09 20:32:39
	freearc maybe, as its weird	2009-09-09 20:32:50
	though as to reasons	2009-09-09 20:33:22
<osman>	7zip and rar do that too	2009-09-09 20:33:38
<Shelwien>	the destination file might exist	2009-09-09 20:33:39
	and if extracted file has the same name	2009-09-09 20:33:54
	but, for example, is broken	2009-09-09 20:34:04
	they make sure that it won't overwrite anything	2009-09-09 20:34:15
	or something	2009-09-09 20:34:21
<osman>	they can ask at least	2009-09-09 20:34:29
<Shelwien>	anyway, they extract stuff to tempfiles, yeah	2009-09-09 20:34:35
<osman>	this both doubles required time and disk space	2009-09-09 20:34:46
<Shelwien>	but i think they should create these tempfiles on the target drive	2009-09-09 20:34:49
	otherwise it takes too long to move the data	2009-09-09 20:35:11
<osman>	all of them creates at temp directory which is irrelevant to target drive. so, i always have to "clean" my C: drive	2009-09-09 20:35:48
<Shelwien>	dunno really	2009-09-09 20:36:35
<osman>	it's really annoying for me	2009-09-09 20:36:49
	i sometimes could not extract some iso files or dvd movies	2009-09-09 20:37:05
<Shelwien>	i still don't think that console rar works like that	2009-09-09 20:37:23
<osman>	it might not be	2009-09-09 20:37:40
<Shelwien>	...huh?! %)	2009-09-09 20:42:52
	seems that my msb coders compresses archives ;)	2009-09-09 20:43:25
	a little ;)	2009-09-09 20:43:28
<osman>	i mean console rar might not fit "extract to temp" rule	2009-09-09 20:43:43
	you mean even compressed data?	2009-09-09 20:43:53
<Shelwien>	well, original rar 269456 bytes	2009-09-09 20:44:10
	compressed 269003	2009-09-09 20:44:15
<osman>	for a static coder, it's very good IMO	2009-09-09 20:44:41
<Shelwien>	well, i suspect that's because of statistics	2009-09-09 20:45:09
<osman>	http://www.koders.com/c/fid856C2F4B1D04931B2005712C658E2DC3D181154E.aspx	2009-09-09 20:57:09
	seems everyone is not perfect :/	2009-09-09 20:57:21
	this source also does not take utf-8 variable property into account	2009-09-09 20:58:03
<Shelwien>	...and nobody cares ;)	2009-09-09 20:58:28
*** Simon\|B has left the channel		2009-09-09 20:59:21
<osman>	are you sure?	2009-09-09 21:03:23
	asian people are really angry with who developed unicode set. because most of their characters are in range > 0xFFFF	2009-09-09 21:04:04
<Shelwien>	not japanese i think ;)	2009-09-09 21:04:44
<osman>	if we consider that there are ~3 billion chinese. and considering whole world population is around ~5-6 billion. we should take care IMO :)	2009-09-09 21:04:47
<Shelwien>	its not that bad actually ;)	2009-09-09 21:05:29
<osman>	you know that most spoken language is actually chinese not english :)	2009-09-09 21:05:31
<Shelwien>	sure	2009-09-09 21:06:08
	english is not even second apparently ;)	2009-09-09 21:06:18
*** toffer has joined the channel		2009-09-09 21:12:40
	toffer: i made the coder and it compresses book1 to ~570k	2009-09-09 21:17:25
	and what's more funny, it compresses archives %)	2009-09-09 21:17:37
<toffer>	hi	2009-09-09 21:19:07
	archives still have a header and stuff like this	2009-09-09 21:19:15
<Shelwien>	yeah	2009-09-09 21:19:26
	<osman> you mean even compressed data?	2009-09-09 21:19:35
	<Shelwien> well, original rar 269456 bytes	2009-09-09 21:19:35
	<Shelwien> compressed 269003	2009-09-09 21:19:35
<toffer>	that's just 400 bytes	2009-09-09 21:19:51
<Shelwien>	yeah, but its not expanded ;)	2009-09-09 21:20:07
	which is good ;)	2009-09-09 21:20:10
<osman>	then try to compress a 7zip or winrk archive :) afair, their headers are also compressed	2009-09-09 21:20:14
<Shelwien>	some m1*.7z	2009-09-09 21:21:06
	78510 -> 78159 ;)	2009-09-09 21:21:17
<osman>	hehe	2009-09-09 21:21:27
<toffer>	i'd only count that if it scales on large archives	2009-09-09 21:21:42
<Shelwien>	probably does, if there're lots of files	2009-09-09 21:22:01
	there's probably some small redundancy	2009-09-09 21:22:16
<toffer>	(if thre're not lots files in the header)	2009-09-09 21:22:19
<Shelwien>	like and rc stream start/end etc	2009-09-09 21:22:23
<toffer>	file names and stuff like that	2009-09-09 21:22:26
<Shelwien>	scales	2009-09-09 21:23:10
<osman>	what about your mkv video test? it's really hard to compress IMO	2009-09-09 21:23:30
<Shelwien>	3k difference on 10M zip archive	2009-09-09 21:23:31
	wow...	2009-09-09 21:24:25
<toffer>	and how much kb does zip save if you zip the zipfile again ... zip! :)	2009-09-09 21:24:31
<Shelwien>	23k on that mkv	2009-09-09 21:25:01
<osman>	hehe. it might outperform at least BIT :)	2009-09-09 21:25:23
<Shelwien>	well, some of that is certainly due to statistics volume	2009-09-09 21:26:24
	its not perfectly static yet	2009-09-09 21:26:36
	but things like 3k and 23k are certainly much larger than stats	2009-09-09 21:27:04
	i think that's because its able to detect compressible substrings	2009-09-09 21:28:02
	i mean, if there're not much msb matches in context, it just leaves it alone	2009-09-09 21:28:55
	seems like not quite bad algo for detection and maybe segmentation	2009-09-09 21:29:53
<osman>	do you use trunc(log2(c[i]^c) * k) or just trunc(log2(c[i]^c)) ?	2009-09-09 21:31:02
	i mean 9 contexts or more?	2009-09-09 21:31:22
<Shelwien>	"just" and i don't really use log2 at all ;)	2009-09-09 21:31:36
	there 9^4 contexts	2009-09-09 21:31:47
<osman>	yep. last one is actually a bsr instruction :)	2009-09-09 21:31:55
<Shelwien>	LUT in my case	2009-09-09 21:32:05
<osman>	try bsr. it might help.... but maybe not. because, you have a single LUT and it can be highly cached	2009-09-09 21:32:46
<Shelwien>	actually i'd have a single LUT per whole context index	2009-09-09 21:33:11
	well, maybe	2009-09-09 21:33:27
	i mean, these *9 are not really good ;)	2009-09-09 21:34:02
	even if they're done via LEA's actually ;)	2009-09-09 21:34:20
<osman>	:)	2009-09-09 21:34:41
<Shelwien>	wonder if i should move the case bit to lsb or something %)	2009-09-09 21:36:41
<osman>	it might scale like before :)	2009-09-09 21:37:04
	because lsbs are mostly noisy	2009-09-09 21:37:15
<Shelwien>	i mean, A/a case	2009-09-09 21:37:24
<osman>	aa...ok. got it	2009-09-09 21:37:51
	it can help :)	2009-09-09 21:37:55
	just optimize your reoder for that :)	2009-09-09 21:38:16
<Shelwien>	i thought that too	2009-09-09 21:38:26
<osman>	it may more helpful	2009-09-09 21:38:27
<Shelwien>	not reorder, just bit order in the byte ;)	2009-09-09 21:38:41
<osman>	if you are not lazy as me, then why not? :)	2009-09-09 21:39:09
	i would probably start reoder optimization and sleep after that :)	2009-09-09 21:39:26
<Shelwien>	well, i'd do that	2009-09-09 21:39:29
	i'd have to convert it to huffman anyway	2009-09-09 21:40:00
<osman>	btw, i realized that actually GCC comes from another dimension of the space %) it won't compile most of my sources %)	2009-09-09 21:42:14
	*it doesn't compile	2009-09-09 21:42:24
<Shelwien>	yeah	2009-09-09 21:42:35
	the main problem is that it not only has a whole different runtime library	2009-09-09 21:42:54
	but also has some annoying C++ syntax incompatibilities	2009-09-09 21:43:10
<osman>	yep. definitely	2009-09-09 21:43:24
	probably i'll use intelc for posix platforms in the end %)	2009-09-09 21:43:57
<Shelwien>	yeah, might be a good idea	2009-09-09 21:44:12
	though i didn't hear about IC for freebsd	2009-09-09 21:44:27
<osman>	freebsd is posix compliant too. if i could even "execute" some simple command in freebsd, i would test my linux compile in there	2009-09-09 21:45:09
	freebsd is a really nightmare	2009-09-09 21:45:19
	it eventually crashes after starting GUI	2009-09-09 21:45:39
	i can't use it in vmware	2009-09-09 21:46:15
	just i can only see prompt	2009-09-09 21:46:27
<Shelwien>	well, its vmware problem, not freebsd's	2009-09-09 21:46:41
<osman>	most of commands are incompatible with linux distros'	2009-09-09 21:46:45
	if i could not run it, then i can't test it right? :) so, it doesn't matter it's about vmware or not	2009-09-09 21:47:30
	:)	2009-09-09 21:47:33
	seems i'll start to test macos x :)	2009-09-09 21:48:22
	it's posix compliant too :)	2009-09-09 21:48:34
	"In UTF-8, characters outside the basic multilingual plane are not a special case. UTF-16 is often mistaken to be the obsolete constant-length UCS-2 encoding, leading to code that works for most text but suddenly fails for non-BMP characters. It's better to implement support for the entire range of Unicode from the start."	2009-09-09 22:12:10
	from Wikipedia :)	2009-09-09 22:12:15
*** toffer has left the channel		2009-09-09 22:13:33
	"...Japanese and the Korean UTF-8 article on Wikipedia take more space if saved as UTF-16 than the original UTF-8 version" i think this is a really good reason to use utf8 :)	2009-09-09 22:15:18
<Shelwien>	err... i think many things take more spaces in utf-16 than in utf-8 ;)	2009-09-09 22:26:15
<osman>	but, considering asian languages...it is a bit surprise to see utf-8 is more compact	2009-09-09 22:26:54
<Shelwien>	you know, there're spaces and stuff too	2009-09-09 22:27:44
<osman>	yep. that's the point in here actually :)	2009-09-09 22:28:14
*** toffer has joined the channel		2009-09-09 22:41:37
*** toffer has left the channel		2009-09-09 23:45:13
<Shelwien>	!next	2009-09-09 23:55:00