*** complogger has joined the channel		2009-09-28 20:24:37
<Shelwien>	!grep toffer>.*m1	2009-09-28 20:24:42
*** pinc has left the channel		2009-09-28 21:00:24
<toffer>	gn8	2009-09-28 22:06:29
*** toffer has left the channel		2009-09-28 22:06:32
*** BitGenix has joined the channel		2009-09-29 06:34:35
<BitGenix>	someone speaks Spanish	2009-09-29 06:36:23
*** BitGenix has left the channel		2009-09-29 06:39:19
*** toffer has joined the channel		2009-09-29 08:30:01
<toffer>	cheers	2009-09-29 08:30:06
*** toffer has left the channel		2009-09-29 09:01:22
*** pinc has joined the channel		2009-09-29 09:01:39
*** Shelwien has left the channel		2009-09-29 10:17:59
*** Shelwien has joined the channel		2009-09-29 12:57:15
*** Shelwien has left the channel		2009-09-29 13:06:33
*** pinc\|mirror has joined the channel		2009-09-29 14:13:06
*** pinc has left the channel		2009-09-29 14:13:06
*** pinc\|mirror has left the channel		2009-09-29 14:46:07
*** pinc has joined the channel		2009-09-29 17:45:21
*** pinc has left the channel		2009-09-29 20:47:31
*** pinc has joined the channel		2009-09-30 08:17:11
*** toffer has joined the channel		2009-09-30 10:17:27
	hi	2009-09-30 10:17:31
<pinc>	hi	2009-09-30 11:19:54
<toffer>	no activity this time	2009-09-30 12:22:28
	-.-	2009-09-30 12:22:32
<pinc>	shelwien is absent )	2009-09-30 12:25:46
*** Shelwien has joined the channel		2009-09-30 12:47:53
*** toffer has left the channel		2009-09-30 13:01:15
*** LELLO has joined the channel		2009-09-30 14:23:28
<LELLO>	ciao	2009-09-30 14:23:33
<Shelwien>	hi?	2009-09-30 14:23:41
*** LELLO has left the channel		2009-09-30 14:24:03
*** pinc has left the channel		2009-09-30 15:25:26
*** toffer has joined the channel		2009-09-30 18:30:08
<toffer>	hi again	2009-09-30 18:30:16
<Shelwien>	hi	2009-09-30 18:59:02
*** osman has joined the channel		2009-09-30 19:29:11
<osman>	hello everyone :)	2009-09-30 19:29:21
<Shelwien>	hi	2009-09-30 19:29:29
<osman>	how are you?	2009-09-30 19:29:36
<Shelwien>	added a search function here a few days ago	2009-09-30 19:30:00
	!grep osman>.*ccm_sh	2009-09-30 19:30:09
<osman>	:)	2009-09-30 19:30:28
	nice	2009-09-30 19:30:30
	christian should not to see that features ;)	2009-09-30 19:31:03
<Shelwien>	it only prints first 10 occurences in the channel though	2009-09-30 19:31:11
	but supports regexps ;)	2009-09-30 19:31:17
<osman>	regexp!? %)	2009-09-30 19:31:31
	i always have some problems to interpret them	2009-09-30 19:31:46
<Shelwien>	pay more attention to the command i used ;)	2009-09-30 19:31:52
	.* means any number of any symbols	2009-09-30 19:32:10
<osman>	is there a quick n' dirty manual to explain regexp? i always have problem with them	2009-09-30 19:33:00
<Shelwien>	i guess http://en.wikipedia.org/wiki/Regexp	2009-09-30 19:34:30
<osman>	:)	2009-09-30 19:34:37
	is there any news from "your side"?	2009-09-30 19:35:40
<Shelwien>	<Nishi> meanwhile, i made some progress with that fma LZ yesterday	2009-09-30 19:37:53
	<Nishi> one interesting thing	2009-09-30 19:37:53
	<Nishi> is that i found two different 100-byte strings in enwik9	2009-09-30 19:37:53
	<Nishi> which have the same crc32 ;)	2009-09-30 19:37:53
	<Nishi> fixed a defect in my hashing algorithm thanks to that ;)	2009-09-30 19:37:53
	<Nishi> hash collision basically ;)	2009-09-30 19:37:54
	<Nishi> and what i'm doing now is not diff	2009-09-30 19:37:56
	<Nishi> its more like rep probably	2009-09-30 19:37:58
	<Nishi> but what i want to get in the end is different from rep too	2009-09-30 19:38:00
	<Nishi> its like incremental LZ or something	2009-09-30 19:38:02
	<Nishi> an "archive" with literal data	2009-09-30 19:38:04
	<Nishi> and files are converted to match structure with references to archive	2009-09-30 19:38:06
	<Nishi> (and all new literal data from files are added to the archive)	2009-09-30 19:38:08
	<Nishi> so in a way its like diff... or like rep... but is basically something new (maybe) ;)	2009-09-30 19:38:10
<osman>	good news :)	2009-09-30 19:39:00
	btw, afair crc32 has a really good distribution over the whole range. and finding different strings for equal crc values is not an easy task	2009-09-30 19:40:05
	especially it's really hard for me to find in "natural" data (like natural text)	2009-09-30 19:41:34
<Shelwien>	yeah, but its a fact that enwik8 contains at least two 100-byte strings with the same crc32 ;)	2009-09-30 19:42:34
	but actually that collision in enwik9	2009-09-30 19:43:45
	allowed me to fix a bug in my hashing algo	2009-09-30 19:43:57
<osman>	more interesting thing that they are also same length %)	2009-09-30 19:43:58
<Shelwien>	they'd not match otherwise	2009-09-30 19:44:20
	thats where all fun is	2009-09-30 19:44:30
<osman>	did you ever consider to use native CRC32 instruction? :)	2009-09-30 19:45:05
<Shelwien>	actually my "fragment hash" contains anchor hash (32 bits but 7 of these are 0), fragment hash (plain crc32 which matched there) and fragment length	2009-09-30 19:45:28
	but there was a bug-feature	2009-09-30 19:45:50
	the rolling hash at the end of fragment was used for anchor	2009-09-30 19:46:14
<osman>	what do you "exactly" mean by "anchor hash"?	2009-09-30 19:47:22
<Shelwien>	so if fragment hash matched, it's been really easy for whole hash to match	2009-09-30 19:47:28
	err... did you see fma-diff (its in the topic)?\	2009-09-30 19:48:01
	...	2009-09-30 19:57:20
	and as to idea of anchored hashing...	2009-09-30 19:57:59
	various block hashing is used in various circumstances, right?	2009-09-30 19:58:16
	like in torrent files?	2009-09-30 19:58:19
	but to find a matching block in a file by torrent hashes	2009-09-30 19:59:02
	(eg. a different version of the file for which torrent was generated)	2009-09-30 19:59:20
	we'd have to compute the hashes for all the blocks with the same size in the file	2009-09-30 20:00:09
	with all bytewise offsets	2009-09-30 20:00:13
	which is slow	2009-09-30 20:01:04
	and completely impossible if we'd like to find matching blocks just by two torrent files, without the actual data	2009-09-30 20:01:47
	so the idea is to use a different approach instead of fixed size blocks at fixed offsets	2009-09-30 20:02:34
	specifically, to split the file into blocks by some properties of data itself	2009-09-30 20:03:18
	and now, my "anchor hashes" are that "data property"	2009-09-30 20:04:24
	currently its a rolling crc32 with 7 zero lsb	2009-09-30 20:04:47
	so, the file is split into hashed blocks by these anchor hashes	2009-09-30 20:05:40
	and if ~300 bytes match in two files, that means that at least one hash would match in their hashtables	2009-09-30 20:07:13
	so long matches can be found without looking up stuff at each byte	2009-09-30 20:07:50
	and even without having the actual content to match against	2009-09-30 20:08:07
*** toffer has left the channel		2009-09-30 21:28:55
*** osman has left the channel		2009-09-30 21:29:02
	!next	2009-09-30 21:38:36