*** complogger has joined the channel2009-09-28 20:24:37
<Shelwien> !grep toffer>.*m12009-09-28 20:24:42
*** pinc has left the channel2009-09-28 21:00:24
<toffer> gn82009-09-28 22:06:29
*** toffer has left the channel2009-09-28 22:06:32
*** BitGenix has joined the channel2009-09-29 06:34:35
<BitGenix> someone speaks Spanish2009-09-29 06:36:23
*** BitGenix has left the channel2009-09-29 06:39:19
*** toffer has joined the channel2009-09-29 08:30:01
<toffer> cheers2009-09-29 08:30:06
*** toffer has left the channel2009-09-29 09:01:22
*** pinc has joined the channel2009-09-29 09:01:39
*** Shelwien has left the channel2009-09-29 10:17:59
*** Shelwien has joined the channel2009-09-29 12:57:15
*** Shelwien has left the channel2009-09-29 13:06:33
*** pinc|mirror has joined the channel2009-09-29 14:13:06
*** pinc has left the channel2009-09-29 14:13:06
*** pinc|mirror has left the channel2009-09-29 14:46:07
*** pinc has joined the channel2009-09-29 17:45:21
*** pinc has left the channel2009-09-29 20:47:31
*** pinc has joined the channel2009-09-30 08:17:11
*** toffer has joined the channel2009-09-30 10:17:27
 hi2009-09-30 10:17:31
<pinc> hi2009-09-30 11:19:54
<toffer> no activity this time2009-09-30 12:22:28
 -.-2009-09-30 12:22:32
<pinc> shelwien is absent )2009-09-30 12:25:46
*** Shelwien has joined the channel2009-09-30 12:47:53
*** toffer has left the channel2009-09-30 13:01:15
*** LELLO has joined the channel2009-09-30 14:23:28
<LELLO> ciao2009-09-30 14:23:33
<Shelwien> hi?2009-09-30 14:23:41
*** LELLO has left the channel2009-09-30 14:24:03
*** pinc has left the channel2009-09-30 15:25:26
*** toffer has joined the channel2009-09-30 18:30:08
<toffer> hi again2009-09-30 18:30:16
<Shelwien> hi2009-09-30 18:59:02
*** osman has joined the channel2009-09-30 19:29:11
<osman> hello everyone :)2009-09-30 19:29:21
<Shelwien> hi2009-09-30 19:29:29
<osman> how are you?2009-09-30 19:29:36
<Shelwien> added a search function here a few days ago2009-09-30 19:30:00
 !grep osman>.*ccm_sh2009-09-30 19:30:09
<osman> :)2009-09-30 19:30:28
 nice2009-09-30 19:30:30
 christian should not to see that features ;)2009-09-30 19:31:03
<Shelwien> it only prints first 10 occurences in the channel though2009-09-30 19:31:11
 but supports regexps ;)2009-09-30 19:31:17
<osman> regexp!? %)2009-09-30 19:31:31
 i always have some problems to interpret them2009-09-30 19:31:46
<Shelwien> pay more attention to the command i used ;)2009-09-30 19:31:52
 .* means any number of any symbols2009-09-30 19:32:10
<osman> is there a quick n' dirty manual to explain regexp? i always have problem with them2009-09-30 19:33:00
<Shelwien> i guess http://en.wikipedia.org/wiki/Regexp2009-09-30 19:34:30
<osman> :)2009-09-30 19:34:37
 is there any news from "your side"?2009-09-30 19:35:40
<Shelwien> <Nishi> meanwhile, i made some progress with that fma LZ yesterday2009-09-30 19:37:53
 <Nishi> one interesting thing2009-09-30 19:37:53
 <Nishi> is that i found two different 100-byte strings in enwik92009-09-30 19:37:53
 <Nishi> which have the same crc32 ;)2009-09-30 19:37:53
 <Nishi> fixed a defect in my hashing algorithm thanks to that ;)2009-09-30 19:37:53
 <Nishi> hash collision basically ;)2009-09-30 19:37:54
 <Nishi> and what i'm doing now is not diff2009-09-30 19:37:56
 <Nishi> its more like rep probably2009-09-30 19:37:58
 <Nishi> but what i want to get in the end is different from rep too2009-09-30 19:38:00
 <Nishi> its like incremental LZ or something2009-09-30 19:38:02
 <Nishi> an "archive" with literal data2009-09-30 19:38:04
 <Nishi> and files are converted to match structure with references to archive2009-09-30 19:38:06
 <Nishi> (and all new literal data from files are added to the archive)2009-09-30 19:38:08
 <Nishi> so in a way its like diff... or like rep... but is basically something new (maybe) ;)2009-09-30 19:38:10
<osman> good news :)2009-09-30 19:39:00
 btw, afair crc32 has a really good distribution over the whole range. and finding different strings for equal crc values is not an easy task2009-09-30 19:40:05
 especially it's really hard for me to find in "natural" data (like natural text)2009-09-30 19:41:34
<Shelwien> yeah, but its a fact that enwik8 contains at least two 100-byte strings with the same crc32 ;)2009-09-30 19:42:34
 but actually that collision in enwik92009-09-30 19:43:45
 allowed me to fix a bug in my hashing algo2009-09-30 19:43:57
<osman> more interesting thing that they are also same length %)2009-09-30 19:43:58
<Shelwien> they'd not match otherwise2009-09-30 19:44:20
 thats where all fun is2009-09-30 19:44:30
<osman> did you ever consider to use native CRC32 instruction? :)2009-09-30 19:45:05
<Shelwien> actually my "fragment hash" contains anchor hash (32 bits but 7 of these are 0), fragment hash (plain crc32 which matched there) and fragment length2009-09-30 19:45:28
 but there was a bug-feature2009-09-30 19:45:50
 the rolling hash at the end of fragment was used for anchor2009-09-30 19:46:14
<osman> what do you "exactly" mean by "anchor hash"?2009-09-30 19:47:22
<Shelwien> so if fragment hash matched, it's been really easy for whole hash to match2009-09-30 19:47:28
 err... did you see fma-diff (its in the topic)?\2009-09-30 19:48:01
 ...2009-09-30 19:57:20
 and as to idea of anchored hashing...2009-09-30 19:57:59
 various block hashing is used in various circumstances, right?2009-09-30 19:58:16
 like in torrent files?2009-09-30 19:58:19
 but to find a matching block in a file by torrent hashes2009-09-30 19:59:02
 (eg. a different version of the file for which torrent was generated)2009-09-30 19:59:20
 we'd have to compute the hashes for all the blocks with the same size in the file2009-09-30 20:00:09
 with all bytewise offsets2009-09-30 20:00:13
 which is slow2009-09-30 20:01:04
 and completely impossible if we'd like to find matching blocks just by two torrent files, without the actual data2009-09-30 20:01:47
 so the idea is to use a different approach instead of fixed size blocks at fixed offsets2009-09-30 20:02:34
 specifically, to split the file into blocks by some properties of data itself2009-09-30 20:03:18
 and now, my "anchor hashes" are that "data property"2009-09-30 20:04:24
 currently its a rolling crc32 with 7 zero lsb2009-09-30 20:04:47
 so, the file is split into hashed blocks by these anchor hashes2009-09-30 20:05:40
 and if ~300 bytes match in two files, that means that at least one hash would match in their hashtables2009-09-30 20:07:13
 so long matches can be found without looking up stuff at each byte2009-09-30 20:07:50
 and even without having the actual content to match against2009-09-30 20:08:07
*** toffer has left the channel2009-09-30 21:28:55
*** osman has left the channel2009-09-30 21:29:02
 !next2009-09-30 21:38:36