*** complogger has joined the channel | 2009-09-28 20:24:37 |
<Shelwien> | !grep toffer>.*m1 | 2009-09-28 20:24:42 |
*** pinc has left the channel | 2009-09-28 21:00:24 |
<toffer> | gn8 | 2009-09-28 22:06:29 |
*** toffer has left the channel | 2009-09-28 22:06:32 |
*** BitGenix has joined the channel | 2009-09-29 06:34:35 |
<BitGenix> | someone speaks Spanish | 2009-09-29 06:36:23 |
*** BitGenix has left the channel | 2009-09-29 06:39:19 |
*** toffer has joined the channel | 2009-09-29 08:30:01 |
<toffer> | cheers | 2009-09-29 08:30:06 |
*** toffer has left the channel | 2009-09-29 09:01:22 |
*** pinc has joined the channel | 2009-09-29 09:01:39 |
*** Shelwien has left the channel | 2009-09-29 10:17:59 |
*** Shelwien has joined the channel | 2009-09-29 12:57:15 |
*** Shelwien has left the channel | 2009-09-29 13:06:33 |
*** pinc|mirror has joined the channel | 2009-09-29 14:13:06 |
*** pinc has left the channel | 2009-09-29 14:13:06 |
*** pinc|mirror has left the channel | 2009-09-29 14:46:07 |
*** pinc has joined the channel | 2009-09-29 17:45:21 |
*** pinc has left the channel | 2009-09-29 20:47:31 |
*** pinc has joined the channel | 2009-09-30 08:17:11 |
*** toffer has joined the channel | 2009-09-30 10:17:27 |
| hi | 2009-09-30 10:17:31 |
<pinc> | hi | 2009-09-30 11:19:54 |
<toffer> | no activity this time | 2009-09-30 12:22:28 |
| -.- | 2009-09-30 12:22:32 |
<pinc> | shelwien is absent ) | 2009-09-30 12:25:46 |
*** Shelwien has joined the channel | 2009-09-30 12:47:53 |
*** toffer has left the channel | 2009-09-30 13:01:15 |
*** LELLO has joined the channel | 2009-09-30 14:23:28 |
<LELLO> | ciao | 2009-09-30 14:23:33 |
<Shelwien> | hi? | 2009-09-30 14:23:41 |
*** LELLO has left the channel | 2009-09-30 14:24:03 |
*** pinc has left the channel | 2009-09-30 15:25:26 |
*** toffer has joined the channel | 2009-09-30 18:30:08 |
<toffer> | hi again | 2009-09-30 18:30:16 |
<Shelwien> | hi | 2009-09-30 18:59:02 |
*** osman has joined the channel | 2009-09-30 19:29:11 |
<osman> | hello everyone :) | 2009-09-30 19:29:21 |
<Shelwien> | hi | 2009-09-30 19:29:29 |
<osman> | how are you? | 2009-09-30 19:29:36 |
<Shelwien> | added a search function here a few days ago | 2009-09-30 19:30:00 |
| !grep osman>.*ccm_sh | 2009-09-30 19:30:09 |
<osman> | :) | 2009-09-30 19:30:28 |
| nice | 2009-09-30 19:30:30 |
| christian should not to see that features ;) | 2009-09-30 19:31:03 |
<Shelwien> | it only prints first 10 occurences in the channel though | 2009-09-30 19:31:11 |
| but supports regexps ;) | 2009-09-30 19:31:17 |
<osman> | regexp!? %) | 2009-09-30 19:31:31 |
| i always have some problems to interpret them | 2009-09-30 19:31:46 |
<Shelwien> | pay more attention to the command i used ;) | 2009-09-30 19:31:52 |
| .* means any number of any symbols | 2009-09-30 19:32:10 |
<osman> | is there a quick n' dirty manual to explain regexp? i always have problem with them | 2009-09-30 19:33:00 |
<Shelwien> | i guess http://en.wikipedia.org/wiki/Regexp | 2009-09-30 19:34:30 |
<osman> | :) | 2009-09-30 19:34:37 |
| is there any news from "your side"? | 2009-09-30 19:35:40 |
<Shelwien> | <Nishi> meanwhile, i made some progress with that fma LZ yesterday | 2009-09-30 19:37:53 |
| <Nishi> one interesting thing | 2009-09-30 19:37:53 |
| <Nishi> is that i found two different 100-byte strings in enwik9 | 2009-09-30 19:37:53 |
| <Nishi> which have the same crc32 ;) | 2009-09-30 19:37:53 |
| <Nishi> fixed a defect in my hashing algorithm thanks to that ;) | 2009-09-30 19:37:53 |
| <Nishi> hash collision basically ;) | 2009-09-30 19:37:54 |
| <Nishi> and what i'm doing now is not diff | 2009-09-30 19:37:56 |
| <Nishi> its more like rep probably | 2009-09-30 19:37:58 |
| <Nishi> but what i want to get in the end is different from rep too | 2009-09-30 19:38:00 |
| <Nishi> its like incremental LZ or something | 2009-09-30 19:38:02 |
| <Nishi> an "archive" with literal data | 2009-09-30 19:38:04 |
| <Nishi> and files are converted to match structure with references to archive | 2009-09-30 19:38:06 |
| <Nishi> (and all new literal data from files are added to the archive) | 2009-09-30 19:38:08 |
| <Nishi> so in a way its like diff... or like rep... but is basically something new (maybe) ;) | 2009-09-30 19:38:10 |
<osman> | good news :) | 2009-09-30 19:39:00 |
| btw, afair crc32 has a really good distribution over the whole range. and finding different strings for equal crc values is not an easy task | 2009-09-30 19:40:05 |
| especially it's really hard for me to find in "natural" data (like natural text) | 2009-09-30 19:41:34 |
<Shelwien> | yeah, but its a fact that enwik8 contains at least two 100-byte strings with the same crc32 ;) | 2009-09-30 19:42:34 |
| but actually that collision in enwik9 | 2009-09-30 19:43:45 |
| allowed me to fix a bug in my hashing algo | 2009-09-30 19:43:57 |
<osman> | more interesting thing that they are also same length %) | 2009-09-30 19:43:58 |
<Shelwien> | they'd not match otherwise | 2009-09-30 19:44:20 |
| thats where all fun is | 2009-09-30 19:44:30 |
<osman> | did you ever consider to use native CRC32 instruction? :) | 2009-09-30 19:45:05 |
<Shelwien> | actually my "fragment hash" contains anchor hash (32 bits but 7 of these are 0), fragment hash (plain crc32 which matched there) and fragment length | 2009-09-30 19:45:28 |
| but there was a bug-feature | 2009-09-30 19:45:50 |
| the rolling hash at the end of fragment was used for anchor | 2009-09-30 19:46:14 |
<osman> | what do you "exactly" mean by "anchor hash"? | 2009-09-30 19:47:22 |
<Shelwien> | so if fragment hash matched, it's been really easy for whole hash to match | 2009-09-30 19:47:28 |
| err... did you see fma-diff (its in the topic)?\ | 2009-09-30 19:48:01 |
| ... | 2009-09-30 19:57:20 |
| and as to idea of anchored hashing... | 2009-09-30 19:57:59 |
| various block hashing is used in various circumstances, right? | 2009-09-30 19:58:16 |
| like in torrent files? | 2009-09-30 19:58:19 |
| but to find a matching block in a file by torrent hashes | 2009-09-30 19:59:02 |
| (eg. a different version of the file for which torrent was generated) | 2009-09-30 19:59:20 |
| we'd have to compute the hashes for all the blocks with the same size in the file | 2009-09-30 20:00:09 |
| with all bytewise offsets | 2009-09-30 20:00:13 |
| which is slow | 2009-09-30 20:01:04 |
| and completely impossible if we'd like to find matching blocks just by two torrent files, without the actual data | 2009-09-30 20:01:47 |
| so the idea is to use a different approach instead of fixed size blocks at fixed offsets | 2009-09-30 20:02:34 |
| specifically, to split the file into blocks by some properties of data itself | 2009-09-30 20:03:18 |
| and now, my "anchor hashes" are that "data property" | 2009-09-30 20:04:24 |
| currently its a rolling crc32 with 7 zero lsb | 2009-09-30 20:04:47 |
| so, the file is split into hashed blocks by these anchor hashes | 2009-09-30 20:05:40 |
| and if ~300 bytes match in two files, that means that at least one hash would match in their hashtables | 2009-09-30 20:07:13 |
| so long matches can be found without looking up stuff at each byte | 2009-09-30 20:07:50 |
| and even without having the actual content to match against | 2009-09-30 20:08:07 |
*** toffer has left the channel | 2009-09-30 21:28:55 |
*** osman has left the channel | 2009-09-30 21:29:02 |
| !next | 2009-09-30 21:38:36 |