X264asm

An edited version can be found here: X264 asm intro
2010-11-22 00:35:03 < Jumpyshoes> cool
2010-11-22 00:35:09 < Dark_Shikari> open common/x86/predict-a.asm
2010-11-22 00:35:26 < Dark_Shikari> go to predict_4x4_dc_mmxext
2010-11-22 00:35:29 < Dark_Shikari> this function does the following
2010-11-22 00:35:32 < Dark_Shikari>   A B C D
2010-11-22 00:35:34 < Dark_Shikari> E X X X X
2010-11-22 00:35:36 < Dark_Shikari> F X X X X
2010-11-22 00:35:37 < Dark_Shikari> G X X X X
2010-11-22 00:35:40 < Dark_Shikari> H X X X X
2010-11-22 00:35:48 < Kovensky> ascii graphs :D
2010-11-22 00:35:51 < Dark_Shikari> It calculates (A+B+C+D+E+F+G+H+4)>>3, and sets all the Xs equal to that value.
2010-11-22 00:36:06 < Dark_Shikari> where those are 8-bit pixels in a 2D array with a stride of FDEC_STRIDE.
2010-11-22 00:36:12 < Dark_Shikari> got that?  throw questions at any time
2010-11-22 00:36:19 < Jumpyshoes> actually, can you hold on for one minute?
2010-11-22 00:36:28 < Dark_Shikari> ok
2010-11-22 00:36:42 < Jumpyshoes> being a dumbass, i forgot x264 was on my other computer
2010-11-22 00:36:46 < Kovensky> durf, iTerm2 locked up
2010-11-22 00:38:54 < Jumpyshoes> okay
2010-11-22 00:38:56 < Jumpyshoes> ready now
2010-11-22 00:40:15  * Dark_Shikari waits for response
2010-11-22 00:40:33 < Jumpyshoes> i think i should look up the intel docs
2010-11-22 00:40:42 < Dark_Shikari> ?
2010-11-22 00:40:50 < Dark_Shikari> I asked you if you understood my explanation of what a function does.
2010-11-22 00:40:52 < Jumpyshoes> well, it's a bunch of assembly
2010-11-22 00:40:53 < Jumpyshoes> OH
2010-11-22 00:40:54 < Dark_Shikari> This has absolutely nothing to do with intel.
2010-11-22 00:40:59 < Jumpyshoes> yes, i understand what the function does
2010-11-22 00:41:19 < Dark_Shikari> ok, good
2010-11-22 00:41:42 < Jumpyshoes> actually, what is FDEC_STRIDE?
2010-11-22 00:41:54 < Dark_Shikari> x264 does all its pixel operations on the current macroblock in a temporary buffer
2010-11-22 00:41:59 < Dark_Shikari> of constant stride
2010-11-22 00:42:02 < Dark_Shikari> it's faster that way, and better on cache
2010-11-22 00:42:16 < Jumpyshoes> what's a stride? <_<
2010-11-22 00:42:16 < Dark_Shikari> so for example, motion compensation will write to this buffer
2010-11-22 00:42:18 < Dark_Shikari> or intra prediction
2010-11-22 00:42:26 < Kovensky> a stride is the distance from one line to the next IIRC
2010-11-22 00:42:43 < Dark_Shikari> stride is the distance between (x,y) and (x,y+1)
2010-11-22 00:43:07 < Jumpyshoes> i see
2010-11-22 00:43:36 < Dark_Shikari> so to get from one row to the next
2010-11-22 00:43:51 < Dark_Shikari> now that you understand what the function does, let's look at the asm
2010-11-22 00:43:54 < Dark_Shikari> cglobal predict_4x4_dc_mmxext, 1,4
2010-11-22 00:44:01 < Kovensky> yay, asm class
2010-11-22 00:44:02  * Kovensky watches
2010-11-22 00:44:12 < Dark_Shikari> cglobal: declares a function accessible from outside of asm
2010-11-22 00:44:20 < Dark_Shikari> the function's name is x264_predict_4x4_dc_mmxext
2010-11-22 00:44:24 < Dark_Shikari> the x264_ is auto-added.
2010-11-22 00:44:33 < Dark_Shikari> the "1" means "we have one argument.  Put it in r0."
2010-11-22 00:44:36 < Dark_Shikari> that argument is uint8_t *src
2010-11-22 00:44:46 < Dark_Shikari> if we had a second argument, we'd say "2" and the second one would go in r1.
2010-11-22 00:44:50 < Dark_Shikari> and if we had a third, it'd go in r2.
2010-11-22 00:44:51 < Dark_Shikari> etc
2010-11-22 00:44:53 < Dark_Shikari> got that?
2010-11-22 00:45:04 < Dark_Shikari> so at the start of the function, r0 contains uint8_t *src.
2010-11-22 00:45:04 < Jumpyshoes> that argument is uint8_t *src <-- what does this mean?
2010-11-22 00:45:09 < Dark_Shikari> ; void predict_4x4_dc( uint8_t *src )
2010-11-22 00:45:11 < Dark_Shikari> hurr hurr
2010-11-22 00:45:15 < Jumpyshoes> oh
2010-11-22 00:45:16 < Dark_Shikari> it's a function argument
2010-11-22 00:45:16 < Jumpyshoes> okay
2010-11-22 00:45:36 < Jumpyshoes> what tells the function that it's uint8_t?
2010-11-22 00:45:39 < Dark_Shikari> Nothing.
2010-11-22 00:45:42 < Dark_Shikari> It doesn't need to know.
2010-11-22 00:45:44 < Dark_Shikari> types are a Cism
2010-11-22 00:45:58 < Jumpyshoes> right
2010-11-22 00:45:59 < Jumpyshoes> true
2010-11-22 00:46:03 < Dark_Shikari> the "4" means we want x264 to give us 4 registers to use.
2010-11-22 00:46:05 < Dark_Shikari> r0, r1, r2, r3.
2010-11-22 00:46:10 < Dark_Shikari> This, of course, includes the r0 used for the parameter.
2010-11-22 00:46:17 < Dark_Shikari> So in short, after the first line:
2010-11-22 00:46:19 < Dark_Shikari> r0 = src
2010-11-22 00:46:22 < Dark_Shikari> r1/r2/r3 = free
2010-11-22 00:46:26 < Dark_Shikari> r5 and up: can't use.
2010-11-22 00:46:37 < Kovensky> that's x86inc.asm's doing right?
2010-11-22 00:46:42 < Dark_Shikari> yes, but we aren't going into that
2010-11-22 00:46:58 < Jumpyshoes> i assume it means we can use, but if you do, it'll screw around with something you don't want to?
2010-11-22 00:47:14 < Kovensky> which is why you can't use it
2010-11-22 00:47:19 < Dark_Shikari> ^
2010-11-22 00:47:23 < Jumpyshoes> kk
2010-11-22 00:47:31 < Dark_Shikari> So now, this function as you can see has 4 real steps
2010-11-22 00:47:36 < Dark_Shikari> 1) Sum up A through D
2010-11-22 00:47:39 < Dark_Shikari> 2) Sum up E through H
2010-11-22 00:47:46 < Dark_Shikari> 3) Do the math to get our final value
2010-11-22 00:47:50 < Dark_Shikari> 4) Store it into the 16 output Xs
2010-11-22 00:47:59 < Dark_Shikari> so let's see how this asm implements these.
2010-11-22 00:48:22 < Dark_Shikari> First, we'll look at step 1)
2010-11-22 00:48:27 < Dark_Shikari> pxor mm7, mm7: mm7 is now zeroed.
2010-11-22 00:48:31 < Dark_Shikari> mm7 is a 64-bit register.
2010-11-22 00:48:36 < Dark_Shikari> xor, as you might know, is a nice way to zero things.
2010-11-22 00:48:40 < Jumpyshoes> how do you tell how large a register is?
2010-11-22 00:48:51 < Dark_Shikari> mm* is 64-bit
2010-11-22 00:48:54 < Dark_Shikari> xmm* is 128-bit
2010-11-22 00:48:55 < Kovensky> the mm registers have a fixed size
2010-11-22 00:48:58 < Jumpyshoes> ah, okay
2010-11-22 00:49:09 < Dark_Shikari> stop me at any point if you are missing something.
2010-11-22 00:49:13 < Dark_Shikari> so, now mm7 is zero.
2010-11-22 00:49:20 < Dark_Shikari> movd mm0, [r0-FDEC_STRIDE]
2010-11-22 00:49:24 < Kovensky> only the general purpose registers are wordsize-dependant on x86
2010-11-22 00:49:31 < Dark_Shikari> this sets mm0 equal to {A,B,C,D,0,0,0,0}
2010-11-22 00:49:42 < Jumpyshoes> oh, and how do we know the mm* registers are free?
2010-11-22 00:49:45 < Dark_Shikari> They always are.
2010-11-22 00:49:48 < Jumpyshoes> oh
2010-11-22 00:49:50 < Jumpyshoes> kk
2010-11-22 00:49:59 < Dark_Shikari> in x86, b = byte, w = word (16-bit), d = doubleword (32-bit), q = quadword (64-bit), dq = double quadword (128-bit)
2010-11-22 00:50:04 < Dark_Shikari> so movd = move doubleword
2010-11-22 00:50:05 < Dark_Shikari> = move 32 bits
2010-11-22 00:50:14 < Dark_Shikari> so movd to mm0 will load data to the first 4 bytes
2010-11-22 00:50:16 < Dark_Shikari> and zero the rest.
2010-11-22 00:50:20 < Dark_Shikari> thus mm0 is now ABCD0000
2010-11-22 00:50:32 < Dark_Shikari> [r0-FDEC_STRIDE] is equivalent to *(src-FDEC_STRIDE)
2010-11-22 00:50:35 < Dark_Shikari> in Cstyle
2010-11-22 00:50:43 < Dark_Shikari> Hence why it points to ABCD.
2010-11-22 00:50:56 < Jumpyshoes> kk
2010-11-22 00:51:08 < Dark_Shikari> got it so far?
2010-11-22 00:51:11 < Jumpyshoes> yup
2010-11-22 00:51:24 < Jumpyshoes> i tried to dabble in asm at some point in time
2010-11-22 00:51:28 < Jumpyshoes> then got frustrated and gave up
2010-11-22 00:51:31 < Jumpyshoes> <-- lazy ass
2010-11-22 00:51:48 < Kovensky> 21:35.33 Dark_Shikari:  A B C D
2010-11-22 00:51:48 < Kovensky> 21:35.35 Dark_Shikari: E X X X X
2010-11-22 00:51:48 < Kovensky> are the "A B C D" on top of the "X X X X" or do they start on top of the "E"
2010-11-22 00:51:55 < Dark_Shikari> former
2010-11-22 00:51:58 < Dark_Shikari> your IRC client sucks
2010-11-22 00:52:00 < Dark_Shikari> your spacing is wrong
2010-11-22 00:52:11 < jarod> nothing wrong here
2010-11-22 00:52:13 < Dark_Shikari> use a monospaced font
2010-11-22 00:52:18 < Dark_Shikari> Jumpyshoes: next
2010-11-22 00:52:18 < Kovensky> I'm using osaka-mono
2010-11-22 00:52:20 < Kovensky> well, whatever
2010-11-22 00:52:22 < Dark_Shikari> uint16_t psadbw( uint8_t in[8], uint8_t out[8] )
2010-11-22 00:52:23 < Dark_Shikari> {
2010-11-22 00:52:23 < Dark_Shikari> 	uint16_t sum = 0;
2010-11-22 00:52:23 < Dark_Shikari> 	for(int i = 0; i < 8; i++)
2010-11-22 00:52:23 < Dark_Shikari> 		sum += abs(in[i]-out[i]);
2010-11-22 00:52:25 < Dark_Shikari> 	return sum;
2010-11-22 00:52:27 < Dark_Shikari> }
2010-11-22 00:52:33 < Dark_Shikari> that's what psadbw does
2010-11-22 00:52:45 < Dark_Shikari> parse that for a moment, and tell me when you're ready
2010-11-22 00:53:02 < Jumpyshoes> where is the sum stored?
2010-11-22 00:53:14 < Kovensky> packed SAD byte words?
2010-11-22 00:53:26 < Dark_Shikari> psadbw X, Y
2010-11-22 00:53:30 < Dark_Shikari> X is where the output is stored.
2010-11-22 00:53:37 < Dark_Shikari> So X is overwritten.
2010-11-22 00:53:40 < Jumpyshoes> ah
2010-11-22 00:53:43 < Dark_Shikari> so it's stored in the low 16 bits of X.
2010-11-22 00:53:59 < Dark_Shikari> now, of course, mm7 is zero!
2010-11-22 00:54:13 < Dark_Shikari> so we get abs(A-0) + abs(B-0) + abs(C-0) + abs(D-0) + abs(0-0) ...
2010-11-22 00:54:16 < Dark_Shikari> or A+B+C+D.
2010-11-22 00:54:28 < Dark_Shikari> So after psadbw mm0, mm7, mm0 is A+B+C+D and mm7 is still zero.
2010-11-22 00:54:30 < Dark_Shikari> Got that?
2010-11-22 00:54:31 < Jumpyshoes> wow, in three commands
2010-11-22 00:54:36 < Jumpyshoes> yea
2010-11-22 00:54:46 < Kovensky> nice trick
2010-11-22 00:54:46 < Dark_Shikari> Now, we move the result to "r3d", a general purpose register
2010-11-22 00:54:54 < Dark_Shikari> and get moving with part 2) of the algorithm.
2010-11-22 00:55:03 < Dark_Shikari> Note: the suffix 'd' means the 32-bit version, as opposed to the native-size version.
2010-11-22 00:55:13 < Jumpyshoes> is r3d one of the things that come with the 4 registers that are free>.
2010-11-22 00:55:14 < Jumpyshoes> ?
2010-11-22 00:55:15 < Dark_Shikari> This is an optimization: on 64-bit, using 32-bit versions of registers results in smaller instruction opcode sizes.
2010-11-22 00:55:20 < Dark_Shikari> So it's really just r3.
2010-11-22 00:55:24 < Dark_Shikari> r0, r1, r2, r3 are the 4 that are free.
2010-11-22 00:55:27 < Dark_Shikari> So we're using r3.
2010-11-22 00:55:31 < Jumpyshoes> kk
2010-11-22 00:55:36 < Dark_Shikari> So now r0 has our source pointer, and r3 has A+B+C+D.
2010-11-22 00:55:49 < Dark_Shikari> Now, while the CPU is busy doing that, we'll go and do part 2), the E+F+G+H.
2010-11-22 00:55:53 < Kovensky> what does the movzx do?
2010-11-22 00:55:59 < Dark_Shikari> we'll get to that
2010-11-22 00:56:01 < Dark_Shikari> Unfortunately, these bytes aren't in a straight line.
2010-11-22 00:56:07 < Dark_Shikari> So we can't just load EFGH and sad them.
2010-11-22 00:56:15 < Dark_Shikari> We'll have to do it the naive/slow way.
2010-11-22 00:56:26 < Dark_Shikari> well, s/straight line/adjacent in memory/
2010-11-22 00:56:34 < Kovensky> oh, so %rep is a looping macro
2010-11-22 00:56:43 < Dark_Shikari> so, now we're going to load E, F, G, H
2010-11-22 00:56:48 < Jumpyshoes> oh
2010-11-22 00:56:50 < Dark_Shikari> now you might notice some preprocessor commands here.
2010-11-22 00:56:57 < Dark_Shikari> %assign, %rep, etc are preprocessor commands
2010-11-22 00:57:04 < Dark_Shikari> so, first step: load E into r1d
2010-11-22 00:57:09 < Dark_Shikari> "movzx" means "move, with zero extend"
2010-11-22 00:57:15 < Dark_Shikari> movzx r1d, byte [r0-1]
2010-11-22 00:57:20 < Dark_Shikari> in C this would be:
2010-11-22 00:57:25 < Dark_Shikari> int r1d = r0[-1];
2010-11-22 00:58:00 < Jumpyshoes> my C is a bit rusty, what does that do? does it just take the location in memory before r0[0]?
2010-11-22 00:58:01 < Dark_Shikari> got that?
2010-11-22 00:58:07 < Dark_Shikari> *(r0-1)
2010-11-22 00:58:08 < Dark_Shikari> yes
2010-11-22 00:58:14 < Dark_Shikari> [] is just a dereference of a pointer
2010-11-22 00:58:23 < Jumpyshoes> ah
2010-11-22 00:58:24 < Dark_Shikari> *(r0-1) = r0[-1] = (r0-1)[0]
2010-11-22 00:58:51 < Dark_Shikari> So, here's what these 7 lines look like after the macro runs
2010-11-22 00:58:53 < Kovensky> what is r0-1 in that ascii matrix?
2010-11-22 00:59:09 < Dark_Shikari> E.
2010-11-22 00:59:19 < Dark_Shikari>     movzx  r1d, byte [r0-1]
2010-11-22 00:59:19 < Dark_Shikari>     movzx  r2d, byte [r0+FDEC_STRIDE*1-1]
2010-11-22 00:59:19 < Dark_Shikari>     add    r1d, r2d
2010-11-22 00:59:19 < Dark_Shikari>     movzx  r2d, byte [r0+FDEC_STRIDE*2-1]
2010-11-22 00:59:19 < Dark_Shikari>     add    r1d, r2d
2010-11-22 00:59:21 < Dark_Shikari>     movzx  r2d, byte [r0+FDEC_STRIDE*3-1]
2010-11-22 00:59:24 < Dark_Shikari>     add    r1d, r2d
2010-11-22 00:59:26 < Dark_Shikari> in order:
2010-11-22 00:59:29 < Dark_Shikari> load E
2010-11-22 00:59:31 < Dark_Shikari> load F
2010-11-22 00:59:34 < Dark_Shikari> add F to E
2010-11-22 00:59:36 < Dark_Shikari> load G
2010-11-22 00:59:39 < Dark_Shikari> add G to E
2010-11-22 00:59:41 < Dark_Shikari> load H
2010-11-22 00:59:44 < Dark_Shikari> add H to E
2010-11-22 00:59:47 < Dark_Shikari> any questions about that?
2010-11-22 01:00:01 < Dark_Shikari> by the way, feel free to ask questions about WHY the code is like that, too, not just why it's correct.
2010-11-22 01:00:23 < Jumpyshoes> i'm good so far
2010-11-22 01:00:29 < Dark_Shikari> ok, now we have to do step 3
2010-11-22 01:00:36 < Dark_Shikari> calculating A+B+C+D+E+F+G+H+4 >> 3
2010-11-22 01:00:36 < Jumpyshoes> actually
2010-11-22 01:00:42 < Jumpyshoes> where is n stored?
2010-11-22 01:00:46 < Dark_Shikari> it isn't.
2010-11-22 01:00:49 < Jumpyshoes> oh
2010-11-22 01:00:53 < Dark_Shikari> It's a preprocessor variable.
2010-11-22 01:00:56 < Jumpyshoes> oh, so it's like a macro?
2010-11-22 01:00:58 < Dark_Shikari> Yes
2010-11-22 01:00:59 < Dark_Shikari> It is a macro
2010-11-22 01:01:00 < Kovensky> yes, it's a macro
2010-11-22 01:01:06 < Jumpyshoes> that is handy
2010-11-22 01:01:07 < Dark_Shikari> Note how I pasted the after-preprocessor code above.
2010-11-22 01:01:11 < Jumpyshoes> yea
2010-11-22 01:01:13 < Jumpyshoes> now i see
2010-11-22 01:01:14 < Kovensky> everything starting with % in yasm syntax is a macro
2010-11-22 01:01:14 < Dark_Shikari> No n left.
2010-11-22 01:01:24 < Dark_Shikari> Now, so let's do step 3.
2010-11-22 01:01:31 < Dark_Shikari> lea is the best non-simd opcode in x86
2010-11-22 01:01:41 < Dark_Shikari> first, let's go over x86 addressing
2010-11-22 01:01:45 < Dark_Shikari> what you can put inside the brackets is not infinite.
2010-11-22 01:01:50 < Dark_Shikari> Here's the capabilities, specifically:
2010-11-22 01:02:00 < Dark_Shikari> [REG1 + REG2 * {1,2,4,8} + CONST]
2010-11-22 01:02:12 < Dark_Shikari> a register, plus another register * 1/2/4/8, plus a constant (positive or negative).
2010-11-22 01:02:23 < Dark_Shikari> As you might note, this is pretty useful for accessing things like arrays
2010-11-22 01:02:40 < Dark_Shikari> e.g. array[n+5], where array is an int array, would be
2010-11-22 01:02:45 < Dark_Shikari> [array + n*4 + 20]
2010-11-22 01:02:49 < Kovensky> I suppose the [r0+FDEC_STRIDE*n-1] bit gets simplified on assembly to [register + const]?
2010-11-22 01:02:49 < Dark_Shikari> got that?
2010-11-22 01:02:53 < Dark_Shikari> Kovensky: yes
2010-11-22 01:02:58 < Dark_Shikari> yasm sums up constants for you.
2010-11-22 01:03:00 < Jumpyshoes> yea, that's nice
2010-11-22 01:03:10 < Dark_Shikari> so, as you might note, that's a pretty powerful addressing system.
2010-11-22 01:03:16 < Dark_Shikari> That's more powerful than, say... "add".
2010-11-22 01:03:24 < Dark_Shikari> So why not expose it in an instruction to let us use it for math?
2010-11-22 01:03:26 < Dark_Shikari> So Intel did.
2010-11-22 01:03:39 < Dark_Shikari> lea X, [expr] sets X equal to the value of expr.
2010-11-22 01:03:42 < Dark_Shikari> just as fast as add.
2010-11-22 01:03:55 < Dark_Shikari> so that lea does r1d = r1 + r3 + 4
2010-11-22 01:04:07 < Jumpyshoes> wait, how does that work?
2010-11-22 01:04:11 < Dark_Shikari> how does what work
2010-11-22 01:04:23 < Jumpyshoes> so [] is addressing
2010-11-22 01:04:28 < Dark_Shikari> yes
2010-11-22 01:04:31 < Kovensky> lea runs the [REG1 + REG2 * {1,2,4,8} + CONST] math on its second argument and adds to the first
2010-11-22 01:04:31 < Jumpyshoes> oh
2010-11-22 01:04:33 < Dark_Shikari> lea doesn't actually address it
2010-11-22 01:04:37 < Jumpyshoes> okay
2010-11-22 01:04:40 < Dark_Shikari> It just calculates the result and stores it
2010-11-22 01:04:42 < Dark_Shikari> instead of going to memory.
2010-11-22 01:04:46 < Jumpyshoes> and it's faster than add?
2010-11-22 01:04:50 < Dark_Shikari> It's just as fast
2010-11-22 01:04:53 < Dark_Shikari> Except that you can do more with it.
2010-11-22 01:04:58 < Kovensky> faster since you're doing 3 sums in one
2010-11-22 01:05:02 < Kovensky> if you look it that way
2010-11-22 01:05:04 < Jumpyshoes> o
2010-11-22 01:05:05 < Jumpyshoes> true
2010-11-22 01:05:06 < Kovensky> but cyclewise it's the same speed
2010-11-22 01:05:06 < Dark_Shikari> now, technically, you can do more adds per cycle than lea, so you shouldn't go replacing all your adds with lea
2010-11-22 01:05:14 < Kovensky> hm
2010-11-22 01:05:17 < Dark_Shikari> But if you can use it to do more than one thing at a time, it's a big win.
2010-11-22 01:05:26 < Dark_Shikari> So this lets us add r3, and add 4, in one op.
2010-11-22 01:05:31 < Dark_Shikari> Got that
2010-11-22 01:05:31 < Dark_Shikari> ?
2010-11-22 01:05:33 < Jumpyshoes> yup
2010-11-22 01:05:43 < Dark_Shikari> now shr r1d, 3: there's one that you can probably figure out yourself ;)
2010-11-22 01:06:01 < Sean_McG> hm, shift halfword right?
2010-11-22 01:06:05 < Dark_Shikari> just shift right
2010-11-22 01:06:14 < Jumpyshoes> handy
2010-11-22 01:06:18 < Jumpyshoes> why are we doing this?
2010-11-22 01:06:25 < Dark_Shikari> doing what
2010-11-22 01:06:28 < Jumpyshoes> shifting right
2010-11-22 01:06:32 < Kovensky> it's part of the >>3 in the equation he gave during the description
2010-11-22 01:06:37 < Jumpyshoes> right
2010-11-22 01:06:39 < Kovensky> >>3 = /(2^3) = /8
2010-11-22 01:06:50 < Dark_Shikari> DC prediction consists of averaging the pixels surrounding the block
2010-11-22 01:06:52 < Dark_Shikari> using correct rounding
2010-11-22 01:06:57 < Dark_Shikari> and then filling in the block with the result
2010-11-22 01:07:04 < Dark_Shikari> hence A+B+C+D+E+F+G+H+4 >> 3
2010-11-22 01:07:08 < Dark_Shikari> +4 for correct rounding
2010-11-22 01:07:10 < Dark_Shikari>  >> 3 to divide
2010-11-22 01:07:17 < Jumpyshoes> smart
2010-11-22 01:07:25 < Dark_Shikari> now for the final part: storing the results
2010-11-22 01:07:28 < Kovensky> the same trick as adding +0.5 to a float so you get it rounded when you cast to integer
2010-11-22 01:07:37 < Dark_Shikari> imul r1d, 0x01010101
2010-11-22 01:07:42 < Dark_Shikari> this is called a "splat" and you may have seen it in C as well
2010-11-22 01:07:48 < Kovensky> splat? lol
2010-11-22 01:07:48 < Dark_Shikari> we're turning an 8-bit value into 4x that value
2010-11-22 01:07:51 < Dark_Shikari> e.g. A -> AAAAA
2010-11-22 01:07:55 < Dark_Shikari> er, AAAA
2010-11-22 01:07:58 < Jumpyshoes> i have never seen this before
2010-11-22 01:08:08 < Jumpyshoes> how does this work?
2010-11-22 01:08:09 < Dark_Shikari> so now we have a 32-bit register (r1d) with one copy of A in each 8-bit nibble of that register.
2010-11-22 01:08:15 < Jumpyshoes> oh nevermind, i get it
2010-11-22 01:08:17 < Dark_Shikari> A * 0x01010101 = A A A A
2010-11-22 01:08:30 < Dark_Shikari> Now we go ahead and store this 4 times.
2010-11-22 01:08:35 < Dark_Shikari> And we're done.
2010-11-22 01:08:47 < Jumpyshoes> woah
2010-11-22 01:08:51 < Dark_Shikari> Finally, we RET: x264 will automatically clean up after us.
2010-11-22 01:09:03 < Kovensky> emms is only needed on sse code, right?
2010-11-22 01:09:04 < Jumpyshoes> how much faster is this in asm than C?
2010-11-22 01:09:10 < Dark_Shikari> Not much.
2010-11-22 01:09:14 < Dark_Shikari> The only reason it's faster is psadbw.
2010-11-22 01:09:23 < Dark_Shikari> Everything else is something GCC can do with properly written C.
2010-11-22 01:09:34 < Dark_Shikari> I use it as an example because it's simple, and it mixes a lot of ideas in one function.
2010-11-22 01:09:38 < Dark_Shikari> well, as a first example.
2010-11-22 01:09:44 < Jumpyshoes> it does
2010-11-22 01:09:57 < Jumpyshoes> so what's the point of having it in asm if it's only slightly faster?
2010-11-22 01:09:57 < Dark_Shikari> It's probably 2-3 clocks faster at most.
2010-11-22 01:10:00 < Dark_Shikari> Because we can.
2010-11-22 01:10:02 < Dark_Shikari> lol
2010-11-22 01:10:06 < Jumpyshoes> of course
2010-11-22 01:10:09 < Jumpyshoes> this is open source
2010-11-22 01:10:12 < Dark_Shikari> Because it probably took 5 minutes to write.
2010-11-22 01:10:17 < Kovensky> isn't that a function that's called multiple times per frame too?
2010-11-22 01:10:22 < Dark_Shikari> Kovensky: understatement
2010-11-22 01:10:29 < Dark_Shikari> and this function only takes like 10 clocks
2010-11-22 01:10:33 < Dark_Shikari> so saving 2 clocks is kind of meaningful there
2010-11-22 01:10:34 < Dark_Shikari> (relatively)
2010-11-22 01:10:42 < Kovensky> then yea, I guess 3 clocks per MB on a really hot function is worth it =p
2010-11-22 01:10:53 < Dark_Shikari> anyways, that's a simple one.  Let's go on to some other concepts.
2010-11-22 01:11:06 < Dark_Shikari> Throw any questions you ahve at me about this before we go.
2010-11-22 01:11:34 < Jumpyshoes> are the preprocessor marcos in yasm or x264?
2010-11-22 01:11:43 < Jumpyshoes> (also brb 1 minute)
2010-11-22 01:11:49 < Dark_Shikari> x264 has its own macro system written in yasm
2010-11-22 01:11:49 < Kovensky> they're in yasm
2010-11-22 01:11:56 < Dark_Shikari> which handles stuff like arguments, pushing and popping of registers
2010-11-22 01:12:00 < Dark_Shikari> and many more things which we will see soon
2010-11-22 01:12:03 < Dark_Shikari> we call this "x264asm"
2010-11-22 01:12:07 < Dark_Shikari> ffmpeg also uses this.
2010-11-22 01:12:14 < Kovensky> wasn't it "pengvado asm"? :>
2010-11-22 01:12:19 < Dark_Shikari> It's under a BSD license, so anyone in any project can and should use it to make their life less painful.
2010-11-22 01:12:34 < Kovensky> or did you rename to "x264asm" after "pasm" was already taken? :p
2010-11-22 01:12:47 < Dark_Shikari> bugmaster also wrote some of it
2010-11-22 01:13:12 < Dark_Shikari> ok brb I'm grabbing some food
2010-11-22 01:13:18 < Kovensky> right, the win64 part
2010-11-22 01:14:12 < Jumpyshoes> okay back
2010-11-22 01:14:15 < Jumpyshoes> whenever you're ready
2010-11-22 01:15:13 < Jumpyshoes> i wonder how much of this i will retain
2010-11-22 01:15:35 < j0sh> is this room logged somewhere?
2010-11-22 01:15:44 < Jumpyshoes> well, my hard drive
2010-11-22 01:15:59 < Kovensky> I have logs since ~2008 I think
2010-11-22 01:18:43 < Dark_Shikari> pengvado logs this
2010-11-22 01:18:52 < Dark_Shikari> Jumpyshoes: you don't need to retain individual instructions etc, you can look those up
2010-11-22 01:18:59 < Dark_Shikari> ok, next
2010-11-22 01:19:03 < Dark_Shikari> you may have noticed that psadbw is fucking awesome.
2010-11-22 01:19:18 < Jumpyshoes> it does like 8 things in one
2010-11-22 01:19:22 < Dark_Shikari> abs() is typically 4 instructions on x86
2010-11-22 01:19:24 < Kovensky> 22:18.52 Dark_Shikari: Jumpyshoes: you don't need to retain individual instructions etc, you can look those up <-- indeed; you just need to know they exist so you know to look them up =p
2010-11-22 01:19:33 < Dark_Shikari> psadbw does 8 subtracts
2010-11-22 01:19:34 < Jumpyshoes> o
2010-11-22 01:19:36 < Dark_Shikari> 8 absolute values on those results
2010-11-22 01:19:40 < Dark_Shikari> and then adds them up
2010-11-22 01:19:45 < Dark_Shikari> that's 8 + 32 + 7
2010-11-22 01:19:46 < Jumpyshoes> that's a lot
2010-11-22 01:19:48 < Dark_Shikari> 47 instructions in one
2010-11-22 01:19:51 < Jumpyshoes> why is abs so slow?
2010-11-22 01:19:51 < Dark_Shikari> (at least, 47 equivalent)
2010-11-22 01:19:57 < Dark_Shikari> abs isn't slow, there's just no instructin for it
2010-11-22 01:20:00 < Dark_Shikari> the typical algorithm is
2010-11-22 01:20:03 < Dark_Shikari> int sign = x >> 31;
2010-11-22 01:20:14 < Dark_Shikari> (x ^ sign) - sign;
2010-11-22 01:20:20 < Dark_Shikari> this needs a mov on x86, so that's 4 instructions.
2010-11-22 01:20:33 < Jumpyshoes> oh
2010-11-22 01:20:46 < Jumpyshoes> okay
2010-11-22 01:20:47 < Dark_Shikari> So psadbw is pretty awesome.
2010-11-22 01:20:52 < Jumpyshoes> indeed
2010-11-22 01:20:53 < Dark_Shikari> It's very awesome for doing what its name implies you should do with it
2010-11-22 01:20:56 < Dark_Shikari> That is -- SADs
2010-11-22 01:20:58 < Dark_Shikari> sum of absolute differences
2010-11-22 01:21:05 < Dark_Shikari> so let's open sad-a.asm and hop down to line 95
2010-11-22 01:21:17 < Dark_Shikari> also open common/pixel.c and look at the first function: SAD
2010-11-22 01:21:23 < Dark_Shikari> This function is pretty simple.  You should be able to see how it works.
2010-11-22 01:21:29 < Dark_Shikari> If you have any questions about its details, ask (the C, not the asm)
2010-11-22 01:21:36 < Dark_Shikari> look only at the C for now.
2010-11-22 01:21:51 < Jumpyshoes> pixel_sad_%1x%2_mmxext <-- you can have % in function names?
2010-11-22 01:21:57 < Dark_Shikari> We'll get to that.
2010-11-22 01:22:35 < Dark_Shikari> so as you'll notice, the C SAD has 7 different versions
2010-11-22 01:22:40 < Dark_Shikari> for 16x16, 16x8, 8x16...
2010-11-22 01:22:45 < Dark_Shikari> and it's instantiated via a macro.
2010-11-22 01:23:18 < Jumpyshoes> okay, so for the C function
2010-11-22 01:23:25 < Jumpyshoes> how do you pass the pix1 and i_stride_pix1 arguments?
2010-11-22 01:23:33 < Dark_Shikari> One's a pointer, one's the stride.
2010-11-22 01:23:35 < Dark_Shikari> They're just normal params.
2010-11-22 01:23:52 < Dark_Shikari> the function has 4 parameters: two sources, two strides.
2010-11-22 01:23:59 < Jumpyshoes> i mean, the define only has 3 parameters
2010-11-22 01:24:03 < Kovensky> they come from image data
2010-11-22 01:24:08 < Dark_Shikari> The define is defining things that ARENT parameters.
2010-11-22 01:24:21 < Dark_Shikari> the name of the function
2010-11-22 01:24:23 < Dark_Shikari> the width, and the height
2010-11-22 01:24:25 < Kovensky> yes, the define just defines the name and the length of the sad
2010-11-22 01:24:27 < Dark_Shikari> all those are HARDCODED upon compile time
2010-11-22 01:24:34 < Jumpyshoes> oh
2010-11-22 01:24:35 < Jumpyshoes> right
2010-11-22 01:24:36 < Dark_Shikari> into 7 different versions of the function
2010-11-22 01:24:38 < Dark_Shikari> with 7 different names.
2010-11-22 01:24:50 < Jumpyshoes> right, i see
2010-11-22 01:25:06 < Dark_Shikari> so, for our asm, we also need 7 versions
2010-11-22 01:25:07 < Jumpyshoes> i haven't used defines extensively before, so you might get more stupid questions
2010-11-22 01:25:13 < Dark_Shikari> that's fine, no such thing as stupid questions
2010-11-22 01:25:19 < Dark_Shikari> and we also don't want to write the function 7 times, just like in the case of C we didn't.
2010-11-22 01:25:21 < j0sh> only stupid mistakes :)
2010-11-22 01:25:28 < Dark_Shikari> so in the asm, we define a macro
2010-11-22 01:25:30 < Dark_Shikari> %macro SAD 2
2010-11-22 01:25:30 < Kovensky> better than not ask and misunderstand everything
2010-11-22 01:25:35 < Dark_Shikari> that means this macro has two paremeters.
2010-11-22 01:25:43 < Jumpyshoes> oh, %1 and %2?
2010-11-22 01:25:45 < Dark_Shikari> They are accessed as %1 and %2.
2010-11-22 01:25:53 < Dark_Shikari> we call the macro 7 times, one for each size.
2010-11-22 01:26:33 < Dark_Shikari> the function takes 4 args (as you'd expect)
2010-11-22 01:26:36 < Dark_Shikari> and needs 4 regs (just the args)
2010-11-22 01:26:41 < Jumpyshoes> and SAD_INC_2x%1P is another macro?
2010-11-22 01:26:46 < Dark_Shikari> Yes, it's one of three macros
2010-11-22 01:26:47 < Dark_Shikari> look above
2010-11-22 01:26:53 < Dark_Shikari> each one does 2 rows worth of SAD
2010-11-22 01:26:56 < Jumpyshoes> oh, cool
2010-11-22 01:26:57 < Dark_Shikari> for width 4, width 8, and width 16.
2010-11-22 01:27:02 < Dark_Shikari> so it picks the right one based on the width
2010-11-22 01:27:07 < Dark_Shikari> and it %reps it based on the height
2010-11-22 01:27:10 < Kovensky> punpckldq <-- cute instruction name
2010-11-22 01:27:34 < Dark_Shikari> now, start analyzing the 3 macros above (the sad macros) and trying to figure out how they work.
2010-11-22 01:27:37 < Dark_Shikari> ask questions.
2010-11-22 01:27:53 < Dark_Shikari> note mm0 is the accumulator
2010-11-22 01:27:56 < Dark_Shikari> which is why it's zeroed at the start.
2010-11-22 01:27:58 < Jumpyshoes> the order of args is the same as in the C function?
2010-11-22 01:28:01 < Dark_Shikari> yes
2010-11-22 01:28:04 < Jumpyshoes> kk
2010-11-22 01:28:20 < Jumpyshoes> what does punpckldq do?
2010-11-22 01:28:24 < Kovensky> ^
2010-11-22 01:28:30 < Dark_Shikari> good question!
2010-11-22 01:28:39 < Dark_Shikari> punpck is a set of instructions that interleave their arguments in some fashion.
2010-11-22 01:28:47 < Dark_Shikari> to start with, it can be l or h
2010-11-22 01:28:48 < Dark_Shikari> low or high
2010-11-22 01:29:01 < Dark_Shikari> so punpckl__ ABCD, EFGH will use AB and EF.
2010-11-22 01:29:07 < Dark_Shikari> And punpbkh__ ABCD, EFGH will use CD and GH.
2010-11-22 01:29:19 < Kovensky> hurf, little endian
2010-11-22 01:29:20 < Dark_Shikari> the next two letters are the source size, and destination size.
2010-11-22 01:29:26 < Dark_Shikari> for example, punpcklbw interleaves bytes, to create words.
2010-11-22 01:29:36 < Dark_Shikari> So punpcklbw ABCD, EFGH gives you AEBF.
2010-11-22 01:29:51 < Jumpyshoes> oh, okay
2010-11-22 01:29:53 < Dark_Shikari> if the letters are bytes.
2010-11-22 01:30:07 < Dark_Shikari> so punpckldq ABCDEFGH, IJKLMNOP
2010-11-22 01:30:12 < Dark_Shikari> gives us ABCDIJKL
2010-11-22 01:30:23 < Dark_Shikari> so in other words, it stuffs the two sets of 4 bytes we just loaded into one register.
2010-11-22 01:30:26 < Dark_Shikari> So we can do only one SAD, instead of two.
2010-11-22 01:30:43 < Dark_Shikari> punpcklbw ABCD0000, EFGH0000 --> ABCDEFGH
2010-11-22 01:30:47 < Dark_Shikari> er, punpckldq
2010-11-22 01:31:02 < Dark_Shikari> so it effectively concatenates mm1 and mm2 for us.
2010-11-22 01:31:16 < Dark_Shikari> if we didn't do this, we'd have to do twice as many sads and adds.
2010-11-22 01:31:42 < Dark_Shikari> we do this because the registers are width 8, but our sad is width 4.
2010-11-22 01:31:50 < Dark_Shikari> So we need to stuff sad information side by side to fill the whole reg.
2010-11-22 01:32:31 < Jumpyshoes> why are we punpckldq'ing the [r0+r1] and not [r0]?
2010-11-22 01:32:41 < Jumpyshoes> oh wait, nevermind
2010-11-22 01:32:59 < Dark_Shikari> we're concatenating row 0 and row 1
2010-11-22 01:33:02 < Dark_Shikari> of each input.
2010-11-22 01:33:06 < Kovensky> so the punpckldq does mm1 = mm1 & 0xFFFF<<16 | [src+stride]>>16?
2010-11-22 01:33:30 < Dark_Shikari> no, each input is 32 bits
2010-11-22 01:33:30 < Dark_Shikari> not 16
2010-11-22 01:33:38 < Kovensky> orz
2010-11-22 01:33:40 < Dark_Shikari> low 32 of src1, low 32 of src2, combine to make 64 bit output
2010-11-22 01:33:44 < Kovensky> it's the same idea though right?
2010-11-22 01:33:59 < Dark_Shikari> not really, it doesn't right shift anything
2010-11-22 01:34:14 < Kovensky> hm
2010-11-22 01:34:18 < Kovensky> me failing at bit math here
2010-11-22 01:34:28 < Jumpyshoes> lea     r0,     [r0+2*r1] <-- why are we doing this step?
2010-11-22 01:34:42 < Jumpyshoes> doesn't it move r0 over 2*r1?
2010-11-22 01:34:49 < Dark_Shikari> btw Jumpyshoes http://alien.dowling.edu/~rohit/nasmdocb.html
2010-11-22 01:34:59 < Jumpyshoes> and change the arg?
2010-11-22 01:34:59 < Dark_Shikari> Jumpyshoes: we're incrementing the pointer by 2*stride
2010-11-22 01:35:12 < Jumpyshoes> does the C code do that?
2010-11-22 01:35:13 < Kovensky> you can do whatever you want with the arc
2010-11-22 01:35:14 < Kovensky> arg*
2010-11-22 01:35:21 < Jumpyshoes> oh, the rep
2010-11-22 01:36:02 < Jumpyshoes> okay, i get what SAD_INC_2x4P
2010-11-22 01:36:03 < Jumpyshoes> woohoo
2010-11-22 01:36:11 < Dark_Shikari> the others work similarly
2010-11-22 01:36:15 < Dark_Shikari> except without the punpck magic
2010-11-22 01:36:17 < Dark_Shikari> because they don't need it.
2010-11-22 01:36:35 < Jumpyshoes> wait, why is the lea out of order in SAD_INC_2x8P?
2010-11-22 01:36:43 < Jumpyshoes> by out of order i mean not next to each other
2010-11-22 01:36:44 < Kovensky> I'm still finishing 2x4...
2010-11-22 01:36:46 < Dark_Shikari> No particular reason.
2010-11-22 01:36:50 < Kovensky> k, got it
2010-11-22 01:36:52 < Jumpyshoes> oh, okay
2010-11-22 01:38:10 < Dark_Shikari> http://alien.dowling.edu/~rohit/nasmdocb.html have this open in another window for reference
2010-11-22 01:38:13 < Dark_Shikari> very very useful
2010-11-22 01:38:19 < Jumpyshoes> yea, have it open
2010-11-22 01:38:37 < Jumpyshoes> so we rep the SAD for however many times so the 2x%1 is completed?
2010-11-22 01:39:03 < Dark_Shikari> yes
2010-11-22 01:39:03 < Kovensky> well, the SADs work in two rows at a time
2010-11-22 01:39:08 < Dark_Shikari> so if it's height 8
2010-11-22 01:39:10 < Dark_Shikari> we rep it 4 times
2010-11-22 01:39:12 < Dark_Shikari> 4*2 = 8
2010-11-22 01:39:13 < Kovensky> so you just need to do for rows/2 times
2010-11-22 01:39:24 < Jumpyshoes> ah
2010-11-22 01:39:40 < Kovensky> I dunno if I understood 2x8 / 2x16 or not; I have no questions about them but I also doubt that I'll remember this after a day
2010-11-22 01:39:51 < Jumpyshoes> movq    mm2,    [r0+8] <-- why are we adding the 8?
2010-11-22 01:40:03 < Jumpyshoes> if it's movq
2010-11-22 01:40:20 < Kovensky> Jumpyshoes: because it's working now on 2 "columns" of 8 bytes each
2010-11-22 01:40:31 < Jumpyshoes> oh, right
2010-11-22 01:40:37 < Kovensky> Dark_Shikari: why are strides not hardcoded btw?
2010-11-22 01:41:05 < Dark_Shikari> Kovensky: SAD can be called on a reference frame
2010-11-22 01:41:08 < Dark_Shikari> thus variable stride
2010-11-22 01:41:37 < Kovensky> I don't really get it but then I'd have to study more of x264 to know the reference frame memory layout
2010-11-22 01:41:47 < Kovensky> oh wait
2010-11-22 01:41:50 < Kovensky> I got it now lol
2010-11-22 01:41:54 < Dark_Shikari> it's called on frames, as opposed to some temporary block of memory =p
2010-11-22 01:42:26 < Jumpyshoes> okay, i think i understand the SAD_INC_* functions now
2010-11-22 01:42:31 < Dark_Shikari> now, for the kicker
2010-11-22 01:42:33 < Jumpyshoes> and the SAD
2010-11-22 01:42:41 < Dark_Shikari> the 16x16 SAD function declared here is 15 times faster than C.
2010-11-22 01:42:45 < Jumpyshoes> wat
2010-11-22 01:43:13 < Jumpyshoes> why is it so much faster?
2010-11-22 01:43:18 < Dark_Shikari> psadbw
2010-11-22 01:43:33 < Jumpyshoes> oh, because you're doing the abs in the C function
2010-11-22 01:44:24 < Jumpyshoes> okay, that is pretty awesome
2010-11-22 01:44:40 < Dark_Shikari> now let's get a bit to how we measure performance
2010-11-22 01:44:46 < Dark_Shikari> for any asm instruction, there are three things that matter
2010-11-22 01:44:56 < Dark_Shikari> latency, inverse throughput, and execution units
2010-11-22 01:45:05 < Dark_Shikari> the first two are represented like this
2010-11-22 01:45:06 < Dark_Shikari> "3/1"
2010-11-22 01:45:07 < Kovensky> inverse throughput?
2010-11-22 01:45:15 < Dark_Shikari> this means a psadbw takes 3 clocks to finish from when it's started
2010-11-22 01:45:19 < Dark_Shikari> and you can do one of them per cycle.
2010-11-22 01:45:30 < Jumpyshoes> okay, what are all three? <_<
2010-11-22 01:45:41 < Dark_Shikari> another example is "mov"
2010-11-22 01:45:48 < Dark_Shikari> mov between two registers is 1/0.33
2010-11-22 01:45:53 < Dark_Shikari> takes 1 cycle, and you can do 3 per clock.
2010-11-22 01:45:56 < Dark_Shikari> execution unit usage is a bit trickier.
2010-11-22 01:46:03 < Dark_Shikari> Not all execution units can do all instructions.
2010-11-22 01:46:11 < Dark_Shikari> Intel chips have 6 execution units:
2010-11-22 01:46:14 < Dark_Shikari> p0, p1, p2, p3, p4, p5
2010-11-22 01:46:15 < Jumpyshoes> wait, what is latency?
2010-11-22 01:46:20 < Dark_Shikari> time from start to finish, in clocks
2010-11-22 01:46:24 < Jumpyshoes> and inverse throughput and execution units
2010-11-22 01:46:33 < Dark_Shikari> inverse throughput is how many you can do per clock.
2010-11-22 01:46:41 < Dark_Shikari> execution units are the things in the chip that do stuff.
2010-11-22 01:46:41 < Jumpyshoes> oh
2010-11-22 01:46:53 < Dark_Shikari> of these 6 execution units, three can do math: p0, p1, p5.
2010-11-22 01:47:02 < Dark_Shikari> psadbw, for example, can only use one of these (p1)
2010-11-22 01:47:06 < Dark_Shikari> pxor can use all three
2010-11-22 01:47:08 < Dark_Shikari> and so forth
2010-11-22 01:47:18 < Dark_Shikari> generally execution units aren't important until you get into serious optimizing
2010-11-22 01:47:24 < Dark_Shikari> but they can often affect the best instruction choices
2010-11-22 01:47:31 < Dark_Shikari> for example, if an execution unit is sitting around doing nothing for a whole function.
2010-11-22 01:47:48 < Dark_Shikari> the instruction tables sheet here http://agner.org/optimize/ has all the information on latency, execution units, and inverse throughput
2010-11-22 01:47:52 < Dark_Shikari> for a wide variety of CPUs
2010-11-22 01:48:26 < Kovensky> I suppose AMD are roughly the same, for compatibility?
2010-11-22 01:48:28 < Jumpyshoes> how about branching? i heard branching fucks you
2010-11-22 01:48:35 < Dark_Shikari> not generally unless it's unpredictable
2010-11-22 01:48:41 < Kovensky> branch mispredictions do
2010-11-22 01:48:45 < Dark_Shikari> we can get to a case of that later if you want.
2010-11-22 01:48:51 < Dark_Shikari> now, let's just analyze SAD.
2010-11-22 01:48:57 < Dark_Shikari> suppose we want to analyze the 8x8 SAD
2010-11-22 01:49:02 < Dark_Shikari> in this function we do:
2010-11-22 01:49:03 < Dark_Shikari> 8 SADs
2010-11-22 01:49:07 < Dark_Shikari> 8 adds (accumulates)
2010-11-22 01:49:09 < Dark_Shikari> 16 loads
2010-11-22 01:49:34 < Dark_Shikari> plus the start, end, and calling overhead
2010-11-22 01:49:44 < Dark_Shikari> 8 SADs: takes 8 cycles (inverse throughput of 1)
2010-11-22 01:49:54 < Dark_Shikari> 8 adds: takes 8 cycles (inverse throughput of 1), and can run at the same time as SADs
2010-11-22 01:50:01 < Dark_Shikari> 16 loads: takes 16 cycles, and can run at the same time as the above.
2010-11-22 01:50:04 < Dark_Shikari> So the loads are the bottleneck.
2010-11-22 01:50:13 < Dark_Shikari> This is an important thing to understand: it's possible for one type of operation to bottleneck a function.
2010-11-22 01:50:19 < Dark_Shikari> Loads are a common example.
2010-11-22 01:50:34 < Dark_Shikari> In this case, SAD is *so fast* that it is effectively free, as we're sitting around waiting for loads the whole time.
2010-11-22 01:50:47 < Jumpyshoes> oh.
2010-11-22 01:50:56 < Dark_Shikari> the actual runtime of the function is about 22 clocks.
2010-11-22 01:51:02 < Dark_Shikari> Which is fitting for 16 + start + end + overhead.
2010-11-22 01:51:20 < Dark_Shikari> so that's some basic performance analysis for you.
2010-11-22 01:51:29 < Jumpyshoes> is there anything that does this automatically for you?
2010-11-22 01:51:30 < Dark_Shikari> How long the function should take in theory, how long each instruction takes in theory, and how you can be bottlenecked.
2010-11-22 01:51:36 < Dark_Shikari> analysis?  not really.
2010-11-22 01:51:43 < Dark_Shikari> There are intel performance counters and such on the chip
2010-11-22 01:51:46 < Dark_Shikari> but they're not magic
2010-11-22 01:51:53 < Dark_Shikari> It might be useful to have some kind of tool to analyze asm functions
2010-11-22 01:52:08 < Jumpyshoes> ah
2010-11-22 01:52:25 < Dark_Shikari> in general though, intuition is a powerful tool.
2010-11-22 01:53:02 < Jumpyshoes> i see
2010-11-22 01:53:15 < Dark_Shikari> so let's move on to some examples of powerful x264 macros.
2010-11-22 01:53:45 < Jumpyshoes> cool
2010-11-22 01:53:46 < Dark_Shikari> actualyl, let's start with something simpler
2010-11-22 01:53:49 < Dark_Shikari> pixel_avg2_w16_sse2
2010-11-22 01:53:51 < Dark_Shikari> mc-a.asm
2010-11-22 01:53:54 < Dark_Shikari> find it, ping me when you have
2010-11-22 01:54:19  * Kovensky found it
2010-11-22 01:54:27 < Jumpyshoes> found it
2010-11-22 01:55:20 < Dark_Shikari> ok, so this function interpolates between two inputs, and outputs to an output
2010-11-22 01:55:31 < Dark_Shikari> the interpolation is the simplest possible
2010-11-22 01:55:33 < Dark_Shikari> (A+B+1)>>1
2010-11-22 01:55:56 < Dark_Shikari> look at the function signature above
2010-11-22 01:55:59 < Dark_Shikari> ; void pixel_avg2_w4( uint8_t *dst, int dst_stride,
2010-11-22 01:55:59 < Dark_Shikari> etc
2010-11-22 01:56:12 < Dark_Shikari> so this function takes inputs from src1 and src2, averages them together, and writes to dst
2010-11-22 01:56:16 < Dark_Shikari> src1 and src2 have src_stride
2010-11-22 01:56:19 < Dark_Shikari> and dst has dst_stride.
2010-11-22 01:56:20 < Dark_Shikari> got it?
2010-11-22 01:56:50 < Jumpyshoes> what's the height?
2010-11-22 01:57:12 < Dark_Shikari> how many lines to interpolate.
2010-11-22 01:57:17 < Jumpyshoes> ah
2010-11-22 01:58:29 < Dark_Shikari> now this function uses xmm registers (128-bit)
2010-11-22 01:58:31 < Dark_Shikari> so it does 16 bytes at a time
2010-11-22 01:58:44 < Dark_Shikari> all 128-bit loads must be aligned unless movdqu is used.
2010-11-22 01:58:50 < Dark_Shikari> since our inputs are aligned, this is a lot of movdqu.
2010-11-22 01:58:54 < Dark_Shikari> er, are unaligned
2010-11-22 01:59:05 < Jumpyshoes> and what does movdqu do?
2010-11-22 01:59:18 < Dark_Shikari> loads 128 bits from an unaligned source
2010-11-22 01:59:28 < Jumpyshoes> ah
2010-11-22 01:59:33 < Kovensky> why sub r2 from r4?
2010-11-22 01:59:40 < Dark_Shikari> ah, now here's a fun trick
2010-11-22 01:59:47 < Dark_Shikari> we need to increment three pointers, right?
2010-11-22 01:59:50 < Jumpyshoes> yea, aren't you subtracting addressses?
2010-11-22 01:59:51 < Dark_Shikari> src1, src2, dst
2010-11-22 01:59:58 < Dark_Shikari> But src1 and src2 have the same stride.
2010-11-22 02:00:03 < Dark_Shikari> So they're being incremented by the same amount.
2010-11-22 02:00:11 < Dark_Shikari> So we can take src2 and represent it as an offset from src1.
2010-11-22 02:00:14 < Dark_Shikari> Then we only have to increment src1.
2010-11-22 02:00:18 < Dark_Shikari> One lea removed per iteration, bam.
2010-11-22 02:00:28 < Kovensky> and use r6 as the offset + stride?
2010-11-22 02:00:47 < Dark_Shikari> yes
2010-11-22 02:01:31 < Jumpyshoes> that is a nice trick
2010-11-22 02:01:52 < Dark_Shikari> so look through that function and see if there's anything you don't know about
2010-11-22 02:01:53 < Dark_Shikari> and ask questions.
2010-11-22 02:02:16 < Jumpyshoes> god how do you keep track of which argument is which
2010-11-22 02:02:33 < Kovensky> I copied the description from w4
2010-11-22 02:02:37 < Kovensky> and annotated it
2010-11-22 02:02:45 < Kovensky> ; void pixel_avg2_w4( uint8_t *dst (r0), int dst_stride (r1),
2010-11-22 02:02:48 < Kovensky> ;                     uint8_t *src1 (r2), int src_stride (r3),
2010-11-22 02:02:51 < Kovensky> ;                     uint8_t *src2 (r4), int height (r5) );
2010-11-22 02:02:55 < Jumpyshoes> good idea
2010-11-22 02:03:04 < Dark_Shikari> Jumpyshoes: we have a system I'll show you later that helps you keep track of registers.
2010-11-22 02:03:13 < Dark_Shikari> or, well, makes it easier to.
2010-11-22 02:03:41 < Jumpyshoes> pavgb i assume does some sort of averagiing?
2010-11-22 02:04:04 < Dark_Shikari> yes, (A+B+1)>>1 for each pair of input pixels
2010-11-22 02:04:26 < Jumpyshoes> http://www.tommesani.com/SSEPrimer.html ooh this has pretty diagrams
2010-11-22 02:04:43 < Kovensky> that one was easy to read, but I didn't bother much about u vs a
2010-11-22 02:04:55 < Jumpyshoes> movdqa - mov dq to aligned?
2010-11-22 02:05:13 < Dark_Shikari> same as movdqu, except for aligned
2010-11-22 02:05:36 < Jumpyshoes> ah
2010-11-22 02:05:46 < Dark_Shikari> the output is always aligned, as we control it
2010-11-22 02:05:52 < Dark_Shikari> the input is an arbitrary pointer into a reference frame
2010-11-22 02:05:54 < Dark_Shikari> and so it could be anything.
2010-11-22 02:05:56 < Jumpyshoes> hoho one of the four billion jumps that exists in x86
2010-11-22 02:06:04 < Dark_Shikari> jump if greater than
2010-11-22 02:06:07 < Jumpyshoes> jump greater?
2010-11-22 02:06:16 < Dark_Shikari> so if r5d > 0
2010-11-22 02:06:18 < Jumpyshoes> ah
2010-11-22 02:06:35 < Jumpyshoes> so why two?
2010-11-22 02:06:44 < Dark_Shikari> it handles two rows at a time.
2010-11-22 02:06:46 < Jumpyshoes> oh, right
2010-11-22 02:07:04 < Kovensky> hm
2010-11-22 02:07:14 < Kovensky> movdq moves doublequads
2010-11-22 02:07:21 < Kovensky> but the registers are only quads
2010-11-22 02:07:28 < Kovensky> unless on 64bit
2010-11-22 02:07:32 < Kovensky> so what does it do
2010-11-22 02:07:36 < Jumpyshoes> xmm2 is 128, isn't it?
2010-11-22 02:07:37 < Kovensky> oh wait, not registers, memory addresses
2010-11-22 02:07:43 < Kovensky> failed there
2010-11-22 02:07:52 < Jumpyshoes> i thought it pads 0s
2010-11-22 02:07:58 < Dark_Shikari> all the moves here are 128-bit
2010-11-22 02:07:59 < Dark_Shikari> so there's no padding
2010-11-22 02:08:11 < Kovensky> no, it moves 128bits from wherever the registers point to xmm / viceversa
2010-11-22 02:08:24 < Jumpyshoes> ah, right
2010-11-22 02:08:29 < Kovensky> I was failing just now and reading as if it was moving the register contents to xmm
2010-11-22 02:09:20 < Jumpyshoes> okay, this is pretty awesome
2010-11-22 02:09:26 < Jumpyshoes> except i have a headache now
2010-11-22 02:09:37 < wipple> Dark_Shikari: i fixed configure --> http://cccp.project357.com/p/f1860e321
2010-11-22 02:09:46 < wipple> any other good way to fix?
2010-11-22 02:10:13 < Dark_Shikari> wipple: if it works, I'm fine with it.  you want to package that with an updated version of your other patch?
2010-11-22 02:10:38 < wipple> Dark_Shikari: http://cccp.project357.com/p/f3c6e06e4
2010-11-22 02:11:04 < Kovensky> Dark_Shikari: how faster is the SSE2 version of this func compared to C
2010-11-22 02:11:53 < Dark_Shikari> wipple: applied
2010-11-22 02:12:03 < Dark_Shikari> Kovensky: about 11 times faster
2010-11-22 02:12:20 < Jumpyshoes> holy crap
2010-11-22 02:12:59 < Jumpyshoes> okay, i think i get it
2010-11-22 02:13:27 < Dark_Shikari> also, the REP_RET  you might have been wondering about
2010-11-22 02:13:33 < Dark_Shikari> in short, if you have a RET after a jump, use REP_RET.
2010-11-22 02:13:36 < Dark_Shikari> Blame AMD.
2010-11-22 02:13:51 < Jumpyshoes> oh
2010-11-22 02:13:54 < Kovensky> REP_RET is one of x86inc's macros I suppose
2010-11-22 02:14:35 < Jumpyshoes> i sure hope these GCI tasks are easy
2010-11-22 02:14:48 < Jumpyshoes> but in other news, this is pretty cool
2010-11-22 02:14:50 < Kovensky> they are once you get the hang of it
2010-11-22 02:14:54 < Jumpyshoes> how optimized this can get
2010-11-22 02:14:58 < Kovensky> since you just need to take any silly function
2010-11-22 02:15:00 < Kovensky> and write asm for it
2010-11-22 02:15:11 < Kovensky> if I had merged x264-audio already, you could write asm for my resampler lol
2010-11-22 02:15:18 < Kovensky> atm it's pure C
2010-11-22 02:15:24 < Kovensky> er
2010-11-22 02:15:31 < Kovensky> s/resample/sample format converter/
2010-11-22 02:15:38 < Dark_Shikari> so, now let's look at some horrible macro abuse
2010-11-22 02:15:38 < Jumpyshoes> ah
2010-11-22 02:15:38 < Kovensky> +r
2010-11-22 02:15:41 < Jumpyshoes> derp
2010-11-22 02:15:56 < Dark_Shikari> dct-a.asm
2010-11-22 02:15:59 < Dark_Shikari> cglobal add4x4_idct_mmx, 2,2
2010-11-22 02:16:02 < Dark_Shikari> this does an inverse DCT
2010-11-22 02:16:09 < Dark_Shikari> steps:
2010-11-22 02:16:12 < Dark_Shikari> 1.  Load dct coeffs.
2010-11-22 02:16:14 < Dark_Shikari> 2.  1D IDCT.
2010-11-22 02:16:17 < Dark_Shikari> 2.  Transpose.
2010-11-22 02:16:19 < Dark_Shikari> *3.
2010-11-22 02:16:22 < Dark_Shikari> 4.  1D IDCT.
2010-11-22 02:16:33 < Dark_Shikari> 5.  Load pixels, add idct output, clamp, store.
2010-11-22 02:16:44 < Dark_Shikari> You might notice that this function looks curiously simple!
2010-11-22 02:16:47 < Kovensky> the IDCT macro itself is probably cute
2010-11-22 02:16:54 < Jumpyshoes> what does transpose do?
2010-11-22 02:17:05 < Dark_Shikari> Exactly what you think it does.
2010-11-22 02:17:06 < Kovensky> just a regular matrix transpose
2010-11-22 02:17:19 < Jumpyshoes> isn't it 1D though?
2010-11-22 02:17:27 < Kovensky> no, the source is 2D
2010-11-22 02:17:32 < Kovensky> but the IDCT is 1D
2010-11-22 02:17:34 < Jumpyshoes> oh, it is
2010-11-22 02:17:52 < Kovensky> however, if you do it on the matrix on both orientations, it works like a 2D IDCT... somehow...
2010-11-22 02:17:58 < Kovensky> idk the maths behind it lol
2010-11-22 02:18:00 < Dark_Shikari> it's called a "separable transform"
2010-11-22 02:18:08 < Dark_Shikari> it means you can do a 2D transform by doing two 1D transforms
2010-11-22 02:18:09 < Dark_Shikari> one in each direction
2010-11-22 02:18:12 < Dark_Shikari> the transform is designed that way.
2010-11-22 02:18:24 < Jumpyshoes> right, works like the DCT derp derp
2010-11-22 02:18:29 < Kovensky> is that specific of the HCT or has always been part of the DCT
2010-11-22 02:18:40 < Jumpyshoes> concentrates the energy and shit woohoo
2010-11-22 02:18:47 < Dark_Shikari> now notice how simple it is.
2010-11-22 02:18:57 < Kovensky> I know nothing about the DCT, and I've read the wikipedia page like 5 times ._.
2010-11-22 02:18:57 < Dark_Shikari> Macros hide all the complexity in little manageable chunks.
2010-11-22 02:19:03 < Jumpyshoes> well, you're calling IDCT4_1D
2010-11-22 02:19:06 < Dark_Shikari> Which can be edited separately.
2010-11-22 02:19:16 < Kovensky> ,skip_prologue?
2010-11-22 02:19:27 < Dark_Shikari> Kovensky: there are functions that call this, and have already set up the registers
2010-11-22 02:19:30 < Dark_Shikari> so they jump directly to the start
2010-11-22 02:19:34 < Dark_Shikari> instead of the init part
2010-11-22 02:19:42 < Dark_Shikari> *asm functions that call this
2010-11-22 02:19:49 < Kovensky> I see
2010-11-22 02:19:53 < Dark_Shikari> Now, here's the fun part
2010-11-22 02:19:53 < Kovensky> cheaters
2010-11-22 02:19:54 < Kovensky> :P
2010-11-22 02:20:07 < Dark_Shikari> IDCT and transpose are both composed of submacros and so on
2010-11-22 02:20:12 < Jumpyshoes> oh god
2010-11-22 02:20:17 < Dark_Shikari> for example
2010-11-22 02:20:21 < Dark_Shikari> a transpose is a series of BUTTERFLY operations
2010-11-22 02:20:23 < Dark_Shikari> see x86util.asm
2010-11-22 02:20:26 < Dark_Shikari> it's actually pretty simple
2010-11-22 02:20:33 < Kovensky> BUTTERFLY?
2010-11-22 02:21:02 < Dark_Shikari> the catch is that in many cases, these macros output to different registers than they input from
2010-11-22 02:21:11 < Dark_Shikari> so, in a crappy asm language, yo'ud have to track every single register manually
2010-11-22 02:21:15 < Dark_Shikari> which would make you go batshit insane
2010-11-22 02:21:16 < Jumpyshoes> argh
2010-11-22 02:21:30 < Jumpyshoes> i hope there's a way around this
2010-11-22 02:21:52 < Kovensky> why is the butterfly named butterfly
2010-11-22 02:21:59 < Dark_Shikari> Jumpyshoes: But...
2010-11-22 02:21:59 < Jumpyshoes> that too
2010-11-22 02:22:09 < Dark_Shikari> in x264asm, you can do this:
2010-11-22 02:22:12 < Dark_Shikari> SWAP 2,3
2010-11-22 02:22:15 < Dark_Shikari> now m2 and m3 are swapped.
2010-11-22 02:22:16 < Dark_Shikari> From now on.
2010-11-22 02:22:22 < Dark_Shikari> It's the same as exchanging those registers' values.
2010-11-22 02:22:25 < Dark_Shikari> But done without any ops.
2010-11-22 02:22:30 < Dark_Shikari> Because it swaps all future uses of those registers.
2010-11-22 02:22:41 < Dark_Shikari> Thus you offload the task of tracking registers to the assembler.
2010-11-22 02:22:42 < Kovensky> now that's evil macro usage
2010-11-22 02:22:50 < j0sh> Kovensky: because the swaps look like a butterfly https://secure.wikimedia.org/wikipedia/en/wiki/Butterfly_diagram
2010-11-22 02:22:50 < Jumpyshoes> that is awesome
2010-11-22 02:22:59 < Dark_Shikari> "m0, m1, m2" are aliased to mm0, mm1, mm2 etc if INIT_MMX is set
2010-11-22 02:23:07 < Dark_Shikari> and xmm0, xmm1, xmm2... if INIT_XMM is set.
2010-11-22 02:23:15 < Dark_Shikari> and mmsize is 8 in the former case, 16 in the latter.
2010-11-22 02:23:19 < Kovensky> yeah, was about to ask what the m%d were
2010-11-22 02:23:20 < Dark_Shikari> So you can declare a single function
2010-11-22 02:23:27 < Dark_Shikari> then initialize it for both mmx and sse!
2010-11-22 02:23:28 < Dark_Shikari> in one go!
2010-11-22 02:23:58 < Dark_Shikari> here's a simple example: denoise, in quant-.asm
2010-11-22 02:24:01 < Dark_Shikari> line 748
2010-11-22 02:24:09 < Dark_Shikari> it loops over the coefficients in a dct block and denoises them
2010-11-22 02:24:13 < Kovensky> why are the movqs in a weird order?
2010-11-22 02:24:19 < Dark_Shikari> it's initted for both mmx and sse trivially
2010-11-22 02:24:20 < Kovensky> on add4x4_idct
2010-11-22 02:24:22 < Jumpyshoes> wait, where is it?
2010-11-22 02:24:41 < Dark_Shikari> quant-a.asm
2010-11-22 02:24:47 < Dark_Shikari> Kovensky: that's the order they're used
2010-11-22 02:24:52 < Dark_Shikari> so it's a bit faster to do it that way
2010-11-22 02:25:16 < Kovensky> so it's about the execution unit optimization I guess
2010-11-22 02:25:20 < Dark_Shikari> no, just ordering
2010-11-22 02:25:29 < Dark_Shikari> the cpu generally doesn't reorder loads/stores that much
2010-11-22 02:25:29 < Jumpyshoes> oh, is there an updated version of x264?
2010-11-22 02:25:36 < Dark_Shikari> git pull
2010-11-22 02:25:37 < Jumpyshoes> cause 748 for me is     bsf   ecx, r3
2010-11-22 02:25:38 < Dark_Shikari> now you have the latest
2010-11-22 02:25:59 < Dark_Shikari> btw, add_idct_mmx 4x4 is about ~5.8x faster than c
2010-11-22 02:26:17 < Kovensky> 748 is on zigzag_scan for me...
2010-11-22 02:26:19  * Kovensky goes pull
2010-11-22 02:26:47 < Dark_Shikari> 788-798 is were we init three copies of this function
2010-11-22 02:26:50 < Dark_Shikari> mmx, sse2, and ssse3.
2010-11-22 02:26:58 < Dark_Shikari> for mmx vs sse2, we just change from INIT_MMX to INIT_XMM
2010-11-22 02:27:09 < Dark_Shikari> for sse2 vs ssse3, we change PABSW and PSIGNW to use the pabsw and psignw instructions, instead of emulations thereof.
2010-11-22 02:27:20 < Dark_Shikari> (SSSE3 adds a "sign restore" and "absolute value" instruction)
2010-11-22 02:27:27 < Dark_Shikari> which are really really useful.
2010-11-22 02:27:47 < Jumpyshoes> okaaaaaay, so
2010-11-22 02:27:51 < Jumpyshoes> finally caught up
2010-11-22 02:28:02 < Jumpyshoes> why are some asm instructions capitalized?
2010-11-22 02:28:47 < Jumpyshoes> like PSIGNW
2010-11-22 02:29:27 < Jumpyshoes> i mean, why is it capitalized while other instructions aren't?
2010-11-22 02:29:28 < Dark_Shikari> PSIGNW isn't an instruction, it's a macro
2010-11-22 02:29:43 < Jumpyshoes> o
2010-11-22 02:29:43 < Dark_Shikari> we %define it to PSIGNW_MMX for the mmx implementation
2010-11-22 02:29:48 < Dark_Shikari> and when we make the ssse3 version
2010-11-22 02:29:53 < Dark_Shikari> we %define it to PSIGNW_SSSE3
2010-11-22 02:29:57 < Jumpyshoes> i see
2010-11-22 02:30:10 < Dark_Shikari> the latter of which... is just psignw.
2010-11-22 02:30:20 < Kovensky> so I heard you like instructions so we put instructions on your instructions so you can...
2010-11-22 02:30:30 < Dark_Shikari> by the way
2010-11-22 02:30:32 < Dark_Shikari> in INIT_XMM
2010-11-22 02:30:33 < Dark_Shikari> mova == movdqa
2010-11-22 02:30:35 < Dark_Shikari> movh == movq
2010-11-22 02:30:38 < Dark_Shikari> movu == movdqu
2010-11-22 02:30:42 < Dark_Shikari> init INIT_MMX
2010-11-22 02:30:43 < Dark_Shikari> mova == movq
2010-11-22 02:30:45 < Dark_Shikari> movh == movd
2010-11-22 02:30:46 < Dark_Shikari> movu == movq
2010-11-22 02:30:51 < Dark_Shikari> mova == move aligned
2010-11-22 02:30:53 < Dark_Shikari> movh == move half
2010-11-22 02:30:56 < Dark_Shikari> movu == move unaligned
2010-11-22 02:31:55 < Jumpyshoes> oh boy
2010-11-22 02:32:28 < Jumpyshoes> should i try to take a look at this denoise function?
2010-11-22 02:32:37 < Dark_Shikari> Yes, feel free to look at the C.
2010-11-22 02:32:39 < Dark_Shikari> It's not very complicated.
2010-11-22 02:32:48 < Jumpyshoes> where can i find the C?C
2010-11-22 02:32:51 < Dark_Shikari> C is in common/quant.c
2010-11-22 02:32:53 < Dark_Shikari> as you might expect.
2010-11-22 02:33:47 < Jumpyshoes> wait, so why is the macro 1-2?
2010-11-22 02:33:52 < Jumpyshoes> DENOISE_DCT 1-2
2010-11-22 02:33:59 < Dark_Shikari> variable number of arguments
2010-11-22 02:34:03 < Dark_Shikari> Ah, I forgot the third number.
2010-11-22 02:34:08 < Dark_Shikari> cglobal name, X, Y, Z
2010-11-22 02:34:11 < Dark_Shikari> We only covered the X and Y.
2010-11-22 02:34:29 < Dark_Shikari> on win64, xmmregs 6-15 need to be saved.
2010-11-22 02:34:44 < Dark_Shikari> so if we use more than 6 xmmregs, we need to tell x264 about it
2010-11-22 02:34:48 < Dark_Shikari> the last number is the number of xmmregs used.
2010-11-22 02:34:49 < Dark_Shikari> It's optional.
2010-11-22 02:34:55 < Dark_Shikari> So if we are using mmx, we don't bother setting it.
2010-11-22 02:34:58 < Dark_Shikari> and it defaults to 0.
2010-11-22 02:35:47 < Jumpyshoes> why is the C so simple and why is the asm so long
2010-11-22 02:35:56 < Dark_Shikari> 1) C is more expressive than asm
2010-11-22 02:36:03 < Dark_Shikari> 2) the asm is unrolled, doing more iterations per loop than the C
2010-11-22 02:36:10 < Jumpyshoes> oh
2010-11-22 02:37:25 < Dark_Shikari> asm is almost always longer than C.
2010-11-22 02:37:44 < Kovensky> what does pabsw do
2010-11-22 02:37:47 < Dark_Shikari> absolute value
2010-11-22 02:37:53 < Kovensky> it isn't in that nasm reference :(
2010-11-22 02:38:16 < Dark_Shikari> sure, because it's newer than sse2
2010-11-22 02:38:37 < Kovensky> so pabsw dst, src strips the signs from src and stores result on dst?
2010-11-22 02:38:43 < Dark_Shikari> yes
2010-11-22 02:40:35 < Kovensky> and psignw?
2010-11-22 02:43:03 < Jumpyshoes> you know what blows my mind? there's no xor or shr in the asm code
2010-11-22 02:43:06 < Dark_Shikari> Kovensky: restores sign
2010-11-22 02:43:24 < Dark_Shikari> Jumpyshoes: the c code is optimized
2010-11-22 02:43:26 < Kovensky> I'm atm parsing the bunch of punpck
2010-11-22 02:43:34 < Dark_Shikari> it's really more optimized than it needs to be
2010-11-22 02:43:41 < Dark_Shikari> the +/^ is just abs
2010-11-22 02:43:46 < Jumpyshoes> oh
2010-11-22 02:43:47 < Dark_Shikari> and a sign restore at the end
2010-11-22 02:43:54 < Dark_Shikari> the C was effectively rewritten to be more like the asm
2010-11-22 02:44:26 < Kovensky> actually, I lost track of the registers already lol
2010-11-22 02:44:42 < Jumpyshoes> soooooooo did i
2010-11-22 02:45:18 < Dark_Shikari> add comments next to them or give them names if you need to
2010-11-22 02:45:21 < Kovensky> why is m6 only read from
2010-11-22 02:45:27 < Dark_Shikari> m6 is a zero register
2010-11-22 02:45:31 < Dark_Shikari> it's initted at the start
2010-11-22 02:45:45 < Jumpyshoes> it interleaves 0s?
2010-11-22 02:45:51 < Kovensky> so it just supplies 0s to the punpcks
2010-11-22 02:46:08 < Dark_Shikari> yes
2010-11-22 02:46:13 < Dark_Shikari> Jumpyshoes: to convert from 16-bit to 32-bit
2010-11-22 02:46:15 < Dark_Shikari> you add zeroes
2010-11-22 02:46:22 < Kovensky> I actually should go sleep right now, finals week start tomorrow and it's 14 minutes to tomorrow =p
2010-11-22 02:46:32 < Jumpyshoes> you know what my OCD self hates?
2010-11-22 02:46:36 < Jumpyshoes> >paddd     m4, [r1+r3*4+0*mmsize]
2010-11-22 02:46:40 < Jumpyshoes> f-f-f-f-f-fuck
2010-11-22 02:46:47 < Dark_Shikari> what
2010-11-22 02:46:52 < Jumpyshoes> m4 goes with 0
2010-11-22 02:46:58 < Dark_Shikari> oh
2010-11-22 02:46:58 < Dark_Shikari> lol
2010-11-22 02:47:08 < Dark_Shikari> those were just regs I ended up with at the end
2010-11-22 02:47:11 < Dark_Shikari> that's how you write these things
2010-11-22 02:47:12 < Dark_Shikari> =p
2010-11-22 02:48:36 < Jumpyshoes> okay
2010-11-22 02:48:40 < Jumpyshoes> i get the gist of this function
2010-11-22 02:49:08 < Kovensky> gl Jumpyshoes, I'll read the backlog later
2010-11-22 02:49:08 < Dark_Shikari> it subtracts a value from each dct coeff
2010-11-22 02:49:16  * Kovensky sleeps
2010-11-22 02:49:20 < Jumpyshoes> yea, from the offsets?
2010-11-22 02:49:22 < Dark_Shikari> (limiting it to zero, so they don't go negative)
2010-11-22 02:49:22 < Jumpyshoes> cya Kovensky
2010-11-22 02:49:25 < Dark_Shikari> it subtractss the offsets
2010-11-22 02:49:31 < Dark_Shikari> then it adds the amounts subtracted to the accumulators
2010-11-22 02:49:40 < Dark_Shikari> which are then used later in x264 to create new offsets.
2010-11-22 02:50:17 < Jumpyshoes> i see
2010-11-22 02:50:47 < Jumpyshoes> wait, now what's the conditions on this loop from breaking?
2010-11-22 02:51:01 < Jumpyshoes> OH
2010-11-22 02:51:02 < Jumpyshoes> the sub
2010-11-22 02:51:05 < Jumpyshoes> WAY UP TOP
2010-11-22 02:51:23 < Dark_Shikari> yeah, it kind of went walkabout
2010-11-22 02:52:02 < Jumpyshoes> okay
2010-11-22 02:52:05 < Jumpyshoes> now i think i understand
2010-11-22 02:52:48 < Jumpyshoes> understanding 15 lines of asm takes me 30 minutes
2010-11-22 02:52:50 < Jumpyshoes> woohoo
2010-11-22 02:52:52 < Dark_Shikari> That's normal
2010-11-22 02:53:12 < Dark_Shikari> for a newbie, writing is often easier than reading.
2010-11-22 02:53:18 < Dark_Shikari> because when writing, you already know what youw ant
2010-11-22 02:53:19 < Dark_Shikari> *want
2010-11-22 02:53:20 < Jumpyshoes> are you serious?
2010-11-22 02:53:23 < Dark_Shikari> when reading, you have to figure out what someone else wanted.
2010-11-22 02:53:38 < Jumpyshoes> true
2010-11-22 02:53:43 < Dark_Shikari> And you can pattern your functions on others
2010-11-22 02:53:50 < Dark_Shikari> for example, like 1/3 of the functions in x264 are looping over some pixel input
2010-11-22 02:53:57 < Dark_Shikari> using the same basic template.
2010-11-22 02:54:04 < Jumpyshoes> oh
2010-11-22 02:54:12 < Jumpyshoes> i have a feeling GCI is gonna kick my ass
2010-11-22 02:54:15 < Dark_Shikari> in general, it's not as hard as you think.
2010-11-22 02:54:17 < Jumpyshoes> but that's not the point
2010-11-22 02:54:28 < Dark_Shikari> Actually, let's do that.  Let's write a function
2010-11-22 02:54:29 < Jumpyshoes> do all asm functions have a C equivalent?
2010-11-22 02:54:32 < Dark_Shikari> Yes
2010-11-22 02:54:41 < kierank> [02:53] Dark_Shikari: when reading, you have to figure out what someone else wanted. --> or if it's compiler created asm who knows what you have to figure out
2010-11-22 02:54:48 < Jumpyshoes> oh god, i'm gonna get my ass kicked
2010-11-22 02:54:48 < Dark_Shikari> lol
2010-11-22 02:55:51 < Jumpyshoes> well, at least there's c
2010-11-22 02:55:53 < Jumpyshoes> to help me
2010-11-22 02:55:57 < Jumpyshoes> read this
2010-11-22 02:56:17 < Dark_Shikari> void foo( int16_t *dst, int16_t *src1, int16_t *src2 )
2010-11-22 02:56:17 < Dark_Shikari> {
2010-11-22 02:56:17 < Dark_Shikari>     for( int i = 0; i < 64; i++ )
2010-11-22 02:56:17 < Dark_Shikari>         dst[i] = src1[i] - src2[i];
2010-11-22 02:56:18 < Dark_Shikari> }
2010-11-22 02:56:20 < Dark_Shikari> implement this.
2010-11-22 02:56:25 < Dark_Shikari> ask questions as you go
2010-11-22 02:56:25 < Jumpyshoes> ooooh boy
2010-11-22 02:56:28 < Jumpyshoes> very well
2010-11-22 02:56:39 < Dark_Shikari> cglobal foo_mmxext, 3,3
2010-11-22 02:56:41 < Dark_Shikari> or sse2, take your pick
2010-11-22 02:57:14 < Jumpyshoes> i hope you don't expect it to be optimized
2010-11-22 02:57:54 < Dark_Shikari> I expect it to be reasonably fast, i.e. using SIMD
2010-11-22 02:58:07 < Dark_Shikari> feel free to stop at any point and ask any question about anything
2010-11-22 02:58:28 < Jumpyshoes> okay
2010-11-22 03:03:32 < Jumpyshoes> wait, so a qw is 128bits?
2010-11-22 03:03:42 < Dark_Shikari> no, quadword is 64
2010-11-22 03:03:44 < Dark_Shikari> double quadword is 128
2010-11-22 03:03:47 < Jumpyshoes> er
2010-11-22 03:03:49 < Jumpyshoes> yea
2010-11-22 03:03:49 < Dark_Shikari> word is 16
2010-11-22 03:03:53 < Dark_Shikari> quad... 4*...
2010-11-22 03:05:17 < darkbringer> what about overflows?
2010-11-22 03:05:27 < Jumpyshoes> oh, yes <_<
2010-11-22 03:06:10 < Dark_Shikari> not possible as far as I see
2010-11-22 03:06:14 < Dark_Shikari> they're all int16_t
2010-11-22 03:06:29 < Dark_Shikari> I mean, they could happen, but whatever, you won't have to care about them
2010-11-22 03:08:09 < Jumpyshoes> okay, i think this is horribly wrong
2010-11-22 03:08:17 < Dark_Shikari> use pastebin btw instead of pasting tons of shit
2010-11-22 03:08:42 < Jumpyshoes> this is going to be wrong and i will be laughed at <_<
2010-11-22 03:09:02 < Jumpyshoes> well whatever
2010-11-22 03:09:05 < Jumpyshoes> i'm used to looking dumb
2010-11-22 03:10:28 < Jumpyshoes> http://pastebin.com/2JCFdD4R
2010-11-22 03:11:30 < Dark_Shikari> psubw should be lowercase, it's an instruction
2010-11-22 03:11:41 < Dark_Shikari> your function has no stores
2010-11-22 03:11:41 < Jumpyshoes> oh
2010-11-22 03:11:59 < Jumpyshoes> stores?
2010-11-22 03:12:05 < Dark_Shikari> you know, storing your output
2010-11-22 03:12:14 < Jumpyshoes> oops
2010-11-22 03:12:17 < Dark_Shikari> also, n*8, because each iteration handles 8 bytes.
2010-11-22 03:12:27 < Dark_Shikari> also, can you make that a loop instead of %rep?
2010-11-22 03:12:57 < Jumpyshoes> grah, 8 * 8 = 64
2010-11-22 03:13:21 < Jumpyshoes> wouldn't i need another variable for a loop?
2010-11-22 03:13:52 < Dark_Shikari> Yes.
2010-11-22 03:13:54 < Dark_Shikari> you could do something like
2010-11-22 03:13:59 < Dark_Shikari> mov r3d, 8
2010-11-22 03:14:03 < Dark_Shikari> the dec r3d on each iteration
2010-11-22 03:14:17 < Jumpyshoes> then wouldn't i need another variable declared in the func?
2010-11-22 03:14:24 < Dark_Shikari> yes, you'd have to do 3,4 instead of 3,3
2010-11-22 03:14:35 < Jumpyshoes> oh, okay
2010-11-22 03:14:52 < Dark_Shikari> also, psubw can take input from memory
2010-11-22 03:14:54 < Dark_Shikari> so you only need one load
2010-11-22 03:14:55 < Dark_Shikari> i.e.
2010-11-22 03:14:57 < Dark_Shikari> movq mm0, blah
2010-11-22 03:15:00 < Dark_Shikari> psubw mm0, blah2
2010-11-22 03:15:19 < Jumpyshoes> ooh
2010-11-22 03:15:46 < Jumpyshoes> that is nice
2010-11-22 03:15:59 < Dark_Shikari> all instructions can access memory in their second argument.
2010-11-22 03:16:05 < Dark_Shikari> well, almost all.
2010-11-22 03:16:27 < Jumpyshoes> what was the command for subtracting 1, dec?
2010-11-22 03:16:47 < Dark_Shikari> yes
2010-11-22 03:16:50 < Dark_Shikari> it's like sub val, 1
2010-11-22 03:16:54 < Jumpyshoes> right
2010-11-22 03:16:58 < Dark_Shikari> decrement
2010-11-22 03:17:23 < Jumpyshoes> oh, and do i need a ret?
2010-11-22 03:18:33 < Dark_Shikari> yes, at the end
2010-11-22 03:18:41 < Dark_Shikari> just like in c
2010-11-22 03:21:06 < Jumpyshoes> http://pastebin.com/HLCed9Jv
2010-11-22 03:21:32 < Dark_Shikari> .loop should have a :
2010-11-22 03:21:42 < Jumpyshoes> oh
2010-11-22 03:21:47 < Dark_Shikari> in address expressions, you have to use native sizes
2010-11-22 03:21:50 < Dark_Shikari> so inside the [], no d
2010-11-22 03:22:01 < Dark_Shikari> other than that, you're done!
2010-11-22 03:22:19 < Jumpyshoes> woah, that only took three revisions
2010-11-22 03:22:37 < Jumpyshoes> actually wait
2010-11-22 03:22:51 < Jumpyshoes> i'm dealing with int16, and looping 64 times
2010-11-22 03:23:05 < Jumpyshoes> wouldn't i need to do more than 8 loops? since mm is 64 bits
2010-11-22 03:23:27 < Dark_Shikari> ah yes, you'll have to do 16 loops.
2010-11-22 03:23:41 < Dark_Shikari> now, btw, here's the big nice part about writing x264 asm
2010-11-22 03:23:44 < Dark_Shikari> make checkasm;./checkasm
2010-11-22 03:24:02 < Jumpyshoes> where do i do that?
2010-11-22 03:24:26 < Dark_Shikari> in your terminal
2010-11-22 03:24:44 < Jumpyshoes> right
2010-11-22 03:25:00 < Jumpyshoes> lots of warnings thar
2010-11-22 03:25:37 < Jumpyshoes> oh and there's an error
2010-11-22 03:25:43 < Dark_Shikari> error?  what'd you do
2010-11-22 03:25:50 < Jumpyshoes> i have no clue
2010-11-22 03:26:08 < Jumpyshoes> common/x86/const-a.asm:50: error: undefined symbol `BIT_DEPTH' (first use)
2010-11-22 03:26:09 < Jumpyshoes> common/x86/const-a.asm:50: error:  (Each undefined symbol is reported only once.)
2010-11-22 03:26:14 < Dark_Shikari> you need to reconfigure
2010-11-22 03:26:14 < darkbringer> ./configure
2010-11-22 03:26:17 < Jumpyshoes> oh right
2010-11-22 03:26:18 < Jumpyshoes> since i pulled it
2010-11-22 03:28:04 < Jumpyshoes> x264: All tests passed Yeah :)
2010-11-22 03:28:22 < Dark_Shikari> you just ran unit tests on every asm function in x264.
2010-11-22 03:28:27 < Jumpyshoes> woah that is cool
2010-11-22 03:28:40 < Jumpyshoes> so i can test my func
2010-11-22 03:28:41 < Jumpyshoes> by adding it?
2010-11-22 03:28:58 < Dark_Shikari> yes
2010-11-22 03:29:02 < Dark_Shikari> well, I mean
2010-11-22 03:29:06 < Dark_Shikari> your function doesn't have a C equivalent in x264
2010-11-22 03:29:07 < Dark_Shikari> so not really
2010-11-22 03:29:09 < Jumpyshoes> oh
2010-11-22 03:29:10 < Jumpyshoes> right
2010-11-22 03:29:12 < Dark_Shikari> but for anything with a C equivalent, it can test it
2010-11-22 03:29:17 < Jumpyshoes> sexy
2010-11-22 03:29:17 < Dark_Shikari> it of course needs unit test code in checkasm.c
2010-11-22 03:29:24 < Dark_Shikari> but for  all existing C functions, there's unit test code
2010-11-22 03:29:59 < Jumpyshoes> nice
2010-11-22 03:32:01 < Dark_Shikari> also
2010-11-22 03:32:03 < Dark_Shikari> ./checkasm --bench
2010-11-22 03:33:22 < Jumpyshoes> is that the number of clock cycles?
2010-11-22 03:33:32 < Dark_Shikari> 10ths of a clock cycle
2010-11-22 03:33:48 < Jumpyshoes> oh
2010-11-22 03:33:51 < Jumpyshoes> crazy
2010-11-22 03:33:56 < Dark_Shikari> note not all benches are quite accurate, particularly in the case of functions heavily bound by branch mispredictions
2010-11-22 03:34:07 < Jumpyshoes> add4x4_idct is like
2010-11-22 03:34:09 < Dark_Shikari> most commonly where C is branchy
2010-11-22 03:34:11 < Dark_Shikari> and asm isn't
2010-11-22 03:34:14 < Dark_Shikari> but most aren't like that.
2010-11-22 03:34:16 < Jumpyshoes> ah
2010-11-22 03:34:27 < Jumpyshoes> when do GCI tasks come out?
2010-11-22 03:34:34 < Dark_Shikari> probably tomorrow
2010-11-22 03:34:38 < Dark_Shikari> I really need to get to writing up the rest of them
2010-11-22 03:34:40 < Dark_Shikari> we only have 5
2010-11-22 03:34:42 < Dark_Shikari> I need more :/
2010-11-22 03:35:03 < Jumpyshoes> can you add stuff as the contest progresses?
2010-11-22 03:35:13 < Dark_Shikari> I hope so
2010-11-22 03:35:18 < Dark_Shikari> according to them I think yes
2010-11-22 03:35:27 < Jumpyshoes> cool
2010-11-22 03:35:28 < Dark_Shikari> as they said you can have a repeatable task just by re-adding it after someone takes it
2010-11-22 03:36:42 < Jumpyshoes> how hard are these tasks?
2010-11-22 03:37:10 < Dark_Shikari> http://wiki.videolan.org/X264_GCodeIn_Ideas
2010-11-22 03:38:37 < Jumpyshoes> interesting
2010-11-22 03:42:49 < ps-auxw> Dark_Shikari: Just wondering, is there a reason that PIXEL_SAD_C has a separate name argument, instead of constructing the function name using lx/ly and ## concatenation operator?
2010-11-22 03:42:59 < Dark_Shikari> ps-auxw: nobody did it 5 years ago when it was written
2010-11-22 03:43:01 < Dark_Shikari> and it hasn't bee modified since
2010-11-22 03:43:05 < Dark_Shikari> *been
2010-11-22 03:43:07 < ps-auxw> I see.
2010-11-22 03:53:46 < ps-auxw> Would a patch be welcome? ;)
2010-11-22 03:55:30 < Dark_Shikari> not relaly, kind of a waste of a patch =p
2010-11-22 03:55:50 < ps-auxw> True, true.
2010-11-22 08:42:15 < xxthink> < Dark_Shikari> x264 at crf 18 is "almost lossless" too
2010-11-22 08:42:42 < xxthink> what's the recommend GOP structure that x264 should use to save the video content?
2010-11-22 08:42:46 < xxthink> all I frames?
2010-11-22 08:43:20 < xxthink> sorry, wrong channel
2010-11-22 08:45:09 < rfw> so, i heard you guys have GCI tasks :D
2010-11-22 08:45:17 < rfw> hurt me if people have been here about this before me
2010-11-22 08:47:20 < dj_tjerk> you can check the logs anytime ;)
2010-11-22 08:47:49 < rfw> heh
2010-11-22 08:47:54 < rfw> the google site is still being lol :(
2010-11-22 08:48:25 < dj_tjerk> http://wiki.videolan.org/X264_GCodeIn_Ideas <-- just some ideas, D_S will make an official list tomorrow/today (timezones)
2010-11-22 08:48:36 < rfw> ah
2010-11-22 08:48:43 < rfw> yeah i looked at that
2010-11-22 08:48:51 < rfw> the regression testing tool looks fun
2010-11-22 08:48:58 < dj_tjerk> but if you check the logs (see topic) you see D_S giving his awesome asm explanation
2010-11-22 08:49:20 < rfw> 4MB of bzipped logs
2010-11-22 08:49:31 < rfw> how big is that uncompressed
2010-11-22 08:49:33 < dj_tjerk> log*
2010-11-22 08:49:35 < dj_tjerk> 17MB
2010-11-22 08:49:40 < rfw> derp
2010-11-22 08:51:18 < rfw> --- Log opened Thu Jul 24 01:08:59 2008
2010-11-22 08:51:26 < rfw> this is going to be fun to page through
2010-11-22 08:51:31 < dj_tjerk> yeh.. you might wanna start reading from the bottom ;)
2010-11-22 08:51:43 < dj_tjerk> and then scroll up to wherever his awesome asm explanation starts
2010-11-22 08:53:42 < rfw> reading uncolored logs really isn't fun
2010-11-22 08:54:05 < rfw> "I asked you if you understood my explanation of what a function does."
2010-11-22 08:54:08 < rfw> somewhere near here?
2010-11-22 08:54:44 < Kovensky> just a bit more above
2010-11-22 08:54:49 < rfw> ohi Kovensky
2010-11-22 08:56:02 < rfw> god i can't be bothered with this today
2010-11-22 08:56:13 < rfw> it's probably going to be like this for another half a day
2010-11-22 09:01:34 < rfw> bah, i'm not really having much luck at all
2010-11-22 09:01:41 < rfw> i guess i'll be around tomorrow then
2010-11-22 09:01:51 < dj_tjerk> :?
2010-11-22 09:01:51 < rfw> night
2010-11-22 09:02:04 < rfw> the gci website is still giving me 500 errors
2010-11-22 09:02:09 < dj_tjerk> oh.. :|
2010-11-22 09:02:12 < rfw> yeah :|
2010-11-22 09:02:27 < rfw> probably from all the other high school kids ddosing google
2010-11-22 09:04:21 < rfw> tomorrow then :(
2010-11-22 09:04:22 < rfw> night
2010-11-22 09:08:16 < Kovensky> http://pastebin.ca/1998693 <-- my version of that for loop (SSE2)
2010-11-22 09:09:18 < Kovensky> though it's pretty much the same as the mmext, but with movdqu and xmm instead of movq and mm ._.
2010-11-22 09:23:09 < Kovensky> http://pastebin.ca/1998700 <-- uh, moving the sub up, fixes out of bounds read / writes
2010-11-22 09:58:34 < koda|work> hi all
2010-11-22 09:59:21 < koda|work> is anyone experiencing
2010-11-22 09:59:24 < koda|work> common/x86/const-a.asm:50: error: undefined symbol `BIT_DEPTH' (first use)
2010-11-22 09:59:24 < koda|work> ?
2010-11-22 10:00:44 < koda|work> oh nevermind i forgot to configure cleanly
2010-11-22 10:06:12 < Dark_Shikari> hah
2010-11-22 10:24:39 < wipple> Dark_Shikari: sorry, i found a mistake in my first patch
2010-11-22 10:25:06 < wipple> line 207 should be +if [ "armv6" = "yes" ]; then
2010-11-22 10:26:01 < Dark_Shikari> applied
2010-11-22 10:26:12 < wipple> thx
2010-11-22 10:34:34 < Kovensky> Dark_Shikari: is my loop there correct? (as in, do I need to do it by subing 2 at a time?)
2010-11-22 10:34:46 < Kovensky> also, is it common for asm code to work backwards through memory?
2010-11-22 10:35:01 < Dark_Shikari> link?
2010-11-22 10:35:11 < Dark_Shikari> it's common to work backwards to avoid the "cmp" before the jump.
2010-11-22 10:35:12 < Kovensky> http://pastebin.ca/1998700
2010-11-22 10:35:30 < Dark_Shikari> should be mov r3d, 16 to start, not 8
2010-11-22 10:36:10 < Dark_Shikari> also if your source is aligned
2010-11-22 10:36:13 < Dark_Shikari> er, unaligned
2010-11-22 10:36:19 < Dark_Shikari> you can't use memory arguments for psubw
2010-11-22 10:36:29 < Kovensky> I see
2010-11-22 10:36:50 < Kovensky> so then I either assume the whole thing is aligned and use movdqa, or use two movdqus before the psubw
2010-11-22 10:37:04 < Dark_Shikari> yes
2010-11-22 10:38:06 < Kovensky> I think the 8 was left there from when I tried using *16 on the addressing (which ofc failed to assemble)
2010-11-22 10:38:35 < Dark_Shikari> the only purpose of the * is to save instruction space by using "dec"
2010-11-22 10:38:36 < Dark_Shikari> instead of sub
2010-11-22 10:38:43 < Dark_Shikari> so for clarity in your case you might as well just mov r3d, 128
2010-11-22 10:38:45 < Dark_Shikari> and sub 16 on each iteration
2010-11-22 10:38:54 < Kovensky> http://pastebin.ca/1998753 <-- unaligned, http://pastebin.ca/1998756 <-- aligned
2010-11-22 10:38:58 < Kovensky> Dark_Shikari: ok
2010-11-22 10:40:24 < Kovensky> http://pastebin.ca/1998759
2010-11-22 10:40:49 < Dark_Shikari> I'd put the sub above the jg, and mov 112 instead of 128.
2010-11-22 10:40:55 < Dark_Shikari> to reduce data dependencies.
2010-11-22 10:41:19 < Dark_Shikari> and do jge instead of jg
2010-11-22 10:41:39 < Dark_Shikari> http://pastebin.ca/1998761
2010-11-22 10:42:04 < Kovensky> feels more natural, except for starting at 112
2010-11-22 10:45:39 < Dark_Shikari> lololol, in the vp8 experimental branch they removed the low pass filtering from the H and V predictions
2010-11-22 10:45:45 < Dark_Shikari> .... making them the same as h264's
2010-11-22 10:48:08 < Kovensky> lol
2010-11-22 10:52:01 < koda|work> hey Dark_Shikari, i once found a patch that brought speedbuffering to x264
2010-11-22 10:52:12 < koda|work> is there any plan to implement it?
2010-11-22 10:52:37 < Dark_Shikari> it can be done outside of x264
2010-11-22 10:52:55 < Dark_Shikari> with the presets it should be easy to do now
2010-11-22 10:53:24 < koda|work> that is a 'no'? :p
2010-11-22 10:53:39 < Dark_Shikari> Probably not given that you can do it outside x264 in just a few lines of code
2010-11-22 10:54:39 < koda|work> but how would you that like in vlc?
2010-11-22 10:54:52 < Dark_Shikari> ?
2010-11-22 10:57:04 < koda|work> i mean, i patched x264 to have speed and speed-buffer options and then i use them from vlc when i'm transcoding the video and send it on the net
2010-11-22 10:57:29 < koda|work> without speed and speed-buffer the decoded video would not appear fluid
2010-11-22 10:57:58 < Dark_Shikari> speed is for the encoder, not the decoder
2010-11-22 10:58:12 < Dark_Shikari> it's only "not fluid" if you try to use really slow settings when you can't handle them
2010-11-22 10:58:44 < Dark_Shikari> in VLC you could pretty easily add speed buffer code
2010-11-22 10:58:50 < Dark_Shikari> e.g. adjust encoding speed settings based on how far behind you are
2010-11-22 10:59:28 < koda|work> i see...
2010-11-22 10:59:40 < koda|work> i'll do more testing then
2010-11-22 10:59:52 < Dark_Shikari> speedcontrol is useful so that you can _always_ use the _slowest_ settings possible
2010-11-22 11:05:40 < Alex_W> Dark_Shikari: do you know if professionally encoded blu-rays use explicit wpred?
2010-11-22 11:06:14 < Dark_Shikari> I think so
2010-11-22 11:06:19 < Dark_Shikari> prolly depends on the encoder
2010-11-22 11:07:56 < Alex_W> so i wonder if they do it differently to x264 for compatibility reasons, i guess they don't use dupes at all...
2010-11-22 11:08:13 < Dark_Shikari> most encoders don't
2010-11-22 11:08:53 < Dark_Shikari> so quick, GCI starts today
2010-11-22 11:09:02 < Dark_Shikari> we need more tasks
2010-11-22 11:10:01 < Alex_W> then i wonder if it would be possible to have a blu-ray compatible weightp option in x264? ( i mean one that doesn't break on mediatek chipsets)
2010-11-22 11:10:11 < Alex_W> what kind of tasks are you looking for?
2010-11-22 11:10:18 < Dark_Shikari> http://wiki.videolan.org/X264_GCodeIn_Ideas
2010-11-22 11:10:23 < Dark_Shikari> anything no harder than these
2010-11-22 11:11:26 < Alex_W> maybe some psy testing?
2010-11-22 11:11:39 < Dark_Shikari> we would need some psy things to test.
2010-11-22 11:11:48 < Alex_W> aq-mode 1 vs 2
2010-11-22 11:12:08 < Dark_Shikari> if you want to create a psy curve for someone to test, feel free
2010-11-22 11:12:11 < Dark_Shikari> e.g. by adjusting AQ, etc, etc
2010-11-22 11:12:39 < Dark_Shikari> but I think we need to have something available to test
2010-11-22 11:12:42 < Dark_Shikari> as opposed to making htem write it
2010-11-22 11:15:21 < Dark_Shikari> import ideas from http://wiki.videolan.org/X264_TODO etc
2010-11-22 11:19:58 < Dark_Shikari> Alex_W: tl;dr this is a chance to test stuff
2010-11-22 11:20:02 < Dark_Shikari> prepare stuff to be tested.
2010-11-22 11:20:12 < Alex_W> well i'm already testing a different AQ curve at low variances, though i think the one i'm using atm is probably too aggressive
2010-11-22 11:23:10 < Alex_W> unfortunately it seems like preserving low variance dark areas is going to take a lot of bits, at least based on testing with an elevated black level
2010-11-22 11:25:59 < Alex_W> also i think RD bskip might be a problem for these areas as well
2010-11-22 11:30:18 < Alex_W> anyway here's my current change to aq-mode 1 at low variances: http://pastebin.com/UVfMCC5p
2010-11-22 11:38:36 < Dark_Shikari> I wonder if we can calculate the theoretically "correct" value?
2010-11-22 11:38:41 < Dark_Shikari> i.e. considering the effect of BIT_DEPTH
2010-11-22 11:38:49 < Dark_Shikari> er, correct curve
2010-11-22 11:39:02 < Dark_Shikari> in other words, there are two contributors to variance-related quality:
2010-11-22 11:39:06 < Dark_Shikari> 1) the normal effect
2010-11-22 11:39:08 < Dark_Shikari> 2) truncation
2010-11-22 11:39:17 < Dark_Shikari> at lower variance, the effect of 2) rises and the effect of 1) drops
2010-11-22 11:39:22 < Dark_Shikari> well, the relative effect of 1) drops
2010-11-22 11:50:10 < Alex_W> well i'm certainly open to any suggestions on how to improve this
2010-11-22 11:52:11 < Alex_W> how low should the maximum negative QP offset be anyway? (i mean for variance=1)
2010-11-22 11:52:25 < Dark_Shikari> -20?
2010-11-22 11:52:44 < Dark_Shikari> -15?
2010-11-22 11:53:26 < Alex_W> right now it's approximately the same as the one that was originally used in aq-mode 1 which was around -14.4
2010-11-22 11:54:30 < Alex_W> and the difference between variance 512 and variance 1 is around -9 atm with my new curve
2010-11-22 11:54:37 < Dark_Shikari> what did it used to be?
2010-11-22 11:55:25 < Alex_W> the difference between 512 and 1 or the maximum negative offset for variance 1?
2010-11-22 11:55:32 < Dark_Shikari> former
2010-11-22 11:55:38 < Alex_W> lemme check
2010-11-22 11:56:15 < Dark_Shikari> ok, I added a psy test task
2010-11-22 11:56:16 < Dark_Shikari> what else
2010-11-22 11:56:19 < Dark_Shikari> up to 6 tasks
2010-11-22 11:57:37 < Alex_W> actually the difference in QPs is about the same but the shape of the curve between those two points is much different now
2010-11-22 11:57:47 < Dark_Shikari> ah.
2010-11-22 11:59:15 < Alex_W> but this new curve will likely increase bitrate quite a lot on clips with lots of low variance areas at the same crf
2010-11-22 12:00:48 < Alex_W> but i really wonder if there could be a better way to deal with these areas than just throwing huge amounts of bits at them to exactly preserve their noise/dither
2010-11-22 12:05:37 < Alex_W> Dark_Shikari: btw do you have any test content that shows noticeable banding/blocking in low variance areas even at reasonably high bitrates with the current aq-mode 1?
2010-11-22 12:06:08 < Dark_Shikari> dunno.  you could just create an artificial gradient
2010-11-22 12:06:30 < Dark_Shikari> possible solution: quantization-aware dither
2010-11-22 12:06:40 < Alex_W> yeah i was thinking about doing that
2010-11-22 12:06:55 < Alex_W> explain?
2010-11-22 12:07:15 < Dark_Shikari> i.e. just make dither patterns consisting of basis functions
2010-11-22 12:07:43 < Dark_Shikari> not sure how you would get that, but it's a desired result
2010-11-22 12:07:48 < Alex_W> yes but how do you decide which basis functions to use
2010-11-22 12:08:00 < Dark_Shikari> the highest frequency one.
2010-11-22 12:08:24 < Dark_Shikari> to misquote gmaxwell (I think it was), "blur it out, then cover it up with ants"
2010-11-22 12:08:31 < Alex_W> so [7][7] in the 8x8 transform?
2010-11-22 12:08:34 < Dark_Shikari> yes
2010-11-22 12:08:57 < Alex_W> just one basis function or a combination of a few different ones?
2010-11-22 12:08:57 < Dark_Shikari> This might naturally happen if we did floyd-steinberg in the dct
2010-11-22 12:09:07 < Dark_Shikari> or some other batching of nearby coeffs
2010-11-22 12:09:45 < Alex_W> well this is really what psy-trellis would do ideally
2010-11-22 12:11:08 < Alex_W> but yes if the noise has a low enough magnitude i think it could definitely be worthwhile to just remove it and then replace it with some that looks similar but costs a lot less bits
2010-11-22 12:11:28 < Dark_Shikari> I wasn't thinking of that
2010-11-22 12:11:32 < Dark_Shikari> I was thinking of e.g. 10-bit input
2010-11-22 12:11:38 < Dark_Shikari> and then dithering internally to x264 somehow
2010-11-22 12:11:42 < Dark_Shikari> not necessarily literally that
2010-11-22 12:11:54 < Dark_Shikari> but rather doing something AS A DITHER
2010-11-22 12:11:57 < Dark_Shikari> not as REPLACING NOISE
2010-11-22 12:14:58 < Alex_W> well either way i think this would be much better than having to code some MBs as low as QP 6 just to stop the damn blocking/banding
2010-11-22 12:15:18 < Dark_Shikari> if you had a quantizer that actually retained HF energy you wouldn't really need qp 6
2010-11-22 12:15:54 < jenny`> hey dark - i figured out the issues i was having (if you are curious)
2010-11-22 12:15:58 < Dark_Shikari> ?
2010-11-22 12:16:19 < jenny`> the server was sending compressed frames at 30 Hz but the client wasnt able to keep up
2010-11-22 12:16:37 < Dark_Shikari> common problem in systems with no client feedback for saying "I'm too slow"
2010-11-22 12:16:57 < jenny`> decode was taking too long, and it indeed was buffering @ the network level
2010-11-22 12:17:18 < jenny`> yep yep
2010-11-22 12:19:45 < horlicks> thanks for the correction DS :p
2010-11-22 12:20:45 < Dark_Shikari> weightp is an explicit weighting applied to one input
2010-11-22 12:20:56 < Dark_Shikari> weightb is an implicit (or, optionally, explicit) weighting applied to two inputs
2010-11-22 12:21:24 < horlicks> yeah I understand, seems kinda obvious now
2010-11-22 12:22:02 < horlicks> maybe that's something I can do after mbaff :)
2010-11-22 12:22:18 < Dark_Shikari> what is "that"
2010-11-22 12:22:39 < horlicks> one sec
2010-11-22 12:23:09 < horlicks> "Make weightp work with interlacing. Preferably abuse reference duplication to make it useful for MBAFF."
2010-11-22 12:23:16 < Dark_Shikari> ah yes
2010-11-22 12:23:36 < horlicks> anyway, I'm off
2010-11-22 12:23:48 < Dark_Shikari> \o
2010-11-22 12:31:54 < Alex_W> <Dark_Shikari> I wonder if we can calculate the theoretically "correct" curve? <-- how would you go about doing this anyway?
2010-11-22 12:32:08 < Dark_Shikari> calculate the effect of truncation on quality loss
2010-11-22 12:32:26 < Dark_Shikari> that is:
2010-11-22 12:32:33 < Dark_Shikari> 1) Assume a Laplace distribution of coefficients.
2010-11-22 12:33:05 < Dark_Shikari> 2) Calculate quality loss due to the quantization process
2010-11-22 12:33:11 < Dark_Shikari> 3) Calculate quality loss due to truncation upon idct
2010-11-22 12:33:24 < Dark_Shikari> 4) create a curve by combining the two
2010-11-22 12:34:36 < Alex_W> i see, i doubt that the distribution would be laplacian for noise/dither in these cases though
2010-11-22 12:34:48 < Dark_Shikari> why not?
2010-11-22 12:36:56 < Alex_W> well if it's laplacian then bitrate should probably double every 4 - 6 QPs right? from what i've seen so far it doubles much quicker than that
2010-11-22 12:37:50 < Dark_Shikari> it does once you get to the smooth domain of the curve
2010-11-22 12:38:03 < Dark_Shikari> the reason it doesn't seem to do that is because there's a threshold beyond which everything is zeroed
2010-11-22 12:38:07 < Dark_Shikari> because the magnitude of the noise is so low
2010-11-22 12:38:20 < Dark_Shikari> so you need to get beyond the discontinuous part of the curve
2010-11-22 12:38:23 < Dark_Shikari> then it's smooth and laplacian
2010-11-22 12:40:48 < Alex_W> also because the idct always rounds up should we try to compensate for this by adding small negative offsets to the DC coeff or would that be useless?
2010-11-22 12:41:00 < Dark_Shikari> probably useless
2010-11-22 14:34:52 < Dark_Shikari> pengvado: ping
2010-11-22 14:35:28 < Dark_Shikari> of the patches at http://pastebin.com/rW4dsM9J , which can I commit now?
2010-11-22 14:41:42 < Dark_Shikari> I will push them all today if you don't complain
2010-11-22 14:52:50 < jarod> so that patch
2010-11-22 14:52:53 < jarod> --version
2010-11-22 14:53:07 < jarod> is such inaccurate info useful?
2010-11-22 14:53:53 < Dark_Shikari> go away troll
2010-11-22 14:54:07 < jarod> its not to troll
2010-11-22 14:54:23 < jarod> just saying one revision can make a huge difference
2010-11-22 14:54:38 < jarod> unless you meant kierank
2010-11-22 14:55:37 < jarod> but if you want troll
2010-11-22 14:55:48 < jarod> allow me to <?php header('Location: http://www.webmproject.org/tools/vp8-sdk/'); ?>
2010-11-22 14:57:52 < jarod> your mood swings are worst than a 8 month pregnant female
2010-11-22 14:59:05 < pengvado> which one revision is inaccurate?
2010-11-22 14:59:48 < pengvado> Dark_Shikari: 1,3,4,6,7,8 ok
2010-11-22 15:00:00 < pengvado> 2: I agree with mru's comments regarding nulls
2010-11-22 15:00:37 < pengvado> 5: that's a lot of repitions of that if/else block. sounds like a job for a macro.
2010-11-22 15:00:52 < kierank> oh dear. one of the replicated x264 blu-rays has a problem
2010-11-22 15:00:57 < Dark_Shikari> oh no
2010-11-22 15:01:08 < Dark_Shikari> do we know what the issue is?
2010-11-22 15:01:10 < Dark_Shikari> and if it's x264's problem?
2010-11-22 15:01:20 < kierank> "We've tested the disc extensively without bumping into any problems, however after replication the customer is complaining the disc is pixelating at a certain scene in the movie."
2010-11-22 15:01:52 < Dark_Shikari> :>
2010-11-22 15:01:56 < Dark_Shikari> Do we have a sample?
2010-11-22 15:02:12 < Dark_Shikari> Sean_McG: ^
2010-11-22 15:02:38 < Dark_Shikari> what does "replicated" mean -- they've already printed all the blu-rays?
2010-11-22 15:02:58 < kierank> I fear that's exactly what it means.
2010-11-22 15:03:43 < Dark_Shikari> what could cause that, a problem on a particular Blu-ray player that they didn't test on?
2010-11-22 15:03:58 < jarod> not testing and showing results @ devs ftf.... fuck commercialism, i hope its my fault
2010-11-22 15:04:03 < kierank> I guess
2010-11-22 15:05:49 < Dark_Shikari> kierank: can we get them to give a sample or something under NDA?
2010-11-22 15:05:50 < Dark_Shikari> or whatnot
2010-11-22 15:06:05 < kierank> yes i will ask
2010-11-22 15:06:10 < Dark_Shikari> so we can see if we have to commit sepukku or not
2010-11-22 15:06:19 < tjoener> I know there is a problem in x264 with dark scenes (bedroom scene with ellie and carl in up)
2010-11-22 15:06:27 < tjoener> it gets quite pixelated
2010-11-22 15:06:32 < Dark_Shikari> tjoener: that's not what they mean
2010-11-22 15:06:40 < Dark_Shikari> that isn't a "problem in x264"
2010-11-22 15:06:43 < tjoener> could be
2010-11-22 15:06:43 < Dark_Shikari> that's "you not using enough bitrate" =p
2010-11-22 15:06:48 < tjoener> haha
2010-11-22 15:06:51 < CIA-98> x264: James Darnley  master * r8eaf8a66d5 x264/filters/video/resize.c: Fix resize filter rounding code
2010-11-22 15:06:53 < tjoener> well its only in dark scenes
2010-11-22 15:06:54 < Dark_Shikari> at blu-ray bitrates, such a thing is basically meaningless
2010-11-22 15:06:59 < Dark_Shikari> or more accurately
2010-11-22 15:07:02 < tjoener> bright scenes are very VERY nice
2010-11-22 15:07:02 < Dark_Shikari> your monitor is very badly calibrated
2010-11-22 15:07:03 < CIA-98> x264: Anton Mitrofanov  master * raf1a7413af x264/encoder/ (encoder.c slicetype.c):
2010-11-22 15:07:03 < CIA-98> x264: Fix regression in chroma weightp
2010-11-22 15:07:03 < CIA-98> x264: Missing cache calls could cause artifacts, encoder/decoder desync.
2010-11-22 15:07:11 < Dark_Shikari> x264 assumes your monitor is calibrated correctly
2010-11-22 15:07:12 < tjoener> well ok
2010-11-22 15:07:13 < Dark_Shikari> i.e. blacks are black
2010-11-22 15:07:19 < tjoener> my monitor could be the issue
2010-11-22 15:07:19 < Dark_Shikari> if blacks are not black, the results will look very bad
2010-11-22 15:07:27 < tjoener> monitors nowadays are quite crappy
2010-11-22 15:07:42 < tjoener> havent had the time to look into it yet
2010-11-22 15:08:01 < tjoener> Ive got a second one though, old acer flatscreen with far better colours as the new one
2010-11-22 15:08:39 < Dark_Shikari> kierank: you can tell them that I can investigate it to see if there's anything weird about it that could cause problems in broken players, etc
2010-11-22 15:08:44 < Dark_Shikari> i.e. any "common bug triggers"
2010-11-22 15:09:40 < Dark_Shikari> I have a hunch, but I can't say what it is until I get a sample.
2010-11-22 15:10:21 < tjoener> weightp (like flash?)
2010-11-22 15:10:35 < Dark_Shikari> no
2010-11-22 15:25:08 < Dark_Shikari> kierank: well, if this sinks everything, I'd like to know
2010-11-22 15:26:45 < kierank> ok
2010-11-22 16:11:58 < Jumpyshoes> hey Dark_Shikari, would it be too much if i took on the GCI task of writing an assembly function?
2010-11-22 16:12:20 < Dark_Shikari> Of course not.
2010-11-22 16:12:23 < Dark_Shikari> That's a repeatable task.
2010-11-22 16:12:28 < Jumpyshoes> cool
2010-11-22 16:12:29 < Dark_Shikari> If you take it, we'll add it in again
2010-11-22 16:12:33 < Jumpyshoes> i think i will then
2010-11-22 16:12:33 < holger_> Jumpyshoes:  do you have something specific on your mind?
2010-11-22 16:12:40 < Dark_Shikari> Yeah, come up with something first
2010-11-22 16:12:42 < Jumpyshoes> oh
2010-11-22 16:12:45 < Jumpyshoes> i'll need to do that at home
2010-11-22 16:12:51 < Dark_Shikari> keep in mind, almost all new opportunities in asm functions are in the category of "high bit depth"
2010-11-22 16:12:52 < Jumpyshoes> and look for a simple function
2010-11-22 16:12:59 < Dark_Shikari> as, before high bit depth, we had everything done basically (for x86, at least)
2010-11-22 16:13:08 < Dark_Shikari> but now with high bit depth, there's tons of missing functions
2010-11-22 16:13:12 < Dark_Shikari> for example, dequant.
2010-11-22 16:13:23 < Jumpyshoes> so, for example, 10-bit?
2010-11-22 16:13:32 < Dark_Shikari> high bit depth encompasses >8-bit
2010-11-22 16:13:38 < Dark_Shikari> the thing that makes it different is:
2010-11-22 16:13:41 < Dark_Shikari> 1) pixels are 16-bit instead of 8-bit
2010-11-22 16:13:45 < Dark_Shikari> 2) dct coeffs are 32-bit instead of 16-bit
2010-11-22 16:13:48 < Dark_Shikari> thus all the asm is different.
2010-11-22 16:14:06 < Jumpyshoes> but, the same algorithms apply, no?
2010-11-22 16:14:09 < Dark_Shikari> yes
2010-11-22 16:14:18 < holger_> 3) some things can now overflow that couldn't before
2010-11-22 16:14:25 < Dark_Shikari> 4) some things that could overflow before, now can't
2010-11-22 16:14:37 < Dark_Shikari> e.g. in 8-bit, they overflowed, but in 16-bit, since all our inputs are no more than 10-bit, don't
2010-11-22 16:15:08 < Dark_Shikari> e.g. in dequant, previously, we had two dequant branches, based on whether or not it needed 32-bit or 16-bit intermediate precision to work
2010-11-22 16:15:08 < Jumpyshoes> isn't that easier to take care of then?
2010-11-22 16:15:12 < Dark_Shikari> but now.... we can just do 32-bit
2010-11-22 16:15:16 < Dark_Shikari> and have no branch
2010-11-22 16:15:26 < Dark_Shikari> some things are easier, some are harder.
2010-11-22 16:15:38 < Jumpyshoes> i see
2010-11-22 16:16:17 < Jumpyshoes> is the high bit stuff in the same location in the code?
2010-11-22 16:17:42 < holger_> pretty much, yeah. look for ifdef HIGH_BIT_DEPTH
2010-11-22 16:18:27 < Jumpyshoes> thanks
2010-11-22 16:18:29 < holger_> to see which functions are missing you could probably just run checkasm with and without high bit depth and compare the output
2010-11-22 16:18:41 < holger_> erm. checkasm --bench of course ;)
2010-11-22 16:21:46 < Jumpyshoes> how do i run checkasm w/ high bit depth?
2010-11-22 16:22:04 < holger_> you configure for high bit depth, checkasm gets built accordingly
2010-11-22 16:22:08 < BugMaster|work> compile it with high bit depth
2010-11-22 16:22:12 < BugMaster|work> configure --bit-depth=10
2010-11-22 16:22:26 < Jumpyshoes> ah, okay, thanks
2010-11-22 16:23:22 < kierank> ^^ first gci person!!!
2010-11-22 16:23:37 < holger_> we often have more than one asm function btw. _mmx, _sse2, _ssse3, _sse4 are selfexplaining. then there is _sse2_misalign (amd k10 allows misaligned data loads), _c32/_c64 (cacheline split optimized versions), _fastshuffle (denotes somewhat decent ssse3, excludes conroe and atom)
2010-11-22 16:23:55 < Jumpyshoes> kierank: i'm a dumbass so don't expect much
2010-11-22 16:27:14 < holger_> asm isn't that hard in itself. i see dark_shikari already gave you his crash course. you just need to visualize what you're doing (or if you're bad at that, waste a lot of paper documenting what you think you're doing ;)
2010-11-22 16:27:49 < callahanafk> Just pick one and make a bad attempt.  Then DK's head will explode and he'll fix it for you.
2010-11-22 16:27:51 < Jumpyshoes> well, it takes me a while to get used to stuff
2010-11-22 16:28:00 < Jumpyshoes> haha, my initials are actually DK
2010-11-22 16:28:07 < callahan> DS rather
2010-11-22 16:28:45 < Jumpyshoes> hopefully it isn't too bad
2010-11-22 16:28:54 < Jumpyshoes> it only took me three tries to write a three line C function!
2010-11-22 16:30:01 < holger_> the harder part is debugging asm if it doesn't work. so you need to think before you code. even more than in any other language.
2010-11-22 16:30:12 < Jumpyshoes> i'm bad at thinking
2010-11-22 16:30:22 < Jumpyshoes> this will be interesting
2010-11-22 16:32:04 < Dark_Shikari> so anyways, if more GCI people come on and I'm not here
2010-11-22 16:32:08 < Dark_Shikari> and nobody else is here to help them
2010-11-22 16:32:12 < Dark_Shikari> advise them to wait for me
2010-11-22 16:32:16 < holger_> (visualize: have a mental picture of which register holds what, which values go where, etc. write it down if you have to)
2010-11-22 16:33:02 < callahan> Meh, just pick one and do it, then fix it 10 times until it's right.
2010-11-22 16:33:21 < Jumpyshoes> (more like 400 for me)
2010-11-22 16:33:38 < Dark_Shikari> stop underestimating yourself
2010-11-22 16:33:47 < callahan> If that's what it takes, start sooner rather than later :)
2010-11-22 16:34:03 < Jumpyshoes> good thing i have two months
2010-11-22 16:35:17 < Jumpyshoes> i have the lowbit and highbit functions if anyone wants them
2010-11-22 16:36:01 < holger_> Jumpyshoes:  there is another task you can do to get your feet wet. (not a gci task though, because we can't seriously expect anyone to succeed) pick any asm function, try to modify it (do things differently, think outside the box) and keep it working. if you manage to get anything faster *g* we'll gladly add a gci task for you to submit your results and get credit
2010-11-22 16:36:11 < Dark_Shikari> What holger_ said.
2010-11-22 16:36:37 < Jumpyshoes> cool
2010-11-22 16:36:57 < Jumpyshoes> that sounds more viable
2010-11-22 16:37:04 < Dark_Shikari> no, I'd say it's harder
2010-11-22 16:37:09 < Jumpyshoes> oh really?
2010-11-22 16:37:19 < Dark_Shikari> making an asm function faster than C is easy
2010-11-22 16:37:27 < Dark_Shikari> making an asm function faster than holger_'s is lunatic-mode
2010-11-22 16:37:36 < Jumpyshoes> ._.
2010-11-22 16:37:49 < Dark_Shikari> of course, not all the asm functions are by holger, so it's not that hard!
2010-11-22 16:38:06 < Dark_Shikari> In general though, it's a good thing to do so that you learn why code is written like it is
2010-11-22 16:38:12 < Dark_Shikari> which is a very fast way to learn
2010-11-22 16:38:27 < Dark_Shikari> if you learn why function A is written with instructions B, C, and D, you'll know when to use them yourself, ec.
2010-11-22 16:38:30 < Dark_Shikari> etc
2010-11-22 16:38:44 < Jumpyshoes> i needa go, ttyl
2010-11-22 16:53:29 < Kovensky> < Dark_Shikari> making an asm function faster than holger_'s is lunatic-mode <-- you mean extra stage ^ 3
2010-11-22 16:56:40 < holger_> the cheat mode being "do it on k10"
2010-11-22 16:56:58 < holger_> because we haven't been doing much k10-specific opts yet
2010-11-22 17:47:46 < bgm0> if anyone insterested there are some good read material here: http://www.stanford.edu/class/ee368b/handouts.html
2010-11-22 18:16:48 < rfw> mornin' #x264dev
2010-11-22 18:17:32 < rfw> just stating my interest for doing the GCI regression test tool task
2010-11-22 18:18:10 < Kovensky> hi rofflwaffls
2010-11-22 18:18:17 < rfw> hi brazilian
2010-11-22 18:18:28 < Kovensky> you kiwi :<
2010-11-22 18:18:28 < rfw> You have requested to claim this task and the request is pending. Please don't submit any work until the request is approved.
2010-11-22 18:18:29 < rfw> :<
2010-11-22 18:18:31 < Kovensky> ok, back to topic
2010-11-22 18:18:32 < Kovensky> lol
2010-11-22 18:19:26 < rfw> i'm not totally sure how regression testing works though
2010-11-22 18:19:36 < rfw> is it like unit testing but not unit testing
2010-11-22 18:19:42 < JEEB> っ the FATE system in ffmpeg
2010-11-22 18:20:29 < rfw> so, you have the basic tests, like building and configuring
2010-11-22 18:20:47 < rfw> then you have a few tests that don't have pass or fail
2010-11-22 18:20:53 < rfw> you just store the results of them
2010-11-22 18:20:59 < rfw> then you compare those from version to version?
2010-11-22 18:22:50 < kierank> rfw: for x264 it will be things like major filesize changes or major changes in psnr/ssim that are unexpected
2010-11-22 18:22:57 < Kovensky> well
2010-11-22 18:22:59 < kierank> or perhaps things like vbv uderflows
2010-11-22 18:23:04 < kierank> underflows*
2010-11-22 18:23:19 < rfw> you're going to tell me what those are, right
2010-11-22 18:23:21 < kierank> segfaults too
2010-11-22 18:23:25 < Kovensky> IIRC all what kierank said
2010-11-22 18:23:33 < rfw> because i have no clue ;_;
2010-11-22 18:23:34 < Kovensky> I'm epic lagging here so whatever I type is only appearing 20 seconds later D:
2010-11-22 18:24:03 < kierank> rfw: well for example if you try to encode at 1000kbps and new commits are pushed. The next version ends up creating a file at 1500kbps that's a regression
2010-11-22 18:24:15 < rfw> ah
2010-11-22 18:24:22 < JEEB> and x264 outputs PSNR/SSIM data too
2010-11-22 18:24:27 < rfw> no i meant the pnsr/ssim part
2010-11-22 18:24:28 < rfw> lol
2010-11-22 18:24:34 < JEEB> so you can check that and compare to before
2010-11-22 18:24:43 < kierank> psnr/ssim is a mathematical measurement of the closeness to the source
2010-11-22 18:24:56 < kierank> if there's a major drop in that it's likely that something went wrong
2010-11-22 18:25:04 < rfw> ah
2010-11-22 18:25:07 < rfw> got it
2010-11-22 18:25:09 < kierank> ideally the regression tool will do a git bisect and find out what patch caused it
2010-11-22 18:25:14 < kierank> caused the regression i mean
2010-11-22 18:25:14 < Kovensky> it can be off, but by a max deviation
2010-11-22 18:25:24 < rfw> so basically, all you need is a tool stores the data of each revision
2010-11-22 18:25:31 < rfw> then compares it with other revisions?
2010-11-22 18:25:44 < Kovensky> hurf, I should just stop writing since anything I say is already answered before it appears <_<
2010-11-22 18:25:54  * rfw pats Kovensky
2010-11-22 18:26:04 < kierank> rfw: yes something like that
2010-11-22 18:26:16 < rfw> seems fun to do
2010-11-22 18:26:52 < Kovensky> and have a cute and fluffy web interface for people to see
2010-11-22 18:27:05 < kierank> Kovensky: not necessarily
2010-11-22 18:27:07 < Kovensky> well, disregard the "cute and fluffy" part, unless you really want to :">
2010-11-22 18:27:31 < Kovensky> well true, not necessary
2010-11-22 18:27:39 < Kovensky> I'm just borrowing from FATE at this point
2010-11-22 18:27:46 < Kovensky> but a simple tool to run locally is already p. good
2010-11-22 18:28:11 < Kovensky> s/but //
2010-11-22 18:31:56 < rfw> can i just have cute and fluffy cli output
2010-11-22 18:31:58 < rfw> with colors
2010-11-22 18:32:44 < holger_> rfw: speed regressions are interesting too. and obviously you'd want quite a diverse selection of videos, not just one.
2010-11-22 18:33:17 < rfw> well, i'm probably going for a unit-testing fixtures approach
2010-11-22 18:33:53 < rfw> so i guess that could be easily done
2010-11-22 18:34:16 < holger_> it's quite possible for every single routine to become faster while x264 overall becomes slower.
2010-11-22 18:34:37 < holger_> we'd probably see that if we unrolled everything
2010-11-22 18:35:28 < rfw> heh
2010-11-22 18:35:30 < rfw> isn't that always the case
2010-11-22 18:35:32 < nattofriends> funroll loops !
2010-11-22 18:36:35 < callahan> I always read that as fun roll
2010-11-22 18:36:39 < holger_> just an example of something unit testing wouldn't catch
2010-11-22 18:37:54  * microchip_ rollin'
2010-11-22 18:37:59 < nattofriends> callahan: exactly
2010-11-22 19:23:37 < rfw> so Kovensky
2010-11-22 19:23:41 < rfw> where is Dark_Shikari :(
2010-11-22 19:24:42 < Kovensky> somewhere in california
2010-11-22 19:26:45 < rfw> no i mean
2010-11-22 19:27:00 < rfw> didn't you say he was always here
2010-11-22 19:27:15 < Kovensky> :P
2010-11-22 19:27:26 < astrange> can't always be awake
2010-11-22 19:27:41 < JEEB> he sleeps 2-3h at random times of the day and he will look through his pings as he gets back to his keyboard
2010-11-22 19:27:44 < JEEB> just stay on the channel
2010-11-22 19:27:46 < BugMaster_> sometimes he pretend to sleep
2010-11-22 19:27:48 < rfw> lol
2010-11-22 19:29:23 < rfw> writing python at 8am doesn't seem to be an agreeable experience
2010-11-22 19:29:38  * Kovensky flames with perl
2010-11-22 19:29:53 < rfw> TOO BAD YOU'RE GETTING YOUR REGRESSION TESITNG TOOL IN PYTHON
2010-11-22 19:30:18 < Kovensky> nuu ;_;
2010-11-22 19:30:31 < JEEB> python is mighty fine
2010-11-22 19:30:32 < rfw> :D
2010-11-22 19:31:35 < tjoener> indeed
2010-11-22 19:31:39 < tjoener> I like Python
2010-11-22 19:31:41 < tjoener> got a book
2010-11-22 19:32:30 < rfw> i hope it's not an orly book
2010-11-22 19:32:36 < komisar> tjoener: .chm from python distibution is fine
2010-11-22 19:32:39 < komisar> :)
2010-11-22 19:32:44 < tjoener> heh, I know
2010-11-22 19:32:53 < tjoener> ive been reading the online documentation
2010-11-22 19:33:05 < tjoener> but I dont know of anything to make with it
2010-11-22 19:33:12 < tjoener> so I'm kind of letting it be
2010-11-22 19:33:51 < komisar> very simple and powerfull script-language
2010-11-22 19:34:08 < holger_> python makes everything seem so easy. and sometimes it does. in which case your problem was either a) really easy or b) didn't need much performance
2010-11-22 19:34:37 < tjoener> well yeah
2010-11-22 19:34:40 < tjoener> thats my idea too
2010-11-22 19:34:49 < tjoener> I never find a good use for a scripting language
2010-11-22 19:35:09 < tjoener> except for maybe batch/shell scripts, for which there are batch/shell scripts
2010-11-22 19:35:18 < elenril> python isn't just scripting
2010-11-22 19:35:30 < JEEB> > batch scripts
2010-11-22 19:35:31  * elenril wrote an mpd client in it
2010-11-22 19:35:31 < JEEB> eww
2010-11-22 19:35:35 < tjoener> yeah I know
2010-11-22 19:35:36 < djahandarie> I use Haskell for most everything I need
2010-11-22 19:35:54 < Kovensky> tjoener: do you seriously consider batch scripts for stuff? D:
2010-11-22 19:35:57 < Kovensky> plz2use perl
2010-11-22 19:36:15 < tjoener> naah, mostly ridiculously evil stuff
2010-11-22 19:36:24 < tjoener> running a few programes on a file
2010-11-22 19:36:28 < tjoener> not much else
2010-11-22 19:36:31 < Kovensky> though if you need good internationalization support you're better off with python3, assuming they did the win32 part right
2010-11-22 19:37:00 < elenril> >implying anybody uses python3
2010-11-22 19:37:12 < tjoener> whats wrong with python 3?
2010-11-22 19:37:17 < tjoener> except for print()
2010-11-22 19:37:42 < elenril> not enough support
2010-11-22 19:37:51 < tjoener> ah
2010-11-22 19:38:01  * elenril can't port his client to py3 because qt4 in debian only supports py2
2010-11-22 19:38:18 < elenril> also scipy/numpy only support py2 iirc
2010-11-22 19:38:19 < tjoener> hmmm
2010-11-22 19:38:26 < tjoener> numpy is nice
2010-11-22 19:39:46 < Kovensky> what is numpy
2010-11-22 19:39:48 < Kovensky> numbers for python? :P
2010-11-22 19:39:54  * elenril wishes they taught physicists how2write readable/maintainable code
2010-11-22 19:40:10 < elenril> Kovensky: http://numpy.scipy.org/ =p
2010-11-22 19:40:25 < komisar> py3/py2 usefull for self-made script/program... in mingw/*nix/windows... :)
2010-11-22 19:40:30 < tjoener> http://www.boingboing.net/2010/11/22/howto-make-a-stupend.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+boingboing%2FiBag+%28Boing+Boing%29&utm_content=Google+Reader
2010-11-22 19:40:35 < tjoener> lol
2010-11-22 19:40:36 < tjoener> loved those when I was a kid
2010-11-22 19:41:13 < tjoener> I'm thinking of skipping python and looking at something functional
2010-11-22 19:41:21 < rfw> [08:39:47] <Kovensky> numbers for python? :P <-- o u
2010-11-22 19:41:25 < tjoener> like lisp or scheme or ...
2010-11-22 19:41:34  * Kovensky votes for common lisp
2010-11-22 19:42:23 < tjoener> yeah
2010-11-22 19:42:32 < tjoener> lisp isnt that bad
2010-11-22 19:42:35 < tjoener> F#? :)
2010-11-22 19:42:42 < Kovensky> idk about F#
2010-11-22 19:43:15 < tjoener> naah
2010-11-22 19:43:21 < tjoener> I'm to addicted to .NET allready
2010-11-22 19:43:26 < tjoener> should learn something else
2010-11-22 19:43:34 < elenril> >.net
2010-11-22 19:43:39  * elenril runs away screaming
2010-11-22 19:43:52 < tjoener> elenril: It's frikkin nice
2010-11-22 19:44:28 < elenril> <a reply bashing microsoft>
2010-11-22 19:45:32 < tjoener> and what about Mono?
2010-11-22 19:45:50 < Kovensky> reply bashing novell?
2010-11-22 19:45:58 < tjoener> lol :D
2010-11-22 19:46:19 < tjoener> Kovensky: whats the most used common lisp interpreter/compiler/runtime?
2010-11-22 19:46:40 < tjoener> I dont know what's with all the people bashing .NET and Mono
2010-11-22 19:46:54 < komisar> .net -- suks
2010-11-22 19:46:54 < tjoener> What did that language ever do to you? :)
2010-11-22 19:47:05 < elenril> didn't novel get bought my microsoft sometime today?
2010-11-22 19:49:36 < astrange> the most common free one is sbcl
2010-11-22 19:49:48 < tjoener> lets see
2010-11-22 19:51:37 < BugMaster> -> #x264 or anywhere else
2010-11-22 19:52:21 < BugMaster> or this is for google project?
2010-11-22 19:52:25 < tjoener> sorry BugMaster
2010-11-22 19:55:21 < rfw> well then, that's a basic unit testing framework done
2010-11-22 19:55:44 < astrange> unit or regression testing?
2010-11-22 19:55:49 < rfw> well
2010-11-22 19:55:56 < rfw> i'm starting with a unit testing framework
2010-11-22 19:56:01 < rfw> then adding regression testing to it
2010-11-22 19:56:11 < rfw> since regression testing has most aspects of unit testing
2010-11-22 19:56:44 < astrange> not necessarily, running the entire system and making sure it has the same outputs (when you want it to have the same outputs) is an effective regression test
2010-11-22 19:57:11 < rfw> well, i have classes that encapsulate results that compare values from revision to revision
2010-11-22 19:57:33 < rfw> the whole thing is horrendously extensible
2010-11-22 19:59:49 < holger_> rfw: we already have unit testing. for comparing asm implementations of functions to their c reference.
2010-11-22 20:00:16 < rfw> i know
2010-11-22 20:00:25 < rfw> but unit tests only have pass/fail results, right?
2010-11-22 20:06:29 < BugMaster> and now the funny/sad fact. after fixing weight-p bug results of gcc 4.4.5 build and gcc 4.5.2 builds still differ: http://privatepaste.com/b4380671ef
2010-11-22 20:06:29 < BugMaster> no artifacts and encoder/decoder desync, but wieght-p is used more and chroma psnr higher in *gcc 4.5.2* build
2010-11-22 20:06:29 < BugMaster> so it seems like miscompilation in *gcc 4.4.5* which probably also prevent previous bug to occur in this version
2010-11-22 20:06:29 < BugMaster> Dark_Shikari: ^
2010-11-22 20:08:17 < BugMaster> or strange miscompilation in 4.5.2 which increase quality
2010-11-22 20:08:33 < nattofriends> ^ i like this idea
2010-11-22 20:09:44 < Kovensky> why
2010-11-22 20:09:52 < tjoener> just compile it all with -S and see what the asm does :)
2010-11-22 20:09:56 < Kovensky> it means that gcc provides better results while broken than when working properly
2010-11-22 20:09:57 < BugMaster> --asm and --no-asm version identical for the same build. so this time it is not asm
2010-11-22 20:11:44 < tjoener> and no-asm on both versions?
2010-11-22 20:12:01 < BugMaster> no-asm between version differ
2010-11-22 20:13:15 < tjoener> stupid gcc
2010-11-22 20:13:36 < tjoener> I dont understand why they cant make a program compile right under different MINOR versions
2010-11-22 20:13:55 < tjoener> Isnt that the whole point of a compiler?
2010-11-22 20:15:27 < dj_tjerk> someone test with icc ;)
2010-11-22 20:15:30 < BugMaster> hm. gcc 4.5.2 with -O0 same result as 4.4.5
2010-11-22 20:15:49 < JEEB> interesting
2010-11-22 20:16:03 < BugMaster> so now I more think about "or strange miscompilation in 4.5.2 which increase quality"
2010-11-22 20:16:30 < BugMaster> that mean we missed some way to improve weight-p :-D
2010-11-22 20:16:35 < JEEB> law
2010-11-22 20:16:37 < JEEB> l
2010-11-22 20:16:39 < Dark_Shikari> BugMaster: lolwhat
2010-11-22 20:16:57 < Dark_Shikari> rfw: we already have unit tests
2010-11-22 20:17:01 < Gramner> it means gcc has gained artificial intelligence and is able to improve the algorithms
2010-11-22 20:17:02 < Dark_Shikari> we don't need more of it
2010-11-22 20:17:06 < rfw> i know
2010-11-22 20:17:11 < rfw> but what i mean
2010-11-22 20:17:29 < rfw> is you have test cases that return results which can be compared from one revision to another
2010-11-22 20:18:06 < Dark_Shikari> explicit test cases are a bad idea, it should test permutations
2010-11-22 20:18:20 < rfw> what do you mean exactly
2010-11-22 20:18:41 < tjoener> BugMaster: what about the different tests like those in fprofiled?
2010-11-22 20:19:15 < BugMaster> "we missed some way to improve weight-p" <- probably use weight-p on chroma without weighted lume. because currently we not analyse it as I understand if we don't have luma weight
2010-11-22 20:19:19 < tjoener> isnt that something that can narrow down what function gcc likes to change
2010-11-22 20:19:20 < tjoener> ?
2010-11-22 20:20:00 < Dark_Shikari> BugMaster: I tested that
2010-11-22 20:20:12 < Dark_Shikari> useless to run when luma isn't weighted
2010-11-22 20:20:14 < Dark_Shikari> and it saves a ton of time
2010-11-22 20:20:40 < Dark_Shikari> rfw: see my regresion teest sript
2010-11-22 20:20:43 < Dark_Shikari> http://pastebin.com/sHkmCak5
2010-11-22 20:20:49 < Dark_Shikari> it works via permutations
2010-11-22 20:20:59 < Dark_Shikari> whenever I find a case that my script missed, I make it 2x slower, lol
2010-11-22 20:21:04 < Dark_Shikari> by adding another loop
2010-11-22 20:21:13 < BugMaster> dunno. but this is the only thing what I think can be miscompiled and gain quality
2010-11-22 20:21:23 < rfw> i wish i could read that lol
2010-11-22 20:21:36 < Dark_Shikari> rfw: didn't the regression test task specifically tell you to look at mine? =p
2010-11-22 20:21:37 < JEEB> it has lots of for loops
2010-11-22 20:21:44 < Dark_Shikari> anyways my script is 2 minutes of work and shit
2010-11-22 20:21:50 < Dark_Shikari> I'd rather randomly test than permutation-test
2010-11-22 20:21:52 < Dark_Shikari> permutation test is too slow
2010-11-22 20:22:12 < Dark_Shikari> "random" being rather pseudo, so it's replicable
2010-11-22 20:22:25 < rfw> well
2010-11-22 20:22:33 < rfw> can't you just do permutations in a single test case
2010-11-22 20:22:35 < Amit> hi, i'd like to start working on the assembly task from the Google Code In contest, it says there that i should come here
2010-11-22 20:22:40 < Dark_Shikari> welcome!
2010-11-22 20:22:58 < Dark_Shikari> rfw: permutations define 2^N tests instead of N tests
2010-11-22 20:23:05 < Dark_Shikari> where N is the number of test cases you've written
2010-11-22 20:23:29 < Dark_Shikari> er, ok
2010-11-22 20:23:37 < Dark_Shikari> Amit: yes, you're in the right place
2010-11-22 20:23:42 < Amit> great
2010-11-22 20:23:50 < Amit> what do i need to do now?
2010-11-22 20:24:02 < rfw> i'm still kinda confused
2010-11-22 20:24:04 < rfw> lol
2010-11-22 20:24:11 < rfw> how does that work
2010-11-22 20:24:17 < Dark_Shikari> rfw: ok, suppose I have 5 options, each of which can be 0 or 1
2010-11-22 20:24:18 < rfw> how are the tests comparable to each other?
2010-11-22 20:24:26 < Dark_Shikari> that involves 32 tests.
2010-11-22 20:24:31 < Dark_Shikari> A0 B0 C0 D0 E0
2010-11-22 20:24:34 < Dark_Shikari> A0 B0 C0 D0 E1...
2010-11-22 20:24:36 < Dark_Shikari> etc
2010-11-22 20:24:53 < rfw> so you're comparing each test against another?
2010-11-22 20:24:55 < Dark_Shikari> Amit: ok, you'll need to check out a copy of x264
2010-11-22 20:24:56 < Dark_Shikari> rfw: no
2010-11-22 20:25:04 < Dark_Shikari> the regression test makes sure that JM decodes the file in the same way x264 thinks it should.
2010-11-22 20:25:14 < Dark_Shikari> This is the simplest and easiest form of regression test.
2010-11-22 20:25:17 < Dark_Shikari> It does not find all bugs.
2010-11-22 20:25:21 < Dark_Shikari> But it finds most really really bad bugs
2010-11-22 20:25:34 < Dark_Shikari> Amit: sorry, even my typing speed is limited here
2010-11-22 20:25:35 < rfw> so you compare one set of results
2010-11-22 20:25:38 < rfw> to another
2010-11-22 20:25:39 < dj_tjerk> --dump-yuv?
2010-11-22 20:25:42 < Dark_Shikari> rfw: --dump-yuv
2010-11-22 20:25:45 < Amit> no probs :P help him first
2010-11-22 20:25:50 < Dark_Shikari> it dumps x264's internal representation of the video
2010-11-22 20:25:55 < Dark_Shikari> with that, you can compare to what the decoder sees
2010-11-22 20:26:02 < Dark_Shikari> if it doesn't match bit-exact, it's wrong
2010-11-22 20:26:05 < rfw> ah
2010-11-22 20:26:19 < Dark_Shikari> This is exacerbated by the fact that the decoder is not always 100% trustworthy ;) ;)
2010-11-22 20:26:25 < Dark_Shikari> That is, JM has bugs.
2010-11-22 20:26:34 < Dark_Shikari> Amit: so, have you done asm before?
2010-11-22 20:26:34 < rfw> that seems more of a unit test though
2010-11-22 20:26:40 < Amit> yes
2010-11-22 20:26:52 < pengvado> the whole encoder isn't a "unit"
2010-11-22 20:27:05 < Dark_Shikari> unit test usually means test one function
2010-11-22 20:27:17 < Dark_Shikari> which usually means you've gone so batshit crazy on your OO that you should be thrown into a fire
2010-11-22 20:27:24 < Dark_Shikari> Amit: x86, altivec, or neon?
2010-11-22 20:27:29 < Amit> x86
2010-11-22 20:27:38 < Dark_Shikari> ok great, have you done mmx or sse?
2010-11-22 20:28:43 < Dark_Shikari> (not required, obviously you can just use your friendly neighborhood nasm manual and instruction table to look things up if you haven't)
2010-11-22 20:29:54 < Amit> i think that i haven't done any of them
2010-11-22 20:29:55 < pengvado> well, it would be nice if we could unit test everything. but asm is the only case where it's easy.
2010-11-22 20:30:04 < Dark_Shikari> yeah.
2010-11-22 20:30:06 < Amit> i studied the language but haven't yet worked on a real project
2010-11-22 20:30:32 < Dark_Shikari> you will have to do some learning, as x264 uses its own set of yasm macros to make asm coding easier
2010-11-22 20:30:37 < Dark_Shikari> i.e. to do things like popping, pushing, etc for you
2010-11-22 20:30:41 < Dark_Shikari> to track the stack
2010-11-22 20:30:52 < Dark_Shikari> but of course, those are intended to make it easier, not harder.
2010-11-22 20:31:04 < Amit> ok
2010-11-22 20:31:06 < Dark_Shikari> so your tasks in order:
2010-11-22 20:31:08 < Dark_Shikari> 1) check out x264
2010-11-22 20:31:10 < Dark_Shikari> 2) build x264
2010-11-22 20:31:14 < Dark_Shikari> 3) make checkasm
2010-11-22 20:31:17 < Dark_Shikari> 4) run checkasm
2010-11-22 20:31:27 < Dark_Shikari> you've now run the x264 asm unit tester, which automatically checks all enabled asm functions in x264 for errors.
2010-11-22 20:31:40 < Dark_Shikari> once you have that, you can go to pick a function to write.
2010-11-22 20:32:05 < tjoener> its that easy?
2010-11-22 20:32:16 < Dark_Shikari> No, the writing is harder ;)
2010-11-22 20:32:23 < Dark_Shikari> the number 1 rule of any of this: ask questions.  it is better to ask a dumb question than to get stuck in confusion.
2010-11-22 20:32:23 < tjoener> hmmm, maybe I'll look into the asm then :)
2010-11-22 20:32:36 < Dark_Shikari> also, there are no stupid questions, just stupid people.
2010-11-22 20:32:37 < tjoener> used some intrinsics, so I know my way around SSE abit
2010-11-22 20:32:52 < Dark_Shikari> btw, also, the asm tasks are unlimited
2010-11-22 20:33:00 < Dark_Shikari> i.e. if we run out on GCI interface we can add more later
2010-11-22 20:33:07 < Dark_Shikari> so don't worry if there aren't any left at any pointl.
2010-11-22 20:33:20 < Dark_Shikari> well, I mean, they are limited by how much is available to complete, but I'm pretty sure we won't run out of that.
2010-11-22 20:34:06 < Amit> where do i find which functions should be rewritten?
2010-11-22 20:34:19 < tjoener> maybe some MPSADBW ?
2010-11-22 20:34:46 < Dark_Shikari> Amit: well let's give an introduction to the situation first
2010-11-22 20:34:58 < Dark_Shikari> In normal bit depth mode, pixels are 8-bit values (uint8_t)
2010-11-22 20:35:04 < Dark_Shikari> and dct coefficients are 16-bit values (int16_t)
2010-11-22 20:35:19 < Dark_Shikari> for this case, nearly all possible x86 asm is written and reasonably well-optimized.
2010-11-22 20:35:26 < Dark_Shikari> High bit depth is new.
2010-11-22 20:35:31 < Dark_Shikari> In high bit depth, pixels are uint16_t.
2010-11-22 20:35:36 < Dark_Shikari> and dct coeffs are int32_t.
2010-11-22 20:35:43 < Dark_Shikari> There's less asm in this mode; many critical functions are missing.
2010-11-22 20:35:57 < Dark_Shikari> You can see this by running checkasm normally, then reconfiguring with --bit-depth=10, rebuilding checkasm, and running again.
2010-11-22 20:36:24 < Dark_Shikari> By the way, if you decide you can optimize an existing asm function -- and you succeed -- I'll give you a GCI task-equivalent for that too.
2010-11-22 20:37:36 < Amit> ok
2010-11-22 20:37:58 < Dark_Shikari> Now, of course, you're expected to ask questions as we go through this
2010-11-22 20:38:09 < Dark_Shikari> steps to adding a new asm function, or version of an asm function:
2010-11-22 20:38:28 < Dark_Shikari> 1) All C code equivalents for asm functions is in common/X.c, where X is the name of the function category
2010-11-22 20:38:37 < Dark_Shikari> so for example DCT code is in common/dct.c
2010-11-22 20:39:07 < Dark_Shikari> 2) You'll find functions in those files that load function pointers for all the asm functions which exist.  Add yours there under the appropriate place (MMX, SSE, etc)
2010-11-22 20:39:16 < Dark_Shikari> 3) Add your function declaration next to all the others in the appropriate .h file in common/x86/*
2010-11-22 20:39:23 < Dark_Shikari> 4) Find the appropriate asm file in common/x86/*.asm
2010-11-22 20:39:42 < Dark_Shikari> 5) Find the existing 8-bit code for the function, and figure out how it works, preferably by asking questions in addition to staring at it
2010-11-22 20:39:55 < Dark_Shikari> (while keeping the C code as a reference)
2010-11-22 20:40:15 < Dark_Shikari> 6) write your version of the high bit-depth version.
2010-11-22 20:40:28 < Dark_Shikari> So for example, one that's missing is dequant, in quant-a.asm, with quant.c for the C stuff.
2010-11-22 20:40:59 < Dark_Shikari> The C version of this function is about 10 lines and should be pretty easy to read.
2010-11-22 20:42:03 < Dark_Shikari> don't worry too much about optimization when first writing the function: I can make comments on it later.
2010-11-22 20:42:25 < Dark_Shikari> a good way to spot missing functions is to run ./checkasm --bench with and without --bit-depth=10 in configure.
2010-11-22 20:42:37 < Dark_Shikari> it does a benchmark of *all* asm functions and nicely lists them for you.
2010-11-22 20:45:56 < callahan> Dark_Shikari: Emergency mode works well so far.  You should commit it...
2010-11-22 20:46:18 < Dark_Shikari> I'll get to it, maybe this week or next
2010-11-22 20:46:22 < Dark_Shikari> I have a few other changes I have to make
2010-11-22 20:46:23 < Dark_Shikari> and I'm busy
2010-11-22 20:46:30 < nattofriends> emergency mode?
2010-11-22 20:46:40 < nattofriends> does that give x264 big red flashing lights?
2010-11-22 20:46:41 < callahan> Yeah, GCI craziness
2010-11-22 20:46:46 < Dark_Shikari> lol
2010-11-22 20:46:55 < Dark_Shikari> it makes x264 not die when you throw /dev/random at it
2010-11-22 20:47:54 < rfw> Dark_Shikari: can i ask more dumb questions about regression testing
2010-11-22 20:48:06 < rfw> since i still don't seem to be 100% clear about it
2010-11-22 20:48:08 < Dark_Shikari> Don't ask to ask
2010-11-22 20:48:12 < komisar> Dark_Shikari: what about auto-levels patch?
2010-11-22 20:48:13 < rfw> well
2010-11-22 20:48:13 < Dark_Shikari> just ask
2010-11-22 20:48:19 < rfw> i still don't get what you mean by permutations
2010-11-22 20:48:19 < Dark_Shikari> komisar: oh, that should be higher priority, I'll get it in next
2010-11-22 20:49:00 < Dark_Shikari> Suppose I have options A, B, and C.
2010-11-22 20:49:03 < rfw> oh, i get it now
2010-11-22 20:49:09 < Dark_Shikari> I can test with no options
2010-11-22 20:49:09 < rfw> replacing ; with ;\n helped a lot
2010-11-22 20:49:10 < Dark_Shikari> just A
2010-11-22 20:49:12 < Dark_Shikari> just B
2010-11-22 20:49:12 < Dark_Shikari> just C
2010-11-22 20:49:15 < Dark_Shikari> A and B
2010-11-22 20:49:17 < Dark_Shikari> B and C
2010-11-22 20:49:20 < Dark_Shikari> A, B and C
2010-11-22 20:49:20 < rfw> yeah i understand
2010-11-22 20:49:23 < Dark_Shikari> ok, so combinations, not permutations
2010-11-22 20:49:25 < Dark_Shikari> but whatever! ;)
2010-11-22 20:49:35 < rfw> cartesian product actually :)
2010-11-22 20:50:23 < Gramner> you missed "A and C"!
2010-11-22 20:50:30 < Dark_Shikari> ah yes.
2010-11-22 20:56:41 < tjoener> damnit
2010-11-22 20:56:47 < tjoener> terminal locked my alt keys :@
2010-11-22 20:58:12 < j-b> Dark_Shikari: http://www.google-melange.com/gci/task/show/google/gci2010/videolan/t128908283186 and http://www.google-melange.com/gci/task/show/google/gci2010/videolan/t128908435924
2010-11-22 20:58:38 < Dark_Shikari> I know
2010-11-22 20:58:41 < Dark_Shikari> they've already shown up here
2010-11-22 21:13:52 < j-b> Dark_Shikari: so you need to approve at some point, I guess
2010-11-22 21:14:40 < Dark_Shikari> oh I do?
2010-11-22 21:16:02 < Dark_Shikari> done
2010-11-22 21:45:54 < rfw> i think i'm taking too much time to time to make a test suite than to actually make the tests
2010-11-22 21:46:05 < rfw> and wow that sentence was badly formed
2010-11-22 21:47:16 < Dark_Shikari> rfw: the tests are easy
2010-11-22 21:47:19 < Dark_Shikari> the hard part is interpreting them
2010-11-22 21:47:26 < Dark_Shikari> that is, for example
2010-11-22 21:47:28 < rfw> ah
2010-11-22 21:47:33 < Dark_Shikari> really easy interpretation: compare yuvs, see if encoder matches decoder
2010-11-22 21:47:38 < Dark_Shikari> that's trivial: a binary decision
2010-11-22 21:47:39 < Dark_Shikari> 1 or 0
2010-11-22 21:47:42 < rfw> yeah
2010-11-22 21:47:43 < Dark_Shikari> is it right or not.
2010-11-22 21:47:52 < Dark_Shikari> But of course, we might also want to know what frame it's wrong on, if it failed.
2010-11-22 21:47:57 < Dark_Shikari> Or if the decoder failed to output all the frames.
2010-11-22 21:47:59 < Dark_Shikari> i.e. if it crashed, etc.
2010-11-22 21:48:07 < Dark_Shikari> And what macroblock it's wrong on (i.e. where in the frame)
2010-11-22 21:48:17 < Dark_Shikari> And then, we might want to investigate more complex things
2010-11-22 21:48:21 < Dark_Shikari> like for example, compare PSNR between two revisions
2010-11-22 21:48:27 < rfw> do you have examples testing for macroblocks
2010-11-22 21:48:37 < Dark_Shikari> well that's very easy, you just have to do the math
2010-11-22 21:48:40 < rfw> since i have no idea about how YUV data works
2010-11-22 21:48:46 < Dark_Shikari> each frame is layed out as follows:
2010-11-22 21:49:05 < Dark_Shikari> a 2D array of width WIDTH and height HEIGHT, containing Y
2010-11-22 21:49:12 < Dark_Shikari> a 2D array of width WIDTH/2 and height HEIGHT/2, containing U
2010-11-22 21:49:13 < Dark_Shikari> a 2D array of width WIDTH/2 and height HEIGHT/2, containing V
2010-11-22 21:49:17 < rfw> ah
2010-11-22 21:49:20 < Dark_Shikari> so the total size is 1.5 * WIDTH * HEIGHT
2010-11-22 21:49:26 < rfw> so i just spew out the part where it mismatches?
2010-11-22 21:49:28 < Dark_Shikari> so knowing the size of each frame, you can figure out from any offset in the file
2010-11-22 21:49:31 < Dark_Shikari> a) what frame it's in
2010-11-22 21:49:34 < Dark_Shikari> b) what macroblock it's in
2010-11-22 21:49:42 < rfw> how is a macroblock defined
2010-11-22 21:49:51 < Dark_Shikari> the frame is split into 16x16 blocks
2010-11-22 21:49:59 < Dark_Shikari> (well, 16x16 of luma, obviously each 16x16 of Y is 8x8 of U and V)
2010-11-22 21:50:04 < Dark_Shikari> since U and V are half size
2010-11-22 21:50:08 < Dark_Shikari> these are arranged in raster order
2010-11-22 21:50:11 < rfw> ah
2010-11-22 21:50:15 < Dark_Shikari> so for example, frame 57 might be bad at macroblock (5,17)
2010-11-22 21:50:38 < rfw> i see, so
2010-11-22 21:50:44 < rfw> for a rudimentary comparison of two outputs
2010-11-22 21:50:47 < rfw> i could just checksum them
2010-11-22 21:50:51 < rfw> then if they mismatch
2010-11-22 21:50:57 < rfw> i compare frames, then macroblocks?
2010-11-22 21:52:01 < Dark_Shikari> no
2010-11-22 21:52:12 < Dark_Shikari> first you check to see the filesize are the same
2010-11-22 21:52:14 < Dark_Shikari> if not, report an error
2010-11-22 21:52:28 < Dark_Shikari> second, you find the first byte in the two files that differs
2010-11-22 21:52:33 < Dark_Shikari> then you calculate which frame and MB it belongs to.
2010-11-22 21:52:58 < rfw> ah
2010-11-22 21:53:05 < rfw> this sounded so much easier before
2010-11-22 21:54:04 < Dark_Shikari> the more fancy part of the tool is wrapping around this comparison
2010-11-22 21:54:05 < Dark_Shikari> for example
2010-11-22 21:54:13 < Dark_Shikari> suppose I know the latest x264 is broken
2010-11-22 21:54:16 < Dark_Shikari> and I want to know what broke it
2010-11-22 21:54:28 < Dark_Shikari> your script could run git bisect using your comparison script
2010-11-22 21:54:35 < Dark_Shikari> so in effect, it's modular:
2010-11-22 21:54:37 < rfw> yeah
2010-11-22 21:54:47 < Dark_Shikari> 1) Comparison: comparing two revisions in some way (whether it be via yuv dump, psnr, etc)
2010-11-22 21:54:53 < rfw> but i'm decoupling the scm from the core so i can't really run git bisect
2010-11-22 21:54:58 < Dark_Shikari> 2) How we use those comparisons (checking multiple revisions, etc)
2010-11-22 21:55:32 < rfw> well, what my idea is is to create a couple of fixtures which can be used to test every revision
2010-11-22 21:55:40 < rfw> which generates a dump of the results
2010-11-22 21:55:49 < rfw> then those can be compared with the tool from revision to revision
2010-11-22 21:56:39 < Dark_Shikari> well, keep in mind that whatever yours does, it must do more than my crappy script
2010-11-22 21:56:42 < Dark_Shikari> ;)
2010-11-22 21:56:44 < Dark_Shikari> i.e. enough to make it worthwhile
2010-11-22 21:56:49 < rfw> oh it already does more
2010-11-22 21:56:50 < rfw> lol
2010-11-22 21:56:52 < Dark_Shikari> it should also be self-contained and not require really fancy other libs
2010-11-22 21:56:55 < Dark_Shikari> if possible
2010-11-22 21:57:02 < Dark_Shikari> though that's not required if you gt a big benefit from some other lib
2010-11-22 21:57:09 < rfw> well
2010-11-22 21:57:10 < Dark_Shikari> just don't cover it in boost or something
2010-11-22 21:57:14 < rfw> oh it's python
2010-11-22 21:57:15 < rfw> lol
2010-11-22 21:57:17 < Dark_Shikari> good
2010-11-22 21:57:17 < Dark_Shikari> lol
2010-11-22 21:57:25 < rfw> boost gives me nightmares
2010-11-22 21:57:32 < Dark_Shikari> you're not the only one
2010-11-22 21:57:48 < Dark_Shikari> when I worked at facebook, one of my coworkers attempted to use almost every single boost feature in every project
2010-11-22 21:58:17 < rfw> how many screens of template errors does that make
2010-11-22 21:58:34 < Dark_Shikari> I've seen a single *symbol* in a C++ program that was over 30 kilobytes
2010-11-22 21:59:25 < Jumpyshoes> hey Dark_Shikari, i'm taking a look at some functions, and some call macros (like TRANSPOSE4x4w), does the higher bit depth have an equivalent of these, or do i need to check on a case by case basis?
2010-11-22 21:59:29 < checkers> I thought facebook used PHP? Do they write some stuff in C++?
2010-11-22 21:59:39 < Dark_Shikari> flvtool++ is C++
2010-11-22 21:59:42 < Dark_Shikari> along with lots of other internal tools
2010-11-22 21:59:43 < djahandarie> Facebook uses all sorts of languages, but yeah mainly PHP
2010-11-22 21:59:53 < Dark_Shikari> Jumpyshoes: If there isn't one, you'll have to write it!
2010-11-22 22:00:05 < Dark_Shikari> A good start is to look at functions that use TRANSPOSE4x4W
2010-11-22 22:00:19 < Dark_Shikari> if there's a high bit depth version, the high bit depth version likely calls the high bit depth version of TRANSPOSE4x4W
2010-11-22 22:01:11 < Jumpyshoes> let's see
2010-11-22 22:01:14 < Dark_Shikari> since the normal and high bit depth do the same thing, just with different data types.
2010-11-22 22:01:26 < Jumpyshoes> TRANSPOSE4x4W calls other macros haha
2010-11-22 22:02:12 < Dark_Shikari> yup, SBUTTERFLY
2010-11-22 22:02:15 < Dark_Shikari> a transpose is a series of butterflies
2010-11-22 22:02:23 < Dark_Shikari> see "butterfly diagram" on wikipedia
2010-11-22 22:02:42 < Jumpyshoes> would i need to write that too?
2010-11-22 22:03:37 < Dark_Shikari> no, a butterfly is a generic operation
2010-11-22 22:03:44 < Dark_Shikari> not restrained to any data type
2010-11-22 22:03:49 < Dark_Shikari> that is, you tell it the data type to take
2010-11-22 22:03:51 < Dark_Shikari> it's a three line macro anyways!
2010-11-22 22:04:00 < Jumpyshoes> true
2010-11-22 22:04:36 < Jumpyshoes> TRANSPOSE4x4W <-- what does the W at the end signify?
2010-11-22 22:05:11 < Dark_Shikari> words
2010-11-22 22:05:13 < Dark_Shikari> 16-bit
2010-11-22 22:05:27 < Dark_Shikari> so if the low bit depth function uses 16-bit intermediate data
2010-11-22 22:05:35 < Dark_Shikari> and you decide to use 32-bit intermediates instead for high bit depth
2010-11-22 22:05:38 < Dark_Shikari> you'll need TRANSPOSE4x4D
2010-11-22 22:05:53 < Jumpyshoes> ah, i see
2010-11-22 22:07:33 < Dark_Shikari> so transpose doesn't have a native "bit depth"
2010-11-22 22:07:37 < Dark_Shikari> rather it has a size of element it operates on
2010-11-22 22:07:42 < Dark_Shikari> and that may or may not be the size you need.
2010-11-22 22:08:32 < Jumpyshoes> i see
2010-11-22 22:08:54 < Dark_Shikari> note that intermediate sizes aren't always higher in high bit depth
2010-11-22 22:09:02 < Dark_Shikari> for example, in satd, you can still get away with 16-bit intermediates with 10-bit input
2010-11-22 22:09:05 < Dark_Shikari> this was a huge boon for speed
2010-11-22 22:09:40 < Dark_Shikari> (satd is one of those that's done, since it's the most important asm function in x264)
2010-11-22 22:10:03 < Jumpyshoes> out of curiosity, why do the sse* asm functions seem to be longer and more complicated then the mmx ones?
2010-11-22 22:10:50 < Dark_Shikari> depends, which ones are you looking at?
2010-11-22 22:11:10 < Jumpyshoes> add4x4_idct
2010-11-22 22:11:17 < Dark_Shikari> that's a special case
2010-11-22 22:11:24 < Dark_Shikari> a 4x4 idct needs, at most, 64 bits worth of register
2010-11-22 22:11:28 < Dark_Shikari> i.e. 4 16-bit values
2010-11-22 22:11:31 < Dark_Shikari> because, well, it's 4 wide
2010-11-22 22:11:33 < Dark_Shikari> and the values are 16-bit
2010-11-22 22:11:35 < Dark_Shikari> so, naturally
2010-11-22 22:11:48 < Dark_Shikari> So, if you want to take advantage of SSE, you do multiple idcts at a time
2010-11-22 22:11:55 < Dark_Shikari> as is done in the larger ones
2010-11-22 22:11:57 < Dark_Shikari> But...
2010-11-22 22:12:09 < Dark_Shikari> in some places in x264, we can only do a 4x4 idct for technical reasons, i.e. we can only do one at a time
2010-11-22 22:12:27 < Dark_Shikari> So, for the fun of it and the challenge, holger wrote an SSE4 implementation that tries to take advantage of the extra register space
2010-11-22 22:12:30 < Dark_Shikari> through horrible horrible munging
2010-11-22 22:12:40 < Jumpyshoes> oh
2010-11-22 22:12:42 < Dark_Shikari> It's a decent bit faster.
2010-11-22 22:12:53 < Dark_Shikari> But it's far more complicated and harder to read because it's doing things in a completely nonintutive, ugly way.
2010-11-22 22:12:57 < Dark_Shikari> In order to stuff more values into one register.
2010-11-22 22:13:22 < Jumpyshoes> yea, it sure looks harder to read
2010-11-22 22:13:51 < Dark_Shikari> remember width 4 SAD?
2010-11-22 22:13:53 < Dark_Shikari> with the punpckldq?
2010-11-22 22:14:01 < Dark_Shikari> that's an example of munging to take advantage of a register wider than your data.
2010-11-22 22:14:11 < Dark_Shikari> sse4 idct 4x4 is another one.
2010-11-22 22:14:41 < Jumpyshoes> aahh
2010-11-22 22:14:42 < Dark_Shikari> In general, in high bit depth, you'll almost never have this problem, since high bit depth has much larger data types.
2010-11-22 22:14:53 < Dark_Shikari> 4x4 idct, btw, I think has no high bit depth implementation.
2010-11-22 22:15:02 < Dark_Shikari> so that could be a good starting point
2010-11-22 22:15:03 < Jumpyshoes> yea, as far as i can tell, it doesn't
2010-11-22 22:15:07 < Jumpyshoes> which is why i was looking at it
2010-11-22 22:15:12 < Dark_Shikari> Yeah, that would be a good one
2010-11-22 22:15:22 < Dark_Shikari> Not too complex, pretty straightforward
2010-11-22 22:15:24 < Jumpyshoes> do you have a link that actually explains h264's DCT?
2010-11-22 22:15:31 < Dark_Shikari> As long as you stay very very far away from the sse4 oen
2010-11-22 22:15:37 < Dark_Shikari> how much do you know about DCTs?
2010-11-22 22:15:46 < Jumpyshoes> i know JPEG's DCT
2010-11-22 22:16:04 < Jumpyshoes> performs a seperable transformation which compacts energy
2010-11-22 22:16:09 < Jumpyshoes> to the top coefficients
2010-11-22 22:16:17 < Jumpyshoes> so you can effectively 0 out the lower ones
2010-11-22 22:16:19 < Jumpyshoes> w/ quantization
2010-11-22 22:16:35 < Dark_Shikari> JPEG's is just *THE* DCT
2010-11-22 22:16:40 < Dark_Shikari> that is what a dct is
2010-11-22 22:16:51 < Dark_Shikari> H.264's transform is often colloquially known as the HCT (H.264 Cosine Transform) as it isn't the DCT.
2010-11-22 22:16:56 < Dark_Shikari> It's a very very rough approximation.
2010-11-22 22:16:58 < Jumpyshoes> i heard h264's only used integer arithmetic?
2010-11-22 22:17:03 < Dark_Shikari> More than just that
2010-11-22 22:17:09 < Dark_Shikari> all real implementations of DCTs are integer-only
2010-11-22 22:17:22 < Dark_Shikari> and all modern video ones (VC-1, H.264, VP8, RV40, etc) are bit-exact, too.
2010-11-22 22:17:26 < Dark_Shikari> But H.264's is extra-special
2010-11-22 22:17:37 < Dark_Shikari> It requires nothing more than shifts and adds.
2010-11-22 22:17:41 < Dark_Shikari> and subtracts, I guess.
2010-11-22 22:17:51 < Dark_Shikari> The 4x4 dct in particular requires no shifts larger than 1.
2010-11-22 22:18:00 < Dark_Shikari> this makes it more like a Hadamard transform than a DCT.
2010-11-22 22:18:13 < Dark_Shikari> It's slightly less compression-efficient (~1%) than the ideal DCT, but way way way way way faster.
2010-11-22 22:18:21 < Dark_Shikari> Now, here's the catch with this
2010-11-22 22:18:51 < Dark_Shikari> if you analyse this DCT, you'll find out that it isn't properly normalized, i.e. the values outputted by the DCT and inputted to the iDCT are scaled differently per coefficient
2010-11-22 22:18:58 < Dark_Shikari> this is one of the consequences of the extremely simple implementation.
2010-11-22 22:19:09 < Dark_Shikari> So what they did... they rolled these numbers into the quantization scaling factors.
2010-11-22 22:19:16 < Dark_Shikari> Thus making multiplying by them cost nothing.
2010-11-22 22:19:23 < Dark_Shikari> Since you're already multiplying for quantization.
2010-11-22 22:19:40 < Dark_Shikari> so dct -> quant -> dequant -> idct
2010-11-22 22:19:48 < Dark_Shikari> "quant" adds in the scaling factors for the dct
2010-11-22 22:19:53 < Dark_Shikari> and "dequant" adds in the scaling factors for the idct.
2010-11-22 22:19:58 < Dark_Shikari> (in addition to doing what they normally do)
2010-11-22 22:20:01 < Jumpyshoes> huh, i see
2010-11-22 22:20:12 < Dark_Shikari> You don't have to know about this when implementing a dct or idct, but it explains how it works.
2010-11-22 22:20:41 < Jumpyshoes> well, i need to figure out what the C code does
2010-11-22 22:20:52 < Dark_Shikari> This isn't involved in figuring out what it does.
2010-11-22 22:20:59 < Jumpyshoes> true
2010-11-22 22:21:00 < Dark_Shikari> The C code just does a 1D transform in each direction.
2010-11-22 22:21:13 < Dark_Shikari> and the transform is rather obvious and simple, the challenge is if you want to figure out why it works.
2010-11-22 22:21:20 < Dark_Shikari> Which you don't have to, of course.
2010-11-22 22:21:26 < Dark_Shikari> since that's math and math is hard
2010-11-22 22:21:37 < Jumpyshoes> what does x264_clip_pixel do?
2010-11-22 22:22:05 < Dark_Shikari> It clips a pixel to within the valid range.
2010-11-22 22:22:10 < Dark_Shikari> 0-255 for 8-bit
2010-11-22 22:22:13 < Dark_Shikari> 0-511 for 9-bit
2010-11-22 22:22:13 < Dark_Shikari> etc etc
2010-11-22 22:22:28 < Jumpyshoes> ah
2010-11-22 22:22:30 < Dark_Shikari> This is because, after an idct, pixel values can go outside the valid range.
2010-11-22 22:22:33 < Dark_Shikari> the idct is required to clamp its output.
2010-11-22 22:22:41 < Dark_Shikari> to avoid overflow
2010-11-22 22:23:13 < Jumpyshoes> i see
2010-11-22 22:24:50 < Dark_Shikari> this is trickier for high bit depth
2010-11-22 22:24:57 < Jumpyshoes> does IDCT4_1D depend on the input size?
2010-11-22 22:25:01 < Dark_Shikari> if our pixels are 8-bit, we can just do packusbw
2010-11-22 22:25:04 < Dark_Shikari> which automatically saturates
2010-11-22 22:25:17 < Dark_Shikari> but for larger values we can't.
2010-11-22 22:25:22 < Dark_Shikari> Jumpyshoes: yes, go look at it
2010-11-22 22:25:26 < Dark_Shikari> it probably uses word math (paddw, etc)
2010-11-22 22:26:02 < Jumpyshoes> it calls a bunch of SUMSUBD2_AB and stuff
2010-11-22 22:26:31 < Dark_Shikari> go look at what those do
2010-11-22 22:26:35 < Dark_Shikari> they're pretty trivial
2010-11-22 22:26:40 < Dark_Shikari> a SUMSUB calculates A+B and A-B
2010-11-22 22:26:42 < Dark_Shikari> for inputs A and B
2010-11-22 22:26:44 < Dark_Shikari> (surprise!)
2010-11-22 22:27:40 < Jumpyshoes> ah, so it does depend on the bit depth
2010-11-22 22:28:19 < Dark_Shikari> You could modify SUMSUB to take an argument for the data element size
2010-11-22 22:28:24 < Dark_Shikari> e.g. "w" for 16-bit, "d" for 32-bit
2010-11-22 22:28:30 < Dark_Shikari> then go around and find all existing calls to SUMSUB and add that
2010-11-22 22:28:33 < Dark_Shikari> make sure it still works
2010-11-22 22:28:39 < Dark_Shikari> then write your own passing "d" instead of "w"
2010-11-22 22:28:45 < Dark_Shikari> so for example
2010-11-22 22:28:57 < Dark_Shikari> SUMSUB mm0, mm1 would become SUMSUB w, mm0, mm1
2010-11-22 22:29:00 < Dark_Shikari> or similar
2010-11-22 22:29:13 < Jumpyshoes> wouldn't i need to replace it in every SUMSUB then?
2010-11-22 22:30:14 < Dark_Shikari> yes, you'd have to add the "w" argument to every SUMSUB
2010-11-22 22:30:19 < Dark_Shikari> There aren't that many calls though
2010-11-22 22:30:27 < Dark_Shikari> most SUMSUB calls are in util
2010-11-22 22:30:30 < Dark_Shikari> i.e. as part of other things
2010-11-22 22:30:31 < Dark_Shikari> like transforms
2010-11-22 22:30:40 < Jumpyshoes> ah
2010-11-22 22:30:44 < Dark_Shikari> i.e. few functions call sumsub directly, they call it through macros
2010-11-22 22:30:48 < Dark_Shikari> like IDCT4_1D
2010-11-22 22:31:20 < Jumpyshoes> okay, time to try this out
2010-11-22 22:33:00 < Jumpyshoes> what is passed if i have "w" as an argument?
2010-11-22 22:33:11 < Jumpyshoes> is it just "w" in %0?
2010-11-22 22:33:20 < Jumpyshoes> er, %1
2010-11-22 22:33:43 < Dark_Shikari> yes
2010-11-22 22:33:44 < Dark_Shikari> so for example
2010-11-22 22:33:46 < Dark_Shikari> paddw becomes padd%1
2010-11-22 22:33:50 < Dark_Shikari> cool, eh?
2010-11-22 22:33:54 < Jumpyshoes> indeed
2010-11-22 22:34:02 < Dark_Shikari> also, on an unrelated note, %0 is the number of macro parameters.
2010-11-22 22:34:08 < Dark_Shikari> This is useful when making variadic macros.
2010-11-22 22:34:49 < Jumpyshoes> yasm is pretty handy
2010-11-22 22:35:30 < Dark_Shikari> yasm's preprocess is basically what C's should have been.
2010-11-22 22:35:33 < Dark_Shikari> *preprocessor
2010-11-22 22:37:15 < Jumpyshoes> what does mova do?
2010-11-22 22:37:36 < Dark_Shikari> as mentioned yesterday, it's part of the templating system
2010-11-22 22:37:39 < Dark_Shikari> if INIT_MMX is used, it's movq
2010-11-22 22:37:43 < Dark_Shikari> if INIT_XMM is used, it'a movdqa
2010-11-22 22:37:44 < Jumpyshoes> er, right
2010-11-22 22:37:44 < Dark_Shikari> *it's
2010-11-22 22:37:49 < Dark_Shikari> in short, "move a whole register, aligned"
2010-11-22 22:37:59 < Dark_Shikari> in that macro it's just a register-to-register move.
2010-11-22 22:38:02 < Dark_Shikari> iirc
2010-11-22 22:44:55 < Sean_McG> 10:02 < Dark_Shikari> Sean_McG: ^ <-- what was this all about, it's scrolled off
2010-11-22 22:45:41 < Dark_Shikari> pengvado had a comment for you
2010-11-22 22:45:43 < Dark_Shikari> grab the log and read it
2010-11-22 22:45:46 < Dark_Shikari> it's why I didn't commit your fix
2010-11-22 22:47:44 < Sean_McG> oh regarding Mans note about 'strings'? Yeah, I understand... not sure how to fix it anymore though.
2010-11-22 22:48:51 < Sean_McG> not discarding the whole commit though, are we? can we just roll back the change to configure?
2010-11-22 22:49:38 < Sean_McG> back later -- meeting a co-worker for dinner
2010-11-22 23:09:09 < Dark_Shikari> Sean_McG: I just discarded it for now
2010-11-22 23:09:12 < Dark_Shikari> it'll be in the next commit spree if you fix it
2010-11-22 23:16:42 < rfw> i should probably make my output easier to read
2010-11-22 23:17:36 < rfw> Dark_Shikari: http://pastebin.com/g58FejEe is this painful in any way to read
2010-11-22 23:18:33 < Dark_Shikari> It probably shouldn't use 4chan terminology.
2010-11-22 23:18:47 < rfw> oh those are just my half-assed commit messages
2010-11-22 23:18:48 < Dark_Shikari> Also most of that should be omitted in non-verbose mode
2010-11-22 23:18:58 < rfw> from my testing repo
2010-11-22 23:19:00 < rfw> don't mind that lol
2010-11-22 23:19:01 < Dark_Shikari> in non-verbose mode, it should tell me just what I want to know
2010-11-22 23:19:06 < Dark_Shikari> i.e. "X is broken, Y works"
2010-11-22 23:19:11 < Dark_Shikari> not all the things it did to discover that.
2010-11-22 23:19:14 < Dark_Shikari> in verbose mode it can tell me that
2010-11-22 23:58:13 < kemuri-_9> what is the purpose of that large configure change for the 1/0 vs 1/undef in the preprocessor vars?
--- Day changed Tue Nov 23 2010
2010-11-23 00:00:58 < holger_> <Dark_Shikari> But it's far more complicated and harder to read because it's doing things in a completely nonintutive, ugly way. <-- mostly due to the fact that this actually was an exercise in optimizing for atom. you only took the sse4 version, so it's not entirely obvious anymore ;)
2010-11-23 00:02:04 < Kovensky> kemuri-_9: being able to just use #if instead of having to use #ifdef
2010-11-23 00:02:28 < kemuri-_9> the preprocessor defaults undef to 0 though
2010-11-23 00:02:41 < Kovensky> and complains about it
2010-11-23 00:02:47 < kemuri-_9> what compiler?
2010-11-23 00:03:28 < Kovensky> gcc
2010-11-23 00:04:02 < pengvado> #if works on undef, but any other forms of conditionals don't
2010-11-23 00:04:27 < pengvado> ?:, &&, if(), ...
2010-11-23 00:04:32 < kemuri-_9> do we use any other conditionals on those values?
2010-11-23 00:04:49 < pengvado> not yet
2010-11-23 00:05:41 < kemuri-_9> it would probably be easier to check at the end of the ./configure to see which values are missing from the config.h than if (x=yes) define x 1 else define x 0
2010-11-23 00:05:50 < kemuri-_9> which is being splattered everywhere
2010-11-23 00:06:44 < Kovensky> that's... awfully verbose
2010-11-23 00:06:55 < Kovensky> just define x $(x=yes)
2010-11-23 00:08:13 < pengvado> Kovensky: what language is that? not bash
2010-11-23 00:08:47 < kemuri-_9> define x [ $x = yes ]  <--- would this not be what he was trying to do?
2010-11-23 00:08:50 < Kovensky> it isn't, I'm just copying his pseudocode
2010-11-23 00:09:10 < Kovensky> define x $([ $x = yes ] && echo 1 || echo 0)
2010-11-23 00:09:17 < Kovensky> er
2010-11-23 00:09:32 < Kovensky> hm nvm, that's correct
2010-11-23 00:09:36 < Kovensky> $? would give wrong results
2010-11-23 00:11:22 < kemuri-_9> and why all the changes to the makefile/config.mak? changing them to findstring x 1  from findstring x  works doesn't it?
2010-11-23 00:11:30 < rfw> time for lunch then more regression testing
2010-11-23 00:27:06 < Sean_McG> DS: both my submissions?
2010-11-23 00:27:55 < rfw> there, that's the testing framework done
2010-11-23 00:28:02 < rfw> what regression tests am i required to implement?
2010-11-23 00:29:53 < Sean_McG> DS: my 2nd submission regarding the VIS asm is still OK, yes?
2010-11-23 00:32:56 < Sean_McG> DS: for the 'configure' change... is it acceptable if we just extend the length of the array and manually terminate the string?
2010-11-23 00:33:14 < Sean_McG> DS: and no one has been able to provide me with a platform to test where that breaks either.
2010-11-23 00:40:23 < Dark_Shikari> Sean_McG: I don't care about configure.
2010-11-23 00:40:27 < Dark_Shikari> All that matters is that pengvado is satisfied
2010-11-23 00:40:36 < jarod> and the wife
2010-11-23 00:40:37 < Dark_Shikari> rfw: ok, the most basic is the two dump yuv tests
2010-11-23 00:40:38 < Sean_McG> OK... I'll try again
2010-11-23 00:40:41 < Dark_Shikari> compare vs JM and ffmpeg
2010-11-23 00:40:51 < Dark_Shikari> ffmpeg has some known decoding bugs, so JM is more important
2010-11-23 00:40:54 < Dark_Shikari> but they work in the same way
2010-11-23 00:41:00 < Dark_Shikari> for JM, just run ldecod -i inputfile
2010-11-23 00:41:00 < rfw> ah so
2010-11-23 00:41:02 < Dark_Shikari> it'll output to test_dec.yuv
2010-11-23 00:41:07 < rfw> i get JM and ffmpeg
2010-11-23 00:41:07 < Sean_McG> also, does r1789 fix the same artifacting issue BugMaster was talking about yesterday, with gcc 4.5.x?
2010-11-23 00:41:14 < rfw> and get x264 to dump a yuv
2010-11-23 00:41:15 < Dark_Shikari> for ffmpeg, do ffmpeg -i input -pix_fmt yuv420p -f rawvideo output.yuv
2010-11-23 00:41:20 < Dark_Shikari> Sean_McG: yes
2010-11-23 00:41:22 < rfw> yeah i know how to use ffmpeg
2010-11-23 00:41:23 < Sean_McG> OK cool
2010-11-23 00:41:29  * Sean_McG rebuilds
2010-11-23 00:43:20 < rfw> time to get my msbuild ou
2010-11-23 00:43:20 < rfw> t
2010-11-23 00:44:01 < Dark_Shikari> msbuild?
2010-11-23 00:45:46 < rfw> yeah, to buidl the JM solution
2010-11-23 00:45:53 < Sean_McG> DS: can I revert my patch in my local tree without mucking with the commits I did afterwards? (I admit I'm so used to cvs and svn that I still don't fully grok how git workflows are supposed to be done)
2010-11-23 00:46:04 < Dark_Shikari> "revert"?
2010-11-23 00:46:10 < rfw> c:\users\tony\downloads\jm17.2\jm\lcommon\inc\win32.h(39): fatal error C1083: Cannot open include file: 'omp.h': No such file or directory [C:\Users\Tony\Downloads\jm17.2\JM\jm_vc9.sln]
2010-11-23 00:46:10 < rfw>  118 Error(s)
2010-11-23 00:46:15 < rfw> tihs isn't going very well
2010-11-23 00:46:20 < Sean_McG> DS: pull it out so I can edit it and recommit later
2010-11-23 00:46:27 < Dark_Shikari> git rebase -i
2010-11-23 00:46:36 < Dark_Shikari> rfw: compile in cygwin/mingw
2010-11-23 00:46:38 < Sean_McG> what does that do, cherry pick?
2010-11-23 00:46:44 < Dark_Shikari> it lets you edit any past commit
2010-11-23 00:46:50 < Dark_Shikari> just replace the word "pick" with the word "edit"
2010-11-23 00:46:53 < Dark_Shikari> for the commit you want
2010-11-23 00:47:01 < Dark_Shikari> and then when you're done, amend it and git rebase --continue
2010-11-23 00:47:24 < kemuri-_9> http://kemuri9.net/dev/misc/jm/jm17.2.rar  <--- i built this a while back for someone else in msvc, can use it if you want
2010-11-23 00:47:35 < Dark_Shikari> I just use cygwin, works fine here :>
2010-11-23 00:47:42 < rfw> but i just finied building with cygwin :<
2010-11-23 00:48:04 < kemuri-_9> pengvado: what do you think of http://pastebin.com/ZAb3crs1 ?
2010-11-23 00:48:09 < Dark_Shikari> that doesn't look like an error from cygwin
2010-11-23 00:48:26 < rfw> no i mean
2010-11-23 00:48:32 < rfw> i just built it with cygwin
2010-11-23 00:48:36 < rfw> (successfully)
2010-11-23 00:48:46 < rfw> that was from vs2010 before
2010-11-23 00:50:52 < Dark_Shikari> well that's what you get for trying to use vs2010
2010-11-23 00:51:09 < rfw> :<
2010-11-23 00:51:11 < kemuri-_9> lol
2010-11-23 00:51:52 < kemuri-_9> speaking of which i should reinstall vs2008 and icl so i can mess with that crap again
2010-11-23 00:56:07 < Jumpyshoes> Dark_Shikari: so i fell asleep, but would this work for SUMSUB_BA: http://pastebin.com/6Fms1SAB ?
2010-11-23 00:57:09 < Dark_Shikari> no
2010-11-23 00:57:44 < rfw> Dark_Shikari: how do i make x264 output raw YUV again?
2010-11-23 00:57:59 < Dark_Shikari> Jumpyshoes: you probably want something like http://pastebin.com/xewaKQPt
2010-11-23 00:58:05 < checkers> -o file.yuv
2010-11-23 00:58:05 < Dark_Shikari> rfw: --dump-yuv file.yuv
2010-11-23 00:58:09 < Jumpyshoes> oh
2010-11-23 00:58:50 < rfw> ah okay
2010-11-23 00:58:51 < Jumpyshoes> thanks
2010-11-23 00:59:53 < rfw> Dark_Shikari: do you have any specific file or anything to test it on
2010-11-23 00:59:58 < rfw> or do i just grab one of my videos
2010-11-23 01:00:48 < Dark_Shikari> rfw: http://media.xiph.org/video/derf/ pick one
2010-11-23 01:00:59 < Dark_Shikari> if you want to test breaking x264 to see if your tool detects it
2010-11-23 01:01:13 < Dark_Shikari> in common/mc.c, find pixel_avg_wxh
2010-11-23 01:01:14 < Dark_Shikari> remove the +1
2010-11-23 01:01:23 < Dark_Shikari> then recompile and run x264 with --no-asm
2010-11-23 01:01:29 < Dark_Shikari> it will fail all tests on the first b-frame
2010-11-23 01:01:34 < rfw> ah
2010-11-23 01:02:34 < Dark_Shikari> I chose that function because I and P-frames will still work
2010-11-23 01:02:38 < Dark_Shikari> so it doesn't completely break things
2010-11-23 01:02:41 < Dark_Shikari> allowing more meaningful testing
2010-11-23 01:08:21 < Sean_McG> alright, I guess I need someone who gives a rip about this endian test... doesn't mean anything to Windows people. Does pengvado use Windows or Linux or..?
2010-11-23 01:08:52 < Sean_McG> there *is* an autoconf macro: AC_C_BIGENDIAN... should I just adapt that?
2010-11-23 01:09:39 < rfw> Fixture result: PASS, took 6.9245 seconds.
2010-11-23 01:09:39 < rfw> Project result: PASS, took 6.9245 seconds.
2010-11-23 01:09:40 < rfw> \o/
2010-11-23 01:09:52 < Dark_Shikari> \o/
2010-11-23 01:10:57 < rfw> Fixture result: FAIL.
2010-11-23 01:10:58 < rfw> Project result: FAIL.
2010-11-23 01:11:00 < rfw> it can fail too \o/
2010-11-23 01:11:11 < Dark_Shikari> Now we just have to make that more descriptive.
2010-11-23 01:11:18 < rfw> yup
2010-11-23 01:11:35 < Jumpyshoes> Dark_Shikari http://pastebin.com/T0jbAwwv would this work for the rest?
2010-11-23 01:11:39  * Dark_Shikari very much likes finding high school students who can python better than he can
2010-11-23 01:11:48 < Dark_Shikari> also python is now a verb
2010-11-23 01:11:54 < rfw> http://pastebin.com/GF1czXrr
2010-11-23 01:12:02 < rfw> this probably would be much better suited for python projects, lol
2010-11-23 01:12:15 < Dark_Shikari> Jumpyshoes: lgtm, you now have to modify the rest of x264 asm to call these correctly
2010-11-23 01:12:18 < Dark_Shikari> i.e. grep for them, add ws
2010-11-23 01:12:27 < Dark_Shikari> once you're done, compile in bit-depth 8 mode, and checkasm
2010-11-23 01:12:31 < Dark_Shikari> and fix until it works
2010-11-23 01:12:38 < Jumpyshoes> ws?
2010-11-23 01:13:14 < Dark_Shikari> plural w
2010-11-23 01:13:25 < Jumpyshoes> oh right
2010-11-23 01:13:57 < Jumpyshoes> there's like 3 pages of SUMSUBs, is there any way to replace them in all files?
2010-11-23 01:14:06 < Dark_Shikari> find/replace
2010-11-23 01:14:13 < Dark_Shikari> SUMSUB -> SUMSUMB w,
2010-11-23 01:14:22 < Jumpyshoes> can i ignore spacing for now?
2010-11-23 01:14:29 < Dark_Shikari> find/replace should be able to handle spacing.
2010-11-23 01:14:33 < Dark_Shikari> Anyways, it's just 15 minutes of work =p
2010-11-23 01:14:34 < Jumpyshoes> true
2010-11-23 01:14:42 < Jumpyshoes> <-- lazy butt
2010-11-23 01:15:08 < Dark_Shikari> welcome to the club
2010-11-23 01:15:20 < kierank> [01:15] Jumpyshoes: <-- lazy butt --> you'll fit right in
2010-11-23 01:15:34 < Dark_Shikari> pretty much
2010-11-23 01:15:47 < Dark_Shikari> like the other day when I spent 2 hours to finish chroma weightp.
2010-11-23 01:15:47 < Jumpyshoes> oh excellent
2010-11-23 01:15:49 < Dark_Shikari> wait -- you say -- that's work!
2010-11-23 01:15:57 < Dark_Shikari> But it took one full year for me to get around to doing it
2010-11-23 01:16:03 < Jumpyshoes> LOL
2010-11-23 01:16:08 < Jumpyshoes> you sound exactly like one of my friends
2010-11-23 01:16:19 < Dark_Shikari> and I only did it on a dare from boiled_sugar
2010-11-23 01:17:07 < kemuri-_9> so dares are useful motivation?
2010-11-23 01:17:25 < Kovensky> 22:16.19 Dark_Shikari: and I only did it on a dare from boiled_sugar <-- lol
2010-11-23 01:17:26 < Kovensky> what was it
2010-11-23 01:18:03 < Dark_Shikari> he kept yelling at me to do chroma weightp
2010-11-23 01:18:07 < Dark_Shikari> so I said I'd do it that weekend
2010-11-23 01:18:08 < Dark_Shikari> so I did
2010-11-23 01:18:40 < Jumpyshoes> haha, 97 of the SUMSUBs are arm
2010-11-23 01:18:47 < Jumpyshoes> that's half my work done!
2010-11-23 01:18:49 < Dark_Shikari> saves you some time eh!
2010-11-23 01:18:50 < Dark_Shikari> lol
2010-11-23 01:25:51 < rfw> Running test yuv_compare... failed, due to x264-output.yuv is not the same as jm-output.yuv (69076e2a5a9a76d5b483abe953b4a6ce vs 19becfba7cf53d773756095d8a355938)
2010-11-23 01:26:04 < rfw> there's some more verbosity
2010-11-23 01:26:06 < Dark_Shikari> don't need to hash imo
2010-11-23 01:26:08 < Dark_Shikari> waste of time
2010-11-23 01:26:16 < rfw> yeah but i haven't implemented all the macroblocking
2010-11-23 01:26:17 < rfw> and stuff
2010-11-23 01:26:24 < rfw> so i'm just doing that for now
2010-11-23 01:27:14 < Dark_Shikari> you could just give a file offset
2010-11-23 01:27:17 < Dark_Shikari> like diff does
2010-11-23 01:27:27 < rfw> ah
2010-11-23 01:27:34 < Dark_Shikari> btw, you should test your thing on files larger than ram
2010-11-23 01:27:38 < Dark_Shikari> i.e. you should not rely on O(N) ram usage
2010-11-23 01:27:44 < Dark_Shikari> because yuv files get really fucking huge.
2010-11-23 01:28:44 < rfw> yeah i know
2010-11-23 01:28:55 < rfw> i'm reading it into a buffer and then puking it out again
2010-11-23 01:29:13 < Dark_Shikari> yeah, don't do that
2010-11-23 01:29:24 < rfw> what?
2010-11-23 01:29:30 < Dark_Shikari> I mean, don't read the whole file into a buffer
2010-11-23 01:29:35 < rfw> no i'm not doing that
2010-11-23 01:29:35 < rfw> lol
2010-11-23 01:29:36 < Dark_Shikari> oh ok
2010-11-23 01:29:37 < Dark_Shikari> lol
2010-11-23 01:29:41 < rfw> i'm reading 8k bytes at a time
2010-11-23 01:29:46 < Dark_Shikari> good, so you're not like the author of flvtool2
2010-11-23 01:29:49 < rfw> lol
2010-11-23 01:29:58 < Dark_Shikari> aka "ruby programmers"
2010-11-23 01:30:10 < rfw> i could never get into ruby
2010-11-23 01:30:11 < rfw> lol
2010-11-23 01:30:54 < Dark_Shikari> btw, remember that your script will have to pass through pengvado
2010-11-23 01:30:59 < Dark_Shikari> think of him as gandalf, and you as the balrog
2010-11-23 01:31:21 < rfw> it's not so much a script as a huge python library
2010-11-23 01:31:36 < kemuri-_9> lol @ that analogy
2010-11-23 01:31:44 < rfw> wait a minute, doesn't that mean i lose
2010-11-23 01:33:22 < Dark_Shikari> lol
2010-11-23 01:33:30 < Dark_Shikari> it's a great analogy for professors
2010-11-23 01:33:33 < Dark_Shikari> YOU SHALL NOT PASS etc
2010-11-23 01:33:46 < Dark_Shikari> "huge python library"?
2010-11-23 01:34:00 < rfw> well
2010-11-23 01:34:12 < rfw> http://pastebin.com/GF1czXrr
2010-11-23 01:34:35 < Dark_Shikari> what's digress?
2010-11-23 01:34:43 < rfw> what i called it
2010-11-23 01:34:49 < Dark_Shikari> Oh, and where will we get that?
2010-11-23 01:34:59 < rfw> i haven't finished writing it you know
2010-11-23 01:34:59 < rfw> :3
2010-11-23 01:35:01 < Dark_Shikari> will it be included in the source tree or just a lib the regtest tool requires?
2010-11-23 01:35:04 < Dark_Shikari> lol
2010-11-23 01:35:07 < Dark_Shikari> hahaha
2010-11-23 01:35:14 < Dark_Shikari> works for me as long as it works =p
2010-11-23 01:36:55 < Dark_Shikari> the main issue being that since your tool isn't part of x264, dependency issues aren't as big a deal
2010-11-23 01:37:03 < Dark_Shikari> for x264 itself, we make a big deal about not requiring external dependencies
2010-11-23 01:37:07 < Dark_Shikari> but for dev tools, whateva
2010-11-23 01:37:36 < rfw> well
2010-11-23 01:37:39 < rfw> it only depends on python
2010-11-23 01:38:05 < rfw> i guess you could include my package that's more or less standalone in the source tree
2010-11-23 01:38:23 < rfw> (i sorta wanted to use it for my own projects in future too :P)
2010-11-23 01:38:32 < Dark_Shikari> we could put it in extras/
2010-11-23 01:38:50 < Dark_Shikari> like we put the avisynth C header there, and getopt, and other system-dependent imports that not everyone has
2010-11-23 01:39:22 < rfw> ah
2010-11-23 01:47:49 < rfw> alright Dark_Shikari
2010-11-23 01:47:51 < rfw> that's that done
2010-11-23 01:47:53 < rfw> what else do i need
2010-11-23 01:48:10 < Jumpyshoes> hey Dark_Shikari, i'm getting the error: undefined symbol `w' in preprocessor, how do i set it to be a word again? bw is in other places in the code, is it that?
2010-11-23 01:48:46 < Dark_Shikari> where do you get this error?
2010-11-23 01:48:53 < Dark_Shikari> you will only get that error if you attempt to use it as a symbol
2010-11-23 01:48:55 < Dark_Shikari> which you shouldn't
2010-11-23 01:49:16 < Jumpyshoes> common/x86/dct-a.asm:178: error: (SUMSUBD2_AB:3) undefined symbol `w' in preprocessor
2010-11-23 01:49:58 < Dark_Shikari> what's the third line of SUMSUBd@_AB?
2010-11-23 01:50:18 < Dark_Shikari> Ah, 79-85 in your paste is wrong
2010-11-23 01:50:21 < Dark_Shikari> it should be psra%1
2010-11-23 01:50:24 < Dark_Shikari> like in the other cases
2010-11-23 01:51:09 < Jumpyshoes> oh
2010-11-23 01:51:20 < Dark_Shikari> if you want to do a comparison, you'd do
2010-11-23 01:51:22 < Dark_Shikari> %ifidn %1, w
2010-11-23 01:51:26 < Dark_Shikari> But you don't need to.
2010-11-23 01:51:40 < Jumpyshoes> google lied to me ;-;
2010-11-23 01:52:09 < Jumpyshoes> no it didn't
2010-11-23 01:52:11 < Jumpyshoes> i'm just a dumbass
2010-11-23 01:52:23 < rfw> you know if google lied to us this competition could all be a lie
2010-11-23 01:52:24 < rfw> !!
2010-11-23 01:52:44 < Jumpyshoes> oh shit
2010-11-23 01:52:52 < Jumpyshoes> are you also a high school student?
2010-11-23 01:52:55 < rfw> yeah
2010-11-23 01:52:57 < rfw> lol
2010-11-23 01:53:11 < rfw> i don't know shit about asm though
2010-11-23 01:53:15 < Jumpyshoes> where are you from?
2010-11-23 01:53:21 < rfw> new zealand
2010-11-23 01:53:23 < Dark_Shikari> google lied to you?
2010-11-23 01:53:24 < rfw> lol
2010-11-23 01:53:39 < Jumpyshoes> i googled the instruction
2010-11-23 01:53:41 < Dark_Shikari> GOOGLE LIED, PEOPLE DIED
2010-11-23 01:53:43 < Jumpyshoes> and uh
2010-11-23 01:53:50 < Jumpyshoes> missed the instruction in the pages that came up
2010-11-23 01:53:51 < Jumpyshoes> somehow
2010-11-23 01:53:55 < Dark_Shikari> ?
2010-11-23 01:54:09 < rfw> where are you from, Jumpyshoes
2010-11-23 01:54:27 < rfw> washdc.fios.verizon.net lolnevermind
2010-11-23 01:54:30 < Jumpyshoes> yup
2010-11-23 01:54:43 < Jumpyshoes> i googled PSRAD but somehow thought it didn't exist
2010-11-23 01:54:48 < Dark_Shikari> Jumpyshoes: google lies
2010-11-23 01:54:51 < Dark_Shikari> =p
2010-11-23 01:55:07 < Dark_Shikari> psrad is too rad for google
2010-11-23 01:55:29 < Jumpyshoes> well, it did show up
2010-11-23 01:55:36 < Jumpyshoes> i just somehow missed every page that came up
2010-11-23 01:55:38 < Jumpyshoes> or something
2010-11-23 01:56:13 < pengvado> Sean_McG: AC_C_BIGENDIAN is implemented by 200 lines of m4. if by "adapt" you mean "throw out everything except the grep that's just like the one we already have", ok.
2010-11-23 01:56:26 < pengvado> null-termination is good enough for me
2010-11-23 01:57:21 < Sean_McG> pengvado: how do you "null terminate" an int?
2010-11-23 01:57:58 < pengvado> int i[2] = {0x42494745,0};
2010-11-23 01:58:20 < Sean_McG> and those are guaranteed to be sequential and fix the issue that Mans was pointing out?
2010-11-23 01:58:24 < pengvado> yes
2010-11-23 01:58:44 < Sean_McG> hmmm, OK. I'll fix and resend my patch as soon as 1790 is built
2010-11-23 01:59:24 < pengvado> mru is right that clang's intermediate representation isn't real asm and thus still breaks, but since I don't know of any method that *would* work there, sucks to be clang
2010-11-23 01:59:35 < Dark_Shikari> lol
2010-11-23 01:59:40 < Sean_McG> aye, that was my thought too
2010-11-23 02:00:18 < Sean_McG> the endian test would have to be "if { not building with clang }"
2010-11-23 02:01:20 < Sean_McG> and here we go... fprofiled just finished on my Sol10 VM. just need to package it up and post it on my webserver
2010-11-23 02:01:41 < Jumpyshoes> okay Dark_Shikari
2010-11-23 02:01:43 < Jumpyshoes> i think i am done
2010-11-23 02:02:18 < Dark_Shikari> does it pass checkasm?
2010-11-23 02:02:21 < Jumpyshoes> yea
2010-11-23 02:02:27 < Dark_Shikari> in 8 and 10-bit?
2010-11-23 02:02:30 < Jumpyshoes> oh
2010-11-23 02:02:34 < Jumpyshoes> haven't tested 10bit
2010-11-23 02:02:39 < Dark_Shikari> just check to make sure you didn't break it
2010-11-23 02:02:49 < Dark_Shikari> Once you're done with that, you can write your IDCT_1D high bit depth version
2010-11-23 02:02:50 < Dark_Shikari> or, even better...
2010-11-23 02:02:56 < Dark_Shikari> make IDCT4_1D take a w or d :) :)
2010-11-23 02:03:04 < Jumpyshoes> i actually have a geo quiz tomorrow
2010-11-23 02:03:07 < Jumpyshoes> that everyone failed
2010-11-23 02:03:10 < Jumpyshoes> even though it's GEO
2010-11-23 02:03:16 < Jumpyshoes> so i actually need to study for that
2010-11-23 02:03:22 < pengvado> kemuri-_9: `[ ... ]` fails, because [ sets an exit code whereas `` captures stdout
2010-11-23 02:03:25 < rfw> how old are you Jumpyshoes, out of interest
2010-11-23 02:03:27 < Jumpyshoes> 17
2010-11-23 02:03:31 < Jumpyshoes> you?
2010-11-23 02:03:32 < pengvado> just let the !GPL case be handled by the same loop as everything else
2010-11-23 02:03:33 < rfw> 16
2010-11-23 02:03:39 < kemuri-_9> ok
2010-11-23 02:03:40 < Jumpyshoes> <-- 12th grade
2010-11-23 02:03:40 < Dark_Shikari> I was 17 when I started on x264
2010-11-23 02:03:50 < Jumpyshoes> i can't wait for that point in school
2010-11-23 02:03:55 < Jumpyshoes> where i can just give up
2010-11-23 02:03:55  * Sean_McG is... not that young :(
2010-11-23 02:03:56 < rfw> <-- some wacky new zealand schooling level
2010-11-23 02:03:58 < Dark_Shikari> Jumpyshoes: the "I got accepted to college, now I can slack"
2010-11-23 02:04:04 < Jumpyshoes> yes.
2010-11-23 02:04:12 < Dark_Shikari> "senioritis"
2010-11-23 02:04:13 < rfw> heh i'm applying for university
2010-11-23 02:04:16 < Jumpyshoes> same
2010-11-23 02:04:33 < rfw> well not so much applying as having my parents tell me i can then i can't then i can
2010-11-23 02:05:24 < rfw> anyway Dark_Shikari
2010-11-23 02:05:28 < rfw> what are all the tests i need to implement/
2010-11-23 02:05:40 < Dark_Shikari> it's less so need and moreso want
2010-11-23 02:05:47 < Dark_Shikari> But there are a few basic tests:
2010-11-23 02:05:48 < rfw> lol
2010-11-23 02:05:51 < Dark_Shikari> 1) Compare YUV: JM
2010-11-23 02:05:54 < rfw> done
2010-11-23 02:05:55 < Dark_Shikari> 2) Compare YUV: ffmpeg
2010-11-23 02:06:07 < Dark_Shikari> 3) test PSNR: just run x264 and grab the Global PSNR value
2010-11-23 02:06:13 < Dark_Shikari> 4) test SSIM: same, except grab the SSIM value
2010-11-23 02:06:13 < Jumpyshoes> okay, i think i broke something
2010-11-23 02:06:15 < Jumpyshoes> checkasm isn't compile
2010-11-23 02:06:20 < Jumpyshoes> compiling
2010-11-23 02:06:21 < rfw> is that all?
2010-11-23 02:06:27 < Dark_Shikari> Jumpyshoes: ?
2010-11-23 02:06:39 < Dark_Shikari> rfw: well, then you have to implement something to let it do useful things with git
2010-11-23 02:06:42 < Dark_Shikari> like tell me where things broke.
2010-11-23 02:06:48 < Dark_Shikari> and finally, you have to test multiple x264 options, that's the main thing
2010-11-23 02:06:48 < rfw> i have bisection integration already
2010-11-23 02:06:57 < Dark_Shikari> i.e. be able to run test 1) with different x264 options, automatically
2010-11-23 02:07:00 < Dark_Shikari> to attempt to find various bugs
2010-11-23 02:07:00 < kemuri-_9> pengvado: care for any particular order in the HAVE list?
2010-11-23 02:07:08 < Dark_Shikari> that's the most important part
2010-11-23 02:07:11 < Dark_Shikari> testing one set of options isn't helpful
2010-11-23 02:07:15 < Dark_Shikari> testing a ton is useful
2010-11-23 02:07:48 < Jumpyshoes> for some reason, i think trying to compile 8-bit, and then 10-bit breaks gcc until i restart cygwin
2010-11-23 02:08:19 < Dark_Shikari> make distclean
2010-11-23 02:08:27 < Jumpyshoes> oh
2010-11-23 02:08:31 < Jumpyshoes> that will probably solve things
2010-11-23 02:08:43 < Sean_McG> aye, you have to do that because HIGH_BIT_DEPTH is in config.mak
2010-11-23 02:10:18 < rfw> Running test ffmpeg... failed: x264-output.yuv is not the same size as ffmpeg-output.yuv (11404800 vs 9542016)
2010-11-23 02:10:21 < rfw> lol i think my ffmpeg is broken
2010-11-23 02:12:08 < Jumpyshoes> sexy, checkasm passed
2010-11-23 02:12:18 < Dark_Shikari> rfw: ffmpeg commandline?
2010-11-23 02:12:29 < rfw> [ "ffmpeg", "-i", "akiyo_qcif.264", "-pix_fmt", "yuv420p", "-f", "rawvideo", "ffmpeg-output.yuv" ]
2010-11-23 02:12:33 < pengvado> kemuri-_9: no preference
2010-11-23 02:12:58 < rfw> actually
2010-11-23 02:13:03 < rfw> i probably don't need all those params
2010-11-23 02:13:26 < Dark_Shikari> none of those should affect file length, hmmph
2010-11-23 02:14:03 < rfw> maybe my ffmpeg just sucks
2010-11-23 02:14:04 < rfw> lol
2010-11-23 02:14:56 < Dark_Shikari> you could look at the output of ffmpeg
2010-11-23 02:15:19 < rfw> hold on
2010-11-23 02:15:22 < rfw> let me just ignore filesize
2010-11-23 02:15:58 < rfw> Running test ffmpeg... failed: x264-output.yuv is not the same as ffmpeg-output.yuv (offset 269008)
2010-11-23 02:16:14 < Dark_Shikari> check the ffmpeg output on stderr
2010-11-23 02:16:15 < Jumpyshoes> does SWAP care about whether it's w, or d? i don't think so
2010-11-23 02:16:39 < Dark_Shikari> nope
2010-11-23 02:16:42 < Dark_Shikari> SWAP just swaps registers
2010-11-23 02:17:00 < Jumpyshoes> cool
2010-11-23 02:17:04 < rfw> oh also is there a flag to stop ffmpeg telling me if i want to overwrite
2010-11-23 02:17:19 < Jumpyshoes> can i claim IDCT4_1D, even though it isn't actually a function?
2010-11-23 02:17:42 < Dark_Shikari> "claim"?
2010-11-23 02:17:54 < Dark_Shikari> I mean, you'll have to write a function using it at some point!
2010-11-23 02:17:57 < Jumpyshoes> true
2010-11-23 02:17:58 < Dark_Shikari> even if it's nigh-identical to the mmx
2010-11-23 02:18:01 < Jumpyshoes> okay, might as well do it
2010-11-23 02:18:19 < Dark_Shikari> "writing a function" just means creating a new function really
2010-11-23 02:18:25 < Dark_Shikari> whether or not it involved writing
2010-11-23 02:18:47 < rfw> video:9318kB audio:0kB global headers:0kB muxing overhead 0.000000%
2010-11-23 02:18:57 < rfw> oh
2010-11-23 02:18:57 < rfw> Seems stream 0 codec frame rate differs from container frame rate: 29.97 (30000/1001) -> 25.00 (25/1)
2010-11-23 02:19:11 < rfw> that's probably it, then?
2010-11-23 02:19:12 < Jumpyshoes> i forgot the fricking function i was planning to write in the first place
2010-11-23 02:19:44 < espes> Dark_Shikari: For the GCI task, off the top of your head do you know any basic candidates for neon-ing?
2010-11-23 02:20:32 < Dark_Shikari> Tons of stuff
2010-11-23 02:20:36 < Dark_Shikari> oh you mean easy ones?
2010-11-23 02:20:44 < Dark_Shikari> I mean, x264 has tons and tons of functions that are missing neon
2010-11-23 02:20:48 < Dark_Shikari> including many simple ones, like variance
2010-11-23 02:20:55 < Dark_Shikari> and chroma mc needs to be updated
2010-11-23 02:21:08 < Dark_Shikari> your best bet is to just run a profile and look at all the C dsp functions high on the list :)
2010-11-23 02:21:23 < Dark_Shikari> Jumpyshoes: idct 4x4
2010-11-23 02:21:25 < espes> Dark_Shikari: I'll have a look, thanks.
2010-11-23 02:21:35 < Jumpyshoes> i thought that was a macro?
2010-11-23 02:21:51 < Dark_Shikari> add_idct4_mmx?  no, that's a function
2010-11-23 02:21:52 < Dark_Shikari> or whatever it's called
2010-11-23 02:22:15 < Jumpyshoes> oh, add4x4_idct_mmx i believe
2010-11-23 02:22:38 < rfw> where is the psnr data in the x264 output?
2010-11-23 02:22:39 < Sean_McG> sweet... can confirm that 1789 fixes the corruption here too
2010-11-23 02:23:58 < pengvado> rfw: did you enable the --psnr option?
2010-11-23 02:24:12 < rfw> oh that makes more sense
2010-11-23 02:24:12 < rfw> lol
2010-11-23 02:24:25 < rfw> thanks
2010-11-23 02:27:37 < kemuri-_9> Dark_Shikari: http://pastebin.com/99HiveUE instead of the patch you have currently
2010-11-23 02:29:50 < rfw> alright, that's the psnr done
2010-11-23 02:30:09 < rfw> i hope you don't mind regular expressions
2010-11-23 02:31:53 < Dark_Shikari> kemuri-_9: done
2010-11-23 02:32:34 < rfw> x264 [[]info[]]: PSNR Mean Y:\d+[.]\d+ U:\d+[.]\d+ V:\d+[.]\d+ Avg:\d+[.]\d+ Global:(\d+[.]\d+) kb/s:\d+[.]\d+
2010-11-23 02:32:37 < rfw> this should be fine, right
2010-11-23 02:33:14 < Jumpyshoes> hrm, 16*4 fits in an mm*, so isn't changing IDCT4_1D really easy?
2010-11-23 02:33:24 < Dark_Shikari> 32*4 doesn't
2010-11-23 02:33:30 < Dark_Shikari> Of course, if you convert the function to use SSE instead of MMX
2010-11-23 02:33:32 < Dark_Shikari> it's DEAD simple
2010-11-23 02:33:36 < Dark_Shikari> i.e. nothing whatsoever changes
2010-11-23 02:33:41 < Dark_Shikari> since 4x32 is the same as 4x16
2010-11-23 02:34:06 < Jumpyshoes> oh right, these are DCT coefficients
2010-11-23 02:34:39 < Jumpyshoes> oh gosh, that's kinda annoying in mmx
2010-11-23 02:34:43 < Dark_Shikari> You just do it twice.
2010-11-23 02:34:50 < Dark_Shikari> OK, so the transpose will get a bit tricky, that's it.
2010-11-23 02:34:52 < Dark_Shikari> But do the SSE version first.
2010-11-23 02:34:58 < Dark_Shikari> Since the SSE version is going to be mindnumbingly trivial
2010-11-23 02:35:04 < Dark_Shikari> i.e. swap w for d, swap mm for xmm, done
2010-11-23 02:35:06 < Dark_Shikari> in fact
2010-11-23 02:35:08 < Dark_Shikari> you could just template it
2010-11-23 02:35:23 < Jumpyshoes> template it?
2010-11-23 02:35:48 < Sean_McG> DS: is my HIGH_BIT_DEPTH fix going to be pushed next commit storm, or was there an issue with that too?
2010-11-23 02:36:39 < Dark_Shikari> Sean_McG: I think that's fine.
2010-11-23 02:36:42 < Sean_McG> OK
2010-11-23 02:36:43 < Dark_Shikari> pengvado only had issues with two patches.
2010-11-23 02:36:45 < Dark_Shikari> Only one was yours.
2010-11-23 02:36:58 < Sean_McG> I'll be sending a new one RSN
2010-11-23 02:37:44 < Jumpyshoes> wait Dark_Shikari, IDCT4_1D uses the aliases already
2010-11-23 02:37:53 < Jumpyshoes> so m%* already
2010-11-23 02:38:00 < Jumpyshoes> isn't it basically set for SSE?
2010-11-23 02:40:05 < Dark_Shikari> Yup
2010-11-23 02:40:08 < Dark_Shikari> All you have to do is template the function
2010-11-23 02:40:10 < Dark_Shikari> i.e.
2010-11-23 02:40:18 < Dark_Shikari> %macro MY_MACRO_NAME 1
2010-11-23 02:40:25 < Dark_Shikari> cglobal function_name_%1
2010-11-23 02:40:26 < Dark_Shikari> ...
2010-11-23 02:40:28 < Dark_Shikari> %endmacro
2010-11-23 02:40:28 < Dark_Shikari> then
2010-11-23 02:40:40 < Dark_Shikari> INIT_MMX
2010-11-23 02:40:42 < Dark_Shikari> MY_MACRO_NAME mmx
2010-11-23 02:40:44 < Dark_Shikari> INIT_XMM
2010-11-23 02:40:47 < Dark_Shikari> MY_MACRO_NAME sse2
2010-11-23 02:40:55 < Dark_Shikari> though of course you'll have to intersperse the aprpopriate high bit depth ifdefs
2010-11-23 02:41:10 < Jumpyshoes> and i would need to redo the transpose
2010-11-23 02:41:15 < Jumpyshoes> oh boy
2010-11-23 02:41:26 < Jumpyshoes> wait
2010-11-23 02:41:33 < Jumpyshoes> there's no transpose in the IDCT4_1D
2010-11-23 02:41:34 < Jumpyshoes> yay
2010-11-23 02:42:11 < Dark_Shikari> You'll need a TRANSPOSE_4x4D, yes
2010-11-23 02:42:14 < Dark_Shikari> I think there already is one though
2010-11-23 02:42:22 < Jumpyshoes> yea, there is
2010-11-23 02:53:22 < rfw> Dark_Shikari: http://pastebin.com/GuM6bSXZ
2010-11-23 02:54:32 < Dark_Shikari> nice
2010-11-23 02:55:15 < rfw> i think my thresholds are a bit too high
2010-11-23 02:55:25 < rfw> by how much do psnr and ssim differ?
2010-11-23 02:55:28 < Dark_Shikari> "thresholds"?
2010-11-23 02:55:35 < rfw> you know
2010-11-23 02:55:42 < rfw> floating point arithmetic
2010-11-23 02:55:50 < rfw> or should i just take the value as it is
2010-11-23 02:55:59 < rfw> without converting it to a double
2010-11-23 02:56:51 < Dark_Shikari> I'm not sure what you're talking about
2010-11-23 02:56:58 < rfw> well
2010-11-23 02:57:02 < rfw> they're floating point numbers, right
2010-11-23 02:57:06 < Dark_Shikari> yeah
2010-11-23 02:57:12 < rfw> so you can't perform direct comparisons
2010-11-23 02:57:20 < rfw> so how much allowance should i give the values of psnr and ssim
2010-11-23 02:57:25 < Dark_Shikari> "allowance"?
2010-11-23 02:57:28 < Dark_Shikari> this isn't for a regression test
2010-11-23 02:57:30 < Dark_Shikari> at least, not directly
2010-11-23 02:57:33 < rfw> well
2010-11-23 02:57:36 < Dark_Shikari> this is so we can see change in psnr and ssim over time at a given setting
2010-11-23 02:57:39 < Dark_Shikari> e.g. to make a graph
2010-11-23 02:57:42 < rfw> oh
2010-11-23 02:57:47 < rfw> ah oh
2010-11-23 02:58:01 < rfw> i need to stop thinking of this as a unit test
2010-11-23 02:58:07 < Dark_Shikari> yeah
2010-11-23 02:58:53 < nattofriends> LIFE AS UNIT TEST
2010-11-23 03:01:01 < Dark_Shikari> Also, you script should be able to catch things like crashes
2010-11-23 03:01:04 < Dark_Shikari> i.e. say "x264 crashed"
2010-11-23 03:01:10 < Dark_Shikari> or, more usefully, SIGSEGV etc
2010-11-23 03:01:14 < Dark_Shikari> *your script
2010-11-23 03:01:21 < Dark_Shikari> As that's important to know too.
2010-11-23 03:01:21 < Jumpyshoes> oh hi nattofriends
2010-11-23 03:01:29 < Sean_McG> not using dejagnu for unit tests? ;)
2010-11-23 03:01:33 < Dark_Shikari> another TJ resident?
2010-11-23 03:01:34 < nattofriends> hi Jumpyshoes
2010-11-23 03:01:41 < Jumpyshoes> nope
2010-11-23 03:01:47 < Jumpyshoes> know him from elsewhere
2010-11-23 03:01:54 < Dark_Shikari> the story of my life
2010-11-23 03:02:18 < rfw> i swear i've seen nattofriends somewhere else too
2010-11-23 03:02:26 < nattofriends> darkhold?
2010-11-23 03:02:31 < rfw> oh probably
2010-11-23 03:02:47 < rfw> i've probably seen everyone there is on that terrible network
2010-11-23 03:03:42 < rfw> time to compile a program that segfaults
2010-11-23 03:03:43 < kierank> [03:02] Dark_Shikari: the story of my life --> lol
2010-11-23 03:04:03 < kierank> are you the Dark_Shikari from eve-online...
2010-11-23 03:04:28 < Dark_Shikari> hurr hurr yes
2010-11-23 03:04:30 < Dark_Shikari> exactly
2010-11-23 03:05:27 < rfw> int main() { int *a = 0; int b = *a; }
2010-11-23 03:05:34 < rfw> let's see what happens when i replace x264.exe with this
2010-11-23 03:05:41 < Dark_Shikari> you could just add a 1/0
2010-11-23 03:06:22 < rfw> heh it just gives me the windows blah blah has stopped working dialog
2010-11-23 03:06:36 < Jumpyshoes> wait Dark_Shikari, can you macro on macros? basically can i template IDCT4_1D even if it's a macro?
2010-11-23 03:06:53 < rfw> not sure how to handle segfaults in subprocesses
2010-11-23 03:07:29 < Sean_McG> OK... I'mma go watch Panty & Stocking
2010-11-23 03:08:56 < rfw> how do you even detect a segfault in a subprocess
2010-11-23 03:09:16 < Sean_McG> you can't
2010-11-23 03:09:26 < rfw> herp
2010-11-23 03:09:37 < Sean_McG> other than grep the return text
2010-11-23 03:09:50 < rfw> well i already dump the output to the terminal
2010-11-23 03:09:53 < rfw> so i guess that's fine
2010-11-23 03:10:37 < Dark_Shikari> Jumpyshoes: just make the size a parameter
2010-11-23 03:10:41 < Dark_Shikari> IDCT4_1D w, blah, blah, blah
2010-11-23 03:10:44 < Dark_Shikari> or d, blah, blah, blah
2010-11-23 03:10:54 < Jumpyshoes> ah, okay
2010-11-23 03:12:30 < Jumpyshoes> so when i reference variables, does it reset inside the inner macro?
2010-11-23 03:12:35 < Dark_Shikari> ?
2010-11-23 03:12:56 < Jumpyshoes> so i have something like
2010-11-23 03:13:02 < Jumpyshoes> %macro IDCT 2
2010-11-23 03:13:02 < Jumpyshoes> %macro IDCT4_1D_%1 %2 5-6
2010-11-23 03:13:03 < Jumpyshoes> i guess
2010-11-23 03:13:14 < Dark_Shikari> No
2010-11-23 03:13:16 < Dark_Shikari> don't do that
2010-11-23 03:13:18 < Dark_Shikari> there's no reason to
2010-11-23 03:13:45 < Jumpyshoes> oh
2010-11-23 03:16:37 < rfw> time to clone the git repo and build it o9k times
2010-11-23 03:17:34 < Dark_Shikari> Sean_McG: you get a faster respons eif you post things here
2010-11-23 03:18:25 < Sean_McG> DS: I like a record, but I can pastebin it if you like?
2010-11-23 03:19:18 < rfw>       0 [main] python 5232 C:\cygwin\bin\python.exe: *** fatal error - unable to remap \\?\C:\cygwin\lib\python2.6\lib-dynload\select.dll to same address as parent: 0x360000 != 0x3F0000
2010-11-23 03:19:22 < rfw> wtf cygwin ;_;
2010-11-23 03:20:13 < Dark_Shikari> Sean_McG: sure
2010-11-23 03:20:22 < Dark_Shikari> rfw: I think I've seen that before lol
2010-11-23 03:20:54 < rfw> oh i have to rebaseall
2010-11-23 03:22:34 < Sean_McG> http://www.pastebin.ca/1999626
2010-11-23 03:24:06 < Dark_Shikari> Sean_McG: applied
2010-11-23 03:24:11 < Sean_McG> thank you
2010-11-23 03:24:21 < Sean_McG> OK, I go watch anime now ^^;
2010-11-23 03:29:22 < rfw> Dark_Shikari: Running test ffmpeg... passed, took 2.7280 seconds (True)
2010-11-23 03:29:28 < rfw> i think my x264 was just broken
2010-11-23 03:29:28 < rfw> lol
2010-11-23 03:29:34 < Dark_Shikari> lol
2010-11-23 03:29:55 < Dark_Shikari> it might be useful to automatically check if ffmpeg or jm or similar is installed
2010-11-23 03:29:59 < Dark_Shikari> and tell the user if it isn't, and don't run the test
2010-11-23 03:30:06 < rfw> ah
2010-11-23 03:30:06 < rfw> well
2010-11-23 03:30:19 < rfw> i just check if it's the path?
2010-11-23 03:30:34 < Dark_Shikari> I guess?
2010-11-23 03:30:38 < Dark_Shikari> you could add an option to specify paths
2010-11-23 03:31:50 < Dark_Shikari> also, one issue we have is that there are some known decoding issues in JM
2010-11-23 03:31:54 < Dark_Shikari> particularly, it doesn't support x264's lossless mode
2010-11-23 03:32:07 < Dark_Shikari> so if we e.g. just randomly tried parameters, we'd get some spurious failures
2010-11-23 03:33:00 < rfw> ah
2010-11-23 03:33:09 < rfw> Running test ffmpeg... failed: x264-output.yuv is not the same size as ffmpeg-output.yuv (11404800 vs 9542016)
2010-11-23 03:33:13 < rfw> never mind it is my ffmpeg
2010-11-23 03:33:19 < rfw> why do i have so many ffmpegs installed
2010-11-23 03:35:05 < Dark_Shikari> they're like gremlins
2010-11-23 03:35:09 < Dark_Shikari> don't feed them video after midnight
2010-11-23 03:36:06 < rfw> ahaha i keep deleting folders that are opne in explorer
2010-11-23 03:36:09 < rfw> causing it to crash
2010-11-23 03:40:15 < rfw> Dark_Shikari: are there any revisions that don't compile?
2010-11-23 03:43:22 < kemuri-_9> revisions of x264?
2010-11-23 03:44:27 < rfw> yeah
2010-11-23 03:48:59 < kemuri-_9> hmm... i think there have a been a few, they're usually fairly rare (well it depends on the architecture, x86 being the primary - others are usually broken on a major change until someone coughs up a patch)
2010-11-23 03:49:27 < rfw> ah
2010-11-23 03:49:33 < rfw> i'm just looking to test my automatic bisection
2010-11-23 03:53:17 < Dark_Shikari> probably nothing won't compile on x86 on gcc 3.4.5
2010-11-23 03:53:20 < Dark_Shikari> i.e. what I compile on
2010-11-23 03:53:23 < Dark_Shikari> you could easily test it though!
2010-11-23 03:53:24 < Dark_Shikari> make a local commit!
2010-11-23 03:53:25 < Dark_Shikari> :)
2010-11-23 04:02:00 < rfw> lol
2010-11-23 04:10:05 < rfw> Bisecting between 8eaf8a (good) and HEAD (bad)...
2010-11-23 04:10:17 < rfw> let's hope this works
2010-11-23 04:11:51 < Dark_Shikari> also, GCI students, feel free to hang out in #x264
2010-11-23 04:12:06 < Dark_Shikari> where we discuss a wide variety of topics that are, in fact, totally x264-related
2010-11-23 04:12:12 < Dark_Shikari> like touhou and starcraft
2010-11-23 04:17:03 < rfw> coincidentally i have touhou music on
2010-11-23 04:17:39 < Dark_Shikari> I would if I wasn't watching the gomtv live starcraft stream
2010-11-23 04:17:56 < Dark_Shikari> what's with this high correlation between "working on x264" and "touhou"
2010-11-23 04:18:36 < rfw> :P
2010-11-23 04:18:46 < saintdev> sadly touhou is not such a good candidate for ffaac hacking :(
2010-11-23 04:19:09 < Dark_Shikari> why not
2010-11-23 04:19:13 < Dark_Shikari> there's touhou music for every genre
2010-11-23 04:19:46 < rfw> Project result: FAILED, took 5.0750 seconds.
2010-11-23 04:19:46 < rfw> No tests failed on 85d4e2df22e8a894b3746f9a5ead33bfdbee9566; revision is good.
2010-11-23 04:19:52 < rfw> wtf i definitely broke something here
2010-11-23 04:20:05 < Dark_Shikari> also can you make it take revision numbers instead of git hashes?
2010-11-23 04:20:07 < Dark_Shikari> see version.sh
2010-11-23 04:20:19 < rfw> let me fix this first lol
2010-11-23 04:20:31 < Dark_Shikari> well, both would be good too
2010-11-23 04:20:32 < Dark_Shikari> and yeah
2010-11-23 04:24:05 < rfw> 8f95149b2a1943930968f098c904d84ef4b33555 determined to be bad.
2010-11-23 04:24:06 < rfw> 8f95149b2a1943930968f098c904d84ef4b33555 eye am broken!
2010-11-23 04:24:08 < rfw> hoory
2010-11-23 04:24:11 < rfw> +a
2010-11-23 04:25:21 < Dark_Shikari> the strongest revision?
2010-11-23 04:26:16 < rfw> lol
2010-11-23 04:26:21 < rfw> glad you get the reference
2010-11-23 04:26:43 < Dark_Shikari> I get references, even really really stupid ones
2010-11-23 04:26:52 < Dark_Shikari> pun (fortunately) not intended
2010-11-23 04:26:59 < rfw> lol
2010-11-23 04:31:37 < saintdev> <fozzy bear>baka baka baka</fozzy>
2010-11-23 04:54:34 < Alex_W> so any news on the problem with the replicated blu-ray that kierank mentioned yesterday?
2010-11-23 04:57:21 < Dark_Shikari> ask him
2010-11-23 04:57:30 < Dark_Shikari> he had an interesting quote on it but I can't give it to you
2010-11-23 04:57:34 < Dark_Shikari> without ok from him
2010-11-23 04:57:45 < Dark_Shikari> (tl;dr: they don't think its x264's problem)
2010-11-23 05:02:39 < Alex_W> i see, so there are no known issues with the revision they're using that could cause any compatibility problems?
2010-11-23 05:03:52 < Dark_Shikari> don't know what revision
2010-11-23 05:03:57 < Dark_Shikari> but that's doubtufl
2010-11-23 05:04:00 < Dark_Shikari> they did massive testing
2010-11-23 05:04:03 < Dark_Shikari> on tons of boxes
2010-11-23 05:04:10 < Dark_Shikari> and the box that's having problems is also having subtitle issues
2010-11-23 05:04:14 < Dark_Shikari> so they think it isn't x264-related
2010-11-23 05:05:27 < Alex_W> right, so it could just be a faulty box
2010-11-23 05:09:53 < rfw> Dark_Shikari: http://pastebin.com/JxgRJ0kh
2010-11-23 05:10:02 < rfw> looking for something like this?
2010-11-23 05:11:40 < Dark_Shikari> except with revision #s, that looks really awesome
2010-11-23 05:11:46 < rfw> lol
2010-11-23 05:13:03 < Dark_Shikari> psnr test should use --tune psnr
2010-11-23 05:13:05 < Dark_Shikari> ssim test should use --tune ssim
2010-11-23 05:13:14 < Dark_Shikari> and we'll need some way to integrate "trying lots of options" into this
2010-11-23 05:15:41 < rfw> --tune psnr in the regression tester/
2010-11-23 05:15:44 < rfw> ?
2010-11-23 05:16:11 < Dark_Shikari> in the x264 options
2010-11-23 05:16:15 < rfw> ah
2010-11-23 05:16:20 < Dark_Shikari> when measuring psnr we use tune psnr (to tell x264 to optimize for psnr)
2010-11-23 05:16:27 < Dark_Shikari> psnr values aren't very useful if the encoder isn't optimizing for it
2010-11-23 05:17:29 < rfw> oh goddammit notepad++
2010-11-23 05:17:32 < rfw> clicked the wrong button
2010-11-23 05:18:15 < Dark_Shikari> hmm, there is one thing that bugs me
2010-11-23 05:18:19 < Dark_Shikari> "PSNR" is generally tested with a single set of settings
2010-11-23 05:18:27 < Dark_Shikari> i.e. I want to know "how psnr has changed over time given some set of settings"
2010-11-23 05:18:29 < Dark_Shikari> But...
2010-11-23 05:18:40 < Dark_Shikari> for regressions, we really just want to know whether it's working or not.
2010-11-23 05:18:49 < Dark_Shikari> so regressions conceptually cover all settings
2010-11-23 05:18:54 < Dark_Shikari> while a psnr or ssim test covers just one.,
2010-11-23 05:41:22 < wipple> Dark_Shikari: my first patch was replaced by kemuri-_9's?
2010-11-23 05:41:27 < Dark_Shikari> yes
2010-11-23 05:41:51 < wipple> ic
2010-11-23 05:42:03 < Dark_Shikari> your second one, the one that actually affects output, is still in =p
2010-11-23 05:43:39 < wipple> i have to fix second patch
2010-11-23 05:44:00 < Dark_Shikari> ah ok
2010-11-23 05:44:08 < Dark_Shikari> give me the new version and I'll update it
2010-11-23 05:55:43 < Jumpyshoes> hey Dark_Shikari, i thought about it some more, and wouldn't add4x4_idct basically need to be rewritten from scratch for mmx? mmx can only hold 64bits, so you need to have all the registers used to store all the data (if coef are 32bit)
2010-11-23 05:55:50 < Dark_Shikari> Yes
2010-11-23 05:55:55 < Dark_Shikari> It wouldn't be very *difficult*
2010-11-23 05:56:07 < Dark_Shikari> the approach would generally be the same
2010-11-23 05:56:12 < Dark_Shikari> but you certainly can't just template it outright
2010-11-23 05:56:16 < Jumpyshoes> ah
2010-11-23 05:56:20 < Dark_Shikari> If you want to make that your second asm function, you could
2010-11-23 05:56:23 < Jumpyshoes> i think i can for SSE though
2010-11-23 05:56:28 < Dark_Shikari> for the first, do see
2010-11-23 05:56:29 < Dark_Shikari> *sse
2010-11-23 05:56:40 < Jumpyshoes> okay
2010-11-23 06:00:59 < Jumpyshoes> what is pw_32?
2010-11-23 06:01:39 < Dark_Shikari> {32, 32, 32, 32...}
2010-11-23 06:01:43 < Dark_Shikari> words
2010-11-23 06:01:45 < Dark_Shikari> packed word 32
2010-11-23 06:01:48 < Dark_Shikari> see const-a.asm
2010-11-23 06:01:50 < Dark_Shikari> where they're declared
2010-11-23 06:01:53 < Jumpyshoes> ah
2010-11-23 06:06:15 < rfw> class factories -- ah the miracles of python metaprogramming
2010-11-23 06:06:43 < Dark_Shikari> factoryfactoryfactory
2010-11-23 06:07:40 < rfw> time to turn implement your clusterfuck permutations
2010-11-23 06:07:44 < rfw> :3
2010-11-23 06:07:59 < saintdev> just write it in brainfuck
2010-11-23 06:08:03 < rfw> this isn't going to look pretty though
2010-11-23 06:12:34 < nattofriends> SingletonFactorySingleton
2010-11-23 06:13:22 < rfw> that doesn't even make sense!
2010-11-23 06:13:49 < Jumpyshoes> error: instruction expected after label <-- what does this error mean?
2010-11-23 06:14:05 < Dark_Shikari> syntax error
2010-11-23 06:14:11 < Jumpyshoes> oh
2010-11-23 06:14:12 < Jumpyshoes> bleh
2010-11-23 06:14:41 < rfw> http://pastebin.com/X3x7Fb57
2010-11-23 06:15:16 < Dark_Shikari> rfw: it might be nicer to try to cover more options than mine, but without as exhaustive a search
2010-11-23 06:15:22 < Dark_Shikari> mine for example only tests CRF ratecontrol mode
2010-11-23 06:15:25 < Dark_Shikari> not ABR, CBR, 2-pass, etc
2010-11-23 06:16:04 < Dark_Shikari> for example, you can use a pseudorandom combination of options
2010-11-23 06:16:13 < rfw> TypeError: unbound method run() must be called with YUVOutputComparison_film___ultrafast instance as first argument (got nothing instead)
2010-11-23 06:16:14 < rfw> blah wtf
2010-11-23 06:16:38 < rfw> oh
2010-11-23 06:16:42 < rfw> forgot to instantiate the class
2010-11-23 06:16:43 < rfw> hurr hurr
2010-11-23 06:20:41 < Jumpyshoes> Dark_Shikari, if i write an implementation of an asm function, how do i test it?
2010-11-23 06:20:49 < Dark_Shikari> 1) declare it in the appropriate C header file
2010-11-23 06:20:58 < Dark_Shikari> 2) assign it to the appropriate function pointer (in your case, in common/dct.c)
2010-11-23 06:21:02 < Dark_Shikari> there's a big init function
2010-11-23 06:21:10 < Dark_Shikari> with stuff like dctf->myfunc = myfuncname_sse2;
2010-11-23 06:21:16 < Dark_Shikari> put it in the right place based on its type (sse2, etc)
2010-11-23 06:21:21 < Dark_Shikari> 3) make checkasm;./checkasm
2010-11-23 06:21:27 < Dark_Shikari> if yours is high bit depth, of course, make sure that:
2010-11-23 06:21:40 < Dark_Shikari> a) it's assigned under the #if high bit depth in common/dct.c
2010-11-23 06:21:44 < Dark_Shikari> b) you configured with high bit depth
2010-11-23 06:22:40 < Jumpyshoes> k
2010-11-23 06:32:40 < Jumpyshoes> oh wow, it compiled
2010-11-23 06:32:44 < Jumpyshoes> off to a good start
2010-11-23 06:32:54 < Jumpyshoes> segfault!
2010-11-23 06:32:58 < Jumpyshoes> lovel
2010-11-23 06:32:59 < Jumpyshoes> y
2010-11-23 06:38:51 < wipple> Dark_Shikari: i misunderstood, don't need to fix my second patch.
2010-11-23 06:38:56 < Dark_Shikari> ah ok
2010-11-23 06:39:40 < wipple> it might be better to fix commit message? i replaced avformat_license() with swscale_license()
2010-11-23 06:39:49 < wipple> but there in no mention
2010-11-23 06:39:53 < wipple> about it
2010-11-23 06:42:08 < Jumpyshoes> yay, no more segfaulting
2010-11-23 06:43:23 < wipple> *there _is_ no mention
2010-11-23 06:44:04 < Jumpyshoes> wait, why can i see pw_32, but not pw_64?
2010-11-23 06:45:29 < rfw> Dark_Shikari: cartesian product regression testing sure takes lolforever
2010-11-23 06:45:45 < rfw> i better implement that pseudorandom thing
2010-11-23 06:47:15 < Dark_Shikari> rfw: yeah, my thought is to define a massive set of parameters and some sane ranges
2010-11-23 06:47:27 < Dark_Shikari> and make your script take a seed + a number of tests
2010-11-23 06:47:31 < Dark_Shikari> and it randomly picks combinations thereof
2010-11-23 06:50:04 < rfw>     if randint(0, 4):
2010-11-23 06:50:04 < rfw>         YUVOutputComparison.test_jm = disabled("randomly disabled")(YUVOutputComparison.test_jm)
2010-11-23 06:50:05 < rfw>         YUVOutputComparison.test_ffmpeg = disabled("randomly disabled")(YUVOutputComparison.test_ffmpeg)
2010-11-23 06:50:05 < rfw> :3
2010-11-23 06:50:21 < Dark_Shikari> I don't mean random choice of tests
2010-11-23 06:50:24 < Dark_Shikari> I mean random choice of parameters =p
2010-11-23 06:50:30 < rfw> yeah
2010-11-23 06:50:34 < rfw> it is a random choice of parameters
2010-11-23 06:50:52 < Dark_Shikari> btw, the parameter array (and sane ranges) should be easily editable by us
2010-11-23 06:50:54 < Dark_Shikari> so we can change it later
2010-11-23 06:51:01 < rfw> http://pastebin.com/FRdfXbpp
2010-11-23 06:51:03 < Dark_Shikari> i.e. with new params, etc
2010-11-23 06:51:05 < rfw> it's sorta editable
2010-11-23 06:51:05 < rfw> lol
2010-11-23 06:51:20 < Dark_Shikari> It will need to be about 100 lines longer than that
2010-11-23 06:51:27 < Dark_Shikari> so that format will probably not be very readable
2010-11-23 06:51:34 < Dark_Shikari> i.e. you'll have to mentally match up the for index with the line number
2010-11-23 06:51:49 < Dark_Shikari> so if this was C, I'd want something like this
2010-11-23 06:51:52 < rfw> derp
2010-11-23 06:51:59 < Dark_Shikari> {"subme", 0, 10}
2010-11-23 06:52:09 < Dark_Shikari> makes sense?
2010-11-23 06:52:38 < rfw> not really no
2010-11-23 06:52:40 < rfw> lol
2010-11-23 06:52:53 < Dark_Shikari> in order to generate a commandline, for each of N parameters, you pick one of M values
2010-11-23 06:53:03 < Dark_Shikari> so for example, if your parameters are A and B
2010-11-23 06:53:06 < Dark_Shikari> and their min values are 0 and max are 10
2010-11-23 06:53:10 < rfw> ah
2010-11-23 06:53:10 < Dark_Shikari> you might do --A 5 --B 7
2010-11-23 06:53:13 < rfw> ah i get you
2010-11-23 06:53:14 < Dark_Shikari> or --A 2 --B 9
2010-11-23 06:53:37 < rfw> you're not giving those 4 points away that easily are you ;__;
2010-11-23 06:53:48 < Dark_Shikari> This isn't actually that hard, I imagine it's just a big array that you iterate over.
2010-11-23 06:54:08 < Dark_Shikari> If you make it extensible, you don't have to write the whole thing
2010-11-23 06:54:10 < Dark_Shikari> We can add it later.
2010-11-23 06:54:13 < rfw> yeah
2010-11-23 06:54:17 < rfw> it is extensible
2010-11-23 06:54:17 < rfw> lol
2010-11-23 06:54:21 < rfw> to the point it's silly
2010-11-23 06:54:30 < Dark_Shikari> A for loop like that is not very readable with 100 options
2010-11-23 06:54:45 < rfw> well
2010-11-23 06:54:46 < rfw> it is
2010-11-23 06:54:50 < rfw> you just add more elements to the array
2010-11-23 06:54:56 < rfw> i flattened your nested loop
2010-11-23 06:54:58 < rfw> into one loop
2010-11-23 06:55:07 < Dark_Shikari> for option in A,C,Q,Y,X,D,I,S,B,W,C
2010-11-23 06:55:08 < Dark_Shikari> 1
2010-11-23 06:55:08 < Dark_Shikari> 2
2010-11-23 06:55:08 < Dark_Shikari> 3
2010-11-23 06:55:08 < Dark_Shikari> 4
2010-11-23 06:55:11 < Dark_Shikari> 5
2010-11-23 06:55:13 < Dark_Shikari> 6
2010-11-23 06:55:16 < Dark_Shikari> 7
2010-11-23 06:55:18 < Dark_Shikari> 8
2010-11-23 06:55:21 < Dark_Shikari> 9
2010-11-23 06:55:24 < Dark_Shikari> which option goes with which line?
2010-11-23 06:55:26 < Dark_Shikari> fucked if I know!
2010-11-23 06:55:30 < rfw> um
2010-11-23 06:55:31 < rfw> what?
2010-11-23 06:55:35 < rfw> YUVOutputComparison.options = filter(None, reduce(operator.add, [ opt.split(" ") for opt in options ]))
2010-11-23 06:55:36 < Dark_Shikari> your for loop has all your option names on one line
2010-11-23 06:55:40 < Dark_Shikari> the one you pasted
2010-11-23 06:55:43 < rfw> oh
2010-11-23 06:55:44 < rfw> whoops
2010-11-23 06:55:46 < Dark_Shikari> and then it has one line for each option
2010-11-23 06:55:53 < Dark_Shikari> and mapping the two to each other is difficult visually
2010-11-23 06:56:00 < rfw> well
2010-11-23 06:56:02 < rfw> it's fixed now :p
2010-11-23 06:56:13 < Dark_Shikari> good =
2010-11-23 06:56:14 < Dark_Shikari> =p
2010-11-23 06:56:16 < rfw> http://pastebin.com/G0qeRJqZ
2010-11-23 06:56:32 < rfw> just add more parameters to the product function
2010-11-23 06:56:33 < rfw> lol
2010-11-23 06:56:53 < Dark_Shikari> ah, sweet
2010-11-23 06:57:03 < Dark_Shikari> what's the randint 0,4 for?
2010-11-23 06:57:07 < rfw> oh
2010-11-23 06:57:10 < rfw> random test selection
2010-11-23 06:57:14 < rfw> so you don't run all of them
2010-11-23 06:58:09 < Dark_Shikari> huh?
2010-11-23 06:58:27 < rfw> well
2010-11-23 06:58:38 < rfw> it generates a random number from 0-4
2010-11-23 06:58:48 < rfw> so there's a 1 in 5 chance of a test running
2010-11-23 06:59:01 < rfw> you can seed it of course
2010-11-23 06:59:03 < Dark_Shikari> ?  I don't see how that applies to the algorithm I mentioned
2010-11-23 06:59:04 < rfw> so it's consistent
2010-11-23 06:59:13 < Dark_Shikari> for any given test, you should randomly assemble a commandline
2010-11-23 06:59:13 < rfw> wait
2010-11-23 06:59:18 < Dark_Shikari> at no point in there is a "random chance of running a test"
2010-11-23 06:59:18 < rfw> yeah
2010-11-23 06:59:30 < rfw> isn't this more or less the same thing
2010-11-23 06:59:32 < Dark_Shikari> If you're going to iterate over all possible commandlines
2010-11-23 06:59:34 < Dark_Shikari> and only run some of them
2010-11-23 06:59:40 < Dark_Shikari> well, good luck when there are 10^100 possible commandlines
2010-11-23 06:59:50 < rfw> wait what
2010-11-23 06:59:52 < rfw> i'm confused now
2010-11-23 07:00:00 < Dark_Shikari> You will run N tests.
2010-11-23 07:00:07 < Dark_Shikari> In each test, you will randomly assemble a commandline.
2010-11-23 07:00:13 < Dark_Shikari> And test it.
2010-11-23 07:00:21 < Dark_Shikari> The randomness will be based on a user-supplied seed.
2010-11-23 07:00:25 < Dark_Shikari> The N will be user-supplied.
2010-11-23 07:00:30 < Jumpyshoes> Dark_Shikari: are there different constants for high bit?
2010-11-23 07:00:31 < rfw> but aren't i randomly assembling a command here
2010-11-23 07:00:38 < Dark_Shikari> No, you're iterating over all possible commandlines
2010-11-23 07:00:41 < Dark_Shikari> and only running some of them
2010-11-23 07:00:42 < Dark_Shikari> right?
2010-11-23 07:00:45 < rfw> yeah
2010-11-23 07:00:46 < rfw> isn't that
2010-11-23 07:00:50 < Dark_Shikari> Jumpyshoes: if you need a new one, make it
2010-11-23 07:00:51 < rfw> functionally equivalent?
2010-11-23 07:00:53 < Dark_Shikari> NO!
2010-11-23 07:00:56 < rfw> :(
2010-11-23 07:00:57 < Dark_Shikari> If there are 10^100 possible commandlines
2010-11-23 07:00:58 < Dark_Shikari> and I want 100 of them
2010-11-23 07:01:03 < rfw> oh
2010-11-23 07:01:06 < Dark_Shikari> your solution will take longer than the life of the universe!
2010-11-23 07:01:19 < rfw> derp
2010-11-23 07:01:27 < Jumpyshoes> i made one in the const-a.asm file
2010-11-23 07:01:33 < Dark_Shikari> Jumpyshoes: make sure there isn't already one
2010-11-23 07:01:35 < Dark_Shikari> there might be a pd_32
2010-11-23 07:01:38 < Jumpyshoes> but yasm can't see it for some reason
2010-11-23 07:01:39 < Jumpyshoes> oh
2010-11-23 07:01:46 < Dark_Shikari> you have to add it in your file
2010-11-23 07:01:48 < Dark_Shikari> extern pd_32
2010-11-23 07:01:53 < Dark_Shikari> see the top of your file
2010-11-23 07:01:56 < Jumpyshoes> aah
2010-11-23 07:02:07 < rfw> actually Dark_Shikari i don't see how
2010-11-23 07:02:10 < rfw> if i just randomized the options
2010-11-23 07:02:17 < rfw> then picked the first 100
2010-11-23 07:02:19 < Dark_Shikari> No
2010-11-23 07:02:24 < Dark_Shikari> that isn't what I asked for
2010-11-23 07:02:26 < Dark_Shikari> read what I said again
2010-11-23 07:02:30 < Dark_Shikari> let me write it in C, ok?
2010-11-23 07:02:34 < rfw> um okay
2010-11-23 07:02:37 < Dark_Shikari> for(int i = 0; i < numruns; i++) {
2010-11-23 07:02:55 < Dark_Shikari> String commandline = randomCommandline();
2010-11-23 07:03:02 < Dark_Shikari> test(commandline);
2010-11-23 07:03:03 < Dark_Shikari> }
2010-11-23 07:03:11 < Dark_Shikari> "randomcommandline" generates one, exactly one, random commandline.
2010-11-23 07:03:14 < Dark_Shikari> just one.  not zero, not two, one.
2010-11-23 07:03:50 < Dark_Shikari> you aren't shuffling all possible commandlines
2010-11-23 07:03:55 < Dark_Shikari> you are just generating one random one
2010-11-23 07:04:02 < Dark_Shikari> randomCommandline is as follows:
2010-11-23 07:04:07 < rfw> yes i get what you mean
2010-11-23 07:04:09 < Dark_Shikari> for all options in optionArray:
2010-11-23 07:04:26 < rfw> but what i'm saying is shuffle the individual subcommands
2010-11-23 07:04:28 < Dark_Shikari>     optionValue = random option in optionArray[option]
2010-11-23 07:04:30 < rfw> like [ "--tune %s" % t for t in ("film", "zerolatency") ]
2010-11-23 07:04:38 < rfw> wait never mind i'm being silly
2010-11-23 07:04:51 < rfw> yeah i see what you mean
2010-11-23 07:05:07 < rfw> i feel kinda dumb now ._.
2010-11-23 07:05:36 < Dark_Shikari> feeling dumb is fine, happens all the time
2010-11-23 07:05:41 < Dark_Shikari> just grep the commit logs for "10L"
2010-11-23 07:06:26 < Dark_Shikari> well, 1[0]*[l|L]
2010-11-23 07:12:23 < Dark_Shikari> so in terms of committing this regression tester of yours
2010-11-23 07:12:29 < Dark_Shikari> what about the actual lib it depends on which you said isn't finished?
2010-11-23 07:23:18 < rfw> it's part of my regression tester
2010-11-23 07:23:27 < rfw> sorry, had to do something
2010-11-23 07:24:27 < Dark_Shikari> no worries, not going to push you or anything
2010-11-23 07:27:08 < rfw> Dark_Shikari: http://pastebin.com/vybYV0VR
2010-11-23 07:27:16 < rfw> is this more along the lines of what you were looking for?
2010-11-23 07:27:30 < Dark_Shikari> yup, exactly
2010-11-23 07:27:33 < rfw> :D
2010-11-23 07:27:38 < Dark_Shikari> fyi I can't really read python very well
2010-11-23 07:27:42 < Dark_Shikari> I haven't used it in like 5 years
2010-11-23 07:27:42 < rfw> lol
2010-11-23 07:27:52 < Dark_Shikari> I last used python seriously for AI class in high school
2010-11-23 07:27:52 < rfw> it's more readable than perl
2010-11-23 07:28:05 < rfw> you know, the old elbows on keyboard joke
2010-11-23 07:28:36 < Dark_Shikari> yes yes
2010-11-23 07:29:39 < rfw> also, version numbers instead of commit hashes?
2010-11-23 07:29:47 < Dark_Shikari> see version.sh
2010-11-23 07:29:57 < Dark_Shikari> since x264 uses a completely linear git tree, version numbers are more usable than hashes
2010-11-23 07:30:09 < rfw> why did i just try to run it in cmd
2010-11-23 07:30:14 < Dark_Shikari> e.g. the current x264 is r1790
2010-11-23 07:32:30 < rfw> oh god i can't read bash
2010-11-23 07:33:46 < Dark_Shikari> You don't have to
2010-11-23 07:33:49 < Dark_Shikari> It just basically counts commits.
2010-11-23 07:33:49 < Dark_Shikari> lol
2010-11-23 07:34:01 < Dark_Shikari> pengvado could explain it though.
2010-11-23 07:34:13 < rfw> so why not just
2010-11-23 07:34:15 < rfw> git log --format=oneline | wc -l
2010-11-23 07:34:16 < rfw> :|
2010-11-23 07:34:36 < rfw> why the git rev-list origin/master | sort | join config.git-hash - | wc -l | awk '{print $1}'
2010-11-23 07:35:24 < Dark_Shikari> Because it does a bit of extra magic
2010-11-23 07:35:28 < Dark_Shikari> specifically, it does the following
2010-11-23 07:35:33 < Dark_Shikari> suppose latest x264 is 1790
2010-11-23 07:35:33 < dj_tjerk> origin/master doesn't make any sense if you do local regression testing does it?
2010-11-23 07:35:36 < Dark_Shikari> suppose I have 5 local commits
2010-11-23 07:35:41 < Dark_Shikari> it'll show my version as 1790+5
2010-11-23 07:35:47 < Dark_Shikari> suppose I also have a modified local tree when I do it
2010-11-23 07:35:50 < Dark_Shikari> it'll show my version as 1790+5M
2010-11-23 07:35:53 < rfw> oh
2010-11-23 07:35:53 < Dark_Shikari> or something like that
2010-11-23 07:35:58 < rfw> i see
2010-11-23 07:36:43 < Dark_Shikari> in your case, it might be useful to distinguish between local and otherwise
2010-11-23 07:36:52 < Dark_Shikari> e.g. if my bug is in revision 1790+4
2010-11-23 07:36:55 < Dark_Shikari> as opposed to 1794
2010-11-23 07:37:08 < rfw> ah
2010-11-23 07:37:10 < Dark_Shikari> but the latter might be fine too.
2010-11-23 07:39:13 < rfw> is there any reason why the bash script does a sort
2010-11-23 07:39:23 < rfw> oh nvm
2010-11-23 07:40:37 < Dark_Shikari> You'll probably want to cache these revision numbers at the start to avoid recalculating them repeatedly.
2010-11-23 07:40:43 < Dark_Shikari> i.e. the mapping of hash to revnum
2010-11-23 07:40:45 < Dark_Shikari> or similar
2010-11-23 07:40:51 < rfw> yeah
2010-11-23 07:53:22 < rfw> welp, now to write a routine to  convert it the other way
2010-11-23 07:58:19 < rfw> Testing project x264 at revision 1790+4M...
2010-11-23 08:01:43 < Dark_Shikari> You could just internally use hashes
2010-11-23 08:01:46 < Dark_Shikari> and convert only for display purposes
2010-11-23 08:01:49 < rfw> i do
2010-11-23 08:01:55 < Dark_Shikari> though I geuss you need to convert the other way for user input
2010-11-23 08:01:59 < Dark_Shikari> i.e. "I want to test revision X through Y"
2010-11-23 08:02:05 < rfw> yeah
2010-11-23 08:02:14 < Dark_Shikari> Though the most common thing for that isn't revision numbers
2010-11-23 08:02:17 < Dark_Shikari> but rather something like
2010-11-23 08:02:19 < Dark_Shikari> HEAD~10
2010-11-23 08:02:22 < Dark_Shikari> like in git parlance
2010-11-23 08:02:26 < Dark_Shikari> i.e. "I want to test the last 10 revisions"
2010-11-23 08:02:28 < rfw> safsagdshasd
2010-11-23 08:02:35 < rfw> TIME TO FIX
2010-11-23 08:02:46 < Dark_Shikari> anyways, let's get down how this interface will work
2010-11-23 08:02:57 < Dark_Shikari> so you don't keep changing things
2010-11-23 08:02:59 < Dark_Shikari> and doing unnecessary work
2010-11-23 08:03:05 < Dark_Shikari> if you have to use hashes, that's ok
2010-11-23 08:03:10 < rfw> nah it's fine
2010-11-23 08:03:24 < rfw> i'm just checking if ~ is in revnumber
2010-11-23 08:03:31 < rfw> though
2010-11-23 08:03:42 < rfw> then what if somebody tries to 1780+4~3
2010-11-23 08:03:57 < rfw> can i just assume nobody's going to do that
2010-11-23 08:05:19 < Dark_Shikari> lol
2010-11-23 08:05:20 < Dark_Shikari> probably
2010-11-23 08:05:24 < Dark_Shikari> yes you can assume people are not stupid
2010-11-23 08:05:41 < rfw> \o/
2010-11-23 08:06:44 < Dark_Shikari> this is for devs, not users
2010-11-23 08:06:48 < Dark_Shikari> it can have less tolerance for bad input
2010-11-23 08:07:01 < Dark_Shikari> though it should at least error out cleanly if you give it stupid shit
2010-11-23 08:07:29 < rfw> why the hell is windows not letting me delete the x264 folder
2010-11-23 08:07:44 < rfw> "you need the computer administrator's permissions to change this folder"
2010-11-23 08:07:49 < rfw> but i am the computer administrator :(
2010-11-23 08:11:06 < Dark_Shikari> try not having it open in a cygwin window
2010-11-23 08:11:37 < rfw> i don't
2010-11-23 08:11:42 < rfw> i don't even have ownership of the folder
2010-11-23 08:11:50 < rfw> wtf
2010-11-23 08:12:10 < Dark_Shikari> lol
2010-11-23 08:12:16 < Dark_Shikari> try removing it in cygwin
2010-11-23 08:12:31 < rfw> nothing
2010-11-23 08:12:43 < rfw> $ cd x264
2010-11-23 08:12:44 < rfw> bash: cd: x264: Permission denied
2010-11-23 08:12:45 < rfw> :(
2010-11-23 08:12:49 < rfw> can't even enter it
2010-11-23 08:13:08 < rfw> drwxr-x---  1 ????????       ????????    0 2010-11-23 21:05 x264
2010-11-23 08:13:39 < rfw> i can't even get group/owner information
2010-11-23 08:14:22 < nattofriends> restart! restart!
2010-11-23 08:14:44 < rfw> nevar!
2010-11-23 08:16:59 < Dark_Shikari> o.0
2010-11-23 08:20:51 < Rodeo> Dark_Shikari: FWIW, http://paste.handbrake.fr/pastebin.php?show=1883
2010-11-23 08:22:46 < Rodeo> not that the current behavior bother me (on the contrary), but if this the correct behavior, then you have a patch
2010-11-23 08:24:49 < Dark_Shikari> I dunno if that'll work
2010-11-23 08:25:09 < Dark_Shikari> I don't think the cplxr_sum stuff is used in 2-pass
2010-11-23 08:26:51 < Rodeo> well, without the patch, I get the final ratefactor for ./x264 infile -o outfile --pass 1 bitrate b
2010-11-23 08:26:57 < Rodeo> with the patch, it's not printed
2010-11-23 08:27:02 < Dark_Shikari> Is it *accurate*?
2010-11-23 08:27:03 < rfw> wow, the gci rankings haven't changed at all
2010-11-23 08:27:26 < Dark_Shikari> Rodeo: wait I'm confused
2010-11-23 08:27:29 < Dark_Shikari> what does your patch do?
2010-11-23 08:27:33 < Dark_Shikari> are you trying to print it in 2-pass mode?
2010-11-23 08:27:36 < Dark_Shikari> on the second pass?
2010-11-23 08:27:37 < Dark_Shikari> or what
2010-11-23 08:27:52 < Dark_Shikari> rfw: what, trying to win the grand prize? =p
2010-11-23 08:27:56 < Rodeo> currently, it's printed in the first pass of a 2-pass encode
2010-11-23 08:28:04 < Rodeo> with that patch, this no longer happens
2010-11-23 08:28:07 < rfw> i'm allowed to be ambitious, no :p
2010-11-23 08:28:19 < Dark_Shikari> Rodeo: er, that's removing a feature
2010-11-23 08:28:45 < Dark_Shikari> rfw: I assume the grand prize will be won by a very very dedicated nerd in a very small basement, taking massive numbers of overrated tasks.
2010-11-23 08:28:49 < Rodeo> OK, I thought that it wasn't supposed to do this
2010-11-23 08:28:52 < rfw> lol
2010-11-23 08:28:52 < Dark_Shikari> No, it is.
2010-11-23 08:28:57 < Rodeo> but if current behavior is OK, then ignore my patch
2010-11-23 08:29:05 < Dark_Shikari> Rodeo: I thought you wanted it in the second pass
2010-11-23 08:29:07 < Dark_Shikari> _that_ would be cool.
2010-11-23 08:29:08 < rfw> i don't want to make surveys or posters
2010-11-23 08:29:16 < Dark_Shikari> rfw: I lol at those tasks
2010-11-23 08:29:20 < Dark_Shikari> MAKE A POSTER FOR MY PROJECT
2010-11-23 08:29:24 < Dark_Shikari> Actually, here's an idea
2010-11-23 08:29:25 < Rodeo> gotta go, I'll see what I can do about the altter later
2010-11-23 08:29:25 < rfw> lolol
2010-11-23 08:29:28 < Dark_Shikari> outsource
2010-11-23 08:29:36 < Dark_Shikari> find the easiest projects which can be done for like $5
2010-11-23 08:29:38 < Dark_Shikari> and outsource them
2010-11-23 08:29:46 < Dark_Shikari> through some mechanical turk like website
2010-11-23 08:29:54 < Dark_Shikari> 2) ???
2010-11-23 08:29:55 < Dark_Shikari> 3) profit
2010-11-23 08:30:33 < rfw> oh god, i have to rethink my side-by-side comparison output
2010-11-23 08:30:38 < rfw> because of the lol fixtures
2010-11-23 08:31:07 < Dark_Shikari> I feel sorry for people trying to game GCI
2010-11-23 08:31:23 < Dark_Shikari> it's just so extraordinarily silly
2010-11-23 08:33:51 < Dark_Shikari> Let's hope pengvado can read python to review your code =p
2010-11-23 08:34:38 < rfw> is there no more ./configure?
2010-11-23 08:35:12 < rfw> either that or something is very wrong
2010-11-23 08:35:15 < Dark_Shikari> o.0
2010-11-23 08:35:30 < rfw> where did it go
2010-11-23 08:36:58 < rfw> oh
2010-11-23 08:37:01 < rfw> there we go
2010-11-23 08:37:52 < rfw> i had my revision list upside down
2010-11-23 08:48:41 < Kovensky> 05:13.39 rfw: i can't even get group/owner information <-- broken ACLs?
2010-11-23 08:48:55 < rfw> Kovensky: looks like it
2010-11-23 08:48:59 < rfw> super-broken ACLs
2010-11-23 08:49:10 < Kovensky> try chown yourusername:Administrator -R /path/to/folder && chmod 777 -R /path/to/folder
2010-11-23 08:49:25 < Kovensky> if that doesn't work, you may need to chkdsk
2010-11-23 08:49:31 < nattofriends> takeown
2010-11-23 08:49:35 < Kovensky> Administrators*
2010-11-23 08:49:36 < rfw> nattofriends: did that
2010-11-23 08:49:44 < rfw> chown: cannot read directory `../x264': Permission denied
2010-11-23 08:49:51 < rfw> ;_;
2010-11-23 08:49:58 < nattofriends> can copy folder?
2010-11-23 08:50:01 < rfw> nope
2010-11-23 08:50:26 < nattofriends> drop to a bootcd with ntfs mounting, try copying it to FAT
2010-11-23 08:50:34 < nattofriends> then reboot and copy elsewhere?
2010-11-23 08:51:09 < Kovensky> the bootcd will just give more errors
2010-11-23 08:51:20 < Kovensky> run a chkdsk on readonly mode, if it whines you found your culprit
2010-11-23 08:51:29 < Dark_Shikari> Aren't there windows tools to override ACLs?
2010-11-23 08:52:09 < Kovensky> Dark_Shikari: they're not working
2010-11-23 08:52:21 < Kovensky> which is why I assume the ACLs are completely broken there
2010-11-23 09:09:55 < rfw> well then, i think i've finished
2010-11-23 09:10:02 < rfw> oh wait
2010-11-23 09:10:09 < rfw> still haven't done that macroblock frame calculation thing
2010-11-23 09:10:10 < rfw> hnngh
2010-11-23 09:10:15 < Dark_Shikari> that's not too hard =p
2010-11-23 09:10:17 < Dark_Shikari> that's like 3 lines of code
2010-11-23 09:10:23 < Dark_Shikari> though you have to know the video width and height to do that.
2010-11-23 09:10:31 < rfw> but i'm tired
2010-11-23 09:10:31 < rfw> D:
2010-11-23 09:10:46 < rfw> also do you mind if i commit my library to github
2010-11-23 09:10:57 < rfw> since i'm paranoid
2010-11-23 09:11:02 < rfw> about my computer exploding
2010-11-23 09:11:04 < rfw> while i sleep
2010-11-23 09:25:20 < J_Darnley> What a lot of discussion where was overnight.  It nearly ran off the top of my buffer
2010-11-23 09:25:31 < tjoener> indeed J_Darnley
2010-11-23 09:25:34 < tjoener> just read them too
2010-11-23 09:26:22 < Kovensky> means your buffer is too small :>
2010-11-23 09:27:28 < Dark_Shikari> pengvado: can you explain cbr_decay?
2010-11-23 09:28:11 < Dark_Shikari> and specifically the formula used to calculate it
2010-11-23 09:28:18 < Dark_Shikari> and what "1.0" vs "0.0" etc would mean
2010-11-23 09:29:46 < Dark_Shikari> and why is printing of ratefactor disabled with high cbr decay?
2010-11-23 09:29:59 < Dark_Shikari> high cbr decay can happen with e.g. --vbv-bufsize 30000 --vbv-maxrate 40000 --bitrate 4000
2010-11-23 09:32:35 < Rodeo> and low cbr decay when vbv maxrate and/or bufsize are closer to the avg. bitrate
2010-11-23 09:33:25 < Rodeo> I used these high values in my test, but I think I'd have gotten the same results with no vbv at all
2010-11-23 09:37:31 < Dark_Shikari> Oh, I see
2010-11-23 09:37:40 < Dark_Shikari> 1.0 --> no cbr decay
2010-11-23 09:37:47 < Dark_Shikari> less than 1.0 --> lots of cbr decay
2010-11-23 09:38:01 < Dark_Shikari> ratefactor is only printed if there's no cbr decay, or nearso
2010-11-23 09:38:23 < Dark_Shikari> there's no cbr decay in my example (1.0) because vbv is much larger than bitrate.
2010-11-23 09:44:31 < Rodeo> that was my guess
2010-11-23 09:44:58 < Rodeo> doesn't explain what cbr decay actually is though
2010-11-23 09:45:09 < Rodeo> though it probably makes more sense to you
2010-11-23 09:45:15 < Dark_Shikari> rc->cplxr_sum *= rc->cbr_decay;
2010-11-23 09:45:17 < Dark_Shikari> rc->wanted_bits_window *= rc->cbr_decay;
2010-11-23 09:45:24 < Dark_Shikari> it's how fast the ABR state decays over time
2010-11-23 09:45:26 < Dark_Shikari> 1.0 has no effect
2010-11-23 09:45:30 < Dark_Shikari> 0.0 means it immediately decays
2010-11-23 09:45:36 < Dark_Shikari> in CBR, you want it to decay fast
2010-11-23 09:45:44 < Dark_Shikari> in ABR with no VBV, you don't want it to decay at all
2010-11-23 09:45:52 < Dark_Shikari> so its a measure of a guess at the effect of CBR on the ABR algorithm
2010-11-23 09:48:13 < Rodeo> so there's no issue then
2010-11-23 09:48:35 < Dark_Shikari> well, it might be useful to have ratefactor if cbr_decay is < 1
2010-11-23 09:48:39 < Dark_Shikari> I don't know if it'd be meaningful or not
2010-11-23 11:47:46 < Alex_W> Dark_Shikari: I just saw your post on doom9 about using weightp with 1 ref for blu-ray compatibility, is it confirmed then that dupes are the problem on mediatek chipsets?
2010-11-23 12:01:31 < j-b> Dark_Shikari: http://socghop.appspot.com/gci/task/show/google/gci2010/videolan/t129045717568
2010-11-23 12:02:46 < kierank> j-b: that guy already failed ;)
2010-11-23 12:06:13 < kierank> melange is awful
2010-11-23 12:11:45 < Dark_Shikari> Alex_W: yes
2010-11-23 12:11:56 < Dark_Shikari> j-b: what?
2010-11-23 12:12:04 < Dark_Shikari> 4032 hours lol?
2010-11-23 12:13:07 < Alex_W> ok, then would you accept a patch that disables dupes? maybe something like --weightp bluray?
2010-11-23 12:13:37 < Dark_Shikari> I don't like the idea of actively supporting broken players
2010-11-23 12:13:42 < Dark_Shikari> it creates more fragmentation
2010-11-23 12:13:58 < JEEB> Alex_W, wipple wrote something similar IIRC
2010-11-23 12:14:09 < JEEB> but yeah, gotta go by D_S's opinion here
2010-11-23 12:14:15 < Dark_Shikari> I think we could have a patch, maybe
2010-11-23 12:14:20 < Dark_Shikari> but committed?  probably not
2010-11-23 12:15:12 < Alex_W> the problem is that replicators probably won't use weightp at all because of this, imo a slightly less useful weightp without dupes is better than no weightp at all...
2010-11-23 12:15:47 < Alex_W> also we already have an --open-gop bluray, how is this any different?
2010-11-23 12:16:25 < JEEB> because it's in the spec :|
2010-11-23 12:16:56 < JEEB> normal weightp is completely within the spec and most players play it fine
2010-11-23 12:17:09 < JEEB> (as it is in the spec AFAICS)
2010-11-23 12:18:02 < wipple> i wrote this patch before ---> http://cccp.project357.com/p/f3b5731a7
2010-11-23 12:18:14 < Dark_Shikari> Alex_W: also, at very low quants, weightp is less useful
2010-11-23 12:18:17 < wipple> and rejected by Dark_Shikari
2010-11-23 12:18:40 < Alex_W> the problem is that the spec probably doesn't really matter to replicators, the only thing that's going to matter to them is whether real world players will actually play the discs without any problems
2010-11-23 12:18:55 < j-b> Dark_Shikari: I believe you should approve it
2010-11-23 12:19:17 < Dark_Shikari> j-b: really, for someone who didn't follow instructions ands how up?
2010-11-23 12:19:20 < Dark_Shikari> I guess I can hope they do after.
2010-11-23 12:19:44 < j-b> Dark_Shikari: then reject it
2010-11-23 12:19:53 < Dark_Shikari> well, they might be waiting for acceptance
2010-11-23 12:19:55 < Dark_Shikari> and then show up
2010-11-23 12:19:59 < Dark_Shikari> so whatever I'll accept can't hurt
2010-11-23 12:20:06 < j-b> melange is awful
2010-11-23 12:22:47 < Alex_W> Dark_Shikari: wipple's patch seems like it should be fine to me, it could be interesting to be able to disable dupes in ordinary encodes anyway
2010-11-23 12:23:37 < Dark_Shikari> dunno
2010-11-23 12:24:07 < Dark_Shikari> we could add a b_bluray mode to enable stupid blu-ray hacks
2010-11-23 12:24:07 < JEEB> is it explicitly dupes that all of those players have/had problems with?
2010-11-23 12:24:16 < Dark_Shikari> there's only one chipset with a bug
2010-11-23 12:24:17 < Dark_Shikari> mediatek
2010-11-23 12:24:20 < Dark_Shikari> and it's fixed in a firmware update
2010-11-23 12:24:46 < Alex_W> b_bluray would be the same as --device bluray right?
2010-11-23 12:25:41 < Dark_Shikari> no
2010-11-23 12:25:55 < Dark_Shikari> it'd be for stupid shit like the max frame size limitation we stuck in
2010-11-23 12:26:00 < Dark_Shikari> it's currently under if( hrd && level 4.1)
2010-11-23 12:26:22 < Alex_W> what frame size limitation is that?
2010-11-23 12:26:37 < Dark_Shikari> mincr
2010-11-23 12:26:41 < Dark_Shikari>         /* Blu-ray requires this */
2010-11-23 12:26:41 < Dark_Shikari>         if( l->level_idc == 41 && h->param.i_nal_hrd )
2010-11-23 12:26:41 < Dark_Shikari>             mincr = 4;
2010-11-23 12:26:58 < Kovensky> 09:13.37 Dark_Shikari: I don't like the idea of actively supporting broken players <-- shifting the blame? :)
2010-11-23 12:27:14 < JEEB> So that problem was with dupes? I still don't get it, I get it's a single chipset but was the problem explicitly with dupes regarding weightp?
2010-11-23 12:27:15 < Dark_Shikari> it's my fault that mediatek fucked up?
2010-11-23 12:27:18 < Dark_Shikari> yes
2010-11-23 12:27:21 < Dark_Shikari> it's a single chipset
2010-11-23 12:27:22 < JEEB> ok
2010-11-23 12:27:25 < Dark_Shikari> which has ALREADY BEEN FIXED
2010-11-23 12:27:29 < JEEB> :)
2010-11-23 12:27:29 < Dark_Shikari> and which people need to stop bitching about
2010-11-23 12:27:34 < JEEB> Indeed
2010-11-23 12:28:00 < Alex_W> it's only fixed if people actually bother to update their firmware though :/
2010-11-23 12:28:10 < JEEB> I wish you could update the firmware through blu-ray discs
2010-11-23 12:28:23 < JEEB> "To watch this movie you need to update your player bleh bleh bleh"
2010-11-23 12:28:41 < Kovensky> "And if the power fail or otherwise stuff goes wrong you'll have yourself a heavy brick"
2010-11-23 12:28:42 < kierank> JEEB: well that implies they have a decent internet connection at hand
2010-11-23 12:28:43 < Dark_Shikari> Alex_W: discs already come with instructions on how to upgrade firmware
2010-11-23 12:28:45 < Kovensky> +s
2010-11-23 12:28:52 < Alex_W> ok, so apart from the mincr thing and dupes are there any other stupid hacks that blu-ray requires?
2010-11-23 12:29:10 < Dark_Shikari> not that aren't encapsulated in other options
2010-11-23 12:29:20 < JEEB> kierank, I mean having the updates for those crappy pieces of shit on the disc... :| But I guess that's impossible.
2010-11-23 12:29:41 < Kovensky> Dark_Shikari: the opengop one? I mean, the bluray suboption is specifically a bluray hack, isn't it
2010-11-23 12:29:46 < Dark_Shikari> oh, that's true.
2010-11-23 12:29:59 < Alex_W> ok so that too then
2010-11-23 12:30:14 < Dark_Shikari> ok, proposal
2010-11-23 12:30:22 < Dark_Shikari> replace weightp 1 with weightp without dupes.
2010-11-23 12:30:32 < Dark_Shikari> that is, 1 becomes SMART minus dupes
2010-11-23 12:30:38 < Dark_Shikari> instead of SMART minus analysis
2010-11-23 12:30:45 < JEEB> sounds good 'nuff
2010-11-23 12:30:46 < Dark_Shikari> I will commit that patch
2010-11-23 12:31:25 < Alex_W> ok that sounds reasonable
2010-11-23 12:31:42 < Dark_Shikari> this is in part because analysis is really faster than we expected it to be
2010-11-23 12:31:50 < Dark_Shikari> I'll have to retune the presets but that's not too hard
2010-11-23 12:31:53 < Dark_Shikari> just give me a patch and I'll do the rest
2010-11-23 12:32:18  * Alex_W goes to look at the code
2010-11-23 12:32:38 < Dark_Shikari> NB: grep for all places where we check against WEIGHTP_SMART
2010-11-23 12:32:43 < Dark_Shikari> most of those, we'll have to just check for weightp
2010-11-23 12:32:49 < Dark_Shikari> also note WEIGHTP_FAKE as an internal value
2010-11-23 12:32:53 < Dark_Shikari> (for psy + weightp = 0)
2010-11-23 12:33:08 < Dark_Shikari> and we can replace BLIND with WEIGHTP_SIMPLE or something
2010-11-23 12:37:49 < Alex_W> so WEIGHTP_SIMPLE would be weightp without dupes and WEIGHTP_SMART would be weightp + dupes?
2010-11-23 12:38:13 < pengvado> Dark_Shikari: you got your explanation of cbr_decay?
2010-11-23 12:38:14 < pengvado> we don't print rate_factor in cbr mode, because it would be the rate_factor of just the last few seconds of the movie
2010-11-23 12:39:05 < Dark_Shikari> what about ABR with some VBV?
2010-11-23 12:39:31 < Dark_Shikari> I guess given how cbr_decay works, it wouldn't work.
2010-11-23 12:39:34 < Dark_Shikari> you'd have to somehow recalculate it.
2010-11-23 12:39:39 < Dark_Shikari> sorta makes sense.
2010-11-23 12:40:41 < pengvado> the soft threshold of "cbr mode is when vbv-maxrate < 1.5*bitrate" is not at all tested, but yes
2010-11-23 12:46:48 < Alex_W> why does WEIGHTP_SMART add 2 dupes atm?
2010-11-23 12:47:10 < Dark_Shikari> the normal blind dupe, plus the smart dupe
2010-11-23 12:47:23 < Dark_Shikari> that is, there are three copies of a given frame:
2010-11-23 12:47:28 < Dark_Shikari> OPTIMAL (optimal weight)
2010-11-23 12:47:32 < Dark_Shikari> OPTIMAL-1 (blind dupe of optimal)
2010-11-23 12:47:35 < Dark_Shikari> ORIGINAL (original frame)
2010-11-23 12:47:43 < Dark_Shikari> if OPTIMAL and ORIGINAL are the same, ORIGINAL is omitted.
2010-11-23 12:47:46 < Dark_Shikari> i.e. if there's no weight
2010-11-23 12:47:58 < Dark_Shikari> having both optimal and original is very useful if part of the frame isn't fading in
2010-11-23 12:48:04 < Dark_Shikari> or is otherwise useful unweighted
2010-11-23 12:49:01 < Alex_W> well i'll have to change that so that optimal replaces original for the new weightp 1 right?
2010-11-23 12:49:13 < Dark_Shikari> no, ORIGINAL is a dupe
2010-11-23 12:49:16 < Dark_Shikari> optimal already does replace original
2010-11-23 12:49:18 < Dark_Shikari> i.e. ref0 is optimal
2010-11-23 12:49:23 < Dark_Shikari> all you have to do is _stop_ the creation of dupes
2010-11-23 12:49:26 < Alex_W> ah
2010-11-23 12:49:29 < Dark_Shikari> there's no new code to be added
2010-11-23 12:51:28 < Alex_W> lines 242 and 243 in common/macroblock.c can be removed then right since WEIGHTP_SIMPLE won't add any dupes?
2010-11-23 12:51:54 < Dark_Shikari> yes
2010-11-23 12:52:00 < Alex_W> k
2010-11-23 12:52:20 < Dark_Shikari> also we could just rename them
2010-11-23 12:52:22 < Dark_Shikari> to 0, 1, 2, 3
2010-11-23 12:52:24 < Dark_Shikari> 1 == fast
2010-11-23 12:52:25 < Dark_Shikari> 2 == medium
2010-11-23 12:52:28 < Dark_Shikari> 3 == slow (I'l do this)
2010-11-23 12:52:35 < Dark_Shikari> anyways, for now, just do your part and I'll futz with the rest later
2010-11-23 12:52:41 < Dark_Shikari> brb sleep
2010-11-23 12:53:16 < Alex_W> what about > 8 bit, do we want dupes in WEIGHTP_SMART?
2010-11-23 12:54:22 < Dark_Shikari> not the -1 dupe
2010-11-23 12:54:25 < Dark_Shikari> yes the original vs optimal
2010-11-23 12:54:45 < Alex_W> k
2010-11-23 15:27:56  * koda|work pings Dark_Shikari
2010-11-23 17:39:47 < reid_> Is this the channel for Google Code-In?
2010-11-23 17:41:16 < kierank> yes
2010-11-23 17:41:31 < kierank> reid_: yes
2010-11-23 17:42:48 < reid_> I am looking at completing one of the 'C to assembly function' problems and it said to come here first
2010-11-23 17:45:31 < kierank> reid_: how much assembly do you know?
2010-11-23 17:48:58 < reid_> I know how the language works, but I would  have to look up most commands. I wanted to look at the code before making a decision to see If it was out of my skill level.
2010-11-23 17:49:49 < reid_> Is there a link to the specific methods in question?
2010-11-23 17:51:31 < kierank> reid_: Dark_Shikari will teach you when he wakes up
2010-11-23 17:52:06 < callahan> reid_: Basically they are the 10 bit functions that are slow when you run the checkasm program.
2010-11-23 17:52:22 < callahan> Although I couldn't tell you which ones other people are working on.
2010-11-23 17:52:57 < callahan> Or that it matters.  Step 1 would be to get the code and compile it for 10 bit, then run checkasm.
2010-11-23 17:53:08 < callahan> While you wait for D_S.
2010-11-23 17:54:43 < reid_> where is the code? There is no reference to it on the task page.
2010-11-23 17:55:17 < callahan> git clone git://git.videolan.org/x264.git
2010-11-23 17:55:56 < callahan> Will check you out a copy, assuming that you have the git program installed.
2010-11-23 17:56:10 < callahan> If not start by googling up and installing git.
2010-11-23 17:56:27 < jarod> what is the main x264 google code in website?
2010-11-23 17:57:07 < reid_> is it an x86 instruction set?
2010-11-23 17:57:30 < callahan> git is a revision control system
2010-11-23 17:57:37 < callahan> for managing code bases
2010-11-23 17:59:04 < jarod> reid_ what link did you use to get here? :P
2010-11-23 18:01:07 < reid_> From the GCI task list on google-melange.com, it said it is required to come here before claiming the task.
2010-11-23 18:01:35 < kierank> reid_: what OS are you on?
2010-11-23 18:01:41 < jarod> yes that good, i just couldn't mind it :)
2010-11-23 18:03:30 < reid_> Right now i'm on an Ubuntu variant.
2010-11-23 18:05:13 < kierank> reid_: then download the git package, open a terminal and run the command callahan posted
2010-11-23 18:08:27 < reid_> Ok it completed.
2010-11-23 18:09:56 < callahan> the source will be in the x264 directory
2010-11-23 18:10:18 < holger_> and the asm will be in x264/common/x86
2010-11-23 18:11:10 < callahan> I'm not sure what the configure command is to get checkasm and 10 bit going.  Anyone else?
2010-11-23 18:12:14 < holger_> ./configure --bit-depth=10 ; make ; make checkasm
2010-11-23 18:12:32 < callahan> Yeah that, except use && instead of ;
2010-11-23 18:12:39 < callahan> in case one fails :)
2010-11-23 18:12:46 < jarod> \doc
2010-11-23 18:12:47 < jarod> regression_test.txt
2010-11-23 18:12:52 < jarod> svn co svn://svn.videolan.org/x264/trunk x264
2010-11-23 18:12:54 < jarod> =)
2010-11-23 18:14:49 < callahan> reid_: anyway, do the holger_ commands to compile it.
2010-11-23 18:14:53 < holger_> i'm currently wondering which of the .asm files would be the least scary for a noob to look at. can't decide. they all seem to have parts that are going to be way over the head for someone who's pretty new to asm.
2010-11-23 18:15:36 < reid_> so what exactly am I doing with this code?
2010-11-23 18:15:52 < callahan> Shikari has been recommending people try either one of two things.  Pick a missing 10 bit asm function and implement it, or make random changes to an existing function and see if it's faster.
2010-11-23 18:16:10 < reid_> Am I just translating a few functions into c?
2010-11-23 18:16:19 < callahan> reid_: You're making it faster, either by implementing a missing function or making one of the existing ones faster.
2010-11-23 18:17:03 < holger_> erm. you're going to translate c into asm. or optimize existing asm.
2010-11-23 18:17:34 < callahan> reid_: Best would be to get it to compile and poke around while you wait for Dark_Shikari to get back.
2010-11-23 18:17:43 < irock> personally I though sad was easiest to understand and write; it was the first code I implemented for high depth
2010-11-23 18:18:49 < callahan> get checkasm running first, that gives you a profile of the existing asm functions.
2010-11-23 18:20:56 < holger_> configure for 10 bit, build, run ./checkasm --bench. configure for 8 bit, do the same. compare the output. the differences tell you what's missing.
2010-11-23 18:22:22 < reid_> so checkasm will tell you how efficient your code is?
2010-11-23 18:22:53 < callahan> A good estimate, yes
2010-11-23 18:22:56 < holger_> it measures runtime. for the c reference and the asm routines (often there is more than one, for different instruction sets)
2010-11-23 18:23:39 < holger_> so pick a routine that exists for 8 bit but not 10 bit, look it up in the source and see if you can understand how it works.
2010-11-23 18:25:08 < reid_> where can I get checkasm. Do I need a specific repository?
2010-11-23 18:25:23 < holger_> you already have it. it's in the tools directory. make checkasm builds it.
2010-11-23 18:25:54 < callahan> It's an x264 specific regression test.
2010-11-23 18:26:20 < reid_> Oh wow sorry, I thought it was a tool.
2010-11-23 18:26:27 < holger_> yup that too. if you work on optimizing asm, it's good to have a way of telling if you broke it.
2010-11-23 18:29:13 < reid_> Am I supposed to get tons of errors?
2010-11-23 18:29:16 < callahan> no
2010-11-23 18:29:17 < irock> no
2010-11-23 18:29:21 < holger_> if you get stuck trying to understand the asm, dark_shikari is probably happy to give you his crash course to x264 asm.
2010-11-23 18:29:55 < irock> reid_: make sure do a make clean before/after you reconfigure for another bit depth.
2010-11-23 18:30:30 < reid_> I pointed my terminal over to /x264/tools and ran 'make checkasm'
2010-11-23 18:30:30 < holger_> oh and you want to install yasm if you don't have it already
2010-11-23 18:30:45 < irock> you should be standing in x264 root
2010-11-23 18:30:45 < callahan> nah, make checkasm from the x264 directory
2010-11-23 18:31:00 < callahan> also make sure you ran the configure command listed above first
2010-11-23 18:31:17 < callahan> ./configure && make && make checkasm
2010-11-23 18:31:21 < callahan> will get you the 8 bit code
2010-11-23 18:37:28 < reid_> Ok I got it. did that just compile the whole project?
2010-11-23 18:37:34 < irock> yep
2010-11-23 18:38:10 < irock> now you have two binaries, x264 and checkasm
2010-11-23 18:38:28 < reid_> This is true.
2010-11-23 18:38:30 < irock> if you run ./checkasm --bench the aforementioned benchmark will be created
2010-11-23 18:39:03 < irock> tip: pipe it to a file for review
2010-11-23 18:42:55 < reid_> It gave me a big long list of what I assume to be the ASM method name followed by a number representing how long it took?
2010-11-23 18:43:04 < irock> correct
2010-11-23 18:43:30 < irock> the measured number of cycles it took to be exact iirc
2010-11-23 18:44:01 < irock> as you can see, the mxx/sse2/ssse3/... are faster than their C equivalents
2010-11-23 18:45:15 < irock> now, if you run `make clean && ./configure --bit-depth=10 && make && make checkasm` you end up with two new binaries
2010-11-23 18:46:51 < reid_> Ok it's running.
2010-11-23 18:47:17 < reid_> So what are you looking for to consider the task completed?
2010-11-23 18:48:24 < irock> if you compare the output from 8-bit (first run) checkasm and the output of 10-bit (second run) checkasm you'll see that there are missing some functions in 10-bit mode
2010-11-23 18:48:49 < irock> quite a lot of asm optimized functions actually
2010-11-23 18:49:24 < irock> so what we're looking for is you creating one function that doesn't exist yet for 10-bit
2010-11-23 18:50:00 < irock> OR rewrite an existing one for either 8-bit or 10-bit to make it faster
2010-11-23 18:50:00 < reid_> A method that exists in c but not in asm, for 10 bet mode?
2010-11-23 18:50:06 < irock> yes
2010-11-23 18:51:30 < irock> but, rewriting a 8-bit asm function is probably very hard, 10-bit version maybe not so (I wrote that, and I started learning asm in july)
2010-11-23 18:52:13 < reid_> So once I complete my method do I upload the source file containing the method I added/optimized and tell you which one it is?
2010-11-23 18:52:56 < irock> well, I think we could help you get started if you choose one and let us know to begin with
2010-11-23 18:54:12 < irock> however, I suggest that you wait an hour or two for Dark_Shikari to wake up. he'll teach you the basics of x264 asm
2010-11-23 18:54:55 < reid_> Ok I'll monitor the channel.
2010-11-23 18:56:04 < reid_> I just requested the task, thanks for all the help guys.
2010-11-23 19:03:16 < lnandor> hello everyone!
2010-11-23 19:45:44 < rfw> i think i'm done
2010-11-23 19:45:47 < rfw> who do i give this to
2010-11-23 19:49:00 < reid_> Is anyone from VideoLAN on here?
2010-11-23 19:50:19 < BugMaster> reid_: try ping j-b
2010-11-23 19:50:49 < dj_tjerk> pastebin for review?
2010-11-23 19:50:58 < reid_> ping j-b
2010-11-23 19:51:46 < reid_> Who is in charge of accepting a claim request?
2010-11-23 19:52:06 < Jumpyshoes> Dark_Shikari as far as i'm aware
2010-11-23 19:52:17 < reid_> Is he up yet?
2010-11-23 19:54:22 < reid_> Jumpyshoes, are you part of the videoLAN team?
2010-11-23 19:54:25 < nattofriends> does Dark_Shikari sleep?
2010-11-23 19:54:35 < Jumpyshoes> reid_: no
2010-11-23 19:55:00 < rfw> sure is GCI in here
2010-11-23 19:55:30 < Jumpyshoes> i actually need to talk to Dark_Shikari myself
2010-11-23 19:55:34 < reid_> I think it's pretty much only GCI in here.
2010-11-23 19:55:34 < Jumpyshoes> since i'm failing at this asm thing
2010-11-23 19:55:40 < rfw> <-- GCI
2010-11-23 19:55:43 < rfw> so is Jumpyshoes
2010-11-23 19:55:44 < rfw> lol
2010-11-23 19:55:58 < reid_> jumpyshoes, yea me too.
2010-11-23 19:56:09 < mferrell> are you guys doing the code-in thing?
2010-11-23 19:56:13 < rfw> yeah
2010-11-23 19:56:37 < reid_> Jumpyshoes, are you writing the 10 bit asm functions?
2010-11-23 19:56:47 < Jumpyshoes> yes
2010-11-23 19:56:49 < Jumpyshoes> i've gotten a few macros down
2010-11-23 19:56:58 < rfw> this makes me glad i didn't choose to do the ASM thing
2010-11-23 19:57:20 < Jumpyshoes> it's fucking with my mind
2010-11-23 19:57:42 < reid_> I havent found any that are written only in c and not asm.
2010-11-23 19:58:12 < irock> Jumpyshoes: what function are you working on?
2010-11-23 19:58:31 < Jumpyshoes> add4x4_idct or w/e
2010-11-23 19:58:41 < Jumpyshoes> add4x4_idct, yea
2010-11-23 19:58:51 < Jumpyshoes> which comes along with 15 different macros
2010-11-23 19:59:03 < reid_> where are you looking?
2010-11-23 19:59:18 < Jumpyshoes> what do you mean by that?
2010-11-23 19:59:43 < reid_> are you using checkasm?
2010-11-23 20:00:00 < Jumpyshoes> yea
2010-11-23 20:00:33 < irock> Jumpyshoes: do you have a paste of your current work?
2010-11-23 20:00:49 < reid_> are you looking through the list it dumps out?
2010-11-23 20:00:56 < Jumpyshoes> my current work spans across about 5 files, but i have the current ASM function i'm working on
2010-11-23 20:01:07 < Jumpyshoes> i can also point out what the problem is (probably)
2010-11-23 20:01:26 < irock> Jumpyshoes: or run git diff
2010-11-23 20:01:43 < Jumpyshoes> currently, the code is a mess though, since i've been mucking around with it
2010-11-23 20:01:54 < irock> ok, I remember that :) np
2010-11-23 20:02:14 < Jumpyshoes> hrm?
2010-11-23 20:02:54 < irock> I remember how it was when I wrote my first asm this summer
2010-11-23 20:03:10 < Jumpyshoes> oh
2010-11-23 20:03:14 < Jumpyshoes> it's terrible
2010-11-23 20:03:32 < Jumpyshoes> i'm not actually sure what something in the original function is doing
2010-11-23 20:03:39 < Jumpyshoes> so i was planning on asking someone about it
2010-11-23 20:03:44 < irock> good idea
2010-11-23 20:06:25 < irock> well, start asking is good to begin with
2010-11-23 20:06:45 < Jumpyshoes> i was planning on waiting till D_S got back
2010-11-23 20:07:21 < irock> alright
2010-11-23 20:07:24 < Jumpyshoes> unless you feel like helping me
2010-11-23 20:07:53 < rfw> Jumpyshoes: what's your task rated
2010-11-23 20:08:06 < Jumpyshoes> not sure, whatever the code a function in assembly is
2010-11-23 20:08:15 < irock> Jumpyshoes: I might be able to help, but I need a question first
2010-11-23 20:08:21 < Jumpyshoes> google's awesome open source tool to list projects crashes my browser, so i can't tell
2010-11-23 20:08:22 < rfw> i would check but the list tasks page takes fucking forever to load
2010-11-23 20:08:27 < rfw> haha
2010-11-23 20:08:30 < reid_> dificult
2010-11-23 20:08:34 < rfw> ah
2010-11-23 20:08:44 < kierank> yes melange is completely useless
2010-11-23 20:08:50 < reid_> yes they list crashes all my browsers.
2010-11-23 20:08:58 < irock> yep, the lists particularly
2010-11-23 20:08:59 < Jumpyshoes> irock: http://pastebin.com/jyhm0z4w reid_
2010-11-23 20:09:04 < Jumpyshoes> oops, fail tab completion
2010-11-23 20:09:13 < rfw> i think i'm going to go for plone next
2010-11-23 20:09:18 < Jumpyshoes> FDEC_STRIDE = 32
2010-11-23 20:09:27 < Jumpyshoes> i don't understand what the STORE_DIFF  m0, m4, m7, [r0+0*FDEC_STRIDE] is doing
2010-11-23 20:09:30 < Jumpyshoes> (etc.)
2010-11-23 20:09:35 < Jumpyshoes> since it's moving up by 32 each time
2010-11-23 20:09:54 < Jumpyshoes> and that's a lot of bytes
2010-11-23 20:11:53 < irock> actually you need to double FDEC_STRIDE for high depth
2010-11-23 20:12:18 < irock> check sub4x4_dct_mmx e.g
2010-11-23 20:12:25 < Jumpyshoes> yea, but that's secondary
2010-11-23 20:13:06 < reid_> Jumpyshoes: has you request been accepted yet?
2010-11-23 20:13:07 < Jumpyshoes> doesn't that go beyond a 4x4 array of bytes
2010-11-23 20:13:19 < Jumpyshoes> reid_: nope
2010-11-23 20:13:42 < Jumpyshoes> i mean, i didn't file for one
2010-11-23 20:13:44 < Jumpyshoes> but D_S knows
2010-11-23 20:14:19 < irock> well, it's not a 4x4 array of bytes in high depth, it's an 4x4 array of uint16_t.
2010-11-23 20:14:32 < Jumpyshoes> i mean, i'm just talking about 8 bit for now
2010-11-23 20:14:46 < irock> ah, ok
2010-11-23 20:15:02 < Jumpyshoes> doesn't 32 go over the 1x4 row?
2010-11-23 20:15:36 < rfw> oh god
2010-11-23 20:15:39 < rfw> melange is broken
2010-11-23 20:15:43 < rfw> Time to complete:	 6 mins
2010-11-23 20:15:54 < rfw> I HAVE 6 MINUTES TO COMPLETE THIS 7 DAY TASK I STARTED YESTERDAY
2010-11-23 20:15:59 < Jumpyshoes> LOL
2010-11-23 20:16:12 < rfw> Time to complete:	 59 mins
2010-11-23 20:16:16 < rfw> slightly better!
2010-11-23 20:16:20 < reid_> to be fair, melange never really worked to begin with.
2010-11-23 20:16:28 < Jumpyshoes> interesting how they use crappy open source software to promote open source
2010-11-23 20:17:32 < rfw> i think i should join #melange
2010-11-23 20:17:37 < irock> Jumpyshoes: take a look at the C code
2010-11-23 20:17:41 < rfw> and ask why the fuck i have 58 minutes left
2010-11-23 20:17:50 < reid_> rf
2010-11-23 20:18:00 < Jumpyshoes> yea, what about it?
2010-11-23 20:18:04 < irock> we're writing 4 pixels first, then we add the stride, write 4 more pixels, and so on
2010-11-23 20:18:33 < Jumpyshoes> OH MY GOD
2010-11-23 20:18:37 < BugMaster> Dark_Shikari: http://privatepaste.com/8b80558011
2010-11-23 20:18:37 < BugMaster> For information. It fix non critical difference between gcc 4.4.5 and gcc 4.5.2, which was due different processing of division by zero (early termination doesn't work in gcc 4.5.2 with division by zero).
2010-11-23 20:18:37 < BugMaster> Removed one weight_cache which was really needed only due the absent of luma weights initialization (which can be set in lookahead)
2010-11-23 20:18:40 < Jumpyshoes> i've had this file open for AGES
2010-11-23 20:18:49 < Jumpyshoes> and i haven't noticed that
2010-11-23 20:18:51 < Jumpyshoes> thank you irock
2010-11-23 20:19:13 < irock> Jumpyshoes: just ask :)
2010-11-23 20:19:15 < tjoener> Jumpyshoes: Microsoft used Windows to promote themselves
2010-11-23 20:19:26 < tjoener> just use crappy software and youre rich :)
2010-11-23 20:19:46 < Jumpyshoes> i'm on windows 7 and i like it
2010-11-23 20:19:52 < Jumpyshoes> DON'T BE HATIN'
2010-11-23 20:19:55  * JEEBsv high-fives BugMaster 
2010-11-23 20:20:02 < tjoener> Ive got win7too
2010-11-23 20:20:03 < rfw> Jumpyshoes: me too, but not the liking part
2010-11-23 20:20:05 < JEEBsv> great, so I guess that problem is mostly behind us the?
2010-11-23 20:20:06 < JEEBsv> *then
2010-11-23 20:20:14 < tjoener> I meant in the old days
2010-11-23 20:20:56 < reid_> can we port c methods from 8bit to 10 bit or do they have to be the asm ones?
2010-11-23 20:21:08 < Jumpyshoes> ask Dark_Shikari
2010-11-23 20:21:21 < irock> reid_: the C functions are already working for high depth (10-bit)
2010-11-23 20:21:24 < reid_> where is he?
2010-11-23 20:21:32 < reid_> is he still sleeping!
2010-11-23 20:21:35 < Jumpyshoes> he's probably sleeping or something
2010-11-23 20:21:42 < tjoener> sleeping?
2010-11-23 20:21:49 < tjoener> DS does not need sleep!
2010-11-23 20:21:53 < tjoener> He's undead
2010-11-23 20:21:55 < Jumpyshoes> no clue, out with hot girls or something else
2010-11-23 20:21:55 < tjoener> or immortal
2010-11-23 20:21:57 < tjoener> or whatever
2010-11-23 20:22:09 < tjoener> now THAT would be the only reason to get away :)
2010-11-23 20:22:32 < irock> reid_: if you can't wait for a live tutorial, you can check the logs a few days back
2010-11-23 20:22:48 < Jumpyshoes> oh yea, D_S gave me a crash course on asm
2010-11-23 20:22:49 < irock> I think Jumpyshoes and someone more has done the tutorial quite recently
2010-11-23 20:23:02 < irock> check topic for the logs
2010-11-23 20:23:32 < reid_> irock: the c functions are not showing up on checkasm
2010-11-23 20:23:46 < reid_> at least not all of them.
2010-11-23 20:23:51 < irock> reid_: they are only showing up if there's an asm equivalent
2010-11-23 20:23:55 < Jumpyshoes> yea
2010-11-23 20:24:03 < reid_> oh
2010-11-23 20:25:57 < reid_> is there an easy way to tell where the function are located?
2010-11-23 20:26:21 < irock> ~all x86 asm functions are located in common/x86
2010-11-23 20:26:58 < irock> sometimes it's quite easy to guess from the name, otherwise you could ask or grep
2010-11-23 20:27:11 < reid_> I know, I was wondering if there was a way so i dont have to go hunting for them.
2010-11-23 20:27:30 < reid_> I'm about to write a java app to do it for me.
2010-11-23 20:28:05 < irock> that's overkill
2010-11-23 20:28:37 < reid_> yea, bit it will take my mind off of asm.
2010-11-23 20:29:50 < kierank> ctags?
2010-11-23 20:30:36 < Gramner> >java
2010-11-23 20:30:41 < Gramner> blasphemy
2010-11-23 20:32:06 < Dark_Shikari> reid_: /me is now here
2010-11-23 20:33:06 < Dark_Shikari> rfw: keep in mind that if melange sucks and randomly steals all your time, I'm happy to re-issue the tasks.
2010-11-23 20:33:13 < rfw> :D
2010-11-23 20:33:24 < rfw> apparently they know about it, though
2010-11-23 20:33:44 < Dark_Shikari> BugMaster: applied
2010-11-23 20:34:09 < rfw> Dark_Shikari: so do i have to wait for pengvado to look at my script?
2010-11-23 20:34:19 < Dark_Shikari> When you think you're done, post a git format-patch.
2010-11-23 20:34:32 < Dark_Shikari> And then the other devs here can try it out!
2010-11-23 20:34:44 < rfw> put it in extra?
2010-11-23 20:35:08 < Dark_Shikari> tools/
2010-11-23 20:35:10 < Dark_Shikari> like checkasm
2010-11-23 20:35:34 < reid_> are the 10 bit methods compiled when HIGH_BIT_DEPTH is defined and 8 bit methods compiled when it is not?
2010-11-23 20:35:48 < Dark_Shikari> Yup
2010-11-23 20:36:01 < Dark_Shikari> The primary difference between 8-bit and 10-bit, for the asm:
2010-11-23 20:36:03 < Dark_Shikari> for 8-bit
2010-11-23 20:36:13 < Dark_Shikari> pixels are uint8_t, and dct coeffs are int16_t
2010-11-23 20:36:15 < Dark_Shikari> for 10-bit
2010-11-23 20:36:21 < Dark_Shikari> pixels are uint16_t, and dct coeffs are int32_t
2010-11-23 20:36:27 < Dark_Shikari> (of course, the pixels can't exceed 1023 in that case)
2010-11-23 20:36:51 < Dark_Shikari> Some functions may need more internal precision than they did with 8-bit.
2010-11-23 20:36:53 < Dark_Shikari> Others might not.
2010-11-23 20:37:01 < JEEBsv> rfw: what kind of a script did you do?
2010-11-23 20:37:18 < Dark_Shikari> regression tester
2010-11-23 20:37:20 < Dark_Shikari> that does magic
2010-11-23 20:37:25 < reid_> So All I have to do is change the data types and make sure nothing overflows and what not?
2010-11-23 20:38:31 < Dark_Shikari> of course, this involves writing new asm
2010-11-23 20:38:37 < rfw> JEEBsv: python
2010-11-23 20:38:39 < Dark_Shikari> you can't just "
2010-11-23 20:38:44 < Dark_Shikari> "change the data types" -- at least not usually
2010-11-23 20:38:47 < Dark_Shikari> of course, sometimes you can
2010-11-23 20:38:48 < rfw> JEEBsv: a whole pile of python... :V
2010-11-23 20:39:01 < Dark_Shikari> for example, in Jumpyshoes' case, he found a function that could be converted from MMX 8-bit to SSE 10-bit with almost no changes
2010-11-23 20:39:09 < Dark_Shikari> as in MMX 8-bit, it worked on 4x16-bit values
2010-11-23 20:39:09 < JEEBsv> rfw: I'll be looking forward to it
2010-11-23 20:39:15 < Dark_Shikari> and in SSE 10-bit, it'll work on 4x32-bit values
2010-11-23 20:39:26 < rfw> JEEBsv: that sounds ominous
2010-11-23 20:39:26 < Dark_Shikari> so the basic structure stayed identical
2010-11-23 20:40:11 < Jumpyshoes> except i'm failing at life
2010-11-23 20:40:35 < reid_> what are dct's?
2010-11-23 20:40:37 < Dark_Shikari> s/failing at life/not asking questions and posting your code/
2010-11-23 20:40:44 < Jumpyshoes> well
2010-11-23 20:40:49 < Jumpyshoes> i need to go in 10 minutes
2010-11-23 20:40:53 < Jumpyshoes> school being over and all
2010-11-23 20:40:59 < Jumpyshoes> so i will be doing that when i get back home
2010-11-23 20:41:02 < Dark_Shikari>  ok =p
2010-11-23 20:41:13 < Dark_Shikari> reid_: a DCT is a discrete cosine transform.  You don't have to understand how/why it works, just that there's nice pretty C code for it in common/dct.c
2010-11-23 20:41:53 < reid_> so all I have to worry about is pixel depth?
2010-11-23 20:42:38 < Dark_Shikari> and all consequences thereof
2010-11-23 20:42:45 < Dark_Shikari> and learning to write asm in x264
2010-11-23 20:43:19 < reid_> what is x264? what architecture is that?
2010-11-23 20:43:31 < Dark_Shikari> x264 is the program you're planning to work on
2010-11-23 20:43:47 < Dark_Shikari> the architectures to pick from are x86 (MMX/SSE), PPC (Altivec), and ARM (NEON)
2010-11-23 20:44:53 < rfw> fuck
2010-11-23 20:44:54 < rfw> fuuuuuuuuck
2010-11-23 20:45:02 < reid_> all I see is MMX/SSE
2010-11-23 20:45:08 < Dark_Shikari> NEON would be in common/arm
2010-11-23 20:45:10 < rfw> i just deleted
2010-11-23 20:45:12 < rfw> fuuuuuuuuck
2010-11-23 20:45:13 < Dark_Shikari> altivec would be in common/ppc
2010-11-23 20:45:18 < Jumpyshoes> reid_: change directory!
2010-11-23 20:45:22 < Dark_Shikari> rfw: stop.  stop right now.
2010-11-23 20:45:25 < Dark_Shikari> don't do anything else
2010-11-23 20:45:28 < rfw> too late
2010-11-23 20:45:31 < Dark_Shikari> Go get a data recovery tool
2010-11-23 20:45:34 < rfw> oh
2010-11-23 20:45:55 < irock> rfw: what file system are you using?
2010-11-23 20:45:57 < rfw> ntfs
2010-11-23 20:46:00 < Dark_Shikari> By the way, this is why you POST YOUR SHIT
2010-11-23 20:46:10 < irock> ah, then it's probably np
2010-11-23 20:46:15 < rfw> okay, i have testdisk
2010-11-23 20:46:17 < Dark_Shikari> Do not write to any other files
2010-11-23 20:46:24 < reid_> i'm in the root.
2010-11-23 20:46:35 < Jumpyshoes> okay going home
2010-11-23 20:46:40 < Jumpyshoes> i shall bug you with questions later
2010-11-23 20:46:54 < reid_> but it doesn't matter i'm only working on x86
2010-11-23 20:47:02 < Dark_Shikari> rfw: http://www.officerecovery.com/index.htm#freeundelete
2010-11-23 20:47:27 < Dark_Shikari> This fact also dictates the following procedure for using FreeUndelete:
2010-11-23 20:47:27 < Dark_Shikari>    1. Stop any activity on the disk you are going to undelete files from! Remember that writing to that disk can damage the contents of the deleted files. Examples of disastrous activity include: copying files to the disk, installing programs there or running programs that use the disk as their swap media.
2010-11-23 20:47:32 < Dark_Shikari>    2. Download and install FreeUndelete. Whenever possible, save the setup executable and install the program to a disk that does not hold files you need to undelete.
2010-11-23 20:47:36 < Dark_Shikari>    3. Run and use FreeUndelete.
2010-11-23 20:47:36 < Dark_Shikari> I've used it before.
2010-11-23 20:47:47 < Dark_Shikari> Now go grab a flash drive, stick it in, install freeundelete on it, and get your files back
2010-11-23 20:47:50 < Dark_Shikari> and next time, learn to post your shit.
2010-11-23 20:47:53 < rfw> yeah
2010-11-23 20:47:59 < Dark_Shikari> because stuff on the internet can't be deleted.
2010-11-23 20:48:00 < rfw> i'm so silly
2010-11-23 20:48:14 < irock> (or learn to do regular backups)
2010-11-23 20:48:22 < JEEBsv> A true (9) you are, aren't you rfw :3
2010-11-23 20:48:36 < rfw> 天才
2010-11-23 20:48:45 < Dark_Shikari> Or just use a revision control system
2010-11-23 20:48:46 < Dark_Shikari> like git
2010-11-23 20:48:48 < Dark_Shikari> which saves your shit
2010-11-23 20:48:50 < JEEBsv> yeah
2010-11-23 20:48:52 < reid_> who backs up hourly?
2010-11-23 20:48:57 < Dark_Shikari> Someone who uses git
2010-11-23 20:48:58 < rfw> i did but
2010-11-23 20:49:01 < irock> reid_: I do
2010-11-23 20:49:06 < JEEBsv> you rm -rf'd the whole folder?
2010-11-23 20:49:08 < rfw> yeah
2010-11-23 20:49:09 < rfw> >_>
2010-11-23 20:49:14 < rfw> need to learn to look
2010-11-23 20:49:15 < rfw> when i type
2010-11-23 20:51:54 < rfw> Dark_Shikari: not there D:
2010-11-23 20:52:04 < rfw> it's not reading the Users folder
2010-11-23 20:52:12 < Dark_Shikari> I highly doubt you've managed to run the entire program on your entire drive already.
2010-11-23 20:52:18 < Dark_Shikari> It is much slower than that.
2010-11-23 20:52:30 < rfw> i scanned C
2010-11-23 20:52:34 < rfw> and it finished
2010-11-23 20:52:45 < Dark_Shikari> Did you check stuff like the unnamed folders and files and such?
2010-11-23 20:52:50 < Dark_Shikari> i.e. the stuff with no metadata?
2010-11-23 20:52:58 < rfw> hm, where?
2010-11-23 20:53:02 < rfw> it only shows 3 folders
2010-11-23 20:53:08 < Dark_Shikari> Open them.
2010-11-23 20:53:11 < rfw> programdat, program files and config.msi
2010-11-23 20:53:40 < Dark_Shikari> Also, don't you still have your editor open?
2010-11-23 20:53:49 < rfw> no
2010-11-23 20:53:55 < rfw> well
2010-11-23 20:53:59 < rfw> i only have test_x264.py open
2010-11-23 20:54:26 < Dark_Shikari> well then you're good aren't you?
2010-11-23 20:55:15 < rfw> no
2010-11-23 20:55:25 < rfw> since all of digress is gone
2010-11-23 20:55:49 < Dark_Shikari> er... you were making a library
2010-11-23 20:55:51 < Dark_Shikari> and you never put it on github?
2010-11-23 20:56:07 < Dark_Shikari> you never copied it into the x264 directory, instead you _moved_ it?
2010-11-23 20:56:10 < Dark_Shikari> leaving no original?
2010-11-23 20:56:24 < Dark_Shikari> what kind of crack are you on?
2010-11-23 20:56:30 < rfw> i don't know
2010-11-23 20:56:32 < rfw> it's 9a,
2010-11-23 20:56:33 < rfw> 9am
2010-11-23 20:56:36 < rfw> i don't think clearly
2010-11-23 20:56:50 < Dark_Shikari> er... this means you've been actively stupid for the past couple months you've been working on this "digress" thing
2010-11-23 20:56:59 < rfw> er no
2010-11-23 20:57:03 < rfw> i just started yesterday
2010-11-23 20:57:08 < rfw> i wrote a library first, then the tests
2010-11-23 20:57:12 < rfw> as part of the whole thing
2010-11-23 20:57:20 < rfw> i don't write a regression testing suite for no reason, you know
2010-11-23 20:57:34 < Dark_Shikari> maybe you can regression-test your brain first?
2010-11-23 20:57:48 < rfw> i probably need to D:
2010-11-23 20:58:13 < Dark_Shikari> check your editor temp files or something
2010-11-23 20:58:18 < rfw> oh
2010-11-23 20:58:20 < rfw> recuva works
2010-11-23 20:58:36 < JEEBsv> 'grats
2010-11-23 20:58:38 < Dark_Shikari> Now, once you recover this
2010-11-23 20:58:40 < Dark_Shikari> make a repository on github
2010-11-23 20:58:43 < Dark_Shikari> and commit your shit.
2010-11-23 20:58:52 < Dark_Shikari> and don't ever do that again.
2010-11-23 20:59:08 < rfw> yes sir
2010-11-23 20:59:13 < tjoener> yeah rfw recuva helped me out a tight spot a few times (read thesis)
2010-11-23 20:59:20 < Dark_Shikari> lol
2010-11-23 20:59:22 < JEEBsv> lol
2010-11-23 20:59:24 < Dark_Shikari> >thesis oh god
2010-11-23 20:59:30 < JEEBsv> it helped me get back some rare'ish files
2010-11-23 20:59:40 < JEEBsv> although I think I wasn't using recuva then, but that other open source app
2010-11-23 21:00:17 < holger_> as Dark_Shikari said: that was pretty lucky. at least you got your stuff back. also note we do not accept the "dog ate my homework" excuse, so you better recover your work from the dog erm disk ;) ;)
2010-11-23 21:00:36 < Dark_Shikari> github ---> your closest friend
2010-11-23 21:00:38 < Dark_Shikari> or gitosis or whatever
2010-11-23 21:00:48 < rfw> >estimated time left 7 hours
2010-11-23 21:01:01 < koda> umh would there be any hope of having this http://forum.doom9.org/showthread.php?t=154606 included in the stable branch?
2010-11-23 21:01:06 < tjoener> Ive got a public git repo, if someone wants to post some code for backup I could probably arrange something easy
2010-11-23 21:01:23 < tjoener> a few extra ssh keys or whatnot
2010-11-23 21:01:34 < koda> this patch to be more precise http://pastebin.com/ErL2eAW8
2010-11-23 21:01:56 < reid_> how do we submit our code?
2010-11-23 21:02:08 < JEEBsv> git format-patch
2010-11-23 21:02:12 < Dark_Shikari> you make a local commit
2010-11-23 21:02:14 < Dark_Shikari> you use git format-patch
2010-11-23 21:02:17 < Dark_Shikari> and then we review it!
2010-11-23 21:02:20 < Dark_Shikari> You should do this obvious
2010-11-23 21:02:21 < Dark_Shikari> er, often
2010-11-23 21:02:26 < Dark_Shikari> preferably, before it's even working!
2010-11-23 21:02:34 < Dark_Shikari> Because when things are broken, we may be able to help you figure out what's wrong
2010-11-23 21:02:47 < Dark_Shikari> git format-patch -> pastebin
2010-11-23 21:02:59 < Dark_Shikari> pastebin.com, pastebin.ca, etc
2010-11-23 21:03:28 < reid_> im using gitg
2010-11-23 21:04:25 < rfw> haha yes
2010-11-23 21:04:28 < rfw> got them all back
2010-11-23 21:04:29 < rfw> now
2010-11-23 21:04:31 < rfw> time to not be stupid
2010-11-23 21:04:43 < JEEBsv> time to add, commit and push to something non-local
2010-11-23 21:04:51 < rfw> oh wait
2010-11-23 21:04:55 < rfw> >.pyc
2010-11-23 21:04:59 < rfw> >_>
2010-11-23 21:05:00 < holger_> and it's always a good idea to sync your work directory to another machine, locally or even remote, if you have one. on linux you could set up a cron job running rsync for that
2010-11-23 21:05:10 < rfw> well, that was half-helpful
2010-11-23 21:07:20 < reid_> you want all the source?
2010-11-23 21:07:44  * koda pings Dark_Shikari with http://pastebin.com/ErL2eAW8
2010-11-23 21:08:19 < rfw> yeah
2010-11-23 21:09:00 < kierank> koda: the plan is to get it into the main branch
2010-11-23 21:09:02 < kierank> but it needs some work
2010-11-23 21:09:19 < Dark_Shikari> koda: there's no way that will get into trunk in its current state
2010-11-23 21:09:31 < Dark_Shikari> and really IMO speedcontrol should be outside libx264
2010-11-23 21:09:42 < Dark_Shikari> the only justification for it being _in_ x264 is to stop people from having to implement it themselves
2010-11-23 21:10:03 < koda> Dark_Shikari: it's just so convenient to have it all in one tool
2010-11-23 21:10:05 < Dark_Shikari> rfw: you can decompile python
2010-11-23 21:10:11 < koda> kierank: how can i help in the process?
2010-11-23 21:10:17 < rfw> Dark_Shikari: not very well
2010-11-23 21:10:26 < Dark_Shikari> at least it's not .pyo
2010-11-23 21:10:44 < irock> .pyc even have comments, no?
2010-11-23 21:19:26 < tjoener> I'm off
2010-11-23 21:19:27 < tjoener> taa
2010-11-23 21:21:41 < rfw> YES
2010-11-23 21:21:44 < rfw> THANK YOU UNPYC
2010-11-23 21:22:08 < reid_> what are you using python for?
2010-11-23 21:22:17 < rfw> regression testing thing
2010-11-23 21:23:39 < dj_tjerk> pastebin nao
2010-11-23 21:24:19 < rfw> oh, it failed on my more complex files
2010-11-23 21:24:46 < rfw> sigh
2010-11-23 21:24:51 < rfw> at this rate i'm going to have to withdraw
2010-11-23 21:25:35 < reid_> seriously melange
2010-11-23 21:29:11 < checkers> the worst part about stupid backup stories is nobody learns from them except the protagonist
2010-11-23 21:31:58 < dj_tjerk> rfw > writing python doesn't take that much time does it?
2010-11-23 21:32:26 < checkers> 0/10
2010-11-23 21:32:27 < rfw> i suppose not but
2010-11-23 21:32:34 < rfw> :(
2010-11-23 21:33:57 < checkers> ps use dropbox
2010-11-23 21:34:47 < checkers> also it's probably worth trying another undelete tool
2010-11-23 21:35:08 < checkers> http://ntfsundelete.com/ <-- I generally use this one
2010-11-23 21:36:44 < rfw> everything just froze
2010-11-23 21:36:47 < rfw> argh not having a good day
2010-11-23 21:44:03 < callahan> Sounds like you're learning alot :)
2010-11-23 22:00:25 < doron> Hi, I Registered your assembly task in the Google Code In competition. Can you please confirm me and explain to me what I need to do?
2010-11-23 22:01:00 < JEEBsv> one thing: keep on the channel, devs will talk to you eventually
2010-11-23 22:01:53 < doron> ah thank
2010-11-23 22:02:04 < reid_> doron:  do you know assembly?
2010-11-23 22:02:16 < rfw> i guess i'm going to have to go with plan B
2010-11-23 22:02:22 < doron> yes
2010-11-23 22:07:29 < Dark_Shikari> rfw: it took you a day
2010-11-23 22:07:33 < Dark_Shikari> surely couldn't take that long to rewrite parts of it =p
2010-11-23 22:07:42 < Dark_Shikari> doron: will get to you in a moment, busy in class atm
2010-11-23 22:08:00 < Dark_Shikari> in the meantime you can check the log in the topic and see above where all previous students asked the same question ;)
2010-11-23 22:15:22 < checkers> back at college?
2010-11-23 22:15:31 < checkers> I thought you had a big break over christmas?
2010-11-23 22:15:43 < Dark_Shikari> is it christmas yet?
2010-11-23 22:15:45 < checkers> oh, you have that during your summer, dont you
2010-11-23 22:15:57 < Dark_Shikari> no...
2010-11-23 22:15:59 < astrange> it's not even thanksgiving break yet
2010-11-23 22:16:09 < Dark_Shikari> the stores seem to think it's already christmas
2010-11-23 22:16:11 < Dark_Shikari> but nobody else does
2010-11-23 22:16:12 < checkers> uni people here got out a few weeks ago
2010-11-23 22:16:12 < astrange> actually i think it is at every GA school except this one
2010-11-23 22:16:20 < checkers> so they have a break from mid-november to march 1
2010-11-23 22:16:50 < rfw> Dark_Shikari: i think i might withdraw from this
2010-11-23 22:16:54 < rfw> i'm sorry :(
2010-11-23 22:16:59 < Dark_Shikari> rfw: but you had such a cool script!
2010-11-23 22:17:03 < rfw> i know!
2010-11-23 22:17:08 < rfw> i can't believe how fucking stupid i am
2010-11-23 22:17:10 < Dark_Shikari> get a better python decompiler
2010-11-23 22:17:14 < rfw> i tried everything
2010-11-23 22:17:26 < Dark_Shikari> go put in three hours and rewrite it, it's easier when you've done it before
2010-11-23 22:18:03 < rfw> time to figure out all the classes i had
2010-11-23 22:19:36 < rfw> or i could make it
2010-11-23 22:19:37 < rfw> um
2010-11-23 22:19:46 < rfw> better, stronger and faster
2010-11-23 22:19:49 < Dark_Shikari> rewriting is great, it means you can fix all the stupid things you did the first time
2010-11-23 22:19:52 < Dark_Shikari> lol
2010-11-23 22:20:01 < rfw> yeah
2010-11-23 22:20:06 < rfw> but that's one day wasyed
2010-11-23 22:20:08 < rfw> wasted
2010-11-23 22:20:13 < rfw> well
2010-11-23 22:20:15 < checkers> I believe you mean harder, better, faster, stronger
2010-11-23 22:20:16 < rfw> i guess i get the experience
2010-11-23 22:20:20 < Dark_Shikari> no it isn't, because most of the day was spent figuring out how to do things
2010-11-23 22:20:22 < Dark_Shikari> and what needed to be done
2010-11-23 22:20:24 < rfw> yup
2010-11-23 22:20:32 < Dark_Shikari> you're actually more than half done at this point.
2010-11-23 22:20:41 < Dark_Shikari> This time around, do it with git, and make commits often
2010-11-23 22:20:42 < rfw> checkers: are you going to throw a longer and thicker in there too
2010-11-23 22:21:04 < Dark_Shikari> rfw: more than after hour never
2010-11-23 22:21:09 < Dark_Shikari> our work is never over (oh wait)
2010-11-23 22:53:17 < Jumpyshoes> Dark_Shikari: http://pastebin.com/hyNZL78m here's the code that wasn't working for me
2010-11-23 22:54:29 < Jumpyshoes> i changed the mov to move double quadwords, transpose/idct to d, pw --> pd, and used the updated version of STORE_DIFF
2010-11-23 22:54:33 < Jumpyshoes> isn't working though
2010-11-23 22:55:08 < Dark_Shikari> 1) is pd_32 times 4 or times 2?
2010-11-23 22:55:13 < Dark_Shikari> i.e. is it 4 values or 2?  check that, it should b4
2010-11-23 22:55:37 < Jumpyshoes> let's see
2010-11-23 22:55:38 < Jumpyshoes> const pw_32,       times 8 dw 32
2010-11-23 22:55:41 < Jumpyshoes> const pd_32,       times 8 dd 32
2010-11-23 22:55:46 < Dark_Shikari> it should be times 4 unless you didn't create it
2010-11-23 22:55:54 < Jumpyshoes> oh
2010-11-23 22:55:56 < Jumpyshoes> i created it
2010-11-23 22:55:57 < Dark_Shikari> that won't affect it though
2010-11-23 22:56:00 < Dark_Shikari> 8 is obviously enough
2010-11-23 22:56:06 < Dark_Shikari> where's your updated store_diff, how about posting the whole patch?
2010-11-23 22:56:14 < Dark_Shikari> you mean checkasm fails, right?
2010-11-23 22:56:20 < Jumpyshoes> yea, checkasm fails
2010-11-23 22:56:30 < Jumpyshoes> and store_diff was already written
2010-11-23 22:56:39 < Jumpyshoes> how do i post a whole patch?
2010-11-23 22:56:50 < Dark_Shikari> STORE_DIFF was written for high bit dept?h
2010-11-23 22:56:52 < Dark_Shikari> git diff
2010-11-23 22:56:56 < Jumpyshoes> yea, it was
2010-11-23 22:57:27 < Jumpyshoes> git diff prints to the terminal, how do i get it to a file? pipe?
2010-11-23 22:57:48 < dj_tjerk> > test.diff
2010-11-23 22:58:25 < dj_tjerk> if you didn't tinker with color settings, it should be fine (i think you can set it to always output color, and not check if it's outputting to a file or stdout)
2010-11-23 22:59:08 < Jumpyshoes> and i just pastebin it?
2010-11-23 23:00:38 < Dark_Shikari> yes
2010-11-23 23:00:50 < Dark_Shikari> also, you can add printfs to the checkasm unit test
2010-11-23 23:00:54 < Dark_Shikari> to look at the output and try to see what might be wrong
2010-11-23 23:01:03 < Jumpyshoes> how do i do that?
2010-11-23 23:01:24 < Dark_Shikari> by editing checkasm.c
2010-11-23 23:01:26 < Dark_Shikari> and adding printfs
2010-11-23 23:01:28 < Jumpyshoes> http://pastebin.com/6r91qxH0 i did a bunch of mucking around
2010-11-23 23:01:28 < Dark_Shikari> assuming you know basic C
2010-11-23 23:01:41 < Jumpyshoes> why on earth would i try to learn asm before C ._.
2010-11-23 23:01:44 < Dark_Shikari> you should use "diff" highlighting for your pastebinsd
2010-11-23 23:01:55 < Jumpyshoes> oops, sorry
2010-11-23 23:02:41 < Dark_Shikari> aha.
2010-11-23 23:02:48 < Dark_Shikari> Your idct needs to clamp!
2010-11-23 23:03:08 < Jumpyshoes> oh
2010-11-23 23:03:15 < Jumpyshoes> lesse
2010-11-23 23:03:20 < Jumpyshoes> does the clamp happen after the IDCT?
2010-11-23 23:03:31 < Jumpyshoes> seems like it does
2010-11-23 23:03:35 < Dark_Shikari> ok, so here's how it works
2010-11-23 23:03:37 < Dark_Shikari> look at how STORE_DIFF works
2010-11-23 23:03:59 < Dark_Shikari> actually.  wait a minute.  where is store_diff actually used?
2010-11-23 23:04:12 < Jumpyshoes>     STORE_DIFF m0, m4, m5, [r0+0*FDEC_STRIDE], [r0+1*FDEC_STRIDE]
2010-11-23 23:04:14 < Jumpyshoes> ...
2010-11-23 23:04:16 < Jumpyshoes>     STORE_DIFF m3, m4, m5, [r0+6*FDEC_STRIDE], [r0+7*FDEC_STRIDE]
2010-11-23 23:04:32 < Dark_Shikari> .... oh god
2010-11-23 23:04:33 < Dark_Shikari> irock: !
2010-11-23 23:04:35 < Dark_Shikari> come on
2010-11-23 23:04:44 < Dark_Shikari> your STORE_DIFF doesn't do a STORE_DIFF, it's used for a completely unrelated purpose
2010-11-23 23:04:47 < Dark_Shikari> lol
2010-11-23 23:04:51 < Jumpyshoes> :(
2010-11-23 23:04:59 < Jumpyshoes> why is it called the same thing
2010-11-23 23:05:03 < Dark_Shikari> irock
2010-11-23 23:05:04 < Dark_Shikari> blame him
2010-11-23 23:05:09 < Dark_Shikari> anyways, for now, make your own macro that does it
2010-11-23 23:05:13 < Dark_Shikari> first start by copying the low bit depth STORE_DIFF
2010-11-23 23:05:21 < Dark_Shikari> second, you'll need to change it around a bit
2010-11-23 23:05:26 < Jumpyshoes> well, i can't exactly delete his
2010-11-23 23:05:26 < Dark_Shikari> punpcklbw will become punpcklwd
2010-11-23 23:05:31 < Dark_Shikari> No, just call yours STORE_DIFF2
2010-11-23 23:05:35 < Dark_Shikari> I'll fix it later.
2010-11-23 23:05:41 < Jumpyshoes> oh, kk
2010-11-23 23:05:42 < Dark_Shikari> so:
2010-11-23 23:05:44 < Dark_Shikari>     movh       %2, %4
2010-11-23 23:05:44 < Dark_Shikari>     punpcklbw  %2, %3
2010-11-23 23:05:44 < Dark_Shikari>     psraw      %1, 6
2010-11-23 23:05:44 < Dark_Shikari>     paddsw     %1, %2
2010-11-23 23:05:47 < Dark_Shikari>     packuswb   %1, %1
2010-11-23 23:05:49 < Dark_Shikari>     movh       %4, %1
2010-11-23 23:05:54 < Dark_Shikari> we need to modify this for high bit depth and add clamping.
2010-11-23 23:06:01 < Dark_Shikari> punpcklbw -> punpcklwd (since the data type is 16-bit)
2010-11-23 23:06:15 < Dark_Shikari> psraw -> psrad
2010-11-23 23:06:22 < pengvado> how about pack first, then add?
2010-11-23 23:06:27 < Dark_Shikari> Oh, you could do that too
2010-11-23 23:06:32 < Dark_Shikari> yeah, that'd better, you can pack before adding
2010-11-23 23:06:41 < Jumpyshoes> what?
2010-11-23 23:06:49 < Dark_Shikari> Jumpyshoes: remember, the formula is
2010-11-23 23:07:02 < Dark_Shikari> pix + ((dctcoeff + 32) >> 6)
2010-11-23 23:07:08 < Dark_Shikari> the above code does that
2010-11-23 23:07:18 < Dark_Shikari> so we could do something like this:
2010-11-23 23:07:33 < Dark_Shikari> oh, but remember the 32 is already handled inside the idct above, so you don't have to worry about that
2010-11-23 23:07:37 < Dark_Shikari> so we're really doing
2010-11-23 23:07:38 < Dark_Shikari> pix + (dctcoeff >> 6)
2010-11-23 23:07:47 < Dark_Shikari> the pd_32 is the adding of the 32.
2010-11-23 23:07:53 < Jumpyshoes> yea
2010-11-23 23:07:53 < Dark_Shikari> so, here's what we do:
2010-11-23 23:07:59 < Dark_Shikari> 1) dctcoeff >>= 6
2010-11-23 23:08:03 < Dark_Shikari> 2) pack dctcoeff from 32-bit to 16-bit
2010-11-23 23:08:07 < Dark_Shikari> 3) add dctcoeff to pix
2010-11-23 23:08:24 < Dark_Shikari> 4) clamp pix to [0,(1<<BIT_DEPTH)-1]
2010-11-23 23:08:28 < Dark_Shikari> 5) store
2010-11-23 23:08:38 < Dark_Shikari> so, right here in chat, write out the instructions you'd use to do each step
2010-11-23 23:08:47 < Jumpyshoes> in that order?
2010-11-23 23:08:50 < Dark_Shikari> yes
2010-11-23 23:09:11 < Jumpyshoes> i assume pack means (in C talk) to cast?
2010-11-23 23:11:46 < Dark_Shikari> not quite, you're rearranging the data values
2010-11-23 23:11:51 < Dark_Shikari> e.g. 4x32-bit -> 4x16-bit
2010-11-23 23:11:58 < Dark_Shikari> this clearly involves modifying the data in the register
2010-11-23 23:12:33 < Jumpyshoes> ah
2010-11-23 23:13:18 < Jumpyshoes> 1) dctcoeff >>= 6: psrad      %1, 6
2010-11-23 23:13:41 < Dark_Shikari> just use variables in your asm for now
2010-11-23 23:13:45 < Dark_Shikari> e.g. psrad dctcoeff, 6
2010-11-23 23:13:48 < Dark_Shikari> consider it pseudocode
2010-11-23 23:13:50 < Jumpyshoes> ah okay
2010-11-23 23:14:18 < Jumpyshoes> i have no idea how to pack data, unless you can do it by shifting like crazy
2010-11-23 23:14:53 < Dark_Shikari> using the pack instruction
2010-11-23 23:14:54 < Dark_Shikari> duh
2010-11-23 23:15:10 < Dark_Shikari> use your manual
2010-11-23 23:15:12 < Dark_Shikari> packssdw
2010-11-23 23:15:34 < Jumpyshoes> oh
2010-11-23 23:15:40 < Jumpyshoes> bleh, i need to get better at looking stuff up
2010-11-23 23:16:44 < Jumpyshoes> 2) pack dctcoeff from 32-bit to 16-bit: packssdw tmp1, dctcoeff
2010-11-23 23:16:55 < Dark_Shikari> no
2010-11-23 23:17:01 < Dark_Shikari> go read what packssdw does
2010-11-23 23:17:46 < Jumpyshoes> oh derp
2010-11-23 23:17:49 < Dark_Shikari> and look at how it's used in the existing STORE_DIFF
2010-11-23 23:18:06 < rfw> Dark_Shikari: it is better \o/
2010-11-23 23:18:18 < rfw> though i still have to rewrite all that fucking bisection code
2010-11-23 23:18:37 < Jumpyshoes> wait, how can you tell if an instruction can be used on the same register?
2010-11-23 23:18:40 < Jumpyshoes> PACKSSDW xmm1,
2010-11-23 23:18:41 < Jumpyshoes> xmm2/m128
2010-11-23 23:18:57 < Dark_Shikari> Jumpyshoes: does it exist?  if so, it can be used on the same register
2010-11-23 23:19:04 < Jumpyshoes> oh <_<
2010-11-23 23:19:19 < Dark_Shikari> irock: I am copmletely lost with regadrs to the end of your dct mmx functions.  why do we need 32-bit dct coeff precision if your dct function can't even generate that?
2010-11-23 23:19:31 < Jumpyshoes> then, 2) pack dctcoeff from 32-bit to 16-bit: packssdw dctcoeff, dctcoeff ?
2010-11-23 23:19:58 < Dark_Shikari> yes
2010-11-23 23:23:16 < pengvado> that works, but you probably want to pack 2 regs together. no sense in wasting half the throughput.
2010-11-23 23:23:29 < Dark_Shikari> and what pengvado said
2010-11-23 23:23:31 < Dark_Shikari> so you can do something like
2010-11-23 23:23:39 < Dark_Shikari> psrad dctcoeffreg1, 6
2010-11-23 23:23:41 < Dark_Shikari> psrad dctcoeffreg2, 6
2010-11-23 23:23:50 < Dark_Shikari> packssdw dctcoeffreg1, dctcoeffreg2
2010-11-23 23:25:14 < Jumpyshoes> wouldn't i need to modify the storediff? i thought only one set of dctcoeff were passed at a time
2010-11-23 23:25:23 < Dark_Shikari> you're writing your own
2010-11-23 23:25:29 < Dark_Shikari> if you want, start by not even making it a macro
2010-11-23 23:25:31 < Dark_Shikari> you can macroize it later
2010-11-23 23:25:34 < Dark_Shikari> just write out the instructions
2010-11-23 23:25:42 < Jumpyshoes> right
2010-11-23 23:27:40 < Dark_Shikari>  so do 1) and 2) for all your dct coeff registers
2010-11-23 23:28:53 < Jumpyshoes> psrad dctcoeffreg1, 6
2010-11-23 23:28:53 < Jumpyshoes> psrad dctcoeffreg2, 6
2010-11-23 23:28:53 < Jumpyshoes> packssdw dctcoeffreg1, dctcoeffreg2
2010-11-23 23:28:53 < Jumpyshoes> psrad dctcoeffreg3, 6
2010-11-23 23:28:53 < Jumpyshoes> psrad dctcoeffreg4, 6
2010-11-23 23:28:54 < Jumpyshoes> packssdw dctcoeffreg3, dctcoeffreg4
2010-11-23 23:28:58 < Dark_Shikari> yes
2010-11-23 23:29:13 < Jumpyshoes> oh, and now they're 16bit
2010-11-23 23:29:19 < Dark_Shikari> now load up your pixels, which are also 16-bit, and add them
2010-11-23 23:29:28 < Jumpyshoes> right
2010-11-23 23:29:29 < Dark_Shikari> so e.g.
2010-11-23 23:29:36 < Dark_Shikari> movq tmp1, [r0+0*FDEC_STRIDE]
2010-11-23 23:29:41 < Dark_Shikari> movq tmp2, [r0+1*FDEC_STRIDE]
2010-11-23 23:29:44 < Dark_Shikari> punpcklqdq tmp1, tmp2
2010-11-23 23:29:50 < Dark_Shikari> which gives you two rows of pixels in "tmp1"
2010-11-23 23:30:01 < Dark_Shikari> which is what you want, as you have two rows of dct coeffs in "dctcoeffreg1"
2010-11-23 23:30:03 < rfw> To git@github.com:rofflwaffls/digress.git
2010-11-23 23:30:03 < rfw>  * [new branch]      master -> master
2010-11-23 23:30:09 < rfw> there
2010-11-23 23:30:12 < Jumpyshoes> yea
2010-11-23 23:30:37 < Dark_Shikari> actually, better yet
2010-11-23 23:30:38 < Dark_Shikari> do
2010-11-23 23:30:41 < Dark_Shikari> movq tmp1, [r0+0*FDEC_STRIDE]
2010-11-23 23:30:46 < Dark_Shikari> movhps tmp1, [r0+1*FDEC_STRIDE]
2010-11-23 23:30:51 < Dark_Shikari> same thing, two instructions instead of three.
2010-11-23 23:30:57 < Dark_Shikari> movhps --> move to the high half
2010-11-23 23:31:09 < Jumpyshoes> oh, isn't that for floating point though?
2010-11-23 23:31:15 < Dark_Shikari> It's a move, do you think the cpu cares? =p
2010-11-23 23:31:24 < Jumpyshoes> true
2010-11-23 23:31:40 < Jumpyshoes> now i need to add these
2010-11-23 23:33:23 < Jumpyshoes> paddsw dctcoeffreg1, tmp1
2010-11-23 23:33:23 < Jumpyshoes> paddsw dctcoeffreg3, tmp2
2010-11-23 23:33:23 < Jumpyshoes> ?
2010-11-23 23:35:12 < Dark_Shikari> yes
2010-11-23 23:36:32 < Jumpyshoes> can i use PACKSSDW or something similar to clamp? or will that not work
2010-11-23 23:37:09 < Dark_Shikari> ah, now here comes in the problem
2010-11-23 23:37:18 < Dark_Shikari> you're clamping to between, for example
2010-11-23 23:37:19 < Dark_Shikari> 0 and 1023
2010-11-23 23:37:22 < Dark_Shikari> there's no magic way to do that?
2010-11-23 23:37:24 < Dark_Shikari> *that!
2010-11-23 23:37:28 < Dark_Shikari> because they're not data type sizes
2010-11-23 23:37:34 < Dark_Shikari> i.e. it's not 0 and 65535
2010-11-23 23:37:35 < Dark_Shikari> or 0 and 255
2010-11-23 23:37:50 < Dark_Shikari> so here's what you do
2010-11-23 23:37:58 < Jumpyshoes> bleh, true
2010-11-23 23:38:05 < Dark_Shikari> pmaxsw myval, {0}
2010-11-23 23:38:13 < Dark_Shikari> pminsw myval, {MAX_PIXEL}
2010-11-23 23:38:17 < Dark_Shikari> where {0} is pw_0
2010-11-23 23:38:23 < Dark_Shikari> and max_pixel is pw_max
2010-11-23 23:38:33 < Dark_Shikari> which is defined as times 8 ((BIT_DEPTH<<1)-1)
2010-11-23 23:38:37 < Dark_Shikari> or pw_pixelmax
2010-11-23 23:38:39 < Dark_Shikari> or something descriptive
2010-11-23 23:39:00 < Dark_Shikari> also, pb_0 will (obviously) serve as a good pw_0/.
2010-11-23 23:39:06 < Jumpyshoes> there's already a pw_pixel_max, but it's times 8
2010-11-23 23:39:11 < Dark_Shikari> That's what you need
2010-11-23 23:39:16 < Dark_Shikari> use it
2010-11-23 23:39:17 < Jumpyshoes> oh right, i have 8 of them
2010-11-23 23:39:31 < pengvado> there'as also a macro CLIPW
2010-11-23 23:39:37 < Dark_Shikari> Oh, nice
2010-11-23 23:39:38 < Jumpyshoes> o, can i use that?
2010-11-23 23:39:57 < Dark_Shikari> pengvado: so, why do we need 32-bit dct coefficients for 10-bit?  I just did a dct between min pixel value and max pixel value, and checkasm still passed
2010-11-23 23:40:03 < Dark_Shikari> Or is it because of idct scaling factors?
2010-11-23 23:40:15 < Dark_Shikari> i.e. I assume I'm missing sometihng
2010-11-23 23:40:16 < Dark_Shikari> Jumpyshoes: go check it out
2010-11-23 23:40:23 < Dark_Shikari> it's probably already used for exactly this purpose!
2010-11-23 23:40:53 < Jumpyshoes> yea
2010-11-23 23:41:02 < Jumpyshoes> does exactly what you pasted
2010-11-23 23:41:33 < Jumpyshoes> should i use pb_0 or can i just use 0?
2010-11-23 23:41:57 < Jumpyshoes> or i guess i could define a pw_pixel_min
2010-11-23 23:41:58 < Dark_Shikari> if you already have a zero register, use it
2010-11-23 23:42:05 < Dark_Shikari> guess what, you do
2010-11-23 23:42:18 < Dark_Shikari> m7
2010-11-23 23:42:24 < Jumpyshoes> right, xor'd it up top
2010-11-23 23:42:40 < Jumpyshoes> CLIPW dctcoeffreg1, m7, pw_pixel_max ;m7 = 0
2010-11-23 23:42:40 < Jumpyshoes> CLIPW dctcoeffreg3, m7, pw_pixel_max
2010-11-23 23:42:50 < Dark_Shikari> load pw_pixel_max into a register to avoid loading it twice
2010-11-23 23:43:42 < Jumpyshoes> movq tmp1, pw_pixel_max
2010-11-23 23:43:43 < Jumpyshoes> CLIPW dctcoeffreg1, m7, tmp1 ;m7 = 0
2010-11-23 23:43:43 < Jumpyshoes> CLIPW dctcoeffreg3, m7, tmp1
2010-11-23 23:44:00 < Dark_Shikari> yes
2010-11-23 23:44:25 < Jumpyshoes> okay, now i need to store it
2010-11-23 23:44:50 < Dark_Shikari> movq outputrow1, dctcoeffreg1
2010-11-23 23:44:54 < Dark_Shikari> movhps outputrow1, dctcoeffreg1
2010-11-23 23:44:56 < Dark_Shikari> etc
2010-11-23 23:44:58 < Dark_Shikari> er
2010-11-23 23:45:01 < Dark_Shikari> movhps outputrow2, dctcoeffreg1
2010-11-23 23:45:06 < Dark_Shikari> where outputrow1 is a memory pointer, etc
2010-11-23 23:45:45 < Jumpyshoes> oh, that is handy
2010-11-23 23:45:53 < Dark_Shikari> movq and movhps work in both directions
2010-11-23 23:45:54 < Dark_Shikari> loading and storing
2010-11-23 23:46:01 < j-b> reid_: pong
2010-11-23 23:46:50 < Jumpyshoes> movq [r0+0*FDEC_STRIDE], dctcoeffreg1
2010-11-23 23:46:50 < Jumpyshoes> movhps [r0+4*FDEC_STRIDE], dctcoeffreg1
2010-11-23 23:46:50 < Jumpyshoes> movq [r0+8*FDEC_STRIDE], dctcoeffreg3
2010-11-23 23:46:50 < Jumpyshoes> movhps [r0+12*FDEC_STRIDE], dctcoeffreg3
2010-11-23 23:46:59 < Dark_Shikari> think about where you're outputting
2010-11-23 23:47:08 < Dark_Shikari> and look at the low bit depth version.
2010-11-23 23:47:35 < Jumpyshoes> oh right, you're adding the FDEC_STRIDE
2010-11-23 23:47:54 < Jumpyshoes> bleh, dinner, bbl
2010-11-23 23:48:00 < Dark_Shikari> damn east coasters
2010-11-23 23:48:12 < Jumpyshoes> but i just need to change it to 0,1,2,3?
2010-11-23 23:53:06 < Dark_Shikari> yes
--- Day changed Wed Nov 24 2010
2010-11-24 00:02:55 < pengvado> Dark_Shikari: I don't know why your test passes.
2010-11-24 00:03:42 < pengvado> fdct8's dc is the sum of 2^6 samples, each of which is an 11 bit different between two pixels. total: 17 bits.
2010-11-24 00:04:00 < pengvado> and I think some of the ac coefs have another bit of expansion.
2010-11-24 00:04:55 < pengvado> do we need to generate some maximal edge cases, like we do for hadamard?
2010-11-24 00:05:03 < Dark_Shikari> I think that would be a good idea
2010-11-24 00:06:51 < pengvado> btw, some of the butterfly passes could still be 16bit
2010-11-24 00:07:36 < Dark_Shikari> yeah probably
2010-11-24 00:17:24 < Dark_Shikari> (patches welcome)
2010-11-24 00:22:08 < Dark_Shikari> Jumpyshoes: does it pass checkasm yet? =p
2010-11-24 00:22:44 < rfw> Dark_Shikari: i'm sorta not regretting deleting everything now
2010-11-24 00:22:49 < Dark_Shikari> lol
2010-11-24 00:22:59 < rfw> fixed all the weird ass conventions i had before
2010-11-24 00:23:07 < Dark_Shikari> "I told you so"
2010-11-24 00:23:10 < rfw> lol
2010-11-24 00:23:16 < rfw> should i delete it again when i finish this :D
2010-11-24 00:23:18 < Dark_Shikari> Now commit early and often!
2010-11-24 00:23:20 < Dark_Shikari> and push too
2010-11-24 00:23:25 < rfw> Total 14 (delta 6), reused 0 (delta 0)
2010-11-24 00:23:26 < rfw> To git@github.com:rofflwaffls/digress.git
2010-11-24 00:23:26 < rfw>    2ee921b..12af40a  master -> master
2010-11-24 00:23:28 < rfw> :D
2010-11-24 00:23:56 < Dark_Shikari> I have saved another poor soul from data loss despair with github!
2010-11-24 00:24:11 < rfw> i usually do commit everything to github
2010-11-24 00:24:14 < rfw> not sure why i didn't with this
2010-11-24 00:26:39 < espes> Dark_Shikari: So like, I requested that GCI task.
2010-11-24 00:26:46 < Dark_Shikari> which one?
2010-11-24 00:26:51 < espes> Dark_Shikari: filter.
2010-11-24 00:26:57 < Dark_Shikari> That task is repeatable
2010-11-24 00:27:08 < Dark_Shikari> i.e. if you say you want to do it, you can go do it, no problem
2010-11-24 00:27:22 < Dark_Shikari> I'll go accept you, which filter do you want to do?
2010-11-24 00:27:48 < Dark_Shikari> j-b: please approve my added duplicates of the filter task
2010-11-24 00:28:00 < espes> Dark_Shikari: I was thinking something basic. 2xsai maybe.
2010-11-24 00:28:31 < Dark_Shikari> we already have a resizer, though "interesting" resizing algorithms might be, well, interesting
2010-11-24 00:28:53 < espes> Dark_Shikari: The justification is that you do have tasvideos linked from your homepage :P
2010-11-24 00:28:53 < Dark_Shikari> hq2x/3x/4x might be interesting, though if I recall correctly those are written in pure asm and thus extremely scary
2010-11-24 00:28:58 < Dark_Shikari> tasvideos?
2010-11-24 00:29:13 < Dark_Shikari> Oh, wow, that list hasn't been updated in a mlilion years
2010-11-24 00:29:14 < espes> Dark_Shikari: Video game speed runs.
2010-11-24 00:29:15 < Dark_Shikari> I guess they are listed
2010-11-24 00:29:40 < Dark_Shikari> well, it is fitting, considering that my x264 demos page consists almost entirely of video game clips
2010-11-24 00:29:43 < Dark_Shikari> e.g. http://x264.nl/developers/Dark_Shikari/Flash/extra.html
2010-11-24 00:30:52 < espes> Dark_Shikari: besides, it's easy enough to rip 2xsai out of mplayer.
2010-11-24 00:31:16 < Dark_Shikari> didn't even know mplayer did
2010-11-24 00:31:37 < Dark_Shikari> sounds cool enough, I'll go for that
2010-11-24 00:31:45 < espes> Dark_Shikari: sweet.
2010-11-24 00:31:48 < Dark_Shikari> just keep in mind...
2010-11-24 00:31:55 < Dark_Shikari> all x264 tasks must go through pengvado
2010-11-24 00:31:59 < Dark_Shikari> that is, he must approve your code
2010-11-24 00:32:10 < Dark_Shikari> which means you must pass the code review =p
2010-11-24 00:32:37 < pengvado> and pengvado generally disapproves of forking libavfilter
2010-11-24 00:33:18 < Dark_Shikari> good point.
2010-11-24 00:33:35 < kemuri-_9> hmm someone's doing a filter?
2010-11-24 00:33:36 < Dark_Shikari> actually, pengvado, why didn't you say that earlier? =p
2010-11-24 00:33:56 < pengvado> didn't I say it enough times, whenever anyone has proposed to add any filter to x264?
2010-11-24 00:34:06 < Dark_Shikari> You never reviewed any filters.
2010-11-24 00:34:14 < Dark_Shikari> pad, hqdn3d, etc are sitting around unreviewed
2010-11-24 00:34:36 < Dark_Shikari> I'd be fine with 2xsai being ported to libavfilter though!
2010-11-24 00:34:48 < Dark_Shikari> as long as someone writes an interface to transparently access libavfilter filters in x264
2010-11-24 00:34:52 < Dark_Shikari> but someone can do that later
2010-11-24 00:35:27 < Dark_Shikari> pengvado: I'd really like you to say something certain here on this topic
2010-11-24 00:35:33 < pengvado> ok, so I only complained about yadif and h1dn3d
2010-11-24 00:35:44 < Dark_Shikari> like "I don't want any more filters in x264"
2010-11-24 00:35:50 < Dark_Shikari> or "I won't review any more filters in x264, but you can commit them"
2010-11-24 00:35:59 < Dark_Shikari> or "let's just stuff everything in libavfilter and dump the ones in x264"
2010-11-24 00:36:13 < pengvado> let's just stuff everything in libavfilter
2010-11-24 00:36:16 < Dark_Shikari> for espes, it doesn't really matter -- x264 or libavfilter are both easy enough
2010-11-24 00:36:40 < kemuri-_9> <_<
2010-11-24 00:36:41 < Dark_Shikari> ok, good, we'll swap the tasks over to libavfilter then
2010-11-24 00:36:48 < Dark_Shikari> Should have said this 6 months ago, slowpoke.
2010-11-24 00:37:32 < Dark_Shikari> now, pengvado, are you going to volunteer to do that?
2010-11-24 00:37:36 < Dark_Shikari> because nobody else wants to.
2010-11-24 00:37:49 < Dark_Shikari> that is, we have one primary problem: there is no depth filter in libavfilter
2010-11-24 00:37:51 < Dark_Shikari> and will probably never be
2010-11-24 00:37:56 < Dark_Shikari> and we need that filter no matter what
2010-11-24 00:38:25 < Dark_Shikari> so we need at least some of the x264 filter chain at some point.
2010-11-24 00:38:44 < pengvado> never will be because libav* doesn't support anything other than 8 and 16bit?
2010-11-24 00:38:51 < Dark_Shikari> yes
2010-11-24 00:39:06 < Dark_Shikari> and there are 100 bikesheds between here and that being reality
2010-11-24 00:39:11  * checkers waits for "I wrote a patch for that in 2006"
2010-11-24 00:39:38 < kierank> ffmpeg - "there's a patch for that"
2010-11-24 00:40:07 < rfw> http://code.google.com/p/soc/source/detail?r=57762a491b i don't think i trust google any more
2010-11-24 00:40:08 < Dark_Shikari> so I'd like a straightforward statement of what you want us to do, and what part you'll take in it
2010-11-24 00:40:14 < kemuri-_9> so what all filters are there for libavfilter atm?
2010-11-24 00:40:19 < Dark_Shikari> kemuri-_9: a lot
2010-11-24 00:40:28 < Dark_Shikari> rfw: OH GOD
2010-11-24 00:40:32 < Dark_Shikari> homebrew time/date code
2010-11-24 00:40:33 < Dark_Shikari> AGHAGHAHSDLFKJASDLFKJ
2010-11-24 00:40:46 < kierank> and they say google is nih?
2010-11-24 00:41:03 < rfw> lolol
2010-11-24 00:41:32 < rfw> god i think i should go out for a walk
2010-11-24 00:41:35 < kierank> there was something from google where they accidently used memset instead of memcpy
2010-11-24 00:41:41 < Dark_Shikari> lol what
2010-11-24 00:41:52 < pengvado> bleh, so keep the depth filter, and anything else that can't be merged into libavfilter short of really forking it.
2010-11-24 00:42:19 < Jumpyshoes> bleh, still doesn't pass checkasm
2010-11-24 00:42:19 < pengvado> any filter that *is* already in libavfilter, or could be developed there instead of here, gets a wrapper instead of a paste
2010-11-24 00:42:27 < Dark_Shikari> pengvado: wait wait
2010-11-24 00:42:30 < Dark_Shikari> should we wrap INDIVIDUAL filters
2010-11-24 00:42:31 < Dark_Shikari> or LIBAVFILTER?
2010-11-24 00:42:49 < Dark_Shikari> Jumpyshoes: pastebin the code
2010-11-24 00:43:47 < Jumpyshoes> >We are currently upgrading our software, we will return in a few minutes.
2010-11-24 00:43:49 < Jumpyshoes> hohohoho
2010-11-24 00:43:57 < Dark_Shikari> use a different pastebin
2010-11-24 00:44:14 < pengvado> you mean because if we use libavfilter's filterchain generation code, then it would support only libavfilter?
2010-11-24 00:44:30 < Dark_Shikari> no I'm just asking whether or not we should keep our filterchain code.
2010-11-24 00:44:36 < espes> Jumpyshoes: codepad
2010-11-24 00:44:37 < Dark_Shikari> or whether we should wrap each filter separately
2010-11-24 00:44:57 < pengvado> does ours do anything that libavfilter doesn't and won't?
2010-11-24 00:45:03 < Dark_Shikari> I don't know.
2010-11-24 00:45:06 < Dark_Shikari> Ask kemuri-_9
2010-11-24 00:45:25 < kemuri-_9> i don't know enough about libavfilter to say anything on that
2010-11-24 00:46:01 < Kovensky> though when the filtering system thing was started libavfilter was still vaporware
2010-11-24 00:46:07 < Kovensky> isn't it still vaporware, or is it getting somewhere now?
2010-11-24 00:46:13 < Dark_Shikari> no, it works
2010-11-24 00:46:19 < Jumpyshoes> http://privatepaste.com/2ebbfd4a33
2010-11-24 00:46:46 < Dark_Shikari> Jumpyshoes: 1) replace m7 with m6
2010-11-24 00:46:49 < Dark_Shikari> 2) replace 2,2 with 2,2,7
2010-11-24 00:47:18 < Jumpyshoes> should i xor out m6 too then?
2010-11-24 00:47:24 < Dark_Shikari> yes, I mean replace all instances
2010-11-24 00:47:29 < Dark_Shikari> this won't affect correctness
2010-11-24 00:47:30 < Jumpyshoes> kk
2010-11-24 00:48:09 < bcoudurier> hi guys
2010-11-24 00:48:10 < pengvado> verdict: getting rid of duplicate code is good, but O(1) code for filterchain generation is less important than all of the individual filters
2010-11-24 00:48:11  * Dark_Shikari summons bcoudurier 
2010-11-24 00:48:41 < Dark_Shikari> Jumpyshoes: looks largely right.  this means it's debugging time ;)
2010-11-24 00:48:58 < Jumpyshoes> sigh
2010-11-24 00:49:09 < Dark_Shikari> see TEST_IDCT in checkasm.c
2010-11-24 00:49:16 < Dark_Shikari>         if( memcmp( pbuf3, pbuf4, 32*32 * sizeof(pixel) ) ) \
2010-11-24 00:49:22 < Dark_Shikari> before that line, print out the contents of pbuf3 and pbuf4, and compare
2010-11-24 00:49:42 < Dark_Shikari> this will help you find your ug
2010-11-24 00:49:45 < Dark_Shikari> *bug
2010-11-24 00:49:46 < Jumpyshoes> what are pbuf3 and pbuf4?
2010-11-24 00:49:56 < Jumpyshoes> strings?
2010-11-24 00:50:02 < Dark_Shikari> pixel buffers
2010-11-24 00:50:04 < Dark_Shikari> just memory.
2010-11-24 00:50:07 < Jumpyshoes> ah
2010-11-24 00:50:28 < Jumpyshoes> how do i print just a block of memory?
2010-11-24 00:50:35 < Dark_Shikari> pengvado: so this means....
2010-11-24 00:50:37 < Dark_Shikari> Jumpyshoes:
2010-11-24 00:50:49 < Dark_Shikari> oh yeah, and you should print this inside the if where it fails
2010-11-24 00:50:52 < Dark_Shikari> so it only prints for the broken one
2010-11-24 00:50:55 < Dark_Shikari> for( int y = 0; y < 4; y++ )
2010-11-24 00:51:01 < Dark_Shikari> for( int x = 0; x < 4; x++ )
2010-11-24 00:51:08  * kemuri-_9 sees a vf_scale and looks
2010-11-24 00:51:23 < Dark_Shikari> printf("%d ",pbuf3[x+y*FEC_STRIDE])
2010-11-24 00:51:25 < Dark_Shikari> *FDEC_STRIDE
2010-11-24 00:51:28 < Dark_Shikari> plus whatever line breaks you need
2010-11-24 00:53:37 < pengvado> if there is something that libavfilter's framework doesn't support, and bikeshedding prevents patching libavfilter, and I can't just commit it to libavfilter anyway, then I'll tolerate a duplicate filterchain generator
2010-11-24 00:54:20 < Dark_Shikari> wrappers allows us to make the syntax less retarded while avoiding duplicating internal code
2010-11-24 00:54:29 < Dark_Shikari> now, the real problem here is just one of getting people to implement things
2010-11-24 00:54:36 < Dark_Shikari> without someone willing to do it, it won't happen
2010-11-24 00:54:46 < Dark_Shikari> and I don't want to sit around for 1 year with a half-completed filter system
2010-11-24 00:54:50 < Dark_Shikari> that half-duplicates half-code
2010-11-24 00:54:52 < Dark_Shikari> in a half-assed manner
2010-11-24 00:55:05 < Jumpyshoes> hrm, so the first row is right, everything from there dies
2010-11-24 00:55:32 < Dark_Shikari> the exact way in which it's wrong is often useful.
2010-11-24 00:55:46 < Dark_Shikari> *useful to know
2010-11-24 00:56:05 < bcoudurier> pengvado, I'll personally fight against any bikeshedding
2010-11-24 00:56:29 < Dark_Shikari> kemuri-_9 / J_Darnley since you're the people actually writing this stuff, I'd like to know your thoughts
2010-11-24 00:56:39 < Dark_Shikari> you can get the 16-bit hqdn3d committed to libavfilter
2010-11-24 00:56:51 < Jumpyshoes> http://privatepaste.com/b452d32e20 first row seems right, everything else seems to be random
2010-11-24 00:57:08 < Dark_Shikari> Jumpyshoes: oh duh
2010-11-24 00:57:11 < Dark_Shikari> 0*FDEC_STRIDE
2010-11-24 00:57:13 < Dark_Shikari> 2*FDEC_STRIDE
2010-11-24 00:57:14 < Dark_Shikari> 4*FDEC_STRIDE
2010-11-24 00:57:16 < Dark_Shikari> not 0/1/2/3
2010-11-24 00:57:20 < Dark_Shikari> because the pixels are 2 bytes each
2010-11-24 00:57:24 < Jumpyshoes> derp
2010-11-24 00:57:30 < Dark_Shikari> I should have caught that dmanit.
2010-11-24 00:57:39 < Jumpyshoes> i shoulda too
2010-11-24 00:57:45 < Jumpyshoes> i considered that for a very long time
2010-11-24 00:57:51 < Jumpyshoes> when i was using the wrong STORE_DIFF
2010-11-24 00:58:12 < Jumpyshoes> x264: All tests passed Yeah :)
2010-11-24 00:58:19 < kemuri-_9> "AVFilterPad type. Only video supported now, hopefully someone will add audio in the future." <--- lol, this concept looks familiar
2010-11-24 00:58:38 < pengvado> I didn't comment on it because I thought Dark_Shikari noticed the error and was socratically questioning Jumpyshoes
2010-11-24 00:58:56 < Jumpyshoes> ;-;
2010-11-24 00:59:13 < Dark_Shikari> LOL
2010-11-24 00:59:16 < Kovensky> lol
2010-11-24 00:59:34 < Kovensky> Dark_Shikari: loren.html?
2010-11-24 00:59:42 < Dark_Shikari> ohdear.jpg
2010-11-24 00:59:46 < Jumpyshoes> okay, should i macro-ize this, or leave it as is?
2010-11-24 00:59:59 < Jumpyshoes> i feel like if i mess with it, it'll blow up beyond all repair and i will be sad
2010-11-24 01:00:30 < Dark_Shikari> make a macro right above your function
2010-11-24 01:00:32 < Dark_Shikari> called STORE_DIFFx2
2010-11-24 01:00:45 < Dark_Shikari> it should do two rows worth
2010-11-24 01:00:50 < Dark_Shikari> so then call it twice.
2010-11-24 01:01:05 < Jumpyshoes> sigh, very well
2010-11-24 01:01:13 < Dark_Shikari> You'll be writing your first macro!
2010-11-24 01:01:36 < Jumpyshoes> i will!
2010-11-24 01:02:48 < bcoudurier> kemuri-_9 audio is added right now
2010-11-24 01:04:10 < kemuri-_9> I have no current problems with focusing filtering into libavfilter, it's just more difficult for me as i wrote the current filtering system for x264cli and now i have to learn a completely different one.
2010-11-24 01:05:21 < Dark_Shikari> kemuri-_9: if you want we can keep the x264 filter chain and just outsource individual filters
2010-11-24 01:05:25 < Dark_Shikari> the main issue is:
2010-11-24 01:05:29 < Dark_Shikari> a) libavfilter is getting lots of free development with cool filters
2010-11-24 01:05:41 < Dark_Shikari> b) bcoudurier has been yelling at me like a chicken with its head cut off
2010-11-24 01:05:44 < Dark_Shikari> c) pengvado has spoken
2010-11-24 01:06:12 < kierank> the downside is all the bikeshedding
2010-11-24 01:06:43 < bcoudurier> I hear you
2010-11-24 01:07:09 < bcoudurier> bikeshedding is terrible, I'll do something about it
2010-11-24 01:07:20 < kemuri-_9> at the bare minimum i agree with using what's available in libavfilter in some form or fashion (whether we ditch the current system or wrap it), we'll have to discuss putting new filters into libavfilter as they come along and see if libavfilter will accept them or say 'gtfo'
2010-11-24 01:07:49 < Dark_Shikari> so, bcoudurier, you can get people to help mentor for libavfilter?
2010-11-24 01:07:54 < Dark_Shikari> if so, we'll also need them to mentor kemuri-_9 =p
2010-11-24 01:08:01 < kemuri-_9> :<
2010-11-24 01:08:05 < kemuri-_9> sadly
2010-11-24 01:08:08 < kierank> and can the bitdepth problem be sorted out ;)
2010-11-24 01:08:35 < Dark_Shikari> we're keeping the depth filter for now
2010-11-24 01:08:40 < kierank> the advantage to the x264 filter system was it didn't try to be a jack of all trades
2010-11-24 01:08:43 < Dark_Shikari> i.e. libavfilter will still do 8-bit and 16-bit
2010-11-24 01:08:46 < Dark_Shikari> and not 10-bit
2010-11-24 01:08:51 < Dark_Shikari> and the depth filter will convert for us
2010-11-24 01:08:59 < Dark_Shikari> kierank: we can keep doing that, actually
2010-11-24 01:09:02 < Dark_Shikari> if we just wrap filters
2010-11-24 01:09:07 < Dark_Shikari> then the filters will only do what we let them do
2010-11-24 01:09:50 < bcoudurier> yes, I can mentor as well, and we'll be glad to do so
2010-11-24 01:09:59 < bcoudurier> kemuri, ask me any question
2010-11-24 01:11:07 < kemuri-_9> the only current problem i see immediately is the csp difference between libavfilter and the x264cli filter system, e.g. x264cli recognizes some csps that libavfilter does not (and technically vice versa)
2010-11-24 01:12:23 < kemuri-_9> bcoudurier: thanks i'll ask when the time comes, which is not largely now, i have other things i want to get done for x264 before really messing with libavfilter
2010-11-24 01:12:41 < Dark_Shikari> bcoudurier: we do want to smuggle students into ffmpeg for GCI though
2010-11-24 01:12:43 < Dark_Shikari> and that's really near term
2010-11-24 01:12:48 < Dark_Shikari> GCI literally just started and I got fucking flooded today
2010-11-24 01:13:37 < kemuri-_9> how's that going to work, ffmpeg didn't get accepted by GCI right?
2010-11-24 01:13:52 < Dark_Shikari> exactly
2010-11-24 01:13:56 < Dark_Shikari> and we want video filters in x264
2010-11-24 01:13:56 < rfw> i assume it's something like "hay kid wanna work in ffmpeg"
2010-11-24 01:14:06 < Dark_Shikari> so everyone who "does a filter" for x264
2010-11-24 01:14:09 < Dark_Shikari> will instead port to libavfilter!
2010-11-24 01:14:11 < Dark_Shikari> problem solved.
2010-11-24 01:14:19 < Dark_Shikari> trivial.
2010-11-24 01:14:34 < Dark_Shikari> We intended to do this from the start -- the instant ffmpeg wasn't accepted, we were going to add Videolan/x264 tasks that were just to work on ffmpeg.
2010-11-24 01:14:37 < Dark_Shikari> It's perfectly legit =p
2010-11-24 01:14:49 < rfw> apart from the fact it's not videolan? :D
2010-11-24 01:14:51 < kemuri-_9> seems like cheating the system but not like i a real voice in the issue
2010-11-24 01:14:58 < kemuri-_9> i have a*
2010-11-24 01:15:00 < kierank> boo hoo
2010-11-24 01:15:02 < kierank> google is so sad
2010-11-24 01:15:04 < Dark_Shikari> videolan uses ffmpeg
2010-11-24 01:15:04 < kierank> ;)
2010-11-24 01:15:06 < Dark_Shikari> x264 uses ffmpeg
2010-11-24 01:15:08 < Dark_Shikari> I think it's legit
2010-11-24 01:15:09 < Dark_Shikari> =p
2010-11-24 01:15:14 < rfw> yes but ffmpeg isn't part of videolan :3
2010-11-24 01:15:28 < Dark_Shikari> It might as well be, videolan is going to host ffmpeg's source now
2010-11-24 01:15:29 < kierank> Dark_Shikari: I think it's legit
2010-11-24 01:15:34 < kierank> x264 needs filters
2010-11-24 01:15:35 < Dark_Shikari> they're moving to git, and videolan will host it
2010-11-24 01:15:41 < rfw> ah
2010-11-24 01:15:49 < kierank> we decide that we're going to use libavfilter because of duplicated work
2010-11-24 01:15:50 < rfw> all hail git \o/
2010-11-24 01:15:52 < kierank> that's it
2010-11-24 01:15:53 < Dark_Shikari> kierank: yup
2010-11-24 01:16:03 < Dark_Shikari> seems fine to me.
2010-11-24 01:17:36 < rfw> after writing so many finally clauses i think i see why people use finally
2010-11-24 01:17:51 < Dark_Shikari> you should have started that sentence with "finally"
2010-11-24 01:18:04 < rfw> s/after/finally after/
2010-11-24 01:18:09 < rfw> problem solved
2010-11-24 01:18:50 < rfw> now how did i do my bisection again
2010-11-24 01:18:55 < Dark_Shikari> magic
2010-11-24 01:19:09 < rfw> i hate magic :(
2010-11-24 01:20:16 < Jumpyshoes> Dark_Shikari, I AM DONE
2010-11-24 01:20:30 < Dark_Shikari> woohoooo
2010-11-24 01:20:35 < Dark_Shikari> now I just have to harass irock to fix his shit
2010-11-24 01:20:40 < JEEBsv> lol
2010-11-24 01:20:43 < Jumpyshoes> now what do i do
2010-11-24 01:20:46 < Dark_Shikari> and harass pengvado to make a checkasm test that demosntrates that irock's shit is broken
2010-11-24 01:20:50 < Dark_Shikari> Jumpyshoes: post your patch!
2010-11-24 01:21:00 < Jumpyshoes> do a diff?
2010-11-24 01:21:07 < Dark_Shikari> yes
2010-11-24 01:21:17 < Dark_Shikari> then I will need from you three things!
2010-11-24 01:21:49 < Jumpyshoes> ?
2010-11-24 01:21:50 < Dark_Shikari> Your name, your email, and a response to the thing I send you via email!
2010-11-24 01:22:18 < kemuri-_9> CLA powar
2010-11-24 01:22:24 < Jumpyshoes> i should probably test the 8-bit
2010-11-24 01:22:31 < Jumpyshoes> to make sure i didn't massively f something else up
2010-11-24 01:22:49 < Dark_Shikari> Yup, of course
2010-11-24 01:22:52 < Dark_Shikari> and post the patch so I can review it!
2010-11-24 01:22:57 < Dark_Shikari> I will nitpick it in minor ways!
2010-11-24 01:23:01 < Dark_Shikari> lol
2010-11-24 01:24:07 < Jumpyshoes> after i test 8-bit i'll post
2010-11-24 01:25:15 < Jumpyshoes> great
2010-11-24 01:25:27 < Jumpyshoes> http://pastebin.com/nzybLAmG inb4 i get my ass handed to me
2010-11-24 01:26:11 < Dark_Shikari> email sent
2010-11-24 01:26:17 < Dark_Shikari> now for review!
2010-11-24 01:26:51 < Dark_Shikari> 266/269/271: drop the extra spaces
2010-11-24 01:27:03 < Dark_Shikari> in that whole macro, align everything so that the commas are aligned
2010-11-24 01:27:08 < Dark_Shikari> 272: extra spaces, etc
2010-11-24 01:27:14 < Dark_Shikari> 293 has extra spaces too
2010-11-24 01:27:23 < Dark_Shikari> does your editor have weird tabbing or something?
2010-11-24 01:27:27 < Dark_Shikari> fyi we use 4-space tabs
2010-11-24 01:27:30 < Jumpyshoes> oh, yea
2010-11-24 01:27:31 < Dark_Shikari> 306-307: pointless change
2010-11-24 01:27:33 < Jumpyshoes> notepad++ defaults to single tab
2010-11-24 01:27:47 < Dark_Shikari> fix that, and find all your tabs and fix them
2010-11-24 01:27:51 < Dark_Shikari> 595: pointless change
2010-11-24 01:27:55 < Dark_Shikari> ok, first pass done.
2010-11-24 01:27:57 < Dark_Shikari> fix those and post it again
2010-11-24 01:28:00 < Jumpyshoes> kk
2010-11-24 01:28:50 < Jumpyshoes> oh btw
2010-11-24 01:28:52 < Dark_Shikari> ?
2010-11-24 01:28:54 < Jumpyshoes> i don't think this is an updated verion of x264
2010-11-24 01:29:00 < Dark_Shikari> what do you mean
2010-11-24 01:29:00 < espes> Dark_Shikari: so, what, am I still writing an x264 filter?
2010-11-24 01:29:05 < Jumpyshoes> since i pulled 17xx and didn't bother to pull again
2010-11-24 01:29:10 < Dark_Shikari> espes: you'll be writing a libavfilter filter -- much the same thing
2010-11-24 01:29:14 < Dark_Shikari> Jumpyshoes: ok, here's how you do this
2010-11-24 01:29:16 < espes> Dark_Shikari: right.
2010-11-24 01:29:17 < Dark_Shikari> git commit -a
2010-11-24 01:29:21 < Dark_Shikari> <type in your commit message, etc, etc>
2010-11-24 01:29:24 < Dark_Shikari> git pull --rebase
2010-11-24 01:29:26 < Dark_Shikari> now you have the latest version
2010-11-24 01:29:33 < Dark_Shikari> then, to modify your commit, git commit -a --amend
2010-11-24 01:29:37 < Dark_Shikari> to make a diff
2010-11-24 01:29:41 < Dark_Shikari> git format-patch HEAD~1 --stdout > file.
2010-11-24 01:29:48 < Dark_Shikari> you can also add your authorship to the patch
2010-11-24 01:30:05 < Dark_Shikari> by adding --author="My Name <myemail@email.com>" to git commit
2010-11-24 01:30:10 < Dark_Shikari> which will list your name on it
2010-11-24 01:30:41 < Dark_Shikari> espes: bcoudurier here is responsible for getting someone to help you with this.  he's a cool french dude.
2010-11-24 01:31:03 < espes> Dark_Shikari: ehh, I think libavfilter already supports loading libmpcodec filters, so it'll already be available. :\
2010-11-24 01:31:24 < astrange> it doesn't
2010-11-24 01:31:39 < Jumpyshoes> out of curiosity, why is ftp://ftp.videolan.org/pub/videolan/x264/snapshots/ dead?
2010-11-24 01:32:15 < Dark_Shikari> Jumpyshoes: blame j-b
2010-11-24 01:32:16 < Dark_Shikari> it's all j-b's fault
2010-11-24 01:32:22 < astrange> a lot of libmpcodec filters are a mess anyway (timestamps are broken, only x86 is supported properly, no code reviews since forever)
2010-11-24 01:32:32 < Dark_Shikari> Yeah, we want to move libmpcodecs into libavfilter anyways
2010-11-24 01:32:38 < Dark_Shikari> libmpcodecs is part of mplayer and therefore rather dead
2010-11-24 01:32:46 < Jumpyshoes> i don't actually know what i did in the useless changes
2010-11-24 01:32:56 < Dark_Shikari> Jumpyshoes: shit happens, we do it too
2010-11-24 01:33:09 < espes> Ok, so I move the filter from libmpcodec to libavfilter and make the code not suck.
2010-11-24 01:33:14 < Dark_Shikari> Basically that's it.
2010-11-24 01:33:20 < Dark_Shikari> And 4 points per filter.
2010-11-24 01:33:40 < Dark_Shikari> Pick your favorites.
2010-11-24 01:40:07 < Jumpyshoes> oh damn, my diff doesn't show anything anymor
2010-11-24 01:40:07 < Jumpyshoes> e
2010-11-24 01:40:51 < JEEBsv> you committed it?
2010-11-24 01:41:05 < JEEBsv> then git format-patch HEAD~1 if it's just one commit
2010-11-24 01:41:57 < Dark_Shikari> brb, I'm going to grab dinner
2010-11-24 01:42:00 < Jumpyshoes> i mean, all my changes are committed already, so diff isn't finding anything <_<
2010-11-24 01:42:06 < Jumpyshoes> or something
2010-11-24 01:42:10 < astrange> http://pastebin.com/EdLuGCN7 those are the useful filters, i think
2010-11-24 01:42:12 < Dark_Shikari> use format-patch
2010-11-24 01:42:15 < Dark_Shikari> espes: ^
2010-11-24 01:42:27 < astrange> actually ass is the most useful filter. but i don't think you could do that one
2010-11-24 01:42:31 < Dark_Shikari> lol
2010-11-24 01:42:31 < JEEBsv> Jumpyshoes: point on HEAD~1 (or other number depending on how many commits)
2010-11-24 01:43:01 < JEEBsv> thus, git format-patch HEAD~1 will produce you one patch :3 (or the stdout way)
2010-11-24 01:43:13 < JEEBsv> if you have more commits, f.ex. three -- HEAD~3
2010-11-24 01:43:33 < espes> astrange: I'm told gradfun doesn't really work as a preprocessing filter...
2010-11-24 01:43:44 < espes> (At least that's the note in the mplayer's implementation)
2010-11-24 01:43:47 < Jumpyshoes> oh, so i actually have two commits
2010-11-24 01:44:00 < Jumpyshoes> how do i compare my current with two commits ago?
2010-11-24 01:44:19 < astrange> it sometimes works on bad inputs that spp/pp leave blocks on, but yes
2010-11-24 01:44:22 < JEEBsv> git diff HEAD~2 ?
2010-11-24 01:44:31 < Jumpyshoes> that gives 2 patches
2010-11-24 01:44:38 < Jumpyshoes> so D_S can't see my changes
2010-11-24 01:44:53 < JEEBsv> use the stdout way of format-patch
2010-11-24 01:45:02 < JEEBsv> and thus concat it into one
2010-11-24 01:45:10 < Jumpyshoes> ah
2010-11-24 01:45:15 < Jumpyshoes> thanks
2010-11-24 01:46:27 < espes> astrange: isn't colorspace conversion already done by the scale filter?
2010-11-24 01:47:12 < Jumpyshoes> Dark_Shikari: http://pastebin.com/YASKQLHV
2010-11-24 01:47:32 < espes> But if you think shelling out to swscale isn't really optimal :\
2010-11-24 01:48:10 < astrange> nope, the issue is ignored. but that wouldn't be a bad idea
2010-11-24 01:51:22 < JEEBsv> espes: gradfun has its uses even before encoding :) Just that it indeed needs more bits. Thus if the player can fix the banding, it's better of course.
2010-11-24 01:53:39 < Jumpyshoes> speaking of filters
2010-11-24 01:53:49 < Jumpyshoes> whatever happened to halomaker3000deluxe
2010-11-24 01:54:14 < JEEBsv> lol
2010-11-24 01:54:42 < rfw> python's documentation is quite questionable in places
2010-11-24 01:54:47 < rfw> parser.add_option("-q", "--quiet",
2010-11-24 01:54:47 < rfw>                   action="store_false", dest="verbose",
2010-11-24 01:54:47 < rfw>                   help="be vewwy quiet (I'm hunting wabbits)")
2010-11-24 01:54:52 < rfw> i mean, what.
2010-11-24 01:55:23 < Jumpyshoes> i eat rabbits
2010-11-24 01:55:58 < rfw> lol
2010-11-24 01:56:07 < Jumpyshoes> i have actually eaten a rabbit before
2010-11-24 01:56:19 < rfw> same
2010-11-24 01:56:48 < rfw> Dark_Shikari: i think i'm finished (lolagain)
2010-11-24 01:56:55 < rfw> do i have to implement linear revisions again
2010-11-24 01:57:45 < JEEBsv> Jumpyshoes: I think they made a new filter adding pretty fabulous rainbows on #darkhold IIRC
2010-11-24 01:58:06 < Jumpyshoes> what
2010-11-24 01:58:14 < Jumpyshoes> HM3KD WITH RAINBOWS?
2010-11-24 01:58:18 < Jumpyshoes> WHERE THE FUCK CAN I FIND THIS?
2010-11-24 01:58:32 < rfw> i wrote an hlsl shader that rainbowed everything
2010-11-24 01:58:41 < JEEBsv> You'll have to ask on #darkhold , I don't remember who wrote it atm
2010-11-24 01:58:59 < Jumpyshoes> darn
2010-11-24 01:59:00 < Jumpyshoes> this rizon?
2010-11-24 01:59:07 < JEEBsv> yah, rizon
2010-11-24 01:59:08 < rfw> why are we all on rizon
2010-11-24 01:59:25 < rfw> half of this channel is on rizon
2010-11-24 01:59:27 < rfw> :(
2010-11-24 02:05:21 < Dark_Shikari> espes: gradfun2db WOULD be useful as a preprocessing filter with x264's 10-bit
2010-11-24 02:05:36 < Dark_Shikari> testing has shown this
2010-11-24 02:05:54 < Dark_Shikari> Jumpyshoes: you need to merge your patches
2010-11-24 02:05:59 < Dark_Shikari> you inadvertantly made two local commits instead of one
2010-11-24 02:06:01 < Jumpyshoes> how do i do that?
2010-11-24 02:06:04 < Dark_Shikari> if you do git rebase -i HEAD~2
2010-11-24 02:06:06 < Jumpyshoes> oh
2010-11-24 02:06:10 < Dark_Shikari> you can change one of your commits to "squash" from "pick"
2010-11-24 02:06:15 < Dark_Shikari> and it will squash the commits together
2010-11-24 02:06:22 < Dark_Shikari> remember, to edit a commit, use --amend
2010-11-24 02:06:56 < rfw> almost rm -rf'd digress there
2010-11-24 02:06:57 < rfw> whoops
2010-11-24 02:07:17 < Dark_Shikari> you should be banned from using rm
2010-11-24 02:07:27 < rfw> lol
2010-11-24 02:07:30 < rfw> well i'm on windows
2010-11-24 02:07:34 < rfw> so del /s /q works too
2010-11-24 02:07:44 < Dark_Shikari> don't use that either
2010-11-24 02:08:02 < rfw> well i think i'm done
2010-11-24 02:08:07 < JEEBsv>  /s /q isn't even easy to write :/
2010-11-24 02:08:12 < Dark_Shikari> linear revisions? =p
2010-11-24 02:08:16 < rfw> do i have to
2010-11-24 02:08:17 < rfw> ;_;
2010-11-24 02:08:23 < Dark_Shikari> ok then don't
2010-11-24 02:08:28 < rfw> \o/
2010-11-24 02:08:29 < Dark_Shikari> format-patch please :)
2010-11-24 02:08:33 < Dark_Shikari> don't forget to git add
2010-11-24 02:08:57 < rfw> hold on
2010-11-24 02:09:01 < rfw> let me write a setuptools package
2010-11-24 02:09:26 < Jumpyshoes> wait, how do i merge commits?
2010-11-24 02:09:31 < rfw> git merge?
2010-11-24 02:09:33 < Dark_Shikari> use git rebase -i as I said
2010-11-24 02:09:38 < Dark_Shikari> and replace "pick" for one with "squash"
2010-11-24 02:09:41 < Dark_Shikari> read the info it gives you
2010-11-24 02:09:43 < Dark_Shikari> also, git help is your friend
2010-11-24 02:09:52 < Dark_Shikari> git rebase -i HEAD~2 will let you modify the last 2 commits
2010-11-24 02:09:56 < Dark_Shikari> one of the ways to modify them is squashing
2010-11-24 02:10:01 < rfw> my git --help launches a help browser, except it doesn't
2010-11-24 02:10:22 < Dark_Shikari> git help commandname
2010-11-24 02:10:24 < Dark_Shikari> git help rebase
2010-11-24 02:10:29 < Dark_Shikari> git help help
2010-11-24 02:13:33 < rfw> whoops
2010-11-24 02:13:38 < rfw> i filled my directory with .patch files
2010-11-24 02:13:45 < rfw> there we go
2010-11-24 02:13:48 < rfw> where do i submit this?
2010-11-24 02:13:59 < Dark_Shikari> pastebin
2010-11-24 02:14:11 < rfw> oh you wanted it in tools, right?
2010-11-24 02:14:15 < Dark_Shikari> yes
2010-11-24 02:14:22 < rfw> ah let me fix that
2010-11-24 02:15:49 < Jumpyshoes> how do i resume an interactive git rebase?
2010-11-24 02:16:21 < kemuri-_9> git rebase --continue
2010-11-24 02:16:49 < rfw> http://pastebin.com/zLbbE2KL
2010-11-24 02:17:27 < Dark_Shikari> rfw: does that include digress?
2010-11-24 02:17:30 < rfw> oh
2010-11-24 02:17:30 < rfw> no
2010-11-24 02:17:36 < rfw> but that's on my github repo
2010-11-24 02:17:36 < Dark_Shikari> well it won't run without it, will it?
2010-11-24 02:17:41 < JEEBsv> lol
2010-11-24 02:17:42 < Dark_Shikari> =p
2010-11-24 02:17:45 < rfw> so
2010-11-24 02:17:48 < rfw> put that in tools too?
2010-11-24 02:17:50 < Dark_Shikari> I guess?
2010-11-24 02:17:54 < rfw> heh
2010-11-24 02:18:01 < rfw> am i going to have to include python
2010-11-24 02:18:05 < rfw> and glibc :p
2010-11-24 02:18:54 < Dark_Shikari> lol
2010-11-24 02:18:59 < Dark_Shikari> digress isn't something you can install with apt-get =p
2010-11-24 02:19:30 < Jumpyshoes> http://pastebin.com/GBSUtExQ
2010-11-24 02:20:37 < rfw> maybe i should do a debian task next
2010-11-24 02:20:41 < rfw> and include it in their apt repositories
2010-11-24 02:20:42 < rfw> :D
2010-11-24 02:21:14 < rfw> http://pastebin.com/Atb7j4DV
2010-11-24 02:22:05 < Dark_Shikari> Jumpyshoes: in your STORE_DIFFx2, the , should all be vertically aligned
2010-11-24 02:22:07 < Dark_Shikari> the first , in each line, that is
2010-11-24 02:22:15 < Dark_Shikari> so add spaces as necessary
2010-11-24 02:22:16 < Dark_Shikari> e.g.
2010-11-24 02:22:21 < Dark_Shikari> packssdw %1, %2
2010-11-24 02:22:24 < Jumpyshoes> oh right
2010-11-24 02:22:26 < Dark_Shikari> movq     %3, %4
2010-11-24 02:22:46 < Dark_Shikari> junk spaces are still around on line 306
2010-11-24 02:22:57 < Dark_Shikari> 304 line break is needless
2010-11-24 02:23:05 < Dark_Shikari> other than that lgtm
2010-11-24 02:23:26 < Jumpyshoes> kk
2010-11-24 02:25:53 < rfw> actually Dark_Shikari
2010-11-24 02:25:59 < rfw> can't i add my repo as a submodule
2010-11-24 02:26:18 < Dark_Shikari> I would really rather not do that.
2010-11-24 02:26:23 < rfw> i guess
2010-11-24 02:26:25 < Dark_Shikari> pengvado has the final word though
2010-11-24 02:26:32 < Dark_Shikari> actually yeah, pengvado, can you comment on this in general?
2010-11-24 02:26:37 < Dark_Shikari> python regression test script of magic
2010-11-24 02:27:23 < pengvado> you mean comment on the magic part, or the python?
2010-11-24 02:28:17 < Dark_Shikari> lol
2010-11-24 02:28:25 < Dark_Shikari> python isn't magic?
2010-11-24 02:28:56 < rfw> now to delete every file recovery tool i installed
2010-11-24 02:29:13 < Dark_Shikari> lol
2010-11-24 02:29:19 < pengvado> I don't see an "import magic", so no
2010-11-24 02:29:47 < rfw> i should've probably called it magic, then
2010-11-24 02:30:04 < Dark_Shikari> pengvado: seriously though
2010-11-24 02:30:20 < Dark_Shikari> topics like "can we just dump this in tools/"
2010-11-24 02:32:08 < pengvado> I haven't been following the featurelist, but my general comment on "can we just dump a regression tester in tools/" is "yes"
2010-11-24 02:32:21 < Dark_Shikari> pengvado: any particular features you want to make sure are in there?
2010-11-24 02:32:26 < Dark_Shikari> since I imagine you might want to use it too
2010-11-24 02:33:24 < Jumpyshoes> http://pastebin.com/CEAfyzMd okay, tested w/ 8 and 10 bit
2010-11-24 02:35:13 < Dark_Shikari> er
2010-11-24 02:35:15 < Dark_Shikari> you're pxoring m7
2010-11-24 02:35:21 < Dark_Shikari> an then using m6?
2010-11-24 02:36:13 < Jumpyshoes> well, pxor was used in the other function
2010-11-24 02:36:24 < Dark_Shikari> you're pxoring m7
2010-11-24 02:36:25 < Jumpyshoes> oh wait
2010-11-24 02:36:26 < Dark_Shikari> you never use m7 again
2010-11-24 02:36:26 < Jumpyshoes> you're right
2010-11-24 02:36:27 < Jumpyshoes> i am
2010-11-24 02:36:49 < Jumpyshoes> yea, i can get rid of that
2010-11-24 02:36:53 < Dark_Shikari> I'll fix it locally
2010-11-24 02:36:54 < Jumpyshoes> the dangers of copoy pasta
2010-11-24 02:37:44 < Dark_Shikari> aha
2010-11-24 02:37:46 < Dark_Shikari> your patch fails to apply
2010-11-24 02:37:48 < Dark_Shikari> trailing whitespace
2010-11-24 02:37:52 < Dark_Shikari> lines 294, 297, 298, 299
2010-11-24 02:38:47 < Jumpyshoes> wait, what?
2010-11-24 02:38:57 < Dark_Shikari> whitespace at the end of lines
2010-11-24 02:39:04 < Dark_Shikari> git yells about them when I try to apply your patch
2010-11-24 02:39:11 < Jumpyshoes> oh
2010-11-24 02:39:14 < Jumpyshoes> you can't do that?
2010-11-24 02:39:23 < Dark_Shikari> I guess I could edit the patch manually ok
2010-11-24 02:39:29 < Jumpyshoes> sorry, i didn't know
2010-11-24 02:40:25 < kemuri-_9> notepad++ > textfx edit > trim trailing whitespace  <--- works wonders
2010-11-24 02:41:56 < Dark_Shikari> Jumpyshoes:
2010-11-24 02:41:59 < Dark_Shikari> original function: 188 cycles
2010-11-24 02:42:00 < Dark_Shikari> yours: 30
2010-11-24 02:42:13 < pengvado> kemuri-_9: does it parse diffs and not strip syntactically significant trailing whitespace?
2010-11-24 02:42:28 < Jumpyshoes> :O?
2010-11-24 02:42:29 < Dark_Shikari> pengvado: I think he meant on the code
2010-11-24 02:42:30 < Dark_Shikari> =p
2010-11-24 02:42:33 < Dark_Shikari> not the diff
2010-11-24 02:42:35 < Jumpyshoes> so speedup of about 6x?
2010-11-24 02:42:38 < Dark_Shikari> yes, for that function
2010-11-24 02:42:53 < Jumpyshoes> woah, that's cool
2010-11-24 02:43:31 < Dark_Shikari> http://pastebin.com/e0aa966g
2010-11-24 02:43:33 < kemuri-_9> yeah, on the code - should do it before making diffs/commits to catch where you might've gotten careless
2010-11-24 02:43:35 < Dark_Shikari> that's the final version with commit message
2010-11-24 02:43:44 < Dark_Shikari> I'll handle any more changes pengvado wants to make before we push it
2010-11-24 02:43:49 < astrange> s/^ +$//g
2010-11-24 02:43:58 < astrange> take out the ^
2010-11-24 02:44:09 < Jumpyshoes> woah, i'm the first GCI dude?
2010-11-24 02:44:14 < Dark_Shikari> The first to finish yes
2010-11-24 02:44:19 < pengvado> Dark_Shikari, rfw: vague wishlist: report whether output did or did not change (distinct from whether it changed enough to visibly affect bitrate/psnr). and the vaguer part: what if it's supposed to change some configurations and not others?
2010-11-24 02:44:19 < Jumpyshoes> :O
2010-11-24 02:44:22 < Jumpyshoes> i should sign up
2010-11-24 02:45:04 < Jumpyshoes> Dark_Shikari, so i just take a code an asm func task on the page?
2010-11-24 02:45:17 < Dark_Shikari> you didn't take one?  lol
2010-11-24 02:45:19 < Dark_Shikari> go take one then
2010-11-24 02:45:20 < Jumpyshoes> nope
2010-11-24 02:45:42 < Jumpyshoes> what in the world is melange? cause it's fricking slow
2010-11-24 02:46:06 < Dark_Shikari> Shit.
2010-11-24 02:46:07 < Dark_Shikari> That's what it is.
2010-11-24 02:46:26 < Jumpyshoes> sigh
2010-11-24 02:46:34 < pengvado> Jumpyshoes: 319: prototypes of highdepth-aware functions should use dctcoef* and pixel*
2010-11-24 02:46:34 < Jumpyshoes> using shitty open source to promote open source
2010-11-24 02:46:37 < Jumpyshoes> good way to go
2010-11-24 02:46:48 < Dark_Shikari> ah yes
2010-11-24 02:46:56 < rfw> Dark_Shikari: what do you mean by the second part
2010-11-24 02:46:58 < Dark_Shikari> pengvado: wait.  what?
2010-11-24 02:47:01 < Jumpyshoes> ?
2010-11-24 02:47:05 < Dark_Shikari> I don't think we've done that so far
2010-11-24 02:47:15 < Dark_Shikari> since the asm functions only support one version at a time
2010-11-24 02:47:25 < Jumpyshoes> oops, Dark_Shikari, can you add another asm func task?
2010-11-24 02:47:29 < Jumpyshoes> i think both are taken
2010-11-24 02:47:50 < Dark_Shikari> I think we'll have to wait for j-b to show up
2010-11-24 02:47:53 < Dark_Shikari> yeah they're getting taken really fast
2010-11-24 02:47:55 < Dark_Shikari> I'll have to go spam them
2010-11-24 02:48:14 < Dark_Shikari> pengvado: it seems this wasn't done for anything else in dct.h.  what do you want to do?
2010-11-24 02:48:45 < pengvado> in dct.h I see 3 functions with both versions and a bunch of 8bit-only
2010-11-24 02:49:17 < Dark_Shikari> ah, I see
2010-11-24 02:49:17 < Dark_Shikari> ok
2010-11-24 02:49:31 < Dark_Shikari> but the function only has one version
2010-11-24 02:49:33 < Dark_Shikari> it doesn't exist for 8-bit
2010-11-24 02:49:37 < Dark_Shikari> does it make sense to have it be "templated" then?
2010-11-24 02:49:50 < pengvado> hmm, do we want to use the prototypes to keep track of which functions have which versions, or change everything once and for all to dctcoef/pixel so that it doesn't have to change again with each patch to implement a few functions?
2010-11-24 02:50:04 < Dark_Shikari> I like the first idea
2010-11-24 02:50:21 < pengvado> ok, then leave it
2010-11-24 02:51:02 < pengvado> 319: vertical align
2010-11-24 02:52:07 < Dark_Shikari> how so?  just the [16]?
2010-11-24 02:52:13 < Dark_Shikari> the rest is impossible to align without shifting all the others
2010-11-24 02:52:39 < Dark_Shikari> Jumpyshoes: also, feel free to watch, this is how code reviews work =p
2010-11-24 02:52:47 < Jumpyshoes> yea, i am
2010-11-24 02:52:56 < pengvado> shifting the paren would make more match up
2010-11-24 02:52:57 < Jumpyshoes> i feel sorta bad <_<
2010-11-24 02:53:05 < Jumpyshoes> so many mistakes
2010-11-24 02:53:13 < Dark_Shikari> Jumpyshoes: this is how it starts for everyone
2010-11-24 02:53:21 < Dark_Shikari> even _my_ patches get 250 nitpicks before committing
2010-11-24 02:53:26 < Dark_Shikari> this is how we make good code
2010-11-24 02:53:29 < Dark_Shikari> we bitch about each others'
2010-11-24 02:53:34 < Jumpyshoes> haha
2010-11-24 02:53:35 < Dark_Shikari> pengvado: you mean shifting all of them?  or what
2010-11-24 02:53:53 < pengvado> just the new one
2010-11-24 02:54:12 < Dark_Shikari> so
2010-11-24 02:54:13 < Dark_Shikari> void x264_add4x4_idct_mmx       ( uint8_t *p_dst, int16_t dct    [16] );
2010-11-24 02:54:13 < Dark_Shikari> void x264_add4x4_idct_sse2     ( uint16_t *p_dst, int32_t dct    [16] );
2010-11-24 02:54:14 < Dark_Shikari> ?
2010-11-24 02:54:16 < pengvado> yes
2010-11-24 02:54:19 < Dark_Shikari> ok
2010-11-24 02:54:21 < Dark_Shikari> done
2010-11-24 02:54:56 < Dark_Shikari> espes: claim accepted
2010-11-24 02:55:02 < Dark_Shikari> fuck melange
2010-11-24 02:55:22 < Jumpyshoes> so whenever the claims come out
2010-11-24 02:55:39 < pengvado> if I'm not holding Jumpyshoes responsible for optimizing arithmetic width since I haven't written that test yet, then patch ok
2010-11-24 02:55:51 < Dark_Shikari> pengvado: isn't that only possible with fdct?
2010-11-24 02:56:03 < Dark_Shikari> or is there a stage that it's possible with in idct?
2010-11-24 02:56:09 < Dark_Shikari> and yeah, I think that's reasonable
2010-11-24 02:56:17 < Dark_Shikari> I mean, the fdct is clearly _broken_ now, so...
2010-11-24 02:56:52 < pengvado> hmm, idct inputs should fit well within 16bit, but they're a 32bit type since pre-quant coefs need 32bit, so it's at least inconvenient
2010-11-24 02:57:07 < Dark_Shikari> why would idct inputs fit within 16-bit if dct outputs won't?
2010-11-24 02:58:00 < pengvado> because quant rescales things? at least I think it does.
2010-11-24 02:58:06 < Dark_Shikari> by that large a margin?  I'm not sure
2010-11-24 03:00:01 < rfw> x264.c: In function ‘help’: x264.c:351: error: ‘X264_VERSION’ undeclared (first use in this function)
2010-11-24 03:00:05 < rfw> the hell
2010-11-24 03:00:18 < Dark_Shikari> configure failed
2010-11-24 03:00:19 < Dark_Shikari> or something
2010-11-24 03:00:22 < Dark_Shikari> X264_VERSION gets written by configure
2010-11-24 03:00:35 < Dark_Shikari> into config.h
2010-11-24 03:00:40 < Dark_Shikari> pengvado: oh, another thing I wanted to bring up
2010-11-24 03:00:44 < pengvado> oh, my bad. I was counting bits of expansion, assuming that if iddct expands things by 7 bits and is >>6 then it must have pretty small inputs. but that's wrong. it just means that it's syntactically possible to write a bitstream where idct is very very clipped, and even if it isn't there's no way to know which of the dct coefs that magnitude is in.
2010-11-24 03:00:56 < rfw> huh that's funny
2010-11-24 03:01:01 < rfw> it didn't catch the return code
2010-11-24 03:01:14 < Dark_Shikari> a customer has requested that ffmpeg be able to link to a commercial x264
2010-11-24 03:01:18 < Dark_Shikari> while ffmpeg remains LGPL
2010-11-24 03:01:21 < Dark_Shikari> this of course makes sense.
2010-11-24 03:01:30 < Dark_Shikari> But this requires that ffmpeg configure KNOW that x264 is commercially licensed, and not GPL.
2010-11-24 03:01:41 < Dark_Shikari> I considered an installed x264_config.h, similar to ffmpeg's.
2010-11-24 03:01:45 < Dark_Shikari> What do you think should be done?
2010-11-24 03:02:35 < rfw> oh lol
2010-11-24 03:02:36 < rfw> windows encoding
2010-11-24 03:02:50 < rfw> i swear i turned autocrlf off too
2010-11-24 03:04:29 < Jumpyshoes> Dark_Shikari, is my patch okay, or do i need to do what pengvado said?
2010-11-24 03:04:53 < pengvado> patch ok
2010-11-24 03:04:58 < Dark_Shikari> you're good
2010-11-24 03:05:04 < Jumpyshoes> o
2010-11-24 03:05:04 < Jumpyshoes> yay
2010-11-24 03:05:31 < pengvado> I can't think of any way better than x264_config.h
2010-11-24 03:05:43 < Dark_Shikari> is there anything else we should move to x264_config.h?
2010-11-24 03:05:44 < Dark_Shikari> bit depth?
2010-11-24 03:05:58 < pengvado> of course people will install an x264_config.h of one configuration and a binary of another
2010-11-24 03:06:19 < pengvado> yes
2010-11-24 03:06:23 < Dark_Shikari> the problem is that if you rely on an API call to check license
2010-11-24 03:06:29 < Dark_Shikari> it breaks cross-compiling
2010-11-24 03:06:31 < Dark_Shikari> so this leaves us two options
2010-11-24 03:06:33 < Dark_Shikari> x264_config.h
2010-11-24 03:06:39 < Dark_Shikari> or API call + runtime check in ffmpeg if LGPL is on
2010-11-24 03:07:08 < pengvado> or x264_config.h + runtime check for consistency
2010-11-24 03:07:42 < Dark_Shikari> so both bit depth and license will have APIs, *plus* config.h entries?
2010-11-24 03:08:02 < Dark_Shikari> bit_depth is a variable instead of a function call, should license be a variable too, or should we make both function calls?
2010-11-24 03:11:48 < pengvado> otoh, mismatched license doesn't cause segfaults, so maybe that doesn't need a consistency check
2010-11-24 03:11:52 < pengvado> bitdepth does though
2010-11-24 03:13:37 < Jumpyshoes> Dark_Shikari, if my other patch is good, can i claim dct4x4dc_mmx ?
2010-11-24 03:13:43 < Dark_Shikari> Jumpyshoes: sure.
2010-11-24 03:13:46 < Jumpyshoes> cool
2010-11-24 03:14:18 < Dark_Shikari> important thing to note about that
2010-11-24 03:14:24 < Dark_Shikari> That function used to only use 16-bit math
2010-11-24 03:14:29 < Dark_Shikari> and thus looked very similar to all the other nearby functions
2010-11-24 03:14:40 < Dark_Shikari> however, we found that if the input was all black for one input, and all white for the other
2010-11-24 03:14:43 < Dark_Shikari> i.e. 00000 vs 255 255 255
2010-11-24 03:14:47 < Dark_Shikari> it overflowed
2010-11-24 03:14:54 < Dark_Shikari> so it emulates one extra bit of precision: see SUMSUB_17BIT
2010-11-24 03:14:59 < Dark_Shikari> However... you're doing it in 32-bit.
2010-11-24 03:15:01 < Dark_Shikari> So you don't need that.
2010-11-24 03:15:04 < Jumpyshoes> yea
2010-11-24 03:15:27 < Jumpyshoes> also, what is     movq   m7, [pw_8000] ; convert to unsigned and back, so that pavgw works          supposed to do, m7 isn' t used in the function
2010-11-24 03:15:34 < Jumpyshoes> oh yes it is
2010-11-24 03:15:37 < Jumpyshoes> i'm blind <_<
2010-11-24 03:15:51 < Dark_Shikari> You should go view the git history for that file
2010-11-24 03:15:53 < Dark_Shikari> and find were it was changed
2010-11-24 03:15:56 < Dark_Shikari> and use the old version as you reference.
2010-11-24 03:16:01 < Dark_Shikari> *your
2010-11-24 03:16:06 < Dark_Shikari> It will be more applicable.
2010-11-24 03:16:20 < Jumpyshoes> oh, okay
2010-11-24 03:16:23 < Dark_Shikari> The 17bit, convert to unsigned, etc are pure hacks to quickly emulate the extra bit.
2010-11-24 03:17:55  * pengvado sleeps
2010-11-24 03:20:13 < rfw> Dark_Shikari: so what else am i going to need
2010-11-24 03:24:07 < Dark_Shikari> rfw: link to the latest patch again?
2010-11-24 03:24:21 < rfw> um hold on
2010-11-24 03:24:26 < rfw> i changed a few weird things
2010-11-24 03:24:29 < Dark_Shikari> I will locally commit it, but it'll need to go through two tests before it's done:
2010-11-24 03:24:46 < Dark_Shikari> 1) I'll have to use it and be satisfied (along with a few other devs, like bugmaster and pengvado)
2010-11-24 03:24:51 < Dark_Shikari> 2) pengvado's code review
2010-11-24 03:24:55 < Dark_Shikari> of course, if pengvado isn't interested in python
2010-11-24 03:24:58 < Dark_Shikari> he might just say "ok"
2010-11-24 03:25:08 < Dark_Shikari> because it's not part of the encoder, standards are probably lower.
2010-11-24 03:25:16 < Dark_Shikari> i.e. functioning is more important than being pretty
2010-11-24 03:26:00 < Jumpyshoes> Dark_Shikari, do you have an idea when the tasks will come up?
2010-11-24 03:26:09 < Dark_Shikari> when j-b wakes up
2010-11-24 03:26:21 < Dark_Shikari> he's in france, so do the math
2010-11-24 03:26:33 < JEEBsv> ~3AM in France now I'd guess
2010-11-24 03:26:48 < Jumpyshoes> ah, okay
2010-11-24 03:27:02 < rfw> Dark_Shikari: does that mean pengvado is perl supremacist like Kovensky
2010-11-24 03:27:44 < Dark_Shikari> maybe.
2010-11-24 03:27:47 < Dark_Shikari> I don't relaly know
2010-11-24 03:27:54 < Dark_Shikari> there are many theories.
2010-11-24 03:28:01 < rfw> http://pastebin.com/nDgwpk6S
2010-11-24 03:28:02 < Dark_Shikari> some say that he's an alien
2010-11-24 03:28:20 < rfw> what exactly is a pengvado
2010-11-24 03:28:45 < JEEBsv> Nobody knows for sure. Only one thing is sure... they fork()
2010-11-24 03:28:59 < Dark_Shikari> My guess is a youkai of some sort.
2010-11-24 03:29:18 < Dark_Shikari> Doesn't like sunlight, has thought processes alien to a typical human... seems a perfect match.
2010-11-24 03:29:22 < rfw> does he have a funny hat
2010-11-24 03:29:59 < Dark_Shikari> well, everyone knows that youkai that hide out in human society don't wear their funny hats in public.
2010-11-24 03:30:18 < rfw> :(
2010-11-24 03:33:58 < Jumpyshoes> was HADAMARD4_1D phased out or something?
2010-11-24 03:34:20 < Dark_Shikari> No, that change was made because HADAMARD4_1D didn't keep the necessary precision
2010-11-24 03:34:34 < Jumpyshoes> i mean, i can't find it
2010-11-24 03:34:38 < Dark_Shikari> oh
2010-11-24 03:34:55 < Dark_Shikari> I think it was, yeah
2010-11-24 03:35:03 < Jumpyshoes> just cause?
2010-11-24 03:35:06 < Dark_Shikari> wow, holger has gone missing
2010-11-24 03:35:13 < Jumpyshoes> or because of precision issues
2010-11-24 03:35:23 < Jumpyshoes> cause i think i can bring it back for 32-bit... maybe
2010-11-24 03:35:24 < Dark_Shikari> I'm guessing it relates to holger's changes
2010-11-24 03:35:30 < Jumpyshoes> ah, okay
2010-11-24 03:35:32 < Dark_Shikari> no, you just need WALSH4_1D
2010-11-24 03:35:35 < Jumpyshoes> i'll bug him tomorrow or somethin
2010-11-24 03:35:36 < Jumpyshoes> g
2010-11-24 03:35:38 < Dark_Shikari>     SUMSUB_BADC w, m1, m0, m3, m2, m4
2010-11-24 03:35:38 < Dark_Shikari>     SWAP 0, 1
2010-11-24 03:35:38 < Dark_Shikari>     SWAP 2, 3
2010-11-24 03:35:38 < Dark_Shikari>     SUMSUB_17BIT 0,2,4,7
2010-11-24 03:35:38 < Dark_Shikari>     SUMSUB_17BIT 1,3,5,7
2010-11-24 03:35:41 < Dark_Shikari> these lines replace a
2010-11-24 03:35:44 < Dark_Shikari>     WALSH4_1D  0,1,2,3,4
2010-11-24 03:35:48 < Dark_Shikari> You don't need HADAMARD
2010-11-24 03:35:57 < Dark_Shikari> we renamed it to WALSH, I think, because it was a more accurate name
2010-11-24 03:36:07 < Dark_Shikari> as it's a walsh-ordered transform
2010-11-24 03:36:10 < Jumpyshoes> oh, so HADAMARD4_1D is WALSH4_1D?
2010-11-24 03:36:16 < Jumpyshoes> (taking a look at the older revision)
2010-11-24 03:37:15 < Dark_Shikari> I think so
2010-11-24 03:37:25 < Dark_Shikari> anyways just look at the current code and mentally replace the thing I said above
2010-11-24 03:37:28 < Dark_Shikari> with WALSH4_1D
2010-11-24 03:37:36 < Dark_Shikari> It's basically that.
2010-11-24 03:37:42 < Jumpyshoes> ah
2010-11-24 03:40:36 < darkbringer> rfw, i'm not a developer, but would not it be better not to hardcode akiyo_qcif everywhere? I mean it is quite possible that some people would want to use different clips for testing.
2010-11-24 03:40:48 < Dark_Shikari> Yeah, that.
2010-11-24 03:40:52 < rfw> oh
2010-11-24 03:40:54 < Dark_Shikari> it should be a cli argument
2010-11-24 03:40:58 < rfw> cirno.tiff
2010-11-24 03:41:00 < JEEBsv> Yes, indeed
2010-11-24 03:41:08  * JEEBsv pats (9)-rfw
2010-11-24 03:41:35 < Dark_Shikari> well next time I make a stupid mistake I can be dar(9)shikari
2010-11-24 03:43:14 < Jumpyshoes> oh yea, one last question
2010-11-24 03:43:28 < Jumpyshoes> when do revisions usually go up? i'm too lazy to commit and change everything
2010-11-24 03:43:32 < Jumpyshoes> so i want to pull the latest one
2010-11-24 03:43:50 < Dark_Shikari> we make a big push every 1-2 weeks
2010-11-24 03:43:57 < Dark_Shikari> it depends how many changes there are, and how urgent they are
2010-11-24 03:44:05 < Dark_Shikari> along with the push comes out a newsletter documenting the changes
2010-11-24 03:44:12 < Dark_Shikari> pushes usually contain 8-24 commits
2010-11-24 03:44:18 < Jumpyshoes> ah, okay
2010-11-24 03:44:21 < Dark_Shikari> there may also be bugfix pushes soon after if bugs come up
2010-11-24 03:44:24 < Dark_Shikari> these usually contain 1-5 commits
2010-11-24 03:44:24 < Jumpyshoes> so i guess i'll work on it now
2010-11-24 03:44:49 < Dark_Shikari> it's not really an issue though, asm changes rarely have any conflicts with anything else
2010-11-24 03:45:22 < reid_> Dark_Shikari: what is the relation between  VLC and x264?
2010-11-24 03:45:29 < Jumpyshoes> oh, i guess that's true
2010-11-24 03:45:30 < JEEBsv> reid_: same host
2010-11-24 03:45:42 < Dark_Shikari> reid_: Videolan is a big organization that hosts a lot of projects
2010-11-24 03:45:42 < JEEBsv> and under the same videolan umbrella in GSoC etc.
2010-11-24 03:45:46 < Dark_Shikari> those projects include VLC and x264
2010-11-24 03:46:08 < Dark_Shikari> So "Videolan" gets accepted to Google Code-In
2010-11-24 03:46:13 < Dark_Shikari> which means that any Videolan project can submit tasks
2010-11-24 03:46:27 < Dark_Shikari> such as VLC, Videolan itself (website, etc), VLMC, x264, etc
2010-11-24 03:46:45 < reid_> is x264 an application or more like a library?
2010-11-24 03:46:50 < JEEBsv> both
2010-11-24 03:46:51 < darkbringer> both
2010-11-24 03:46:59 < Dark_Shikari> both
2010-11-24 03:47:04 < JEEBsv> very strong command line app, as well as a highly versatile library
2010-11-24 03:47:14 < Dark_Shikari> x264cli and libx264 as they're called
2010-11-24 03:48:08 < Dark_Shikari> JEEBsv: yes, it's very strong, it can leap tall buildings in a single bound and make DC 40 fort saves
2010-11-24 03:48:22 < JEEBsv> haha
2010-11-24 03:49:37 < reid_> I ask because i'm looking at the filtering system.
2010-11-24 03:50:42 < Jumpyshoes> i need a better computer
2010-11-24 03:50:54 < Dark_Shikari> reid_: the filtering task has changed slightly on decree from our BDFL
2010-11-24 03:50:59 < Jumpyshoes> this laptop is hot and x264 compiles too slowly
2010-11-24 03:51:00 < Dark_Shikari> it's now porting new filters into libavfilter ;)
2010-11-24 03:51:07 < Dark_Shikari> Which we'll then use from x264
2010-11-24 03:51:10 < reid_> Jumpyshoes: celeron ftw!
2010-11-24 03:51:17 < Jumpyshoes> oh
2010-11-24 03:51:20 < Jumpyshoes> i have an i5
2010-11-24 03:51:22 < Jumpyshoes> won a laptop
2010-11-24 03:51:32 < Dark_Shikari> >x264 compiles too slowly
2010-11-24 03:51:33 < Dark_Shikari> make -j8
2010-11-24 03:51:35 < Dark_Shikari> problem solved
2010-11-24 03:51:41 < Jumpyshoes> what?
2010-11-24 03:51:45 < reid_> you cant get too much better that i5.
2010-11-24 03:51:52 < Jumpyshoes> oh yes you can
2010-11-24 03:51:58 < Dark_Shikari> I have a 1.6ghz i7
2010-11-24 03:51:59 < Dark_Shikari> it's fast enough
2010-11-24 03:52:03 < Jumpyshoes> i used a $3000 computer over the summer
2010-11-24 03:52:04 < Sean_McG> you can get a 16-way Xeon
2010-11-24 03:52:07  * Sean_McG lollerblades
2010-11-24 03:52:15 < Jumpyshoes> i miss the days when i could open vs2010 in half a second
2010-11-24 03:52:18 < Jumpyshoes> instead of two minutes
2010-11-24 03:52:27 < Dark_Shikari> yes, that's called "time stop"
2010-11-24 03:52:37 < Dark_Shikari> And unless you're a 9th level sorc/wizard, or Sakuya, you can't do that.
2010-11-24 03:52:41 < Dark_Shikari> er, 9th spell level
2010-11-24 03:52:43 < Jumpyshoes> it had an SSD
2010-11-24 03:52:46 < reid_> Sean_McG: in a laptop?
2010-11-24 03:52:49 < Dark_Shikari> Oh, that's the other way to do it.
2010-11-24 03:52:53 < Sean_McG> reid_: hrm, no.
2010-11-24 03:52:56 < Jumpyshoes> yea
2010-11-24 03:52:57 < Jumpyshoes> i love SSDs
X264asm

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Help / Documentation

Development

VideoLAN wiki

Tools