Difference between revisions of "SoC x264 2009"

From VideoLAN Wiki
Jump to navigation Jump to search
(New page: x264 has loads of possibilities for SoC 2008 projects. This is part of the VideoLAN candidature for Google Summer of Code 2008. *Mentor (and author of this page): Dark Shikari *Po...)
 
 
(24 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[x264]] has loads of possibilities for [[SoC 2008]] projects. This is part of the VideoLAN candidature for Google Summer of Code 2008.
+
[[x264]] has loads of possibilities for [[SoC 2009]] projects. Those listed here are merely suggestions; other ideas are always possible if they fit the time and difficulty constraints of SoC.
  
*Mentor (and author of this page): Dark Shikari
+
This is part of the VideoLAN candidature for Google Summer of Code 2009.
*Possible backup mentor: pengvado
+
 
 +
*Mentor (and author of this page): Jason Garrett-Glaser (Dark Shikari)
 +
*Possible backup mentor: Holger Lubitz
  
 
==Introduction to x264==
 
==Introduction to x264==
 
x264 is probably the most efficient, compression-wise, open source video encoder there is. It is quite competitive with commercial encoders, outclassing a large number of them.  
 
x264 is probably the most efficient, compression-wise, open source video encoder there is. It is quite competitive with commercial encoders, outclassing a large number of them.  
  
While not actually part of VLC or ffmpeg (it has its own codebase), it is a major library used by both, licensed under the GPL, in addition to being a standalone encoder.  As the only major open-source H.264 encoder, x264 has a near-complete monopoly on H.264 encoding in the consumer world, along with being used by many major corporations, including Facebook and Google.  Some companies, such as Avail Media, have in the past offered bounties on improvements to the encoder.
+
While not actually part of VLC or ffmpeg (it has its own codebase), it is a major library used by both, licensed under the GPL, in addition to being a standalone encoder.  As the only major open-source H.264 encoder, x264 has a near-complete monopoly on H.264 encoding in the consumer world, along with being used by many major corporations, including Youtube and Facebook.  Some companies, such as Avail Media, have in the past offered bounties on improvements to the encoder.
 
 
==x264 project ideas==
 
This is not at all an exhaustive list; this is just a few I thought up with.  I'm willing to mentor any reasonable project on x264 to the best of my ability.  I'm being pretty conservative here, so I'm picking projects that are probably not at all too ambitious for a good student.  If anything, I might be underestimating the amount of work that can be done, so feel free to propose something else if you're feeling creative.
 
:-- Dark Shikari
 
 
 
===Size key===
 
Depends heavily on the skill and willingness to work of the student.  An extremely dedicated and talented student might be able to implement MBAFF in a summer, but it is certainly not fair to expect such a thing from most students.
 
 
 
*Very Large: Probably too large to completed in one summer.
 
*Large: Probably the right size for a full-summer project.
 
*Medium: Probably too small.  Could be combined with another project, of course.
 
*Small: A small project, but definitely useful, and could be part of a larger project.
 
  
===Skills needed===
+
An overview of x264's algorithm can be found [http://akuvian.org/src/x264/overview_x264_v8_5.pdf here].
 +
==Skills needed==
 
These are required for all listed projects and probably anything not listed, too.
 
These are required for all listed projects and probably anything not listed, too.
 
*Basic C programming.
 
*Basic C programming.
Line 29: Line 20:
 
:*Motion estimation and compensation
 
:*Motion estimation and compensation
 
:*Quantization and entropy encoding
 
:*Quantization and entropy encoding
A PDF with a chapter that can serve as a primer to video compression can be found [http://www.mediafire.com/download.php?auxd23m2snw here].  It also has some more specific chapters on MPEG-4 Part 2 and Part 10.
+
A PDF with a chapter that can serve as a primer to video compression can be found [http://www.mediafire.com/download.php?auxd23m2snw here] (new version [http://dl.dropbox.com/u/2701213/pdfs/0470516925Video.pdf here]).  It also has some more specific chapters on MPEG-4 Part 2 and Part 10.
  
 
==Projects==
 
==Projects==
===Fast inter refinement===
+
We are probably only accepting one or two students this year.  Thus, you will have to prove that you are absolutely able to do your project over the summer--see the qualification tasks.
''Size: Medium to large.''
 
  
Description: Improve heuristics and decision-making for inter refinement to improve efficiency on non-insane encoding settings.  This would involve various early termination heuristics along with methods of deciding which partition modes need to be searched while performing minimal actual searching on those partition modes.  This would be similar to, but a vastly more in-depth analysis of what was proposed in the "Fast-Ref-Search" patch.
+
===Optimization (x86)===
 +
x264 prides itself on being one of the most optimized programs in existence while still being reasonably readable and maintainable.  This project is about furthering that goal: make it even faster without sacrificing code quality.
  
Difficulty: Medium
+
This is probably the hardest task, as x264 is already so absurdly optimized that going significantly further is going to be very difficult.  As such, the qualification task for this project (see below) requires you to prove that you can get at least somewhere.  The goal here is at least a 5-10% performance increase in x264 on x86 (and ideally, other CPUs too, but processor-specific optimizations are allowed here).  Some unconventional ideas in addition to the obvious tasks of writing loads of assembly and finding cases where the C code can be further optimized:
  
===Fast intra refinement===
+
*Cache profiling: try to minimize cache misses.
''Size: Small to medium''
+
*Code size profiling: find the inherent "value" in clock cycles of a line of cache and use this to try to optimize the code size of existing assembly.
 +
*Aliasing optimizations: find places where aliasing is hurting the compiler's ability to optimize.
  
Description: Similar to above, but covering intra modes instead.  Would probably involve considerable statistical analysis of intra mode data, along with creative solutions for improved RDO refinement.  We already have some ideas on this one, but haven't implemented any of them.
+
Not recommended for anyone other than hardcore assembly gurus.
  
Difficulty: Medium
+
===ARM Support===
 +
x264 currently focuses on x86--and to a lesser extent--PowerPC.  ARM is becoming more and more important every year--yet currently x264 cannot even run on some ARMs, and there's no configure support.  Let alone the fact that there's no ARM assembly.  This task would be to set up ARM support in configure, fix all unaligned accesses in the C code (there's probably only one), and then writing ARM assembly for all major DSP functions.  There are already some similar ARM functions in ffmpeg for H.264 decoding which could be useful to you in this task.  The goal here is to make x264 at least 4-5 times faster on ARM by implementing all the major assembly functions on ARM.
  
===RDO B-frame decision===
+
Also not recommended for anyone other than hardcore assembly gurus.
''Size: Medium to large''
 
  
Description: x264's biggest weakness is its B-frame decision algorithm, which can often be extremely subtopimal, with OPSNR losses as high as 1db in some casesImproving this would drastically increase the effectiveness of the encoder.
+
===4:4:4 and 4:2:2 Colorspaces===
 +
x264 currently only supports the 4:2:0 colorspace, also known as YV12.  However, many profession applications require higher precision in the form of 4:2:2 or 4:4:4 chroma subsampling.  Furthermore, such sampling is useful for many artificial video sources, like video game captures and presentations, where sharp chroma edges are blurred by the 4:2:0 sampling.  This is a large project that, unlike the previous two, probably does not involve modifying any assembly if you don't want toIn cases where things would be easier if the assembly ''was'' modified, the other developers will probably be willing to do it for you if you aren't an assembly guru.
  
Difficulty: Medium-high
+
The project mostly covers changing every place in the code where YV12 was assumed--and making it variable.  It also involves handling the potential new syntax elements and adding a new Hadamard transform for the new chroma DC channels.  It also may involve modifying x264_scan8, which could be a rather obnoxious task given how many parts of x264 assume certain properties of it.
  
===4:4:4 and 4:2:2 color support===
+
This project will involve a lot of coding and a lot of debugging, but none of it should be particularly complex.
''Size: Medium to large''
 
  
Description: x264 doesn't support any color spaces other than YV12.  This would solve this problem by adding the ability to use YUY2 and YV24 color spaces.  This might be useful for some animation footage, or graphics; plus its been requested often.
+
===GPU Motion Estimation===
 +
In practice, this probably means CUDA, as OpenCL is not supported by anything yet.
  
Difficulty: Medium
+
While porting x264 entirely to CUDA is an insane task, putting a lookahead motion estimation on the GPU could be useful both for quality and performance.  Tricky catches here include:
  
===Film grain modeling===
+
*The motion estimation method has to support x264's threading model.  The easiest way to do this might be, as suggested, to make it into a lookahead function that is run on entire frames before the main encoding begins.
''Size: Medium to large''
+
*The algorithm has to be able to make some sorts of basic mode decisions well enough to to be comparable with x264's basic SATD mode decision.  This means you will have to implement the hpel and qpel interpolation algorithms, along with the SATD (Sum of Hadamard Transformed Differences) comparison method.
 +
*Interlaced mode.
  
An integral part of the standard... but supported by basically nothing despite its potential usefulnessThis would involve implementing FGM in both x264 '''and''' some sort of decoder, preferably ffmpeg.  Some work has already been done in this category, so you won't be starting from nothing.
+
The general algorithm that has been agreed on after a great deal of discussion is the hierarchical search methodMore description of this method is in the Qualification Tasks section.
  
Difficulty: Medium
+
This project is not recommended unless you have a very significant amount of experience with CUDA.
  
===Other possible projects===
+
===Weighted P-frame Prediction===
Anything here (and not here) can potentially be picked from at the request of a student.
+
Of the projects listed, this is the only one with the potential to significantly improve encoding quality.  Weighted P-frame prediction lets you assign weights to the frames in a reference list for the current frame, values to multiply all the pixels by.  This is incredibly useful in dealing with fades, camera flashes, etc.  However, it would require both a good enough algorithm to find optimal weighting factors and an efficient enough algorithm to be useful in practice.
  
*Assembly optimizations of any sort
+
A [http://akuvian.org/src/x264/x264_wpredp.0.diff patch] already exists for this--but it is so old (from the early days of x264) that it is practically useless except as a guide to how to start implementing it now.
:*Extra skills: Assembly coding
 
:*Difficulty: Medium
 
:*Examples:
 
::*Port cacheline split to the motion compensation code for increased speed (this could further be used to improve ffh264's decoding).
 
::*Assembly-optimize some things that haven't been already.
 
::*Port some MMX assembly to SSE where it seems useful.
 
::*Play around with potential SSE4 optimizations.
 
  
*Psychovisual optimizations for mode decision and quantization (e.g. QNS)
+
Note that the challenge here is twofold: add support for weighted P-frame prediction and make an algorithm good enough and fast enough to make the feature useful.
:*Could also include work on adaptive quantization, a huge benefit for x264 quality-wise.
 
:*Extra skills: Creativity and perhaps some understanding of DCT/Fourier math.
 
:*Difficulty: Medium-high
 
:*Examples:
 
::*SSIM-QNS optimization?
 
::*Adaptive deadzone?
 
::*Adaptive lambda?
 
  
*Implementing MBAFF or PicAFF (potentially too difficult for a SoC project, however)
+
==Qualification tasks==
:*Difficulty: Very high
+
This year, the qualification tasks will represent the start of your summer project.  We're willing to give all the technical help you need, but of course we won't write the code for you.  "Passing" a qualification task is at the mentor's discretion.  Note these are designed to be '''difficult''' and help lead you into your main project.  If you can't do the qualification task for the project, you surely cannot do the project either!
  
*Fast RD optimization using heuristics
+
Again, to reiterate, we will guide you through as much of the codebase as you need to do your work.  '''This page is not supposed to give you all the information you need to do these tasks: you are expected to contact us for more information.'''  Feel free to ask tons of questions.  On #x264dev IRC channel on Freenode, of course.
:*Extra skills: Reading lots of IEEE papers
 
:*Difficulty: Medium
 
  
*Motion search improvements
+
===Optimization (x86)===
:*Difficulty: Medium
+
If you're interested in the optimization task, the qualification task is to speed up x264 on x86 (32 or 64-bit) by 1-2% on "normal settings" without changing the output.  This is '''much''' harder than it sounds.
  
*More RDO
+
===ARM Support===
:*Difficulty: Medium
+
If you're interested in the ARM task, your qualification task will be to:
  
*Anything else reasonable, honestly. There's all sorts of ideas floating around.
+
* Fix the unaligned access bug in the bitstream writer.
 +
* Write NEON SIMD assembly for at least a few of the simpler significant DSP functions (SAD, SATD, etc).
  
==Qualification tasks==
+
===4:4:4 and 4:2:2 Colorspaces===
Before you start work these, drop by #x264dev and meet me (Dark Shikari) firstThese should be done in order, and the results submitted to me at darkshikari[at]gmail.comBonus points will be given for *good* solutions or creative ones, not just ones that work. Well-commented and styled code is also a bonus.  Feel free to do any research necessary to complete the task; this isn't a closed-notes test!
+
If you're interested in working on this project, your task is to produce an x264-encoded bitstream in 4:4:4 or 4:2:2 formatIt does not actually have to be remotely viewable (that is, you don't have to implement any of the code to handle motion compensation, deblocking, or anything else involving 4:4:4/4:2:2 chroma data), but the bitstream has to be written correctly (correct syntax elements)The patch you write for this will be the starting point for your main project.
  
Rules: You can ask any question on IRC you wantThere are no rules about where you can get any information--all that matters is that through your own effort or the help of others, or both, you get these completedFeel free to ask me for help, up to a point, at any part in this, especially if you need algorithmic explanations or details on video encoding concepts behind the algorithmsHowever, if you ask for too much assistance without at least trying it yourself, I may penalize you.
+
===GPU Motion Estimation===
 +
Your task for this project will be to write a C version of your final algorithmIt doesn't need to deal with any of the corner cases; all it has to do is run before the main encoding loop, deciding the motion vectors for the frameIt doesn't even have to work with threading.  It doesn't have to support sub-16x16 partitions eitherThe hierarchical search works via the following algorithm.
  
Qualification tasks:
+
* Set N equal to 2^M, where M is an integer.  A common M is 4.
 +
* WHILE N is greater than 1:
 +
:* Downscale the image (from the original) by a factor of N.
 +
:* Do an ordinary diamond motion search on the image with block size 16x16.  Assume the predicted motion vector to be equal to the median of the top, left, and top right motion vectors (as per H.264 MV prediction)... but use the motion vectors from the previous iteration, not the current for these (this is what allows you to parallelize things with CUDA).
 +
:* For each block after searching, split the motion vectors of that block into 4 separate (but equal) motion vectors.  These will be used as the starting point for the searches in the next iteration.  Each iteration progressively refines the result at a progressively lesser downscale.
 +
:* N = N/2
 +
* Do a final refine at no downscale at all.
  
#Download the x264 source.  You will figure out how to do this yourself.  Make sure to use git, not svn; the svn is not updated anymore, and the last build is broken, so use git!
+
===Weighted P-frame Prediction===
#Compile the x264 source and encode a sample video with the latest build off git.  Upload the video as a .h264 file to Mediafire or a similar site, and email it to me at darkshikari[at]gmail.com with your name.  And don't cheat, because I know what version you encoded it with!  The only requirement is that the video be 1000 frames long and encoded in two-pass mode with bitrate 1000kbps.  The video you choose is completely your own choice, but I suggest you choose one with a good bit of motion, for part 3).
+
The task for this project is to add support for Weighted P-frame Prediction--but not to write any sort of handling of it from an algorithmic standpoint, so no algorithm to decide the weightsThat is, all the weights are just set to some constant value until we come up with a better way to do thingsAs with the other tasks, you can make many simplifications here to avoid corner cases, such as ignoring multithreadingAnother simplification you can make is to only allow a weight on the first reference frame, or even have the program be completely ignorant of the weighted version of the frame until the final encodeAll that matters is you get the basic framework working.
#Your real qualification task will be playing with me.c, in particular, x264_me_search_ref(), the primary motion search in x264.  '''It is not required that you succeed at this, only that you make your best effort!'''  In particular, you'll be working after line 227, "switch( h->mb.i_me_method )".  This is where all five of the motion search methods are: DIA, HEX, UMH, ESA, and TESA.
 
::Simple explanation of commands in me.c that you'll need to know:
 
::*CHECK_MVRANGE(x,y): check the motion vector range of vector <x,y>.  There's a 5-pixel buffer, so you can move by up to 5 pixels in the x and/or y direction before having to check it again.
 
::*COST_MV(x,y): Run a check on location x,y.
 
::*COST_MV_X4(x0,y0,x1,y1,x2,y2,x3,y3): Takes eight arguments; note that these are OFFSETS from a base value of <omx,omy>.  Note this is quite a bit faster than calling COST_MV 4 times.
 
::A simple analysis of the DIA motion search, for example:
 
::case X264_ME_DIA:
 
::/* diamond search, radius 1 */
 
:::for( i = 0; i < i_me_range; i++ )
 
:::{
 
::::DIA1_ITER( bmx, bmy );
 
::::if( bmx == omx && bmy == omy )
 
:::::break;
 
::::if( !CHECK_MVRANGE(bmx, bmy) )
 
:::::break;
 
:::}
 
::: break;     
 
::The for loop runs up to the merange parameterAt each point, DIA1_ITER is called, which just does COST_MV_X4 on all 4 neighboring locations.  If none of these is better than the current one, it breaks out.  If one is better, it selects that as the new center and loops to the beginning againAnd if the MVrange is greater than the max, it breaks out too.  This algorithm is often known as EPZS, or simply "diamond search."
 
::Now that you know the basics of how this works, glance over the ME HEX and ME UMH functions to see how they workIgnore ESA/TESA; these use some quite heavily optimized and complex code that you will probably not comprehend in the least.
 
::Now that you get the basic idea, write me a motion search that is faster than UMH, slower than HEX, but still better than HEX.  This shouldn't be '''too''' difficult, since UMH is quite slow by comparison to HEX, so you have a very large margin in which to beat HEX.  Your only requirement is that it be very little like HEX or UMH; i.e. you can't just rip of one of the two and modify it slightly.
 
::Measuring the effectiveness of the motion search is simple: simply see how good you can get the PSNR value at a particular target bitrate.  Ideally, on the video you choose, UMH will be a lot better than HEX, so there's a lot of margin of improvement between the two.
 
::Your motion search need not be practical or worthwhile--it must merely fall within those parameters mentioned above.
 
::I would also suggest you test it on more than one video, not just the one from 2); its possible to make the mistake of optimizing for a single video at the expense of others.
 
::When you're done, email me the resulting patch for x264 and any extra information you think would be useful.
 
*This list isn't complete.  I may add more if there's time left and we haven't narrowed it down to the right number of students yet.
 
===Updates===
 
*Your algorithm should be '''deterministic'''This means, unlike one algorithm submitted so far (which wasn't half-bad!)... it cannot contain rand().
 
*Your algorithm shouldn't violate the MV range limit as stated previously.  If it does it might lead to random crashing, which we obviously don't want.
 
*If there's not a large difference between HEX and UMH, you won't have much room to work!  Try to find a high-motion source where the difference is measurable.
 
:May I strongly recommend the free 1080p source of Elephant's Dream?  Low-resolution sources don't have much motion, in terms of pixels-moved-per frame, so they're not good for measuring efficiency of complex motion searches.
 
  
 
==Contact info==
 
==Contact info==
If you are interested, drop by #videolan, #x264, or #x264dev on Freenode.
+
If you are interested, drop by #x264dev or #x264 on Freenode.
  
 
You should also contact the admin [[User:J-b|jb]].
 
You should also contact the admin [[User:J-b|jb]].
 +
 +
{{GSoC}}
 +
 +
[[Category:SoC]]
 +
[[Category:x264]]

Latest revision as of 08:46, 5 February 2019

x264 has loads of possibilities for SoC 2009 projects. Those listed here are merely suggestions; other ideas are always possible if they fit the time and difficulty constraints of SoC.

This is part of the VideoLAN candidature for Google Summer of Code 2009.

  • Mentor (and author of this page): Jason Garrett-Glaser (Dark Shikari)
  • Possible backup mentor: Holger Lubitz

Introduction to x264

x264 is probably the most efficient, compression-wise, open source video encoder there is. It is quite competitive with commercial encoders, outclassing a large number of them.

While not actually part of VLC or ffmpeg (it has its own codebase), it is a major library used by both, licensed under the GPL, in addition to being a standalone encoder. As the only major open-source H.264 encoder, x264 has a near-complete monopoly on H.264 encoding in the consumer world, along with being used by many major corporations, including Youtube and Facebook. Some companies, such as Avail Media, have in the past offered bounties on improvements to the encoder.

An overview of x264's algorithm can be found here.

Skills needed

These are required for all listed projects and probably anything not listed, too.

  • Basic C programming.
  • Basic understanding of video encoding, or at least willingness to do the appropriate reading up on the topic before the summer begins.
  • Confidence in the ability to learn the basics of following and similar topics (though not all projects will require such information):
  • Discrete cosine transform and similar frequency transforms
  • Motion estimation and compensation
  • Quantization and entropy encoding

A PDF with a chapter that can serve as a primer to video compression can be found here (new version here). It also has some more specific chapters on MPEG-4 Part 2 and Part 10.

Projects

We are probably only accepting one or two students this year. Thus, you will have to prove that you are absolutely able to do your project over the summer--see the qualification tasks.

Optimization (x86)

x264 prides itself on being one of the most optimized programs in existence while still being reasonably readable and maintainable. This project is about furthering that goal: make it even faster without sacrificing code quality.

This is probably the hardest task, as x264 is already so absurdly optimized that going significantly further is going to be very difficult. As such, the qualification task for this project (see below) requires you to prove that you can get at least somewhere. The goal here is at least a 5-10% performance increase in x264 on x86 (and ideally, other CPUs too, but processor-specific optimizations are allowed here). Some unconventional ideas in addition to the obvious tasks of writing loads of assembly and finding cases where the C code can be further optimized:

  • Cache profiling: try to minimize cache misses.
  • Code size profiling: find the inherent "value" in clock cycles of a line of cache and use this to try to optimize the code size of existing assembly.
  • Aliasing optimizations: find places where aliasing is hurting the compiler's ability to optimize.

Not recommended for anyone other than hardcore assembly gurus.

ARM Support

x264 currently focuses on x86--and to a lesser extent--PowerPC. ARM is becoming more and more important every year--yet currently x264 cannot even run on some ARMs, and there's no configure support. Let alone the fact that there's no ARM assembly. This task would be to set up ARM support in configure, fix all unaligned accesses in the C code (there's probably only one), and then writing ARM assembly for all major DSP functions. There are already some similar ARM functions in ffmpeg for H.264 decoding which could be useful to you in this task. The goal here is to make x264 at least 4-5 times faster on ARM by implementing all the major assembly functions on ARM.

Also not recommended for anyone other than hardcore assembly gurus.

4:4:4 and 4:2:2 Colorspaces

x264 currently only supports the 4:2:0 colorspace, also known as YV12. However, many profession applications require higher precision in the form of 4:2:2 or 4:4:4 chroma subsampling. Furthermore, such sampling is useful for many artificial video sources, like video game captures and presentations, where sharp chroma edges are blurred by the 4:2:0 sampling. This is a large project that, unlike the previous two, probably does not involve modifying any assembly if you don't want to. In cases where things would be easier if the assembly was modified, the other developers will probably be willing to do it for you if you aren't an assembly guru.

The project mostly covers changing every place in the code where YV12 was assumed--and making it variable. It also involves handling the potential new syntax elements and adding a new Hadamard transform for the new chroma DC channels. It also may involve modifying x264_scan8, which could be a rather obnoxious task given how many parts of x264 assume certain properties of it.

This project will involve a lot of coding and a lot of debugging, but none of it should be particularly complex.

GPU Motion Estimation

In practice, this probably means CUDA, as OpenCL is not supported by anything yet.

While porting x264 entirely to CUDA is an insane task, putting a lookahead motion estimation on the GPU could be useful both for quality and performance. Tricky catches here include:

  • The motion estimation method has to support x264's threading model. The easiest way to do this might be, as suggested, to make it into a lookahead function that is run on entire frames before the main encoding begins.
  • The algorithm has to be able to make some sorts of basic mode decisions well enough to to be comparable with x264's basic SATD mode decision. This means you will have to implement the hpel and qpel interpolation algorithms, along with the SATD (Sum of Hadamard Transformed Differences) comparison method.
  • Interlaced mode.

The general algorithm that has been agreed on after a great deal of discussion is the hierarchical search method. More description of this method is in the Qualification Tasks section.

This project is not recommended unless you have a very significant amount of experience with CUDA.

Weighted P-frame Prediction

Of the projects listed, this is the only one with the potential to significantly improve encoding quality. Weighted P-frame prediction lets you assign weights to the frames in a reference list for the current frame, values to multiply all the pixels by. This is incredibly useful in dealing with fades, camera flashes, etc. However, it would require both a good enough algorithm to find optimal weighting factors and an efficient enough algorithm to be useful in practice.

A patch already exists for this--but it is so old (from the early days of x264) that it is practically useless except as a guide to how to start implementing it now.

Note that the challenge here is twofold: add support for weighted P-frame prediction and make an algorithm good enough and fast enough to make the feature useful.

Qualification tasks

This year, the qualification tasks will represent the start of your summer project. We're willing to give all the technical help you need, but of course we won't write the code for you. "Passing" a qualification task is at the mentor's discretion. Note these are designed to be difficult and help lead you into your main project. If you can't do the qualification task for the project, you surely cannot do the project either!

Again, to reiterate, we will guide you through as much of the codebase as you need to do your work. This page is not supposed to give you all the information you need to do these tasks: you are expected to contact us for more information. Feel free to ask tons of questions. On #x264dev IRC channel on Freenode, of course.

Optimization (x86)

If you're interested in the optimization task, the qualification task is to speed up x264 on x86 (32 or 64-bit) by 1-2% on "normal settings" without changing the output. This is much harder than it sounds.

ARM Support

If you're interested in the ARM task, your qualification task will be to:

  • Fix the unaligned access bug in the bitstream writer.
  • Write NEON SIMD assembly for at least a few of the simpler significant DSP functions (SAD, SATD, etc).

4:4:4 and 4:2:2 Colorspaces

If you're interested in working on this project, your task is to produce an x264-encoded bitstream in 4:4:4 or 4:2:2 format. It does not actually have to be remotely viewable (that is, you don't have to implement any of the code to handle motion compensation, deblocking, or anything else involving 4:4:4/4:2:2 chroma data), but the bitstream has to be written correctly (correct syntax elements). The patch you write for this will be the starting point for your main project.

GPU Motion Estimation

Your task for this project will be to write a C version of your final algorithm. It doesn't need to deal with any of the corner cases; all it has to do is run before the main encoding loop, deciding the motion vectors for the frame. It doesn't even have to work with threading. It doesn't have to support sub-16x16 partitions either. The hierarchical search works via the following algorithm.

  • Set N equal to 2^M, where M is an integer. A common M is 4.
  • WHILE N is greater than 1:
  • Downscale the image (from the original) by a factor of N.
  • Do an ordinary diamond motion search on the image with block size 16x16. Assume the predicted motion vector to be equal to the median of the top, left, and top right motion vectors (as per H.264 MV prediction)... but use the motion vectors from the previous iteration, not the current for these (this is what allows you to parallelize things with CUDA).
  • For each block after searching, split the motion vectors of that block into 4 separate (but equal) motion vectors. These will be used as the starting point for the searches in the next iteration. Each iteration progressively refines the result at a progressively lesser downscale.
  • N = N/2
  • Do a final refine at no downscale at all.

Weighted P-frame Prediction

The task for this project is to add support for Weighted P-frame Prediction--but not to write any sort of handling of it from an algorithmic standpoint, so no algorithm to decide the weights. That is, all the weights are just set to some constant value until we come up with a better way to do things. As with the other tasks, you can make many simplifications here to avoid corner cases, such as ignoring multithreading. Another simplification you can make is to only allow a weight on the first reference frame, or even have the program be completely ignorant of the weighted version of the frame until the final encode. All that matters is you get the basic framework working.

Contact info

If you are interested, drop by #x264dev or #x264 on Freenode.

You should also contact the admin jb.


VideoLAN Google Summer of Code (GSoC/SoC) mentoring projects
20072008200920102011 (GCi 2011SOCIS x264 2011)20122013201620172018201920202021202220232024