Difference between revisions of "SoC x264 2010"
|Line 112:||Line 112:|
===Adaptive MBAFF support===
===Adaptive MBAFF support===
This is a big task, so the qualification task will be a very small subset of it. Specifically, you must make a patch that allows bit-exact intra-only encoding with random interlaced vs progressive macroblock choices. You don't need to support CABAC ''or'' deblocking either: it just has to work. What makes this qualification task a bit trickier is that your patch must be written from scratch (you can use the existing material for reference, but no copy-pasting) and it must be ''nearly committable''. In other words, it must be good enough work to demonstrate that you are able to write high enough quality code to finish the full patch effectively.
This is a big task, so the qualification task will be a very small subset of it. Specifically, you must make a patch that allows bit-exact intra-only encoding with random interlaced vs progressive macroblock choices. You don't need to support CABAC ''or'' deblocking either: it just has to work.
What makes this qualification task a bit trickier is that your patch must be written from scratch (you can use the existing material for reference, but no copy-pasting) and it must be ''nearly committable''. In other words, it must be good enough work to demonstrate that you are able to write high enough quality code to finish the full patch effectively.
Revision as of 22:43, 28 January 2010
- 1 Introduction to x264 and Summer of Code
- 2 Guide to getting involved
- 3 Skills needed
- 4 Projects
- 5 Qualification tasks
- 6 Contact info
Introduction to x264 and Summer of Code
x264 is the most popular open source video compression software in the world, used worldwide for applications such as web video, television broadcast, and Blu-ray authoring. It outclasses practically all commercial implementations both speed and compression-wise. While not actually part of VLC or ffmpeg, it is a major library used by both, licensed under the GPL. Due to its popularity in the commercial world (for example, Youtube and Facebook rely on it), many companies have offered bounties in the past for features and improvements that they found useful.
But don't let that all scare you. There's still plenty of projects that a student can effectively get involved in. Do remember that the project ideas listed here are merely suggestions; other ideas are always possible if they fit the time and difficulty constraints of SoC.
x264 is part of the VideoLAN candidature for Google Summer of Code 2010.
- Lead mentor (and author of this page): Jason Garrett-Glaser (Dark Shikari)
- Possible mentors: David Conrad (Yuvi)
An overview of x264's structure and algorithms can be found here. It is somewhat outdated, but still mostly accurate. Do note that understanding this is not necessary for all projects.
Guide to getting involved
These are required for all listed projects and probably anything not listed, too.
- Basic C programming.
- Basic understanding of video encoding, or at least willingness to do the appropriate reading up on the topic before the summer begins.
- To work on anything related directly to the encoder core (not all projects), you'll need to do some significant background reading on relevant topics.
- A PDF with a chapter that can serve as a primer to video compression can be found here. It also has some more specific chapters on MPEG-4 Part 2 and Part 10.
We are probably only accepting 3-5 students this year. Thus, you will have to prove that you are absolutely able to do your project over the summer--see the qualification tasks.
Optimization (x86 and/or ARM)
x264 prides itself on being one of the most optimized programs in existence while still being reasonably readable and maintainable. This project is about furthering that goal: make it even faster without sacrificing code quality.
This is probably the hardest task, as x264 is already so absurdly optimized that going significantly further is going to be very difficult. As such, the qualification task for this project (see below) requires you to prove that you can get at least somewhere. The goal here is at least a 5-10% performance increase in x264 on x86 or ARM (and ideally, other CPUs too, but processor-specific optimizations are allowed here). Some unconventional ideas in addition to the obvious tasks of writing loads of assembly and finding cases where the C code can be further optimized:
- Cache profiling: try to minimize cache misses.
- Code size profiling: find the inherent "value" in clock cycles of a line of cache and use this to try to optimize the code size of existing assembly.
- Aliasing optimizations: find places where aliasing is hurting the compiler's ability to optimize.
Not recommended for anyone other than hardcore assembly gurus.
4:4:4 and 4:2:2 colorspaces
x264 currently only supports the 4:2:0 colorspace, also known as YV12. However, many profession applications require higher precision in the form of 4:2:2 or 4:4:4 chroma subsampling. Furthermore, such sampling is useful for many artificial video sources, like video game captures and presentations, where sharp chroma edges are blurred by the 4:2:0 sampling. This is a large project that, unlike the previous two, probably does not involve modifying any assembly if you don't want to. In cases where things would be easier if the assembly was modified, the other developers will probably be willing to do it for you if you aren't an assembly guru.
The project mostly covers changing every place in the code where YV12 was assumed--and making it variable. It also involves handling the potential new syntax elements and adding a new Hadamard transform for the new chroma DC channels. It also may involve modifying x264_scan8, which could be a rather obnoxious task given how many parts of x264 assume certain properties of it.
This project will involve a lot of coding and a lot of debugging, but none of it should be particularly complex.
GPU motion estimation
While porting x264 entirely to CUDA or OpenCL is an insane task, there are three possible methods that could be used to offload some work to the GPU:
- High-complexity motion search designed to get useful predictors to be used by the main motion search.
- Massively parallelized lookahead motion search, designed to do a lot of the work normally done in the lookahead thread. May also improve B-frame decision and other parts of the lookahead.
- Motion search designed to completely replace x264's main motion search: would require a lot of threading trickery to sync it perfectly with the main encoder threads.
The general algorithm that has been agreed on after a great deal of discussion is the hierarchical search method. More description of this method is in the Qualification Tasks section.
This project is not recommended unless you have a very significant amount of experience with CUDA or OpenCL.
One of x264's current projects is to create a more powerful, general-purpose frontend that is user-friendly and Just Works. In short, you'll be able to run x264 input -o output and generate a perfectly good output file with high-quality video and audio, without messing with any settings. We've got the video down pat, but audio will be whole separate matter. This project will involve the following steps:
- Add some audio handling framework to the main x264 CLI app.
- Add audio input support to the FFMS, LAVF, and Avisynth input modules.
- Add audio encoding support using libavcodec (or, if preferred, libvorbis directly). We plan to support Vorbis and AAC.
- Add audio muxing support to the MP4, FLV, and MKV output modules.
- Optional: Add an audio sync engine so that the user can change framerates and still have the audio in sync.
- Optional: Support audio resampling and downsampling (again, using libavcodec).
This is not nearly as hard as it looks, but will involve touching a whole lot of the main frontend code and learning a lot about how applications such as ffmpeg and VLC work. It's a great project for anyone who wants to get involved in x264 but doesn't think they have the skills to work on the encoder core.
Adaptive MBAFF support
x264 currently supports interlaced encoding, but only if every single macroblock pair of the image is coded as interlaced. Compression can be greatly improved if we allow mixing progressive and interlaced blocks in the same image. This, however, requires a huge number of internal changes:
- x264_macroblock_cache_load, the function that loads relevant neighbor data into the caches for the encoding process, will need to be about 3 times more complex.
- Some parts of MBAFF cannot be abstracted away by stuffing them in cache_load; the top left/right motion vectors for MV prediction are an example of this.
- CABAC entropy coding will need some significant modifications in order to hand more calculations off to cache_load.
- We'll have to find an efficient way to pick between progressive and interlaced coding for each block.
- Deblocking will require many nasty changes.
- Many, many other changes will need to be made!
The skills required here are significant: a deep understanding of H.264, a significant understanding of x264 and libavcodec, and a lot of dedication. If this was all there was to it, this would unquestionably be the hardest project here..
But there is one thing you have going for you...
There's already a patch for it!
It's incredibly badly written, inefficient, outdated, and covered with bugs--but it exists! And furthermore, libavcodec's H.264 decoder already supports adaptive MBAFF. All of this contributes to a huge set of available resources for this project. Now, for the gotchas.
- There are some parts of the task that could be omitted. For example, the deblocking changes aren't necessary to produce a working output stream, and could be done later.
- There are some features that would be harder to implement with MBAFF that we already have working currently (e.g. Constrained Intra). These can probably simply be thrown away, i.e. not allow them in interlaced mode.
And now for the final bonus: there are a lot of companies who want this feature. They are willing to pay a lot of money for it. There is an oustanding $7500 bounty for this task, and at least one other company that promised "at least twice that" to whoever can complete adaptive MBAFF and get it committed to the x264 trunk. In short, there is huge money here if you are skilled enough.
This year, the qualification tasks will represent the start of your summer project. We're willing to give all the technical help you need, but of course we won't write the code for you. "Passing" a qualification task is at the mentor's discretion. Note these are designed to be difficult and help lead you into your main project. If you can't do the qualification task for the project, you surely cannot do the project either!
Again, to reiterate, we will guide you through as much of the codebase as you need to do your work. This page is not supposed to give you all the information you need to do these tasks: you are expected to contact us for more information. Feel free to ask tons of questions. On #x264dev IRC channel on Freenode, of course.
If you're interested in the optimization task, the qualification task is to speed up x264 on x86 (32 or 64-bit) by 1-2% on "normal settings" without changing the output. This is much harder than it sounds. For ARM, the threshold would probably be a bit higher, as x264 is not as heavily optimized for ARM.
4:4:4 and 4:2:2 Colorspaces
If you're interested in working on this project, your task is to produce an x264-encoded bitstream in 4:4:4 or 4:2:2 format. It does not actually have to be remotely viewable (that is, you don't have to implement any of the code to handle motion compensation, deblocking, or anything else involving 4:4:4/4:2:2 chroma data), but the bitstream has to be written correctly (correct syntax elements). The patch you write for this will be the starting point for your main project.
GPU Motion Estimation
Your task for this project will be to write a C version of your final algorithm. It doesn't need to deal with any of the corner cases; all it has to do is run before the main encoding loop, deciding the motion vectors for the frame. It doesn't even have to work with threading. It doesn't have to support sub-16x16 partitions either. The hierarchical search works via the following algorithm.
- Set N equal to 2^M, where M is an integer. A common M is 4.
- WHILE N is greater than 1:
- Downscale the image (from the original) by a factor of N.
- Do an ordinary diamond motion search on the image with block size 16x16. Assume the predicted motion vector to be equal to the median of the top, left, and top right motion vectors (as per H.264 MV prediction)... but use the motion vectors from the previous iteration, not the current for these (this is what allows you to parallelize things with CUDA).
- For each block after searching, split the motion vectors of that block into 4 separate (but equal) motion vectors. These will be used as the starting point for the searches in the next iteration. Each iteration progressively refines the result at a progressively lesser downscale.
- N = N/2
- Do a final refine at no downscale at all.
Your task for this project will be to do one step of the process. Pick the input module of your choice and add audio input to it. Pick the output module of your choice and add audio muxing to it. Add the simplest audio encoding method you like to the main encoder core. Finally, find some simple method to link the two together and thus have an x264 that, when using those specific modules, can encode audio. This will get you a good start on the main project without forcing you into the hairiest parts of the problem, since you can pick the easiest modules to work with and just do those.
Adaptive MBAFF support
This is a big task, so the qualification task will be a very small subset of it. Specifically, you must make a patch that allows bit-exact intra-only encoding with random interlaced vs progressive macroblock choices. You don't need to support CABAC or deblocking either: it just has to work. This eliminates almost all the hard parts: you don't have to mess with motion vectors, CABAC, deblocking, or any of the hard stuff.
What makes this qualification task a bit trickier is that your patch must be written from scratch (you can use the existing material for reference, but no copy-pasting) and it must be nearly committable. In other words, it must be good enough work to demonstrate that you are able to write high enough quality code to finish the full patch effectively.
If you are interested, drop by #x264dev or #x264 on Freenode.
You should also contact the admin jb.