Difference between revisions of "SoC x264 2010"
|Line 101:||Line 101:|
===GPU Motion Estimation===
===GPU Motion Estimation===
Your task for this project will be to write a C version of your final algorithm. It doesn't need to deal with any of the corner cases; all it has to do is run before the main encoding loop, deciding the motion vectors for the frame. It doesn't even have to work with threading. It doesn't have to support sub-16x16 partitions either.
Your task for this project will be to write a C version of your final algorithm. It doesn't need to deal with any of the corner cases; all it has to do is run before the main encoding loop, deciding the motion vectors for the frame. It doesn't even have to work with threading. It doesn't have to support sub-16x16 partitions either. hierarchical search works via the following algorithm
* Set N equal to 2^M, where M is an integer. A common M is 4.
* Set N equal to 2^M, where M is an integer. A common M is 4.
Revision as of 05:04, 17 March 2010
- 1 Introduction to x264 and Summer of Code
- 2 Guide to getting involved
- 3 Skills needed
- 4 Projects
- 5 Qualification tasks
- 6 Contact info
Introduction to x264 and Summer of Code
x264 is the most popular open source video compression software in the world, used worldwide for applications such as web video, television broadcast, and Blu-ray authoring. It outclasses practically all commercial implementations both speed and compression-wise. While not actually part of VLC or ffmpeg, it is a major library used by both, licensed under the GPL. Due to its popularity in the commercial world (for example, Youtube and Facebook rely on it), many companies have offered bounties in the past for features and improvements that they found useful.
But don't let that all scare you. There's still plenty of projects that a student can effectively get involved in.
x264 is part of the VideoLAN candidature for Google Summer of Code 2010.
- Lead mentor (and author of this page): Jason Garrett-Glaser (Dark Shikari)
- Possible others mentors: David Conrad (Yuvi)
An overview of x264's structure and algorithms can be found here. It is somewhat outdated, but still mostly accurate. Do note that understanding this is not necessary for all projects.
Guide to getting involved
- Hop on IRC--whenever! We have a friendly community that's happy to help and to talk about pretty much whatever. #x264 is the general discussion/user help channel and #x264dev is the development-only channel (Freenode server). If you've never used IRC, grab Chatzilla. There is no such thing as a stupid question--only stupid people--so don't worry about sounding dumb. Just get involved!
- Ask questions--I cannot stress this enough. We've had students who couldn't get anywhere because they were stuck--but didn't ask questions! There's no shame in asking questions; that's how everyone here got involved.
- Stay on IRC. Even if you have nothing to say, being around lets you get to know the community, get a feel for what's going on, and even sometimes help out by pointing the semicolon I left at the end of my if statement. This applies during the project period too: we expect all students to always be on IRC. This doesn't mean you have to be actually active all the time, but whenever you're free, you should be at least pingable on IRC, and whenever you're working on Summer of Code, you should be active on IRC.
- The best option for staying online is to get a remote shell account of some sort and leave an IRSSI client on 24/7 in a screen. This lets you easily come back in the morning and see if you missed anything important. checkers can provide you with a shell if you end up being accepted as a student.
- Even if you don't get a slot as a student, or you don't qualify for Summer of Code, we will happily provide you with the exact same support that we would give a student: it is quite reasonable and not at all uncommon for us to have projects of similar scale to Summer of Code be done by students who are not part of the Summer of Code program.
These are required for all listed projects and probably anything not listed, too.
- Basic C programming.
- Basic understanding of video encoding, or at least willingness to do the appropriate reading up on the topic before the summer begins. Most people who get involved don't know much to start with; we don't expect you to!
- To work on anything related directly to the encoder core (not all projects), you'll need to do some significant background reading on relevant topics.
- A PDF with a chapter that can serve as a primer to video compression can be found here. It also has some more specific chapters on MPEG-4 Part 10 (H.264).
We are probably only accepting 3-4 students this year. Thus, you will have to prove that you are absolutely able to do your project over the summer--see the qualification tasks. Do remember that the project ideas listed here are merely suggestions; other ideas are always possible if they fit the time and difficulty constraints of SoC. For example, if you have an idea to improve compression (e.g. implementing a new algorithm) that you think we can benefit from, feel free to bring it up.
Optimization (x86 and/or ARM)
x264 prides itself on being one of the most optimized programs in existence while still being reasonably readable and maintainable. This project is about furthering that goal: make it even faster without sacrificing code quality.
This is probably the hardest task, as x264 is already so absurdly optimized that going significantly further is going to be very difficult. As such, the qualification task for this project (see below) requires you to prove that you can get at least somewhere. The goal here is at least a 5-10% performance increase in x264 on x86 or ARM (and ideally, other CPUs too, but processor-specific optimizations are allowed here). Some unconventional ideas in addition to the obvious tasks of writing loads of assembly and finding cases where the C code can be further optimized:
- Cache profiling: try to minimize cache misses.
- Code size profiling: find the inherent "value" in clock cycles of a line of cache and use this to try to optimize the code size of existing assembly.
- Aliasing optimizations: find places where aliasing is hurting the compiler's ability to optimize.
Not recommended for anyone other than hardcore assembly gurus.
4:4:4 and 4:2:2 colorspaces
x264 currently only supports the 4:2:0 colorspace, also known as YV12. However, many profession applications require higher precision in the form of 4:2:2 or 4:4:4 chroma subsampling. Furthermore, such sampling is useful for many artificial video sources, like video game captures and presentations, where sharp chroma edges are blurred by the 4:2:0 sampling. This is a large project that, unlike the previous, probably does not involve modifying any assembly if you don't want to. In cases where things would be easier if the assembly was modified, the other developers will be willing to do it for you if you aren't an assembly guru.
The project mostly covers changing every place in the code where YV12 was assumed--and making it variable. It also involves handling the potential new syntax elements and adding a new Hadamard transform for the new chroma DC channels. It also may involve modifying x264_scan8, which could be a rather obnoxious task given how many parts of x264 assume certain properties of it.
This project will involve a lot of coding and a lot of debugging, but none of it should be particularly complex.
GPU motion estimation
While porting x264 entirely to CUDA or OpenCL is an insane task, there are three possible methods that could be used to offload some work to the GPU:
- High-complexity motion search designed to get useful predictors to be used by the main motion search.
- Massively parallelized lookahead motion search, designed to do a lot of the work normally done in the lookahead thread. May also improve B-frame decision and other parts of the lookahead.
- Motion search designed to completely replace x264's main motion search: would require a lot of threading trickery to sync it perfectly with the main encoder threads.
The general algorithm that has been agreed on after a great deal of discussion is the hierarchical search method. If you have a better idea, feel free to propose it, of course. More description of this method is in the Qualification Tasks section.
This project is not recommended unless you have a very significant amount of experience with CUDA or OpenCL.
10-bit encoding support
Higher bit depth allows not only professional applications, but also allows us to significantly improve quality: tests have shown up to 10-15% improvements just from the extra precision! However, this would involve making a lot of internal changes. But there's one thing that makes it relatively easy: we're willing to template the encoder to do this. In other words, you can feel free to define a PIXEL type (uint8_t if compiled without 10-bit, uint16_t if compiled with) and use that to replace all of the current uint8_t image code. This will allow moving to higher-bit encoding relatively easy without too much effort on your part.
There are a few other things that make this task easier than it may seem:
- While this task in theory requires a lot of new assembly code, it is not at all required for the project: it can be written later. For now, we can stick to no assembly only. However, having experience with or being willing to learn assembly is a big plus for this project, especially if you think you can finish early.
- x264 rarely access pixels directly outside of DSP functions, and the DSP functions can be easily modified. This greatly reduces the amount of code to modify.
If you finish early, there will be many optimizations we can make:
- Perform motion estimation on an 8-bit version of the frame, but encoding on a full-res version of the frame (motion estimation is vastly faster, much more than 2x faster, on 8-bit).
- Start writing assembly code.
One of x264's current projects is to create a more powerful, general-purpose frontend that is user-friendly and Just Works. In short, you'll be able to run 'x264 input -o output' and generate a perfectly good output file with high-quality video and audio, without messing with any settings. We've got the video down pat, but audio will be whole separate matter. This project will involve the following steps:
- Add some audio handling framework to the main x264 CLI app.
- Add audio input support to the FFMS, LAVF, and Avisynth input modules.
- Add audio encoding support using libavcodec (or, if preferred, libvorbis directly). We plan to support Vorbis and AAC.
- Add audio muxing support to the MP4, FLV, and MKV output modules.
- Optional: Add an audio sync engine so that the user can change framerates and still have the audio in sync.
- Optional: Support audio resampling and downsampling (again, using libavcodec).
This is not nearly as hard as it looks, but will involve touching a whole lot of the main frontend code and learning a lot about how applications such as ffmpeg and VLC work. It's a great project for anyone who wants to get involved in x264 but doesn't think they have the skills to work on the encoder core.
Non-local RD optimization
x264's biggest weakness is that it only considers the optimal decisions for the current macroblock; it isn't aware of the effects of its decision on the future. There may be significant benefits to be gained via non-local methods, such as iterative optimization. However, especially in H.264, non-local RD is very difficult to do efficiently. This is a project primarily targeted at someone already familiar with video compression: in particular, it *requires* that you have at least some idea with regards to how you would do it! Your idea must not only improve compression, but also do so in some sane amount of time (a 2x speed cost might be tolerable, 400x most certainly not).
This is probably not very difficult from a coding standpoint and is really more of an algorithmic problem. Since we haven't done it already, that of course means it's a hard algorithmic problem.
Unlike many other projects, such as ffmpeg, x264's policy for qualification tasks is to use tasks that are directly useful to you for your summer project. That is, the projects directly lead you into the start of your project and create a base for you to work off for the rest of the summer. This is, in our opinion, much better than making you work on something completely unrelated. We're willing to give all the technical help you need, but of course we won't write the code for you. "Passing" a qualification task is at the mentor's discretion. Note these are designed to be relatively difficult and help lead you into your main project. If you can't do the qualification task for the project, you surely cannot do the project either!
Again, to reiterate, we will guide you through as much of the codebase as you need to do your work. This page is not supposed to give you all the information you need to do these tasks: you are expected to contact us for more information. Feel free to ask tons of questions. On #x264dev IRC channel on Freenode, of course.
None of these tasks are supposed to take more than a few days to a week of work. If you successfully complete one, we will almost surely accept you as a student.
If you're interested in the optimization task, the qualification task is to speed up x264 on x86 (32 or 64-bit) by 1-2% on "normal settings" without changing the output. This is much harder than it sounds. For ARM, the threshold would probably be a bit higher, as x264 is not as heavily optimized for ARM.
4:4:4 and 4:2:2 Colorspaces
If you're interested in working on this project, your task is to produce an x264-encoded bitstream in 4:4:4 or 4:2:2 format. It does not actually have to be at all watchable (that is, you don't have to implement any of the code to handle motion compensation, deblocking, or anything else involving 4:4:4/4:2:2 chroma data), but the bitstream has to be written correctly (correct syntax elements). The patch you write for this will be the starting point for your main project.
GPU Motion Estimation
Your task for this project will be to write a C version of your final algorithm. It doesn't need to deal with any of the corner cases; all it has to do is run before the main encoding loop, deciding the motion vectors for the frame. It doesn't even have to work with threading. It doesn't have to support sub-16x16 partitions either. Assuming you didn't propose another, the hierarchical search works via the following algorithm:
- Set N equal to 2^M, where M is an integer. A common M is 4.
- WHILE N is greater than 1:
- Downscale the image (from the original) by a factor of N.
- Do an ordinary diamond motion search on the image with block size 16x16. Assume the predicted motion vector to be equal to the median of the top, left, and top right motion vectors (as per H.264 MV prediction)... but use the motion vectors from the previous iteration, not the current for these (this is what allows you to parallelize things with CUDA).
- For each block after searching, split the motion vectors of that block into 4 separate (but equal) motion vectors. These will be used as the starting point for the searches in the next iteration. Each iteration progressively refines the result at a progressively lesser downscale.
- N = N/2
- Do a final refine at no downscale at all.
10-bit encoding support
Modify x264 to write a valid 10-bit bitstream. This is much easier than it sounds: you can "cheat" by simply writing out all the pixels as normal and only modifying the necessary header elements and syntax elements to make it work. Obviously this will look wrong, but all that matters is that it be a valid stream. The conditions of this are similar to that of the 4:2:2/4:4:4 project above. The stream must be decodable by JM, the H.264 reference decoder.
Your task for this project will be to do one step of the process. Pick the input module of your choice and add audio input to it. Pick the output module of your choice and add audio muxing to it. Add the simplest audio encoding method you like to the main encoder core. Finally, find some simple method to link the two together and thus have an x264 that, when using those specific modules, can encode audio. This will get you a good start on the main project without forcing you into the hairiest parts of the problem, since you can pick the easiest modules to work with and just do those. If you want, you can "cheat" by just passing through audio instead of encoding it.
Non-local RD optimization
Implement an extremely minimal prototype of some part of your idea. It can be ugly, hacky, and limited; all that matters is you demonstrate that you can take an idea and turn it into code in x264. Bonus points if the idea actually works.
If you are interested, drop by #x264dev or #x264 on Freenode and ping Dark Shikari.
You should also contact the admin jb.