Difference between revisions of "X264 TODO"
Jump to navigation
Jump to search
Dark Shikari (talk | contribs) |
Dark Shikari (talk | contribs) |
||
Line 39: | Line 39: | ||
* Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text. | * Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text. | ||
** If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou). | ** If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou). | ||
+ | * Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"... | ||
===Lookahead=== | ===Lookahead=== | ||
Line 75: | Line 76: | ||
* 2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved. | * 2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved. | ||
* 2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving). | * 2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving). | ||
+ | * Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD. | ||
===GPU=== | ===GPU=== |
Revision as of 22:12, 13 September 2010
This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.
Some useful resources: Dark Shikari's pile of junk, Pengvado's pile of junk.
If you're interested in doing any of this, drop by #x264dev on Freenode IRC. There are no experience or educational requirements for doing any of this, though you are expected to know how to code.
Contents
Motion Estimation
- Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.
- (T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.
- Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
- Somehow take into account the effect of motion vector decision on future blocks.
- Hierarchical motion estimation
- Approximations from lookahead MVs
- Iterative ME (as per Snow)
- Trellis motion estimation
- We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
- But how do we know which ones we can afford to skip, and when?
- Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
- libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?
Intra Analysis
- Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
- SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?
Mode Decision
- Can we find more ways to skip more motion searches in multiref?
- On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
- Chroma-aware mode decision for B-frames?
- See the TODOs for deblock-aware RD in common/deblock.c.
Psy
- Psy-RD is a hack. It works, but it's a hack. If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results. This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric. Is there a way to make it better?
- Should RD be linear at all? Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
- Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
- Psy-trellis should be merged with Psy-RD. There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
- RD should take into account local variance.
- Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
- Lambda should be picked independent of quantizer (i.e. with greater precision).
- Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text.
- If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
- Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"...
Lookahead
- Lookahead should be multithreaded, either by splitting the frame (sliced threads) or running multiple frame analysis calls at once.
- Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
- Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
Quantization
- CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.
- There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
- Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?
- Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.
- Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?
Transform
- Analyze the error characteristics of the fDCT. Is there any way to make it more accurate without much speed loss? Particularly at extremely low quantizers, this might help.
- Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter. See this paper.
Interlacing
- Finish adaptive MBAFF. Talk to horlicks about this one.
- Make slice-max-mbs/slice-max-size work with interlacing. Pretty much requires a good portion of adaptive MBAFF.
- Lookahead currently blend-deinterlaces to get the lowres. Is this a good idea? Is there something better that isn't much slower?
Weighted Prediction
- Make weightp work with interlacing. Preferably abuse reference duplication to make it useful for MBAFF.
- Make weightp work with chroma. Talk to DylanZA about getting his current patch for this one.
- Finish K-means decision for weightp. Talk to DylanZA about getting his current patch for this one.
- Add explicit weighting for B-frames, too. This helps in nonlinear fades, among other cases.
Ratecontrol
- VBV might be able to utilize the ability to re-encode a row of the frame for improved accuracy.
- Maybe re-encode everything in case of an underflow that row-reencoding can't fix? This might be better than underflowing.
- Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.
- Make the frame size and row size predictors better. They currently are kind of crappy.
- Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qcomp.
- 2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.
- 2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
- Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.
GPU
- Motion estimation?
- Methods
- Hierarchical?
- 2D Wave?
- Something else?
- "Easy": lookahead motion estimation
- Extremely high parallelism, hundreds of frame searches (each with thousands of searches) at once.
- "Hard": main motion estimation
- Difficult synchronization issues, not as heavily parallel in terms of numb of macroblocks, but far more partition sizes and refs to search.
- But potentially more useful...
- Methods
- Other things?
x86 assembly
- Finish AVX support for x86inc.asm. Talk to Dark Shikari for a patch.
- Optimize more for the Phenom.
- Optimize for the upcoming Sandy Bridge.
- Work on 10-bit asm. Talk to irock about this.
- Convince holger to commit his local patches.
Other assembly
- NEON assembly is nowhere near complete.
- Chroma MC needs to be rewritten for NV12 support.
- Altivec assembly is very lacking.
Other CPU optimizations
- x264 needs more prefetching. How many L1 and L2 cache misses (particularly L1) can we get rid of via smart prefetching in the right places? Warning: this is often hard to benchmark.
x264CLI
- Finish audio support. Talk to Kovensky about this one.
- Add more filters.
- Deinterlacers (YADIF).
- Denoisers (HQDN3D?).
- IVTC, decomb?