Difference between revisions of "X264 TODO"

From VideoLAN Wiki
Jump to navigation Jump to search
m (+{{Back to|Category:x264}})
 
(22 intermediate revisions by 8 users not shown)
Line 1: Line 1:
 +
{{Lowercase}}
 +
{{Back to|Category:x264}}
 
This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.  
 
This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.  
  
Line 11: Line 13:
 
*Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.  
 
*Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.  
 
*(T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.  
 
*(T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.  
*Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.  
+
*Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
 +
**I have a patch for this in the lookahead, but it didn't help much, since it only added predictors.
 
*Somehow take into account the effect of motion vector decision on future blocks.  
 
*Somehow take into account the effect of motion vector decision on future blocks.  
 
**Hierarchical motion estimation  
 
**Hierarchical motion estimation  
Line 22: Line 25:
 
*libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?  
 
*libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?  
 
*With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?  
 
*With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?  
 +
**This seems to be awful from my testing, but maybe there's something we can do?
 
*Try sub-8x8 partitions in B-frames. Is it at all useful?  
 
*Try sub-8x8 partitions in B-frames. Is it at all useful?  
 
*Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?  
 
*Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?  
Line 29: Line 33:
 
=== Intra Analysis ===
 
=== Intra Analysis ===
  
*Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.  
+
*Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
 +
**With the SSSE3-based fast intra analysis, we no longer do any early terminations for different modes, at least in SAD/SATD analysis.  But there might still be improvements to be made.
 
*SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?
 
*SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?
  
 
=== Mode Decision ===
 
=== Mode Decision ===
  
*Can we find more ways to skip more motion searches in multiref?  
+
*Can we find more ways to skip more motion searches in multiref?
*On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?  
+
**A while back, I tried using weaker motion searches on older refs.  This helped a bit for speed-vs-compression, but is ironically the opposite of what one wants; older refs will be harder to find good MVs in, and therefore really need better searches.
*See the TODOs for deblock-aware RD in common/deblock.c.  
+
*On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
*Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.  
+
**Can we do something smart by analyzing fenc?  It's impossible to tell whether a block is motionless by looking at fdec, but looking at the source pixels is useful.  There's still complexity such as lower-QP-than-reference though.
 +
*See the TODOs for deblock-aware RD in common/deblock.c.
 +
**I tried correcting weightp references for deblock RDO, but it didn't help.
 +
**I tried chroma, too, and again, it didn't help measurably.
 +
*Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.
 +
**Doing a merged 4x4/8x8 SATD would help here, but would require new asm.
 
*Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.  
 
*Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.  
 
*Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? [http://akuvian.org/src/x264/x264_dct8_guess.diff This has been tried before], but only helped if our guess was extremely good (better than we could get in reality).  
 
*Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? [http://akuvian.org/src/x264/x264_dct8_guess.diff This has been tried before], but only helped if our guess was extremely good (better than we could get in reality).  
*With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?  
+
*With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?
*How about a "brute force" mode decision that takes no shortcuts (no early ref termination in p8x8, no SATD thresholds, etc)?
+
*How about saving CABAC state between each trellis call, rather than basing them all on the CABAC state at the start of the macroblock?
 +
*Make subme=11 not do thresholding in qpel RD and bidir RD.
  
 
=== Psy ===
 
=== Psy ===
Line 59: Line 70:
 
=== Lookahead ===
 
=== Lookahead ===
  
*'''Lookahead should be multithreaded, either by splitting the frame (sliced threads) or running multiple frame analysis calls at once.'''
 
 
*Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.  
 
*Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.  
 
*Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
 
*Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
 +
*B-adapt 1 could be made quite a bit better -- it's important because it's used on all the fast speed modes (and even the defaults).  "Harbour 4CIF" is a good example of a clip where it does noticeably badly.
  
 
=== Quantization ===
 
=== Quantization ===
  
*CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.  
+
*CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.
*There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.  
+
**This is doubly important now, as CABAC trellis has been made way faster, but CAVLC hasn't.  Many of the CABAC trellis improvements can be backported.
 +
*There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
 +
**How useful is this with an entropy coder that doesn't really bias towards zero-runs, as in CABAC?
 
*Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?  
 
*Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?  
 
*Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.  
 
*Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.  
 +
**See [https://github.com/saintdev/x264-devel/compare/enquant-base...energy-quant saintdev's github] for one attempt at this.
 
*Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?
 
*Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?
  
Line 91: Line 105:
 
=== Ratecontrol ===
 
=== Ratecontrol ===
  
*VBV might be able to utilize the ability to re-encode a row of the frame for improved accuracy.
 
**Maybe re-encode everything in case of an underflow that row-reencoding can't fix? This might be better than underflowing.
 
 
*Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.  
 
*Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.  
 
*Make the frame size and row size predictors better. They currently are kind of crappy.  
 
*Make the frame size and row size predictors better. They currently are kind of crappy.  
*Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qblur.  
+
*Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qblur.
*2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.  
+
*1-pass ratecontrol often can't adapt fast enough when there are lots of threads (12, 16, 24, etc), especially with smallish VBV buffers.  Improve this?
*2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).  
+
*2-pass VBV is actually a bit more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.
 +
**2-pass is still better in the case of many threads, due to the above.
 +
*2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
 
*Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.
 
*Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.
  
Line 113: Line 127:
 
***But potentially more useful...  
 
***But potentially more useful...  
 
*Other things?
 
*Other things?
 
=== x86 assembly ===
 
 
*Optimize more for the Phenom.
 
*Yell at holger to commit his local patches.
 
*Make a merged SA8D/SATD for the 8x8dct mode decision, since the two share most of their calculation. Hadamard_ac already does this, but slightly differently.
 
  
 
=== Other assembly ===
 
=== Other assembly ===
  
*NEON assembly is nowhere near complete.  
+
* A lot of ARM assembly is done. Missing is mostly for Hi-Depth bitrate.
**Chroma MC needs to be rewritten for NV12 support.  
+
* Altivec assembly is very lacking.
*Altivec assembly is very lacking.
 
*SPARC VIS assembly is only available when high bit-depth is disabled.
 
  
 
=== Other CPU optimizations ===
 
=== Other CPU optimizations ===
Line 134: Line 140:
 
=== Other features ===
 
=== Other features ===
  
*MPEG-2 encoding support  
+
*MPEG-2 encoding support
*VP8 encoding support
+
**[https://github.com/kierank/x262/wiki/TODO x262]
*'''4:2:2 colorspace support'''
 
 
*Support for SMPTE timecodes  
 
*Support for SMPTE timecodes  
 
*Merge speedcontrol  
 
*Merge speedcontrol  
 
*Mixed lossless/lossy encoding.
 
*Mixed lossless/lossy encoding.
 +
*Segment re-encoding
  
 
=== x264CLI ===
 
=== x264CLI ===
  
*Finish audio support. Talk to Kovensky about this one.
+
*Finish audio support. Talk to Kovensky about this one.
*Make the filtering system aware of fullrange vs TV range.  
+
*Make the filtering system aware of BT.601 vs BT.709.
*Make the filtering system aware of BT.601 vs BT.709.  
+
*Use libavfilter instead of duplicating the filters in x264.
*Add more filters.
 
**Deinterlacers (YADIF).
 
**Denoisers (HQDN3D?).
 
**IVTC, decomb?
 
*Merge L-SMASH mp4 muxer.
 
*Add TS muxing support using HRD. Talk to kierank about this one.  
 
 
*Add --device support.  
 
*Add --device support.  
 
*Add automatic --level restriction support.
 
*Add automatic --level restriction support.
  
=== SOCIS x264 Profile ===
 
 
CPU: ARM V7 PMNC, speed 0 MHz (estimated)
 
 
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 100000
 
  
<pre>samples  %        image name              symbol name
+
[[Category:x264]]
9764    17.8387  x264                     mc_chroma
 
3132      5.7221  x264                    x264_pixel_avg2_w16_neon
 
2706      4.9438  x264                    x264_me_search_ref
 
2697      4.9274  x264                    refine_subpel
 
2490      4.5492  x264                    x264_quant_4x4_trellis
 
2089      3.8166  x264                    x264_pixel_avg2_w8_neon
 
2014      3.6795  x264                    x264_pixel_satd_8x4
 
1959      3.5791  x264                    get_ref_neon
 
1309      2.3915  x264                    x264_pixel_sad_16x16_neon
 
1125      2.0554  x264                    x264_macroblock_encode
 
1089      1.9896  x264                    x264_macroblock_analyse
 
807      1.4744  x264                    x264_satd_8x4v_8x8h_neon
 
780      1.4250  x264                    x264_rd_cost_mb
 
724      1.3227  x264                    x264_satd_8x8_neon
 
679      1.2405  x264                    x264_pixel_sad_x4_16x16_neon
 
672      1.2277  x264                    x264_pixel_sad_x4_8x8_neon
 
638      1.1656  x264                    x264_pixel_satd_4x4_neon
 
634      1.1583  x264                    x264_macroblock_cache_load_progressive
 
633      1.1565  x264                    x264_pixel_satd_4x4
 
601      1.0980  x264                    x264_pixel_sad_8x8_neon
 
571      1.0432  x264                    x264_quant_4x4_neon
 
528      0.9646  x264                    x264_slicetype_mb_cost
 
500      0.9135  x264                    x264_mb_predict_mv
 
492      0.8989  x264                    x264_pixel_satd_8x8_neon
 
473      0.8642  x264                    x264_mb_analyse_intra
 
471      0.8605  x264                    x264_mb_encode_8x8_chroma
 
454      0.8295  x264                    x264_pixel_sad_x3_16x16_neon
 
452      0.8258  x264                    x264_macroblock_tree_propagate
 
404      0.7381  x264                    x264_pixel_sad_x3_8x8_neon
 
395      0.7217  x264                    x264_satd_16x4_neon
 
386      0.7052  libc-2.9.so              /lib/libc-2.9.so
 
367      0.6705  x264                    x264_mb_predict_mv_ref16x16
 
353      0.6449  x264                    x264_sub8x4_dct_neon
 
309      0.5645  x264                    x264_macroblock_cache_save
 
293      0.5353  x264                    x264_hadamard_ac_8x8_neon
 
281      0.5134  x264                    x264_analyse_update_cache
 
280      0.5116  x264                    x264_mb_analyse_inter_b8x8_mixed_ref
 
276      0.5042  x264                    x264_slice_write
 
270      0.4933  x264                    x264_cabac_encode_decision_c
 
267      0.4878  x264                    x264_mb_analyse_inter_b16x16
 
253      0.4622  x264                    x264_cabac_mb_mvd
 
235      0.4293  x264                    block_residual_write_cabac
 
223      0.4074  x264                    x264_decimate_score16
 
214      0.3910  x264                    __aeabi_fdiv
 
209      0.3818  x264                    x264_pixel_var2_8x8_neon
 
204      0.3727  x264                    __aeabi_fadd
 
201      0.3672  x264                    deblock_strength_c
 
197      0.3599  x264                    x264_pixel_avg2_w20_neon
 
197      0.3599  x264                    x264_pixel_avg_w16_neon
 
187      0.3416  x264                    x264_mc_copy_w16_aligned_neon
 
184      0.3362  x264                    x264_cabac_mb_type
 
181      0.3307  x264                    load_deinterleave_8x8x2_fenc
 
178      0.3252  x264                    x264_macroblock_write_cabac
 
177      0.3234  x264                    x264_mb_mc_01xywh
 
168      0.3069  x264                    x264_mb_mc_0xywh
 
167      0.3051  x264                    mc_luma_neon
 
165      0.3015  x264                    x264_pixel_ssd_16x16_neon
 
161      0.2941  x264                    x264_quant_dc_trellis
 
159      0.2905  x264                    x264_pixel_avg_w8_neon
 
156      0.2850  x264                    x264_mc_copy_w8_neon
 
156      0.2850  x264                    x264_pixel_satd_16x16_neon
 
155      0.2832  x264                    x264_mb_analyse_inter_p16x16
 
151      0.2759  x264                    x264_pixel_ssd_8x8_neon
 
150      0.2740  x264                    memcpy_aligned_8_16_neon
 
148      0.2704  x264                    __aeabi_fmul
 
143      0.2613  x264                    x264_mb_encode_i4x4
 
134      0.2448  x264                    mbtree_propagate_cost
 
124      0.2265  x264                    x264_pixel_sad_8x16_neon
 
112      0.2046  x264                    x264_mc_weight_w16_offsetsub_neon
 
111      0.2028  x264                    x264_pixel_sad_x4_8x16_neon
 
110      0.2010  x264                    x264_mb_predict_mv_direct16x16
 
108      0.1973  x264                    x264_frame_init_lowres_core_neon
 
98        0.1790  x264                    x264_plane_copy_interleave_c
 
93        0.1699  x264                    block_residual_write_cabac
 
91        0.1663  x264                    __floatsisf
 
88        0.1608  x264                    x264_mb_mc
 
87        0.1589  x264                    x264_cabac_mb_ref
 
87        0.1589  x264                    x264_dequant_4x4_neon
 
86        0.1571  x264                    __aeabi_l2f
 
86        0.1571  x264                    x264_satd_4x8_8x4_end_neon
 
85        0.1553  x264                    x264_pixel_var_16x16_neon
 
82        0.1498  x264                    store_interleave_8x8x2
 
82        0.1498  x264                    x264_ratecontrol_mb_qp
 
80        0.1462  x264                    x264_pixel_sad_x4_16x8_neon
 
79        0.1443  x264                    x264_mb_predict_mv_16x16
 
79        0.1443  x264                    x264_pixel_sad_16x8_neon
 
76        0.1389  x264                    x264_mc_weight_w8_offsetsub_neon
 
72        0.1315  x264                    x264_prefetch_fenc_arm
 
71        0.1297  x264                    x264_mb_analyse_b_rd
 
70        0.1279  x264                    x264_add8x4_idct_neon
 
70        0.1279  x264                    x264_frame_deblock_row
 
70        0.1279  x264                    x264_pixel_sad_x3_8x16_neon
 
68        0.1242  x264                    x264_pixel_hadamard_ac_16x16_neon
 
61        0.1114  x264                    x264_coeff_last16_neon
 
57        0.1041  x264                    deblock_v_chroma_c
 
56        0.1023  x264                    x264_predict_16x16_h_c
 
55        0.1005  x264                    x264_predict_4x4_hd_c
 
54        0.0987  x264                    x264_predict_8x8_vr_c
 
53        0.0968  x264                    x264_mb_encode_i16x16
 
53        0.0968  x264                    x264_predict_4x4_vl_c
 
52        0.0950  x264                    x264_hpel_filter_c_neon
 
52        0.0950  x264                    x264_hpel_filter_v_neon
 
52        0.0950  x264                    x264_predict_4x4_vr_c
 
52        0.0950  x264                    x264_predict_8x8_filter_c
 
50        0.0913  x264                    x264_mb_analyse_intra_chroma
 
50        0.0913  x264                    x264_mb_analyse_p_rd
 
50        0.0913  x264                    x264_ratecontrol_mb
 
49        0.0895  x264                    x264_mb_mc_8x8
 
49        0.0895  x264                    x264_predict_8x8_hd_c
 
47        0.0859  x264                    memcpy_aligned_16_16_neon
 
47        0.0859  x264                    x264_mb_mc_1xywh
 
46        0.0840  x264                    x264_pixel_satd_16x8_neon
 
45        0.0822  x264                    x264_cabac_encode_terminal_c
 
45        0.0822  x264                    x264_cabac_mb_mvd
 
45        0.0822  x264                    x264_pixel_sad_x3_16x8_neon
 
45        0.0822  x264                    x264_zigzag_scan_4x4_frame_neon
 
44        0.0804  x264                    deblock_h_chroma_c
 
44        0.0804  x264                    x264_predict_8x8_vl_c
 
44        0.0804  x264                    x264_predict_8x8c_p_neon
 
43        0.0786  x264                    x264_pixel_satd_16x16
 
43        0.0786  x264                    x264_predict_8x8c_dc_c
 
41        0.0749  x264                    x264_macroblock_deblock_strength
 
40        0.0731  x264                    x264_copy_column8
 
40        0.0731  x264                    x264_memcpy_aligned_neon
 
38        0.0694  x264                    x264_predict_8x8_ddl_c
 
38        0.0694  x264                    x264_predict_8x8_ddr_c
 
37        0.0676  x264                    x264_mb_analyse_inter_b8x16
 
37        0.0676  x264                    x264_mc_copy_w16_neon
 
36        0.0658  x264                    x264_deblock_h_luma_neon
 
36        0.0658  x264                    x264_predict_16x16_v_c
 
36        0.0658  x264                    x264_predict_4x4_hu_c
 
36        0.0658  x264                    x264_sub4x4_dct_neon
 
36        0.0658  x264                    x264_sub8x8_dct_dc_neon
 
35        0.0639  x264                    x264_intra_satd_x3_4x4
 
35        0.0639  x264                    x264_mb_analyse_inter_b16x8
 
35        0.0639  x264                    x264_me_refine_bidir_satd
 
35        0.0639  x264                    x264_pixel_satd_4x8_neon
 
34        0.0621  x264                    x264_cabac_mb_type
 
34        0.0621  x264                    x264_predict_16x16_dc_c
 
33        0.0603  x264                    x264_hpel_filter_h_neon
 
33        0.0603  x264                    x264_pixel_satd_8x16_neon
 
32        0.0585  x264                    x264_frame_expand_border_lowres
 
31        0.0566  x264                    x264_predict_4x4_ddr_armv6
 
29        0.0530  x264                    x264_macroblock_probe_skip
 
29        0.0530  x264                    x264_mc_weight_w8_neon
 
29        0.0530  x264                    x264_predict_16x16_p_neon
 
28        0.0512  x264                    memcpy_aligned_8_8_neon
 
28        0.0512  x264                    x264_mb_predict_mv_pskip
 
28        0.0512  x264                    x264_predict_8x8_hu_c
 
28        0.0512  x264                    x264_sub16x16_dct_neon
 
27        0.0493  x264                    x264_me_refine_qpel_refdupe
 
27        0.0493  x264                    x264_pixel_avg_w4_neon
 
26        0.0475  x264                    __fixsfsi
 
26        0.0475  x264                    x264_add4x4_idct_neon
 
26        0.0475  x264                    x264_cabac_mb_ref
 
26        0.0475  x264                    x264_intra_satd_x3_8x8c
 
24        0.0438  x264                    x264_intra_satd_x3_16x16
 
24        0.0438  x264                    x264_pixel_satd_8x4_neon
 
24        0.0438  x264                    x264_quant_2x2_dc_neon
 
23        0.0420  x264                    x264_weight_cost_luma
 
22        0.0402  x264                    x264_predict_8x8c_h_c
 
21        0.0384  x264                    __aeabi_fcmpgt
 
21        0.0384  x264                    x264_predict_4x4_dc_c
 
20        0.0365  x264                    x264_ac_energy_mb
 
20        0.0365  x264                    x264_slicetype_frame_cost
 
19        0.0347  x264                    x264_cabac_encode_bypass_c
 
19        0.0347  x264                    x264_deblock_v_luma_neon
 
18        0.0329  x264                    memcpy_aligned_16_8_neon
 
17        0.0311  x264                    x264_decimate_score15
 
17        0.0311  x264                    x264_intra_rd
 
16        0.0292  x264                    x264_cabac_mb_skip
 
16        0.0292  x264                    x264_var_end
 
15        0.0274  x264                    __cmpsf2
 
15        0.0274  x264                    x264_predict_4x4_h_c
 
14        0.0256  x264                    x264_pixel_avg_8x8_neon
 
14        0.0256  x264                    x264_pixel_avg_weight_w16_add_add_neon
 
13        0.0238  x264                    deblock_v_luma_intra_c
 
13        0.0238  x264                    x264_frame_expand_border
 
13        0.0238  x264                    x264_mc_weight_w8_offsetadd_neon
 
13        0.0238  x264                    x264_predict_4x4_ddl_neon
 
12        0.0219  x264                    x264_frame_expand_border_filtered
 
12        0.0219  x264                    x264_memzero_aligned_neon
 
12        0.0219  x264                    x264_pixel_var_8x8_neon
 
11        0.0201  x264                    x264_mb_cache_mv_b16x8
 
11        0.0201  x264                    x264_predict_4x4_v_c
 
10        0.0183  x264                    x264_cabac_encode_ue_bypass
 
10        0.0183  x264                    x264_macroblock_cache_load_neighbours_deblock
 
9        0.0164  x264                    idct_dequant_2x2_dconly
 
9        0.0164  x264                    x264_mb_analyse_transform_rd
 
9        0.0164  x264                    x264_pixel_avg_weight_w8_add_add_neon
 
9        0.0164  x264                    x264_predict_4x4_dc_armv6
 
8        0.0146  x264                    x264_prefetch_ref_arm
 
7        0.0128  x264                    x264_coeff_last15_neon
 
7        0.0128  x264                    x264_prefetch_fenc
 
6        0.0110  x264                    x264_add8x8_idct_dc_neon
 
6        0.0110  x264                    x264_add8x8_idct_neon
 
6        0.0110  x264                    x264_pixel_avg_16x16_neon
 
6        0.0110  x264                    x264_predict_8x8c_dc_neon
 
6        0.0110  x264                    x264_weight_scale_plane
 
5        0.0091  x264                    __aeabi_cfrcmple
 
5        0.0091  x264                    x264_adaptive_quant_frame
 
5        0.0091  x264                    x264_macroblock_tree_finish
 
5        0.0091  x264                    x264_mb_cache_mv_b8x16
 
5        0.0091  x264                    x264_pixel_avg_4x4_neon
 
5        0.0091  x264                    x264_predict_8x8c_v_c
 
4        0.0073  x264                    __aeabi_ui2f
 
4        0.0073  x264                    deblock_h_chroma_intra_c
 
4        0.0073  x264                    deblock_h_luma_intra_c
 
4        0.0073  x264                    x264_fdec_filter_row
 
4        0.0073  x264                    x264_frame_init_lowres
 
4        0.0073  x264                    x264_predict_16x16_h_neon
 
4        0.0073  x264                    x264_predict_4x4_h_armv6
 
3        0.0055  x264                    __aeabi_cfcmple
 
3        0.0055  x264                    __divdf3
 
3        0.0055  x264                    deblock_v_chroma_intra_c
 
3        0.0055  x264                    x264_dequant_4x4_dc_neon
 
3        0.0055  x264                    x264_encoder_encode
 
3        0.0055  x264                    x264_frame_filter
 
3        0.0055  x264                    x264_nal_escape_c
 
3        0.0055  x264                    x264_pixel_avg_8x16_neon
 
3        0.0055  x264                    x264_predict_16x16_dc_neon
 
3        0.0055  x264                    x264_rc_analyse_slice
 
2        0.0037  libpthread-2.9.so        /lib/libpthread-2.9.so
 
2        0.0037  x264                    __subsf3
 
2        0.0037  x264                    x264_analyse_init_costs
 
2        0.0037  x264                    x264_coeff_last4_arm
 
2        0.0037  x264                    x264_encoder_frame_end
 
2        0.0037  x264                    x264_predict_16x16_dc_top_neon
 
2        0.0037  x264                    x264_quant_4x4_dc_neon
 
2        0.0037  x264                    x264_sub8x8_dct_neon
 
2        0.0037  x264                    x264_weight_cost_init_luma
 
1        0.0018  libm-2.9.so              /lib/libm-2.9.so
 
1        0.0018  x264                    __aeabi_d2f
 
1        0.0018  x264                    __aeabi_f2d
 
1        0.0018  x264                    __aeabi_fcmplt
 
1        0.0018  x264                    __aeabi_uidivmod
 
1        0.0018  x264                    __cmpdf2
 
1        0.0018  x264                    __divdi3
 
1        0.0018  x264                    __muldf3
 
1        0.0018  x264                    __udivdi3
 
1        0.0018  x264                    bs_write_ue_big
 
1        0.0018  x264                    hpel_filter_neon
 
1        0.0018  x264                    optimize_chroma_dc
 
1        0.0018  x264                    x264_add16x16_idct_dc_neon
 
1        0.0018  x264                    x264_dct4x4dc_neon
 
1        0.0018  x264                    x264_frame_copy_picture
 
1        0.0018  x264                    x264_frame_push_unused
 
1        0.0018  x264                    x264_free
 
1        0.0018  x264                    x264_macroblock_cache_mv_4_2
 
1        0.0018  x264                    x264_macroblock_slice_init
 
1        0.0018  x264                    x264_pixel_avg_4x8_neon
 
1        0.0018  x264                    x264_pixel_avg_8x4_neon
 
1        0.0018  x264                    x264_predict_16x16_v_neon</pre>
 

Latest revision as of 08:35, 27 March 2019

← Back to Category:x264
This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.

Some useful resources: Dark Shikari's pile of junk, Pengvado's pile of junk.

If you're interested in doing any of this, drop by #x264dev on Freenode IRC. There are no experience or educational requirements for doing any of this, though you are expected to know how to code.

Bolded features may have companies willing to sponsor or provide bounties. This is not complete either; just because it's not bolded doesn't mean there aren't resources out there. If your company is interested in offering a bounty, drop by IRC.

Motion Estimation

  • Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.
  • (T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.
  • Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
    • I have a patch for this in the lookahead, but it didn't help much, since it only added predictors.
  • Somehow take into account the effect of motion vector decision on future blocks.
    • Hierarchical motion estimation
    • Approximations from lookahead MVs
    • Iterative ME (as per Snow)
    • Trellis motion estimation
  • We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
    • But how do we know which ones we can afford to skip, and when?
    • Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
  • libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?
  • With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?
    • This seems to be awful from my testing, but maybe there's something we can do?
  • Try sub-8x8 partitions in B-frames. Is it at all useful?
  • Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?
  • Fullpel chroma ME?
    • For TESA?

Intra Analysis

  • Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
    • With the SSSE3-based fast intra analysis, we no longer do any early terminations for different modes, at least in SAD/SATD analysis. But there might still be improvements to be made.
  • SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?

Mode Decision

  • Can we find more ways to skip more motion searches in multiref?
    • A while back, I tried using weaker motion searches on older refs. This helped a bit for speed-vs-compression, but is ironically the opposite of what one wants; older refs will be harder to find good MVs in, and therefore really need better searches.
  • On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
    • Can we do something smart by analyzing fenc? It's impossible to tell whether a block is motionless by looking at fdec, but looking at the source pixels is useful. There's still complexity such as lower-QP-than-reference though.
  • See the TODOs for deblock-aware RD in common/deblock.c.
    • I tried correcting weightp references for deblock RDO, but it didn't help.
    • I tried chroma, too, and again, it didn't help measurably.
  • Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.
    • Doing a merged 4x4/8x8 SATD would help here, but would require new asm.
  • Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.
  • Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? This has been tried before, but only helped if our guess was extremely good (better than we could get in reality).
  • With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?
  • How about saving CABAC state between each trellis call, rather than basing them all on the CABAC state at the start of the macroblock?
  • Make subme=11 not do thresholding in qpel RD and bidir RD.

Psy

  • Psy-RD is a hack. It works, but it's a hack. If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results. This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric. Is there a way to make it better?
  • Should RD be linear at all? Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
  • Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
  • Psy-trellis should be merged with Psy-RD. There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
  • RD should take into account local variance.
  • Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
  • Lambda should be picked independent of quantizer (i.e. with greater precision).
  • Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text.
    • If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
  • Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"...
  • Macroblock tree correlates pretty well with visual perception in that its concept of a "high complexity" matches well with the visual concept. Except for local illumination changes. Talk to Dark Shikari for a patch.

Lookahead

  • Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
  • Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
  • B-adapt 1 could be made quite a bit better -- it's important because it's used on all the fast speed modes (and even the defaults). "Harbour 4CIF" is a good example of a clip where it does noticeably badly.

Quantization

  • CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.
    • This is doubly important now, as CABAC trellis has been made way faster, but CAVLC hasn't. Many of the CABAC trellis improvements can be backported.
  • There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
    • How useful is this with an entropy coder that doesn't really bias towards zero-runs, as in CABAC?
  • Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?
  • Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.
  • Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?

Transform

  • Analyze the error characteristics of the fDCT. Is there any way to make it more accurate without much speed loss? Particularly at extremely low quantizers, this might help.
  • Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter. See this paper.

Interlacing

  • Lookahead currently blend-deinterlaces to get the lowres. Is this a good idea? Is there something better that isn't much slower?
  • Constrained intra + adaptive MBAFF. Does anyone care about this?
  • PAFF + MBAFF adaptive - PAFF performs better than Adaptive MBAFF on high motion scenes because it can predict from the previous field.

Weighted Prediction

  • Make weightp work with interlacing. Preferably abuse reference duplication to make it useful for MBAFF.
  • Finish K-means decision for weightp. Talk to DylanZA about getting his current patch for this one.
  • Add explicit weighting for B-frames, too. This helps in nonlinear fades, among other cases.
  • Improve weighted prediction analysis to do more searching based on an estimated offset vs scale gradient.

Ratecontrol

  • Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.
  • Make the frame size and row size predictors better. They currently are kind of crappy.
  • Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qblur.
  • 1-pass ratecontrol often can't adapt fast enough when there are lots of threads (12, 16, 24, etc), especially with smallish VBV buffers. Improve this?
  • 2-pass VBV is actually a bit more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.
    • 2-pass is still better in the case of many threads, due to the above.
  • 2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
  • Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.

GPU

  • Motion estimation?
    • Methods
      • Hierarchical?
      • 2D Wave?
      • Something else?
    • "Easy": lookahead motion estimation
      • Extremely high parallelism, hundreds of frame searches (each with thousands of searches) at once.
    • "Hard": main motion estimation
      • Difficult synchronization issues, not as heavily parallel in terms of number of macroblocks, but far more partition sizes and refs to search.
      • But potentially more useful...
  • Other things?

Other assembly

  • A lot of ARM assembly is done. Missing is mostly for Hi-Depth bitrate.
  • Altivec assembly is very lacking.

Other CPU optimizations

  • x264 needs more prefetching. How many L1 and L2 cache misses (particularly L1) can we get rid of via smart prefetching in the right places? Warning: this is often hard to benchmark.
  • Different CPUs take different relative times for some functions. Is this enough (particularly across architectures) to justify different encoding settings for different CPUs?

Other features

  • MPEG-2 encoding support
  • Support for SMPTE timecodes
  • Merge speedcontrol
  • Mixed lossless/lossy encoding.
  • Segment re-encoding

x264CLI

  • Finish audio support. Talk to Kovensky about this one.
  • Make the filtering system aware of BT.601 vs BT.709.
  • Use libavfilter instead of duplicating the filters in x264.
  • Add --device support.
  • Add automatic --level restriction support.