X264 TODO
This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.
Some useful resources: Dark Shikari's pile of junk, Pengvado's pile of junk.
If you're interested in doing any of this, drop by #x264dev on Freenode IRC. There are no experience or educational requirements for doing any of this, though you are expected to know how to code.
Bolded features may have companies willing to sponsor or provide bounties. This is not complete either; just because it's not bolded doesn't mean there aren't resources out there. If your company is interested in offering a bounty, drop by IRC.
Contents
Motion Estimation
- Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.
- (T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.
- Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
- Somehow take into account the effect of motion vector decision on future blocks.
- Hierarchical motion estimation
- Approximations from lookahead MVs
- Iterative ME (as per Snow)
- Trellis motion estimation
- We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
- But how do we know which ones we can afford to skip, and when?
- Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
- libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?
- With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?
- Try sub-8x8 partitions in B-frames. Is it at all useful?
- Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?
- Fullpel chroma ME?
- For TESA?
Intra Analysis
- Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
- SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?
Mode Decision
- Can we find more ways to skip more motion searches in multiref?
- On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
- See the TODOs for deblock-aware RD in common/deblock.c.
- Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.
- Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.
- Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? This has been tried before, but only helped if our guess was extremely good (better than we could get in reality).
- With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?
- How about a "brute force" mode decision that takes no shortcuts (no early ref termination in p8x8, no SATD thresholds, etc)?
Psy
- Psy-RD is a hack. It works, but it's a hack. If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results. This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric. Is there a way to make it better?
- Should RD be linear at all? Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
- Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
- Psy-trellis should be merged with Psy-RD. There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
- RD should take into account local variance.
- Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
- Lambda should be picked independent of quantizer (i.e. with greater precision).
- Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text.
- If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
- Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"...
- Macroblock tree correlates pretty well with visual perception in that its concept of a "high complexity" matches well with the visual concept. Except for local illumination changes. Talk to Dark Shikari for a patch.
Lookahead
- Lookahead should be multithreaded, either by splitting the frame (sliced threads) or running multiple frame analysis calls at once.
- Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
- Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
Quantization
- CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.
- There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
- Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?
- Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.
- Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?
Transform
- Analyze the error characteristics of the fDCT. Is there any way to make it more accurate without much speed loss? Particularly at extremely low quantizers, this might help.
- Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter. See this paper.
Interlacing
- Lookahead currently blend-deinterlaces to get the lowres. Is this a good idea? Is there something better that isn't much slower?
- Constrained intra + adaptive MBAFF. Does anyone care about this?
- PAFF + MBAFF adaptive - PAFF performs better than Adaptive MBAFF on high motion scenes because it can predict from the previous field.
Weighted Prediction
- Make weightp work with interlacing. Preferably abuse reference duplication to make it useful for MBAFF.
- Finish K-means decision for weightp. Talk to DylanZA about getting his current patch for this one.
- Add explicit weighting for B-frames, too. This helps in nonlinear fades, among other cases.
- Improve weighted prediction analysis to do more searching based on an estimated offset vs scale gradient.
Ratecontrol
- VBV might be able to utilize the ability to re-encode a row of the frame for improved accuracy.
- Maybe re-encode everything in case of an underflow that row-reencoding can't fix? This might be better than underflowing.
- Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.
- Make the frame size and row size predictors better. They currently are kind of crappy.
- Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qblur.
- 2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.
- 2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
- Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.
GPU
- Motion estimation?
- Methods
- Hierarchical?
- 2D Wave?
- Something else?
- "Easy": lookahead motion estimation
- Extremely high parallelism, hundreds of frame searches (each with thousands of searches) at once.
- "Hard": main motion estimation
- Difficult synchronization issues, not as heavily parallel in terms of number of macroblocks, but far more partition sizes and refs to search.
- But potentially more useful...
- Methods
- Other things?
x86 assembly
- Optimize more for the Phenom.
- Yell at holger to commit his local patches.
- Make a merged SA8D/SATD for the 8x8dct mode decision, since the two share most of their calculation. Hadamard_ac already does this, but slightly differently.
Other assembly
- NEON assembly is nowhere near complete.
- Chroma MC needs to be rewritten for NV12 support.
- Altivec assembly is very lacking.
- SPARC VIS assembly is only available when high bit-depth is disabled.
Other CPU optimizations
- x264 needs more prefetching. How many L1 and L2 cache misses (particularly L1) can we get rid of via smart prefetching in the right places? Warning: this is often hard to benchmark.
- Different CPUs take different relative times for some functions. Is this enough (particularly across architectures) to justify different encoding settings for different CPUs?
Other features
- MPEG-2 encoding support
- VP8 encoding support
- 4:2:2 colorspace support
- Support for SMPTE timecodes
- Merge speedcontrol
- Mixed lossless/lossy encoding.
x264CLI
- Finish audio support. Talk to Kovensky about this one.
- Make the filtering system aware of fullrange vs TV range.
- Make the filtering system aware of BT.601 vs BT.709.
- Add more filters.
- Deinterlacers (YADIF).
- Denoisers (HQDN3D?).
- IVTC, decomb?
- Merge L-SMASH mp4 muxer.
- Add TS muxing support using HRD. Talk to kierank about this one.
- Add --device support.
- Add automatic --level restriction support.
SOCIS x264 Profile
CPU: ARM V7 PMNC, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name 9764 17.8387 x264 mc_chroma 3132 5.7221 x264 x264_pixel_avg2_w16_neon 2706 4.9438 x264 x264_me_search_ref 2697 4.9274 x264 refine_subpel 2490 4.5492 x264 x264_quant_4x4_trellis 2089 3.8166 x264 x264_pixel_avg2_w8_neon 2014 3.6795 x264 x264_pixel_satd_8x4 1959 3.5791 x264 get_ref_neon 1309 2.3915 x264 x264_pixel_sad_16x16_neon 1125 2.0554 x264 x264_macroblock_encode 1089 1.9896 x264 x264_macroblock_analyse 807 1.4744 x264 x264_satd_8x4v_8x8h_neon 780 1.4250 x264 x264_rd_cost_mb 724 1.3227 x264 x264_satd_8x8_neon 679 1.2405 x264 x264_pixel_sad_x4_16x16_neon 672 1.2277 x264 x264_pixel_sad_x4_8x8_neon 638 1.1656 x264 x264_pixel_satd_4x4_neon 634 1.1583 x264 x264_macroblock_cache_load_progressive 633 1.1565 x264 x264_pixel_satd_4x4 601 1.0980 x264 x264_pixel_sad_8x8_neon 571 1.0432 x264 x264_quant_4x4_neon 528 0.9646 x264 x264_slicetype_mb_cost 500 0.9135 x264 x264_mb_predict_mv 492 0.8989 x264 x264_pixel_satd_8x8_neon 473 0.8642 x264 x264_mb_analyse_intra 471 0.8605 x264 x264_mb_encode_8x8_chroma 454 0.8295 x264 x264_pixel_sad_x3_16x16_neon 452 0.8258 x264 x264_macroblock_tree_propagate 404 0.7381 x264 x264_pixel_sad_x3_8x8_neon 395 0.7217 x264 x264_satd_16x4_neon 386 0.7052 libc-2.9.so /lib/libc-2.9.so 367 0.6705 x264 x264_mb_predict_mv_ref16x16 353 0.6449 x264 x264_sub8x4_dct_neon 309 0.5645 x264 x264_macroblock_cache_save 293 0.5353 x264 x264_hadamard_ac_8x8_neon 281 0.5134 x264 x264_analyse_update_cache 280 0.5116 x264 x264_mb_analyse_inter_b8x8_mixed_ref 276 0.5042 x264 x264_slice_write 270 0.4933 x264 x264_cabac_encode_decision_c 267 0.4878 x264 x264_mb_analyse_inter_b16x16 253 0.4622 x264 x264_cabac_mb_mvd 235 0.4293 x264 block_residual_write_cabac 223 0.4074 x264 x264_decimate_score16 214 0.3910 x264 __aeabi_fdiv 209 0.3818 x264 x264_pixel_var2_8x8_neon 204 0.3727 x264 __aeabi_fadd 201 0.3672 x264 deblock_strength_c 197 0.3599 x264 x264_pixel_avg2_w20_neon 197 0.3599 x264 x264_pixel_avg_w16_neon 187 0.3416 x264 x264_mc_copy_w16_aligned_neon 184 0.3362 x264 x264_cabac_mb_type 181 0.3307 x264 load_deinterleave_8x8x2_fenc 178 0.3252 x264 x264_macroblock_write_cabac 177 0.3234 x264 x264_mb_mc_01xywh 168 0.3069 x264 x264_mb_mc_0xywh 167 0.3051 x264 mc_luma_neon 165 0.3015 x264 x264_pixel_ssd_16x16_neon 161 0.2941 x264 x264_quant_dc_trellis 159 0.2905 x264 x264_pixel_avg_w8_neon 156 0.2850 x264 x264_mc_copy_w8_neon 156 0.2850 x264 x264_pixel_satd_16x16_neon 155 0.2832 x264 x264_mb_analyse_inter_p16x16 151 0.2759 x264 x264_pixel_ssd_8x8_neon 150 0.2740 x264 memcpy_aligned_8_16_neon 148 0.2704 x264 __aeabi_fmul 143 0.2613 x264 x264_mb_encode_i4x4 134 0.2448 x264 mbtree_propagate_cost 124 0.2265 x264 x264_pixel_sad_8x16_neon 112 0.2046 x264 x264_mc_weight_w16_offsetsub_neon 111 0.2028 x264 x264_pixel_sad_x4_8x16_neon 110 0.2010 x264 x264_mb_predict_mv_direct16x16 108 0.1973 x264 x264_frame_init_lowres_core_neon 98 0.1790 x264 x264_plane_copy_interleave_c 93 0.1699 x264 block_residual_write_cabac 91 0.1663 x264 __floatsisf 88 0.1608 x264 x264_mb_mc 87 0.1589 x264 x264_cabac_mb_ref 87 0.1589 x264 x264_dequant_4x4_neon 86 0.1571 x264 __aeabi_l2f 86 0.1571 x264 x264_satd_4x8_8x4_end_neon 85 0.1553 x264 x264_pixel_var_16x16_neon 82 0.1498 x264 store_interleave_8x8x2 82 0.1498 x264 x264_ratecontrol_mb_qp 80 0.1462 x264 x264_pixel_sad_x4_16x8_neon 79 0.1443 x264 x264_mb_predict_mv_16x16 79 0.1443 x264 x264_pixel_sad_16x8_neon 76 0.1389 x264 x264_mc_weight_w8_offsetsub_neon 72 0.1315 x264 x264_prefetch_fenc_arm 71 0.1297 x264 x264_mb_analyse_b_rd 70 0.1279 x264 x264_add8x4_idct_neon 70 0.1279 x264 x264_frame_deblock_row 70 0.1279 x264 x264_pixel_sad_x3_8x16_neon 68 0.1242 x264 x264_pixel_hadamard_ac_16x16_neon 61 0.1114 x264 x264_coeff_last16_neon 57 0.1041 x264 deblock_v_chroma_c 56 0.1023 x264 x264_predict_16x16_h_c 55 0.1005 x264 x264_predict_4x4_hd_c 54 0.0987 x264 x264_predict_8x8_vr_c 53 0.0968 x264 x264_mb_encode_i16x16 53 0.0968 x264 x264_predict_4x4_vl_c 52 0.0950 x264 x264_hpel_filter_c_neon 52 0.0950 x264 x264_hpel_filter_v_neon 52 0.0950 x264 x264_predict_4x4_vr_c 52 0.0950 x264 x264_predict_8x8_filter_c 50 0.0913 x264 x264_mb_analyse_intra_chroma 50 0.0913 x264 x264_mb_analyse_p_rd 50 0.0913 x264 x264_ratecontrol_mb 49 0.0895 x264 x264_mb_mc_8x8 49 0.0895 x264 x264_predict_8x8_hd_c 47 0.0859 x264 memcpy_aligned_16_16_neon 47 0.0859 x264 x264_mb_mc_1xywh 46 0.0840 x264 x264_pixel_satd_16x8_neon 45 0.0822 x264 x264_cabac_encode_terminal_c 45 0.0822 x264 x264_cabac_mb_mvd 45 0.0822 x264 x264_pixel_sad_x3_16x8_neon 45 0.0822 x264 x264_zigzag_scan_4x4_frame_neon 44 0.0804 x264 deblock_h_chroma_c 44 0.0804 x264 x264_predict_8x8_vl_c 44 0.0804 x264 x264_predict_8x8c_p_neon 43 0.0786 x264 x264_pixel_satd_16x16 43 0.0786 x264 x264_predict_8x8c_dc_c 41 0.0749 x264 x264_macroblock_deblock_strength 40 0.0731 x264 x264_copy_column8 40 0.0731 x264 x264_memcpy_aligned_neon 38 0.0694 x264 x264_predict_8x8_ddl_c 38 0.0694 x264 x264_predict_8x8_ddr_c 37 0.0676 x264 x264_mb_analyse_inter_b8x16 37 0.0676 x264 x264_mc_copy_w16_neon 36 0.0658 x264 x264_deblock_h_luma_neon 36 0.0658 x264 x264_predict_16x16_v_c 36 0.0658 x264 x264_predict_4x4_hu_c 36 0.0658 x264 x264_sub4x4_dct_neon 36 0.0658 x264 x264_sub8x8_dct_dc_neon 35 0.0639 x264 x264_intra_satd_x3_4x4 35 0.0639 x264 x264_mb_analyse_inter_b16x8 35 0.0639 x264 x264_me_refine_bidir_satd 35 0.0639 x264 x264_pixel_satd_4x8_neon 34 0.0621 x264 x264_cabac_mb_type 34 0.0621 x264 x264_predict_16x16_dc_c 33 0.0603 x264 x264_hpel_filter_h_neon 33 0.0603 x264 x264_pixel_satd_8x16_neon 32 0.0585 x264 x264_frame_expand_border_lowres 31 0.0566 x264 x264_predict_4x4_ddr_armv6 29 0.0530 x264 x264_macroblock_probe_skip 29 0.0530 x264 x264_mc_weight_w8_neon 29 0.0530 x264 x264_predict_16x16_p_neon 28 0.0512 x264 memcpy_aligned_8_8_neon 28 0.0512 x264 x264_mb_predict_mv_pskip 28 0.0512 x264 x264_predict_8x8_hu_c 28 0.0512 x264 x264_sub16x16_dct_neon 27 0.0493 x264 x264_me_refine_qpel_refdupe 27 0.0493 x264 x264_pixel_avg_w4_neon 26 0.0475 x264 __fixsfsi 26 0.0475 x264 x264_add4x4_idct_neon 26 0.0475 x264 x264_cabac_mb_ref 26 0.0475 x264 x264_intra_satd_x3_8x8c 24 0.0438 x264 x264_intra_satd_x3_16x16 24 0.0438 x264 x264_pixel_satd_8x4_neon 24 0.0438 x264 x264_quant_2x2_dc_neon 23 0.0420 x264 x264_weight_cost_luma 22 0.0402 x264 x264_predict_8x8c_h_c 21 0.0384 x264 __aeabi_fcmpgt 21 0.0384 x264 x264_predict_4x4_dc_c 20 0.0365 x264 x264_ac_energy_mb 20 0.0365 x264 x264_slicetype_frame_cost 19 0.0347 x264 x264_cabac_encode_bypass_c 19 0.0347 x264 x264_deblock_v_luma_neon 18 0.0329 x264 memcpy_aligned_16_8_neon 17 0.0311 x264 x264_decimate_score15 17 0.0311 x264 x264_intra_rd 16 0.0292 x264 x264_cabac_mb_skip 16 0.0292 x264 x264_var_end 15 0.0274 x264 __cmpsf2 15 0.0274 x264 x264_predict_4x4_h_c 14 0.0256 x264 x264_pixel_avg_8x8_neon 14 0.0256 x264 x264_pixel_avg_weight_w16_add_add_neon 13 0.0238 x264 deblock_v_luma_intra_c 13 0.0238 x264 x264_frame_expand_border 13 0.0238 x264 x264_mc_weight_w8_offsetadd_neon 13 0.0238 x264 x264_predict_4x4_ddl_neon 12 0.0219 x264 x264_frame_expand_border_filtered 12 0.0219 x264 x264_memzero_aligned_neon 12 0.0219 x264 x264_pixel_var_8x8_neon 11 0.0201 x264 x264_mb_cache_mv_b16x8 11 0.0201 x264 x264_predict_4x4_v_c 10 0.0183 x264 x264_cabac_encode_ue_bypass 10 0.0183 x264 x264_macroblock_cache_load_neighbours_deblock 9 0.0164 x264 idct_dequant_2x2_dconly 9 0.0164 x264 x264_mb_analyse_transform_rd 9 0.0164 x264 x264_pixel_avg_weight_w8_add_add_neon 9 0.0164 x264 x264_predict_4x4_dc_armv6 8 0.0146 x264 x264_prefetch_ref_arm 7 0.0128 x264 x264_coeff_last15_neon 7 0.0128 x264 x264_prefetch_fenc 6 0.0110 x264 x264_add8x8_idct_dc_neon 6 0.0110 x264 x264_add8x8_idct_neon 6 0.0110 x264 x264_pixel_avg_16x16_neon 6 0.0110 x264 x264_predict_8x8c_dc_neon 6 0.0110 x264 x264_weight_scale_plane 5 0.0091 x264 __aeabi_cfrcmple 5 0.0091 x264 x264_adaptive_quant_frame 5 0.0091 x264 x264_macroblock_tree_finish 5 0.0091 x264 x264_mb_cache_mv_b8x16 5 0.0091 x264 x264_pixel_avg_4x4_neon 5 0.0091 x264 x264_predict_8x8c_v_c 4 0.0073 x264 __aeabi_ui2f 4 0.0073 x264 deblock_h_chroma_intra_c 4 0.0073 x264 deblock_h_luma_intra_c 4 0.0073 x264 x264_fdec_filter_row 4 0.0073 x264 x264_frame_init_lowres 4 0.0073 x264 x264_predict_16x16_h_neon 4 0.0073 x264 x264_predict_4x4_h_armv6 3 0.0055 x264 __aeabi_cfcmple 3 0.0055 x264 __divdf3 3 0.0055 x264 deblock_v_chroma_intra_c 3 0.0055 x264 x264_dequant_4x4_dc_neon 3 0.0055 x264 x264_encoder_encode 3 0.0055 x264 x264_frame_filter 3 0.0055 x264 x264_nal_escape_c 3 0.0055 x264 x264_pixel_avg_8x16_neon 3 0.0055 x264 x264_predict_16x16_dc_neon 3 0.0055 x264 x264_rc_analyse_slice 2 0.0037 libpthread-2.9.so /lib/libpthread-2.9.so 2 0.0037 x264 __subsf3 2 0.0037 x264 x264_analyse_init_costs 2 0.0037 x264 x264_coeff_last4_arm 2 0.0037 x264 x264_encoder_frame_end 2 0.0037 x264 x264_predict_16x16_dc_top_neon 2 0.0037 x264 x264_quant_4x4_dc_neon 2 0.0037 x264 x264_sub8x8_dct_neon 2 0.0037 x264 x264_weight_cost_init_luma 1 0.0018 libm-2.9.so /lib/libm-2.9.so 1 0.0018 x264 __aeabi_d2f 1 0.0018 x264 __aeabi_f2d 1 0.0018 x264 __aeabi_fcmplt 1 0.0018 x264 __aeabi_uidivmod 1 0.0018 x264 __cmpdf2 1 0.0018 x264 __divdi3 1 0.0018 x264 __muldf3 1 0.0018 x264 __udivdi3 1 0.0018 x264 bs_write_ue_big 1 0.0018 x264 hpel_filter_neon 1 0.0018 x264 optimize_chroma_dc 1 0.0018 x264 x264_add16x16_idct_dc_neon 1 0.0018 x264 x264_dct4x4dc_neon 1 0.0018 x264 x264_frame_copy_picture 1 0.0018 x264 x264_frame_push_unused 1 0.0018 x264 x264_free 1 0.0018 x264 x264_macroblock_cache_mv_4_2 1 0.0018 x264 x264_macroblock_slice_init 1 0.0018 x264 x264_pixel_avg_4x8_neon 1 0.0018 x264 x264_pixel_avg_8x4_neon 1 0.0018 x264 x264_predict_16x16_v_neon