Difference between revisions of "X264 TODO"

Revision as of 19:58, 15 August 2011

This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.

Some useful resources: Dark Shikari's pile of junk, Pengvado's pile of junk.

If you're interested in doing any of this, drop by #x264dev on Freenode IRC. There are no experience or educational requirements for doing any of this, though you are expected to know how to code.

Bolded features may have companies willing to sponsor or provide bounties. This is not complete either; just because it's not bolded doesn't mean there aren't resources out there. If your company is interested in offering a bounty, drop by IRC.

Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.
(T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.
Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
Somehow take into account the effect of motion vector decision on future blocks.
- Hierarchical motion estimation
- Approximations from lookahead MVs
- Iterative ME (as per Snow)
- Trellis motion estimation
We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
- But how do we know which ones we can afford to skip, and when?
- Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?
With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?
Try sub-8x8 partitions in B-frames. Is it at all useful?
Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?
Fullpel chroma ME?
- For TESA?

Intra Analysis

Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?

Mode Decision

Can we find more ways to skip more motion searches in multiref?
On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
See the TODOs for deblock-aware RD in common/deblock.c.
Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.
Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.
Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? This has been tried before, but only helped if our guess was extremely good (better than we could get in reality).
With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?
How about a "brute force" mode decision that takes no shortcuts (no early ref termination in p8x8, no SATD thresholds, etc)?

Psy

Psy-RD is a hack. It works, but it's a hack. If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results. This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric. Is there a way to make it better?
Should RD be linear at all? Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
Psy-trellis should be merged with Psy-RD. There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
RD should take into account local variance.
Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
Lambda should be picked independent of quantizer (i.e. with greater precision).
Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text.
- If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"...
Macroblock tree correlates pretty well with visual perception in that its concept of a "high complexity" matches well with the visual concept. Except for local illumination changes. Talk to Dark Shikari for a patch.

Lookahead

Lookahead should be multithreaded, either by splitting the frame (sliced threads) or running multiple frame analysis calls at once.
Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?

Quantization

CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.
There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?
Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.
Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?

Transform

Analyze the error characteristics of the fDCT. Is there any way to make it more accurate without much speed loss? Particularly at extremely low quantizers, this might help.
Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter. See this paper.

Interlacing

Lookahead currently blend-deinterlaces to get the lowres. Is this a good idea? Is there something better that isn't much slower?
Constrained intra + adaptive MBAFF. Does anyone care about this?
PAFF + MBAFF adaptive - PAFF performs better than Adaptive MBAFF on high motion scenes because it can predict from the previous field.

Weighted Prediction

Make weightp work with interlacing. Preferably abuse reference duplication to make it useful for MBAFF.
Finish K-means decision for weightp. Talk to DylanZA about getting his current patch for this one.
Add explicit weighting for B-frames, too. This helps in nonlinear fades, among other cases.
Improve weighted prediction analysis to do more searching based on an estimated offset vs scale gradient.

Ratecontrol

VBV might be able to utilize the ability to re-encode a row of the frame for improved accuracy.
- Maybe re-encode everything in case of an underflow that row-reencoding can't fix? This might be better than underflowing.
Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.
Make the frame size and row size predictors better. They currently are kind of crappy.
Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qblur.
2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.
2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.

GPU

Motion estimation?
- Methods
  - Hierarchical?
  - 2D Wave?
  - Something else?
- "Easy": lookahead motion estimation
  - Extremely high parallelism, hundreds of frame searches (each with thousands of searches) at once.
- "Hard": main motion estimation
  - Difficult synchronization issues, not as heavily parallel in terms of number of macroblocks, but far more partition sizes and refs to search.
  - But potentially more useful...
Other things?

x86 assembly

Optimize more for the Phenom.
Yell at holger to commit his local patches.
Make a merged SA8D/SATD for the 8x8dct mode decision, since the two share most of their calculation. Hadamard_ac already does this, but slightly differently.

Other assembly

NEON assembly is nowhere near complete.
- Chroma MC needs to be rewritten for NV12 support.
Altivec assembly is very lacking.
SPARC VIS assembly is only available when high bit-depth is disabled.

Other CPU optimizations

x264 needs more prefetching. How many L1 and L2 cache misses (particularly L1) can we get rid of via smart prefetching in the right places? Warning: this is often hard to benchmark.
Different CPUs take different relative times for some functions. Is this enough (particularly across architectures) to justify different encoding settings for different CPUs?

Other features

MPEG-2 encoding support
VP8 encoding support
4:2:2 colorspace support
Support for SMPTE timecodes
Merge speedcontrol
Mixed lossless/lossy encoding.

x264CLI

Finish audio support. Talk to Kovensky about this one.
Make the filtering system aware of fullrange vs TV range.
Make the filtering system aware of BT.601 vs BT.709.
Add more filters.
- Deinterlacers (YADIF).
- Denoisers (HQDN3D?).
- IVTC, decomb?
Merge L-SMASH mp4 muxer.
Add TS muxing support using HRD. Talk to kierank about this one.
Add --device support.
Add automatic --level restriction support.

SOCIS x264 Profile

CPU: ARM V7 PMNC, speed 0 MHz (estimated)

Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 100000

samples  %        image name               symbol name
9764     17.8387  x264                     mc_chroma
3132      5.7221  x264                     x264_pixel_avg2_w16_neon
2706      4.9438  x264                     x264_me_search_ref
2697      4.9274  x264                     refine_subpel
2490      4.5492  x264                     x264_quant_4x4_trellis
2089      3.8166  x264                     x264_pixel_avg2_w8_neon
2014      3.6795  x264                     x264_pixel_satd_8x4
1959      3.5791  x264                     get_ref_neon
1309      2.3915  x264                     x264_pixel_sad_16x16_neon
1125      2.0554  x264                     x264_macroblock_encode
1089      1.9896  x264                     x264_macroblock_analyse
807       1.4744  x264                     x264_satd_8x4v_8x8h_neon
780       1.4250  x264                     x264_rd_cost_mb
724       1.3227  x264                     x264_satd_8x8_neon
679       1.2405  x264                     x264_pixel_sad_x4_16x16_neon
672       1.2277  x264                     x264_pixel_sad_x4_8x8_neon
638       1.1656  x264                     x264_pixel_satd_4x4_neon
634       1.1583  x264                     x264_macroblock_cache_load_progressive
633       1.1565  x264                     x264_pixel_satd_4x4
601       1.0980  x264                     x264_pixel_sad_8x8_neon
571       1.0432  x264                     x264_quant_4x4_neon
528       0.9646  x264                     x264_slicetype_mb_cost
500       0.9135  x264                     x264_mb_predict_mv
492       0.8989  x264                     x264_pixel_satd_8x8_neon
473       0.8642  x264                     x264_mb_analyse_intra
471       0.8605  x264                     x264_mb_encode_8x8_chroma
454       0.8295  x264                     x264_pixel_sad_x3_16x16_neon
452       0.8258  x264                     x264_macroblock_tree_propagate
404       0.7381  x264                     x264_pixel_sad_x3_8x8_neon
395       0.7217  x264                     x264_satd_16x4_neon
386       0.7052  libc-2.9.so              /lib/libc-2.9.so
367       0.6705  x264                     x264_mb_predict_mv_ref16x16
353       0.6449  x264                     x264_sub8x4_dct_neon
309       0.5645  x264                     x264_macroblock_cache_save
293       0.5353  x264                     x264_hadamard_ac_8x8_neon
281       0.5134  x264                     x264_analyse_update_cache
280       0.5116  x264                     x264_mb_analyse_inter_b8x8_mixed_ref
276       0.5042  x264                     x264_slice_write
270       0.4933  x264                     x264_cabac_encode_decision_c
267       0.4878  x264                     x264_mb_analyse_inter_b16x16
253       0.4622  x264                     x264_cabac_mb_mvd
235       0.4293  x264                     block_residual_write_cabac
223       0.4074  x264                     x264_decimate_score16
214       0.3910  x264                     __aeabi_fdiv
209       0.3818  x264                     x264_pixel_var2_8x8_neon
204       0.3727  x264                     __aeabi_fadd
201       0.3672  x264                     deblock_strength_c
197       0.3599  x264                     x264_pixel_avg2_w20_neon
197       0.3599  x264                     x264_pixel_avg_w16_neon
187       0.3416  x264                     x264_mc_copy_w16_aligned_neon
184       0.3362  x264                     x264_cabac_mb_type
181       0.3307  x264                     load_deinterleave_8x8x2_fenc
178       0.3252  x264                     x264_macroblock_write_cabac
177       0.3234  x264                     x264_mb_mc_01xywh
168       0.3069  x264                     x264_mb_mc_0xywh
167       0.3051  x264                     mc_luma_neon
165       0.3015  x264                     x264_pixel_ssd_16x16_neon
161       0.2941  x264                     x264_quant_dc_trellis
159       0.2905  x264                     x264_pixel_avg_w8_neon
156       0.2850  x264                     x264_mc_copy_w8_neon
156       0.2850  x264                     x264_pixel_satd_16x16_neon
155       0.2832  x264                     x264_mb_analyse_inter_p16x16
151       0.2759  x264                     x264_pixel_ssd_8x8_neon
150       0.2740  x264                     memcpy_aligned_8_16_neon
148       0.2704  x264                     __aeabi_fmul
143       0.2613  x264                     x264_mb_encode_i4x4
134       0.2448  x264                     mbtree_propagate_cost
124       0.2265  x264                     x264_pixel_sad_8x16_neon
112       0.2046  x264                     x264_mc_weight_w16_offsetsub_neon
111       0.2028  x264                     x264_pixel_sad_x4_8x16_neon
110       0.2010  x264                     x264_mb_predict_mv_direct16x16
108       0.1973  x264                     x264_frame_init_lowres_core_neon
98        0.1790  x264                     x264_plane_copy_interleave_c
93        0.1699  x264                     block_residual_write_cabac
91        0.1663  x264                     __floatsisf
88        0.1608  x264                     x264_mb_mc
87        0.1589  x264                     x264_cabac_mb_ref
87        0.1589  x264                     x264_dequant_4x4_neon
86        0.1571  x264                     __aeabi_l2f
86        0.1571  x264                     x264_satd_4x8_8x4_end_neon
85        0.1553  x264                     x264_pixel_var_16x16_neon
82        0.1498  x264                     store_interleave_8x8x2
82        0.1498  x264                     x264_ratecontrol_mb_qp
80        0.1462  x264                     x264_pixel_sad_x4_16x8_neon
79        0.1443  x264                     x264_mb_predict_mv_16x16
79        0.1443  x264                     x264_pixel_sad_16x8_neon
76        0.1389  x264                     x264_mc_weight_w8_offsetsub_neon
72        0.1315  x264                     x264_prefetch_fenc_arm
71        0.1297  x264                     x264_mb_analyse_b_rd
70        0.1279  x264                     x264_add8x4_idct_neon
70        0.1279  x264                     x264_frame_deblock_row
70        0.1279  x264                     x264_pixel_sad_x3_8x16_neon
68        0.1242  x264                     x264_pixel_hadamard_ac_16x16_neon
61        0.1114  x264                     x264_coeff_last16_neon
57        0.1041  x264                     deblock_v_chroma_c
56        0.1023  x264                     x264_predict_16x16_h_c
55        0.1005  x264                     x264_predict_4x4_hd_c
54        0.0987  x264                     x264_predict_8x8_vr_c
53        0.0968  x264                     x264_mb_encode_i16x16
53        0.0968  x264                     x264_predict_4x4_vl_c
52        0.0950  x264                     x264_hpel_filter_c_neon
52        0.0950  x264                     x264_hpel_filter_v_neon
52        0.0950  x264                     x264_predict_4x4_vr_c
52        0.0950  x264                     x264_predict_8x8_filter_c
50        0.0913  x264                     x264_mb_analyse_intra_chroma
50        0.0913  x264                     x264_mb_analyse_p_rd
50        0.0913  x264                     x264_ratecontrol_mb
49        0.0895  x264                     x264_mb_mc_8x8
49        0.0895  x264                     x264_predict_8x8_hd_c
47        0.0859  x264                     memcpy_aligned_16_16_neon
47        0.0859  x264                     x264_mb_mc_1xywh
46        0.0840  x264                     x264_pixel_satd_16x8_neon
45        0.0822  x264                     x264_cabac_encode_terminal_c
45        0.0822  x264                     x264_cabac_mb_mvd
45        0.0822  x264                     x264_pixel_sad_x3_16x8_neon
45        0.0822  x264                     x264_zigzag_scan_4x4_frame_neon
44        0.0804  x264                     deblock_h_chroma_c
44        0.0804  x264                     x264_predict_8x8_vl_c
44        0.0804  x264                     x264_predict_8x8c_p_neon
43        0.0786  x264                     x264_pixel_satd_16x16
43        0.0786  x264                     x264_predict_8x8c_dc_c
41        0.0749  x264                     x264_macroblock_deblock_strength
40        0.0731  x264                     x264_copy_column8
40        0.0731  x264                     x264_memcpy_aligned_neon
38        0.0694  x264                     x264_predict_8x8_ddl_c
38        0.0694  x264                     x264_predict_8x8_ddr_c
37        0.0676  x264                     x264_mb_analyse_inter_b8x16
37        0.0676  x264                     x264_mc_copy_w16_neon
36        0.0658  x264                     x264_deblock_h_luma_neon
36        0.0658  x264                     x264_predict_16x16_v_c
36        0.0658  x264                     x264_predict_4x4_hu_c
36        0.0658  x264                     x264_sub4x4_dct_neon
36        0.0658  x264                     x264_sub8x8_dct_dc_neon
35        0.0639  x264                     x264_intra_satd_x3_4x4
35        0.0639  x264                     x264_mb_analyse_inter_b16x8
35        0.0639  x264                     x264_me_refine_bidir_satd
35        0.0639  x264                     x264_pixel_satd_4x8_neon
34        0.0621  x264                     x264_cabac_mb_type
34        0.0621  x264                     x264_predict_16x16_dc_c
33        0.0603  x264                     x264_hpel_filter_h_neon
33        0.0603  x264                     x264_pixel_satd_8x16_neon
32        0.0585  x264                     x264_frame_expand_border_lowres
31        0.0566  x264                     x264_predict_4x4_ddr_armv6
29        0.0530  x264                     x264_macroblock_probe_skip
29        0.0530  x264                     x264_mc_weight_w8_neon
29        0.0530  x264                     x264_predict_16x16_p_neon
28        0.0512  x264                     memcpy_aligned_8_8_neon
28        0.0512  x264                     x264_mb_predict_mv_pskip
28        0.0512  x264                     x264_predict_8x8_hu_c
28        0.0512  x264                     x264_sub16x16_dct_neon
27        0.0493  x264                     x264_me_refine_qpel_refdupe
27        0.0493  x264                     x264_pixel_avg_w4_neon
26        0.0475  x264                     __fixsfsi
26        0.0475  x264                     x264_add4x4_idct_neon
26        0.0475  x264                     x264_cabac_mb_ref
26        0.0475  x264                     x264_intra_satd_x3_8x8c
24        0.0438  x264                     x264_intra_satd_x3_16x16
24        0.0438  x264                     x264_pixel_satd_8x4_neon
24        0.0438  x264                     x264_quant_2x2_dc_neon
23        0.0420  x264                     x264_weight_cost_luma
22        0.0402  x264                     x264_predict_8x8c_h_c
21        0.0384  x264                     __aeabi_fcmpgt
21        0.0384  x264                     x264_predict_4x4_dc_c
20        0.0365  x264                     x264_ac_energy_mb
20        0.0365  x264                     x264_slicetype_frame_cost
19        0.0347  x264                     x264_cabac_encode_bypass_c
19        0.0347  x264                     x264_deblock_v_luma_neon
18        0.0329  x264                     memcpy_aligned_16_8_neon
17        0.0311  x264                     x264_decimate_score15
17        0.0311  x264                     x264_intra_rd
16        0.0292  x264                     x264_cabac_mb_skip
16        0.0292  x264                     x264_var_end
15        0.0274  x264                     __cmpsf2
15        0.0274  x264                     x264_predict_4x4_h_c
14        0.0256  x264                     x264_pixel_avg_8x8_neon
14        0.0256  x264                     x264_pixel_avg_weight_w16_add_add_neon
13        0.0238  x264                     deblock_v_luma_intra_c
13        0.0238  x264                     x264_frame_expand_border
13        0.0238  x264                     x264_mc_weight_w8_offsetadd_neon
13        0.0238  x264                     x264_predict_4x4_ddl_neon
12        0.0219  x264                     x264_frame_expand_border_filtered
12        0.0219  x264                     x264_memzero_aligned_neon
12        0.0219  x264                     x264_pixel_var_8x8_neon
11        0.0201  x264                     x264_mb_cache_mv_b16x8
11        0.0201  x264                     x264_predict_4x4_v_c
10        0.0183  x264                     x264_cabac_encode_ue_bypass
10        0.0183  x264                     x264_macroblock_cache_load_neighbours_deblock
9         0.0164  x264                     idct_dequant_2x2_dconly
9         0.0164  x264                     x264_mb_analyse_transform_rd
9         0.0164  x264                     x264_pixel_avg_weight_w8_add_add_neon
9         0.0164  x264                     x264_predict_4x4_dc_armv6
8         0.0146  x264                     x264_prefetch_ref_arm
7         0.0128  x264                     x264_coeff_last15_neon
7         0.0128  x264                     x264_prefetch_fenc
6         0.0110  x264                     x264_add8x8_idct_dc_neon
6         0.0110  x264                     x264_add8x8_idct_neon
6         0.0110  x264                     x264_pixel_avg_16x16_neon
6         0.0110  x264                     x264_predict_8x8c_dc_neon
6         0.0110  x264                     x264_weight_scale_plane
5         0.0091  x264                     __aeabi_cfrcmple
5         0.0091  x264                     x264_adaptive_quant_frame
5         0.0091  x264                     x264_macroblock_tree_finish
5         0.0091  x264                     x264_mb_cache_mv_b8x16
5         0.0091  x264                     x264_pixel_avg_4x4_neon
5         0.0091  x264                     x264_predict_8x8c_v_c
4         0.0073  x264                     __aeabi_ui2f
4         0.0073  x264                     deblock_h_chroma_intra_c
4         0.0073  x264                     deblock_h_luma_intra_c
4         0.0073  x264                     x264_fdec_filter_row
4         0.0073  x264                     x264_frame_init_lowres
4         0.0073  x264                     x264_predict_16x16_h_neon
4         0.0073  x264                     x264_predict_4x4_h_armv6
3         0.0055  x264                     __aeabi_cfcmple
3         0.0055  x264                     __divdf3
3         0.0055  x264                     deblock_v_chroma_intra_c
3         0.0055  x264                     x264_dequant_4x4_dc_neon
3         0.0055  x264                     x264_encoder_encode
3         0.0055  x264                     x264_frame_filter
3         0.0055  x264                     x264_nal_escape_c
3         0.0055  x264                     x264_pixel_avg_8x16_neon
3         0.0055  x264                     x264_predict_16x16_dc_neon
3         0.0055  x264                     x264_rc_analyse_slice
2         0.0037  libpthread-2.9.so        /lib/libpthread-2.9.so
2         0.0037  x264                     __subsf3
2         0.0037  x264                     x264_analyse_init_costs
2         0.0037  x264                     x264_coeff_last4_arm
2         0.0037  x264                     x264_encoder_frame_end
2         0.0037  x264                     x264_predict_16x16_dc_top_neon
2         0.0037  x264                     x264_quant_4x4_dc_neon
2         0.0037  x264                     x264_sub8x8_dct_neon
2         0.0037  x264                     x264_weight_cost_init_luma
1         0.0018  libm-2.9.so              /lib/libm-2.9.so
1         0.0018  x264                     __aeabi_d2f
1         0.0018  x264                     __aeabi_f2d
1         0.0018  x264                     __aeabi_fcmplt
1         0.0018  x264                     __aeabi_uidivmod
1         0.0018  x264                     __cmpdf2
1         0.0018  x264                     __divdi3
1         0.0018  x264                     __muldf3
1         0.0018  x264                     __udivdi3
1         0.0018  x264                     bs_write_ue_big
1         0.0018  x264                     hpel_filter_neon
1         0.0018  x264                     optimize_chroma_dc
1         0.0018  x264                     x264_add16x16_idct_dc_neon
1         0.0018  x264                     x264_dct4x4dc_neon
1         0.0018  x264                     x264_frame_copy_picture
1         0.0018  x264                     x264_frame_push_unused
1         0.0018  x264                     x264_free
1         0.0018  x264                     x264_macroblock_cache_mv_4_2
1         0.0018  x264                     x264_macroblock_slice_init
1         0.0018  x264                     x264_pixel_avg_4x8_neon
1         0.0018  x264                     x264_pixel_avg_8x4_neon
1         0.0018  x264                     x264_predict_16x16_v_neon

Difference between revisions of "X264 TODO"

Revision as of 19:58, 15 August 2011

Contents

Motion Estimation

Intra Analysis

Mode Decision

Psy

Lookahead

Quantization

Transform

Interlacing

Weighted Prediction

Ratecontrol

GPU

x86 assembly

Other assembly

Other CPU optimizations

Other features

x264CLI

SOCIS x264 Profile

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Help / Documentation

Development

VideoLAN wiki

Tools

@@ Line 1: / Line 1: @@
-This page contains an incomplete list of things available in x264 for you to do.  It's organized into sections covering various parts of x264.
+This page contains an incomplete list of things available in x264 for you to do. It's organized into sections covering various parts of x264.
 Some useful resources: [http://www.x264.nl/developers/Dark_Shikari/?dir=./src Dark Shikari's pile of junk], [http://akuvian.org/src/x264/ Pengvado's pile of junk].
-If you're interested in doing any of this, drop by #x264dev on Freenode IRC.  There are no experience or educational requirements for doing any of this, though you are expected to know how to code.
+If you're interested in doing any of this, drop by #x264dev on Freenode IRC. There are no experience or educational requirements for doing any of this, though you are expected to know how to code.
-Bolded features may have companies willing to sponsor or provide bounties.  This is not complete either; just because it's not bolded doesn't mean there aren't resources out there.  If your company is interested in offering a bounty, drop by IRC.
+Bolded features may have companies willing to sponsor or provide bounties. This is not complete either; just because it's not bolded doesn't mean there aren't resources out there. If your company is interested in offering a bounty, drop by IRC.
-===Motion Estimation===
+=== Motion Estimation ===
-* Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs.  The downside is we won't be able to use SAD_X4 anymore.
-* (T)ESA is currently wrong for motion searches done on weightp duplicates.  This effect is miniscule, but it still should be fixed.
-* Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA.  It might also help regularize motion.
-* Somehow take into account the effect of motion vector decision on future blocks.
-** Hierarchical motion estimation
-** Approximations from lookahead MVs
-** Iterative ME (as per Snow)
-** Trellis motion estimation
-* We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
-** But how do we know which ones we can afford to skip, and when?
-** Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
-* libtheora does fullpel motion estimation on the source pixels instead of decoded pixels.  Does this give a better starting point for the subpel search and discourage "weird" MVs?
-* With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?
-* Try sub-8x8 partitions in B-frames.  Is it at all useful?
-* Try bidir motion estimation for fullpel.  That is, considering L1's MV when doing L0 (or vice versa).  Xvid does this.  How much does it help?
-* Fullpel chroma ME?
-** For TESA?
-===Intra Analysis===
+*Sequential elimination (SEA), used for exhaustive search, might be more generally applicable to algorithms like UMH, by letting us skip a lot of SADs. The downside is we won't be able to use SAD_X4 anymore.
-* Make the early terminations smarter.  Currently they're just hacks -- some statistical analysis might be useful.
+*(T)ESA is currently wrong for motion searches done on weightp duplicates. This effect is miniscule, but it still should be fixed.
-* SAD (subme 1) i8x8 vs i4x4 decision is a bit bad.  Can it be improved without significant speed loss?
+*Hierarchical motion estimation might be a useful way to catch very long motion vectors without the cost of UMH or ESA. It might also help regularize motion.
+*Somehow take into account the effect of motion vector decision on future blocks.
+**Hierarchical motion estimation
+**Approximations from lookahead MVs
+**Iterative ME (as per Snow)
+**Trellis motion estimation
+*We don't need to check all 11 predictors all the time for 16x16 fullpel motion search.
+**But how do we know which ones we can afford to skip, and when?
+**Xvid and libtheora have algorithms for this, but the former's is almost surely 100% useless and the latter doesn't seem impressive either.
+*libtheora does fullpel motion estimation on the source pixels instead of decoded pixels. Does this give a better starting point for the subpel search and discourage "weird" MVs?
+*With extremely fast encoding settings (subme 0), can we rip off lookahead MVs instead of doing a real search?
+*Try sub-8x8 partitions in B-frames. Is it at all useful?
+*Try bidir motion estimation for fullpel. That is, considering L1's MV when doing L0 (or vice versa). Xvid does this. How much does it help?
+*Fullpel chroma ME?
+**For TESA?
-===Mode Decision===
+=== Intra Analysis ===
-* Can we find more ways to skip more motion searches in multiref?
-* On extremely fast encoding settings, fast skip is actually kind of slow.  But anything dumber (e.g. SAD) is completely useless.  Is there some better balance that can be achieved here?
-* See the TODOs for deblock-aware RD in common/deblock.c.
-* Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision?  At very fast settings, the time this uses is nontrivial.
-* Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD?  RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.
-* Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct?  [http://akuvian.org/src/x264/x264_dct8_guess.diff This has been tried before], but only helped if our guess was extremely good (better than we could get in reality).
-* With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?
-* How about a "brute force" mode decision that takes no shortcuts (no early ref termination in p8x8, no SATD thresholds, etc)?
-===Psy===
+*Make the early terminations smarter. Currently they're just hacks -- some statistical analysis might be useful.
-* Psy-RD is a hack.  It works, but it's a hack.  If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results.  This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric.  Is there a way to make it better?
+*SAD (subme 1) i8x8 vs i4x4 decision is a bit bad. Can it be improved without significant speed loss?
-* Should RD be linear at all?  Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
-* Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
-* Psy-trellis should be merged with Psy-RD.  There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
-* RD should take into account local variance.
-* Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
-* Lambda should be picked independent of quantizer (i.e. with greater precision).
-* Classic problem: a block is mostly high complexity but has a small area of low complexity.  How do we judge whether that area is important?  Good example: sharp text on background with film grain; grain gets blurred out because of the text.
-** If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
-* Should motion estimation lambda be as high as it is at very high quantizers?  There's some value to capturing "true motion"...
-* Macroblock tree correlates pretty well with visual perception in that its concept of a "high complexity" matches well with the visual concept.  Except for local illumination changes.  Talk to Dark Shikari for a patch.
-===Lookahead===
+=== Mode Decision ===
-* '''Lookahead should be multithreaded, either by splitting the frame (sliced threads) or running multiple frame analysis calls at once.'''
-* Temporal MV predictors in lookahead?  There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
-* Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)?  If so, should it take into account estimated ratecontrol quantizer, too?  If so, how?
-===Quantization===
+*Can we find more ways to skip more motion searches in multiref?
-* CAVLC "trellis" is a hack.  It works, but it's a hack.  Make it better.  See the TODOs in encoder/rdo.c.
+*On extremely fast encoding settings, fast skip is actually kind of slow. But anything dumber (e.g. SAD) is completely useless. Is there some better balance that can be achieved here?
-* There's room for something between trellis and deadzone in terms of complexity.  libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer.  This can't be SIMD'd easily, but is still vastly faster than trellis.  A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
+*See the TODOs for deblock-aware RD in common/deblock.c.
-* Floyd-Steinberg for quantization?  Try pushing quantization error to nearby DCT coefficients.  Should this go from high to low or low to high?
+*Is there a faster way than SA8D/SATD to do 8x8dct vs 4x4dct mode decision? At very fast settings, the time this uses is nontrivial.
-* Energy-preserving quantizer -- maintain L1 (or maybe L2?  I'm not sure) energy.  Should we maintain it in the spatial domain (post-iDCT) or residual domain?  Probably the former.
+*Is there a faster way than RD to do 8x8dct vs 4x4dct mode decision that's still better than SATD? RD takes over an order of magnitude more time than SATD, so it might be useful to have something in the middle.
-* Decimation is currently just a ripoff of the JVT recommended algorithm.  Can we do this more optimally?  With RD?
+*Is there some value to swapping the mode decision metric from SATD to SA8D if we think that the macroblock will use 8x8dct? [http://akuvian.org/src/x264/x264_dct8_guess.diff This has been tried before], but only helped if our guess was extremely good (better than we could get in reality).
+*With trellis 2, can we skip most of CABAC and CAVLC bit cost calculation?
+*How about a "brute force" mode decision that takes no shortcuts (no early ref termination in p8x8, no SATD thresholds, etc)?
-===Transform===
+=== Psy ===
-* Analyze the error characteristics of the fDCT.  Is there any way to make it more accurate without much speed loss?  Particularly at extremely low quantizers, this might help.
-* Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter.  See [http://akuvian.org/src/x264/Shwang_loopfilter_thesis.pdf this paper].
-===Interlacing===
+*Psy-RD is a hack. It works, but it's a hack. If you apply QNS with Psy-RD as the metric, it goes way overboard and gives terrible results. This means that Psy-RD only works because normal mode decision is limited in the way it can modify the image to better suit the metric. Is there a way to make it better?
-* Lookahead currently blend-deinterlaces to get the lowres.  Is this a good idea?  Is there something better that isn't much slower?
+*Should RD be linear at all? Perhaps we should weight more heavily against low quality blocks and also try to ignore minuscule distortion that viewers can't see.
-* Constrained intra + adaptive MBAFF.  Does anyone care about this?
+*Psy-trellis (and maybe psy-RD?) are too strong at very high QPs.
-* PAFF + MBAFF adaptive - PAFF performs better than Adaptive MBAFF on high motion scenes because it can predict from the previous field.
+*Psy-trellis should be merged with Psy-RD. There are patches for this, but they probably won't be committed until psy-RD itself is fixed.
+*RD should take into account local variance.
+*Lambda should be varied on a per-DCT-block basis instead of a per-macroblock basis.
+*Lambda should be picked independent of quantizer (i.e. with greater precision).
+*Classic problem: a block is mostly high complexity but has a small area of low complexity. How do we judge whether that area is important? Good example: sharp text on background with film grain; grain gets blurred out because of the text.
+**If we think it's important all the time, we ruin the quality of many clips that rely on raising complexity on edges (Touhou).
+*Should motion estimation lambda be as high as it is at very high quantizers? There's some value to capturing "true motion"...
+*Macroblock tree correlates pretty well with visual perception in that its concept of a "high complexity" matches well with the visual concept. Except for local illumination changes. Talk to Dark Shikari for a patch.
-===Weighted Prediction===
+=== Lookahead ===
-* '''Make weightp work with interlacing.  Preferably abuse reference duplication to make it useful for MBAFF.'''
-* Finish K-means decision for weightp.  Talk to DylanZA about getting his current patch for this one.
-* Add explicit weighting for B-frames, too.  This helps in nonlinear fades, among other cases.
-* Improve weighted prediction analysis to do more searching based on an estimated offset vs scale gradient.
-===Ratecontrol===
+*'''Lookahead should be multithreaded, either by splitting the frame (sliced threads) or running multiple frame analysis calls at once.'''
-* VBV might be able to utilize the ability to re-encode a row of the frame for improved accuracy.
+*Temporal MV predictors in lookahead? There's a patch for these somewhere, but they biased heavily in favor of B-frames, likely by improving the motion search.
-** Maybe re-encode everything in case of an underflow that row-reencoding can't fix?  This might be better than underflowing.
+*Should lookahead use variable lambda based on quantizer (esp. due to adaptive quant)? If so, should it take into account estimated ratecontrol quantizer, too? If so, how?
-* Current per-frame VBV is a hack.  It only adapts per row and is O(N^2), where N is the number of rows.  An O(N) solution would be able to react more often and thus be more accurate.
-* Make the frame size and row size predictors better.  They currently are kind of crappy.
-* Ratecontrol code as a whole is a bit of a mess.  It could be improved.  There's a lot of cruft left over that is probably not needed now, like qblur.
-* 2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot.  This trust is often misplaced if the first pass was a fast one.  This should be improved.
-* 2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
-* Macroblock-tree: make it more psy-aware.  Maybe we should cap how much it lowers the quantizer on extremely static scenes?  This might tie into the "just-noticeable error" issue in RD.
-===GPU===
+=== Quantization ===
-* Motion estimation?
-** Methods
-*** Hierarchical?
-*** 2D Wave?
-*** Something else?
-** "Easy": lookahead motion estimation
-*** Extremely high parallelism, hundreds of frame searches (each with thousands of searches) at once.
-** "Hard": main motion estimation
-*** Difficult synchronization issues, not as heavily parallel in terms of number of macroblocks, but far more partition sizes and refs to search.
-*** But potentially more useful...
-* Other things?
-===x86 assembly===
+*CAVLC "trellis" is a hack. It works, but it's a hack. Make it better. See the TODOs in encoder/rdo.c.
-* Optimize more for the Phenom.
+*There's room for something between trellis and deadzone in terms of complexity. libvpx has a good example -- it biases towards zero-runs in its "medium speed" quantizer. This can't be SIMD'd easily, but is still vastly faster than trellis. A nonlinear quantizer (be more likely to round up larger coefficients) might also be useful.
-* Yell at holger to commit his local patches.
+*Floyd-Steinberg for quantization? Try pushing quantization error to nearby DCT coefficients. Should this go from high to low or low to high?
-* Make a merged SA8D/SATD for the 8x8dct mode decision, since the two share most of their calculation.  Hadamard_ac already does this, but slightly differently.
+*Energy-preserving quantizer -- maintain L1 (or maybe L2? I'm not sure) energy. Should we maintain it in the spatial domain (post-iDCT) or residual domain? Probably the former.
+*Decimation is currently just a ripoff of the JVT recommended algorithm. Can we do this more optimally? With RD?
-===Other assembly===
+=== Transform ===
-* NEON assembly is nowhere near complete.
-** Chroma MC needs to be rewritten for NV12 support.
-* Altivec assembly is very lacking.
-* SPARC VIS assembly is only available when high bit-depth is disabled.
-===Other CPU optimizations===
+*Analyze the error characteristics of the fDCT. Is there any way to make it more accurate without much speed loss? Particularly at extremely low quantizers, this might help.
-* x264 needs more prefetching.  How many L1 and L2 cache misses (particularly L1) can we get rid of via smart prefetching in the right places?  Warning: this is often hard to benchmark.
+*Before forward transform, run a "blocking filter" that acts as the approximate inverse of the deblock filter. See [http://akuvian.org/src/x264/Shwang_loopfilter_thesis.pdf this paper].
-* Different CPUs take different relative times for some functions.  Is this enough (particularly across architectures) to justify different encoding settings for different CPUs?
-===Other features===
+=== Interlacing ===
-* MPEG-2 encoding support
-* VP8 encoding support
-* '''4:2:2 colorspace support'''
-* Support for SMPTE timecodes
-* Merge speedcontrol
-* Mixed lossless/lossy encoding.
-===x264CLI===
+*Lookahead currently blend-deinterlaces to get the lowres. Is this a good idea? Is there something better that isn't much slower?
-* Finish audio support.  Talk to Kovensky about this one.
+*Constrained intra + adaptive MBAFF. Does anyone care about this?
-* Make the filtering system aware of fullrange vs TV range.
+*PAFF + MBAFF adaptive - PAFF performs better than Adaptive MBAFF on high motion scenes because it can predict from the previous field.
-* Make the filtering system aware of BT.601 vs BT.709.
-* Add more filters.
+=== Weighted Prediction ===
-** Deinterlacers (YADIF).
-** Denoisers (HQDN3D?).
+*'''Make weightp work with interlacing. Preferably abuse reference duplication to make it useful for MBAFF.'''
-** IVTC, decomb?
+*Finish K-means decision for weightp. Talk to DylanZA about getting his current patch for this one.
-* Merge L-SMASH mp4 muxer.
+*Add explicit weighting for B-frames, too. This helps in nonlinear fades, among other cases.
-* Add TS muxing support using HRD.  Talk to kierank about this one.
+*Improve weighted prediction analysis to do more searching based on an estimated offset vs scale gradient.
-* Add --device support.
-* Add automatic --level restriction support.
+=== Ratecontrol ===
+*VBV might be able to utilize the ability to re-encode a row of the frame for improved accuracy.
+**Maybe re-encode everything in case of an underflow that row-reencoding can't fix? This might be better than underflowing.
+*Current per-frame VBV is a hack. It only adapts per row and is O(N^2), where N is the number of rows. An O(N) solution would be able to react more often and thus be more accurate.
+*Make the frame size and row size predictors better. They currently are kind of crappy.
+*Ratecontrol code as a whole is a bit of a mess. It could be improved. There's a lot of cruft left over that is probably not needed now, like qblur.
+*2-pass VBV is actually more likely to underflow than 1-pass because it doesn't adapt as aggressively and trusts first pass data a lot. This trust is often misplaced if the first pass was a fast one. This should be improved.
+*2-pass macroblock-tree: if we added the ability to do macroblock-tree on real encoded data, we'd get better results (particularly with repeating patterns and multiref, such as an anime character's mouth moving).
+*Macroblock-tree: make it more psy-aware. Maybe we should cap how much it lowers the quantizer on extremely static scenes? This might tie into the "just-noticeable error" issue in RD.
+=== GPU ===
+*Motion estimation?
+**Methods
+***Hierarchical?
+***2D Wave?
+***Something else?
+**"Easy": lookahead motion estimation
+***Extremely high parallelism, hundreds of frame searches (each with thousands of searches) at once.
+**"Hard": main motion estimation
+***Difficult synchronization issues, not as heavily parallel in terms of number of macroblocks, but far more partition sizes and refs to search.
+***But potentially more useful...
+*Other things?
+=== x86 assembly ===
+*Optimize more for the Phenom.
+*Yell at holger to commit his local patches.
+*Make a merged SA8D/SATD for the 8x8dct mode decision, since the two share most of their calculation. Hadamard_ac already does this, but slightly differently.
+=== Other assembly ===
+*NEON assembly is nowhere near complete.
+**Chroma MC needs to be rewritten for NV12 support.
+*Altivec assembly is very lacking.
+*SPARC VIS assembly is only available when high bit-depth is disabled.
+=== Other CPU optimizations ===
+*x264 needs more prefetching. How many L1 and L2 cache misses (particularly L1) can we get rid of via smart prefetching in the right places? Warning: this is often hard to benchmark.
+*Different CPUs take different relative times for some functions. Is this enough (particularly across architectures) to justify different encoding settings for different CPUs?
+=== Other features ===
+*MPEG-2 encoding support
+*VP8 encoding support
+*'''4:2:2 colorspace support'''
+*Support for SMPTE timecodes
+*Merge speedcontrol
+*Mixed lossless/lossy encoding.
+=== x264CLI ===
+*Finish audio support. Talk to Kovensky about this one.
+*Make the filtering system aware of fullrange vs TV range.
+*Make the filtering system aware of BT.601 vs BT.709.
+*Add more filters.
+**Deinterlacers (YADIF).
+**Denoisers (HQDN3D?).
+**IVTC, decomb?
+*Merge L-SMASH mp4 muxer.
+*Add TS muxing support using HRD. Talk to kierank about this one.
+*Add --device support.
+*Add automatic --level restriction support.
+=== SOCIS x264 Profile ===
+CPU: ARM V7 PMNC, speed 0 MHz (estimated)
+Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 100000
+<pre>samples  %        image name               symbol name
+     17.8387  x264                     mc_chroma
+      5.7221  x264                     x264_pixel_avg2_w16_neon
+      4.9438  x264                     x264_me_search_ref
+      4.9274  x264                     refine_subpel
+      4.5492  x264                     x264_quant_4x4_trellis
+      3.8166  x264                     x264_pixel_avg2_w8_neon
+      3.6795  x264                     x264_pixel_satd_8x4
+      3.5791  x264                     get_ref_neon
+      2.3915  x264                     x264_pixel_sad_16x16_neon
+      2.0554  x264                     x264_macroblock_encode
+      1.9896  x264                     x264_macroblock_analyse
+       1.4744  x264                     x264_satd_8x4v_8x8h_neon
+       1.4250  x264                     x264_rd_cost_mb
+       1.3227  x264                     x264_satd_8x8_neon
+       1.2405  x264                     x264_pixel_sad_x4_16x16_neon
+       1.2277  x264                     x264_pixel_sad_x4_8x8_neon
+       1.1656  x264                     x264_pixel_satd_4x4_neon
+       1.1583  x264                     x264_macroblock_cache_load_progressive
+       1.1565  x264                     x264_pixel_satd_4x4
+       1.0980  x264                     x264_pixel_sad_8x8_neon
+       1.0432  x264                     x264_quant_4x4_neon
+       0.9646  x264                     x264_slicetype_mb_cost
+       0.9135  x264                     x264_mb_predict_mv
+       0.8989  x264                     x264_pixel_satd_8x8_neon
+       0.8642  x264                     x264_mb_analyse_intra
+       0.8605  x264                     x264_mb_encode_8x8_chroma
+       0.8295  x264                     x264_pixel_sad_x3_16x16_neon
+       0.8258  x264                     x264_macroblock_tree_propagate
+       0.7381  x264                     x264_pixel_sad_x3_8x8_neon
+       0.7217  x264                     x264_satd_16x4_neon
+       0.7052  libc-2.9.so              /lib/libc-2.9.so
+       0.6705  x264                     x264_mb_predict_mv_ref16x16
+       0.6449  x264                     x264_sub8x4_dct_neon
+       0.5645  x264                     x264_macroblock_cache_save
+       0.5353  x264                     x264_hadamard_ac_8x8_neon
+       0.5134  x264                     x264_analyse_update_cache
+       0.5116  x264                     x264_mb_analyse_inter_b8x8_mixed_ref
+       0.5042  x264                     x264_slice_write
+       0.4933  x264                     x264_cabac_encode_decision_c
+       0.4878  x264                     x264_mb_analyse_inter_b16x16
+       0.4622  x264                     x264_cabac_mb_mvd
+       0.4293  x264                     block_residual_write_cabac
+       0.4074  x264                     x264_decimate_score16
+       0.3910  x264                     __aeabi_fdiv
+       0.3818  x264                     x264_pixel_var2_8x8_neon
+       0.3727  x264                     __aeabi_fadd
+       0.3672  x264                     deblock_strength_c
+       0.3599  x264                     x264_pixel_avg2_w20_neon
+       0.3599  x264                     x264_pixel_avg_w16_neon
+       0.3416  x264                     x264_mc_copy_w16_aligned_neon
+       0.3362  x264                     x264_cabac_mb_type
+       0.3307  x264                     load_deinterleave_8x8x2_fenc
+       0.3252  x264                     x264_macroblock_write_cabac
+       0.3234  x264                     x264_mb_mc_01xywh
+       0.3069  x264                     x264_mb_mc_0xywh
+       0.3051  x264                     mc_luma_neon
+       0.3015  x264                     x264_pixel_ssd_16x16_neon
+       0.2941  x264                     x264_quant_dc_trellis
+       0.2905  x264                     x264_pixel_avg_w8_neon
+       0.2850  x264                     x264_mc_copy_w8_neon
+       0.2850  x264                     x264_pixel_satd_16x16_neon
+       0.2832  x264                     x264_mb_analyse_inter_p16x16
+       0.2759  x264                     x264_pixel_ssd_8x8_neon
+       0.2740  x264                     memcpy_aligned_8_16_neon
+       0.2704  x264                     __aeabi_fmul
+       0.2613  x264                     x264_mb_encode_i4x4
+       0.2448  x264                     mbtree_propagate_cost
+       0.2265  x264                     x264_pixel_sad_8x16_neon
+       0.2046  x264                     x264_mc_weight_w16_offsetsub_neon
+       0.2028  x264                     x264_pixel_sad_x4_8x16_neon
+       0.2010  x264                     x264_mb_predict_mv_direct16x16
+       0.1973  x264                     x264_frame_init_lowres_core_neon
+        0.1790  x264                     x264_plane_copy_interleave_c
+        0.1699  x264                     block_residual_write_cabac
+        0.1663  x264                     __floatsisf
+        0.1608  x264                     x264_mb_mc
+        0.1589  x264                     x264_cabac_mb_ref
+        0.1589  x264                     x264_dequant_4x4_neon
+        0.1571  x264                     __aeabi_l2f
+        0.1571  x264                     x264_satd_4x8_8x4_end_neon
+        0.1553  x264                     x264_pixel_var_16x16_neon
+        0.1498  x264                     store_interleave_8x8x2
+        0.1498  x264                     x264_ratecontrol_mb_qp
+        0.1462  x264                     x264_pixel_sad_x4_16x8_neon
+        0.1443  x264                     x264_mb_predict_mv_16x16
+        0.1443  x264                     x264_pixel_sad_16x8_neon
+        0.1389  x264                     x264_mc_weight_w8_offsetsub_neon
+        0.1315  x264                     x264_prefetch_fenc_arm
+        0.1297  x264                     x264_mb_analyse_b_rd
+        0.1279  x264                     x264_add8x4_idct_neon
+        0.1279  x264                     x264_frame_deblock_row
+        0.1279  x264                     x264_pixel_sad_x3_8x16_neon
+        0.1242  x264                     x264_pixel_hadamard_ac_16x16_neon
+        0.1114  x264                     x264_coeff_last16_neon
+        0.1041  x264                     deblock_v_chroma_c
+        0.1023  x264                     x264_predict_16x16_h_c
+        0.1005  x264                     x264_predict_4x4_hd_c
+        0.0987  x264                     x264_predict_8x8_vr_c
+        0.0968  x264                     x264_mb_encode_i16x16
+        0.0968  x264                     x264_predict_4x4_vl_c
+        0.0950  x264                     x264_hpel_filter_c_neon
+        0.0950  x264                     x264_hpel_filter_v_neon
+        0.0950  x264                     x264_predict_4x4_vr_c
+        0.0950  x264                     x264_predict_8x8_filter_c
+        0.0913  x264                     x264_mb_analyse_intra_chroma
+        0.0913  x264                     x264_mb_analyse_p_rd
+        0.0913  x264                     x264_ratecontrol_mb
+        0.0895  x264                     x264_mb_mc_8x8
+        0.0895  x264                     x264_predict_8x8_hd_c
+        0.0859  x264                     memcpy_aligned_16_16_neon
+        0.0859  x264                     x264_mb_mc_1xywh
+        0.0840  x264                     x264_pixel_satd_16x8_neon
+        0.0822  x264                     x264_cabac_encode_terminal_c
+        0.0822  x264                     x264_cabac_mb_mvd
+        0.0822  x264                     x264_pixel_sad_x3_16x8_neon
+        0.0822  x264                     x264_zigzag_scan_4x4_frame_neon
+        0.0804  x264                     deblock_h_chroma_c
+        0.0804  x264                     x264_predict_8x8_vl_c
+        0.0804  x264                     x264_predict_8x8c_p_neon
+        0.0786  x264                     x264_pixel_satd_16x16
+        0.0786  x264                     x264_predict_8x8c_dc_c
+        0.0749  x264                     x264_macroblock_deblock_strength
+        0.0731  x264                     x264_copy_column8
+        0.0731  x264                     x264_memcpy_aligned_neon
+        0.0694  x264                     x264_predict_8x8_ddl_c
+        0.0694  x264                     x264_predict_8x8_ddr_c
+        0.0676  x264                     x264_mb_analyse_inter_b8x16
+        0.0676  x264                     x264_mc_copy_w16_neon
+        0.0658  x264                     x264_deblock_h_luma_neon
+        0.0658  x264                     x264_predict_16x16_v_c
+        0.0658  x264                     x264_predict_4x4_hu_c
+        0.0658  x264                     x264_sub4x4_dct_neon
+        0.0658  x264                     x264_sub8x8_dct_dc_neon
+        0.0639  x264                     x264_intra_satd_x3_4x4
+        0.0639  x264                     x264_mb_analyse_inter_b16x8
+        0.0639  x264                     x264_me_refine_bidir_satd
+        0.0639  x264                     x264_pixel_satd_4x8_neon
+        0.0621  x264                     x264_cabac_mb_type
+        0.0621  x264                     x264_predict_16x16_dc_c
+        0.0603  x264                     x264_hpel_filter_h_neon
+        0.0603  x264                     x264_pixel_satd_8x16_neon
+        0.0585  x264                     x264_frame_expand_border_lowres
+        0.0566  x264                     x264_predict_4x4_ddr_armv6
+        0.0530  x264                     x264_macroblock_probe_skip
+        0.0530  x264                     x264_mc_weight_w8_neon
+        0.0530  x264                     x264_predict_16x16_p_neon
+        0.0512  x264                     memcpy_aligned_8_8_neon
+        0.0512  x264                     x264_mb_predict_mv_pskip
+        0.0512  x264                     x264_predict_8x8_hu_c
+        0.0512  x264                     x264_sub16x16_dct_neon
+        0.0493  x264                     x264_me_refine_qpel_refdupe
+        0.0493  x264                     x264_pixel_avg_w4_neon
+        0.0475  x264                     __fixsfsi
+        0.0475  x264                     x264_add4x4_idct_neon
+        0.0475  x264                     x264_cabac_mb_ref
+        0.0475  x264                     x264_intra_satd_x3_8x8c
+        0.0438  x264                     x264_intra_satd_x3_16x16
+        0.0438  x264                     x264_pixel_satd_8x4_neon
+        0.0438  x264                     x264_quant_2x2_dc_neon
+        0.0420  x264                     x264_weight_cost_luma
+        0.0402  x264                     x264_predict_8x8c_h_c
+        0.0384  x264                     __aeabi_fcmpgt
+        0.0384  x264                     x264_predict_4x4_dc_c
+        0.0365  x264                     x264_ac_energy_mb
+        0.0365  x264                     x264_slicetype_frame_cost
+        0.0347  x264                     x264_cabac_encode_bypass_c
+        0.0347  x264                     x264_deblock_v_luma_neon
+        0.0329  x264                     memcpy_aligned_16_8_neon
+        0.0311  x264                     x264_decimate_score15
+        0.0311  x264                     x264_intra_rd
+        0.0292  x264                     x264_cabac_mb_skip
+        0.0292  x264                     x264_var_end
+        0.0274  x264                     __cmpsf2
+        0.0274  x264                     x264_predict_4x4_h_c
+        0.0256  x264                     x264_pixel_avg_8x8_neon
+        0.0256  x264                     x264_pixel_avg_weight_w16_add_add_neon
+        0.0238  x264                     deblock_v_luma_intra_c
+        0.0238  x264                     x264_frame_expand_border
+        0.0238  x264                     x264_mc_weight_w8_offsetadd_neon
+        0.0238  x264                     x264_predict_4x4_ddl_neon
+        0.0219  x264                     x264_frame_expand_border_filtered
+        0.0219  x264                     x264_memzero_aligned_neon
+        0.0219  x264                     x264_pixel_var_8x8_neon
+        0.0201  x264                     x264_mb_cache_mv_b16x8
+        0.0201  x264                     x264_predict_4x4_v_c
+        0.0183  x264                     x264_cabac_encode_ue_bypass
+        0.0183  x264                     x264_macroblock_cache_load_neighbours_deblock
+         0.0164  x264                     idct_dequant_2x2_dconly
+         0.0164  x264                     x264_mb_analyse_transform_rd
+         0.0164  x264                     x264_pixel_avg_weight_w8_add_add_neon
+         0.0164  x264                     x264_predict_4x4_dc_armv6
+         0.0146  x264                     x264_prefetch_ref_arm
+         0.0128  x264                     x264_coeff_last15_neon
+         0.0128  x264                     x264_prefetch_fenc
+         0.0110  x264                     x264_add8x8_idct_dc_neon
+         0.0110  x264                     x264_add8x8_idct_neon
+         0.0110  x264                     x264_pixel_avg_16x16_neon
+         0.0110  x264                     x264_predict_8x8c_dc_neon
+         0.0110  x264                     x264_weight_scale_plane
+         0.0091  x264                     __aeabi_cfrcmple
+         0.0091  x264                     x264_adaptive_quant_frame
+         0.0091  x264                     x264_macroblock_tree_finish
+         0.0091  x264                     x264_mb_cache_mv_b8x16
+         0.0091  x264                     x264_pixel_avg_4x4_neon
+         0.0091  x264                     x264_predict_8x8c_v_c
+         0.0073  x264                     __aeabi_ui2f
+         0.0073  x264                     deblock_h_chroma_intra_c
+         0.0073  x264                     deblock_h_luma_intra_c
+         0.0073  x264                     x264_fdec_filter_row
+         0.0073  x264                     x264_frame_init_lowres
+         0.0073  x264                     x264_predict_16x16_h_neon
+         0.0073  x264                     x264_predict_4x4_h_armv6
+         0.0055  x264                     __aeabi_cfcmple
+         0.0055  x264                     __divdf3
+         0.0055  x264                     deblock_v_chroma_intra_c
+         0.0055  x264                     x264_dequant_4x4_dc_neon
+         0.0055  x264                     x264_encoder_encode
+         0.0055  x264                     x264_frame_filter
+         0.0055  x264                     x264_nal_escape_c
+         0.0055  x264                     x264_pixel_avg_8x16_neon
+         0.0055  x264                     x264_predict_16x16_dc_neon
+         0.0055  x264                     x264_rc_analyse_slice
+         0.0037  libpthread-2.9.so        /lib/libpthread-2.9.so
+         0.0037  x264                     __subsf3
+         0.0037  x264                     x264_analyse_init_costs
+         0.0037  x264                     x264_coeff_last4_arm
+         0.0037  x264                     x264_encoder_frame_end
+         0.0037  x264                     x264_predict_16x16_dc_top_neon
+         0.0037  x264                     x264_quant_4x4_dc_neon
+         0.0037  x264                     x264_sub8x8_dct_neon
+         0.0037  x264                     x264_weight_cost_init_luma
+         0.0018  libm-2.9.so              /lib/libm-2.9.so
+         0.0018  x264                     __aeabi_d2f
+         0.0018  x264                     __aeabi_f2d
+         0.0018  x264                     __aeabi_fcmplt
+         0.0018  x264                     __aeabi_uidivmod
+         0.0018  x264                     __cmpdf2
+         0.0018  x264                     __divdi3
+         0.0018  x264                     __muldf3
+         0.0018  x264                     __udivdi3
+         0.0018  x264                     bs_write_ue_big
+         0.0018  x264                     hpel_filter_neon
+         0.0018  x264                     optimize_chroma_dc
+         0.0018  x264                     x264_add16x16_idct_dc_neon
+         0.0018  x264                     x264_dct4x4dc_neon
+         0.0018  x264                     x264_frame_copy_picture
+         0.0018  x264                     x264_frame_push_unused
+         0.0018  x264                     x264_free
+         0.0018  x264                     x264_macroblock_cache_mv_4_2
+         0.0018  x264                     x264_macroblock_slice_init
+         0.0018  x264                     x264_pixel_avg_4x8_neon
+         0.0018  x264                     x264_pixel_avg_8x4_neon
+         0.0018  x264                     x264_predict_16x16_v_neon</pre>