Difference between revisions of "SoC 2010 Ogg Demuxer"

From VideoLAN Wiki
Jump to navigation Jump to search
Line 184: Line 184:
 
Just for fun today I started creating an avformat demuxer for LiVES based on the vlc avformat demuxer.
 
Just for fun today I started creating an avformat demuxer for LiVES based on the vlc avformat demuxer.
  
 +
 +
'''10 Jun'''
 +
 +
I played around a bit more with the avfilter demuxer in LiVES to see if it would be useful at all for the ogg filter, but I found it is totally unsuitable for serious video editing. Quite often it gives the wrong frame rate (1000 fps) and it is pretty much useless for seeking. The seek function doesn't even attempt to find a keyframe, and the display_picture_number is always 0. The only things it seems to get right every time are the frame size and pixel format, and the audio format.
  
  

Revision as of 21:45, 10 June 2010

Week 1:

26th May 2010

I started by porting and adapting the existing code from the LiVES ogg/theora decoder.

I made some updates to ogg.c and added the new files oggseek.c and oggseek.h. Some definitions from ogg.c were moved into the new file ogg.h since they are now common to both ogg.c and oggseek.c. I also updated Modules.am with the new files.

As a first test, I was able to correctly determine the total number of frames in various ogg/theora files:

- seek to the last ogg page for a given stream (i.e. the theora stream)
- get the granulepos
- convert the granulepos to a frame number
- set the total number of frames

As a result the demuxer can now return the file length in microseconds. This can be seen in the interface at the bottom. Previously this figure was an estimate, but now it is accurate.


As a second test, I was able to seek roughly to any given frame or time in a clip. Right now it is not frame accurate, so there are still some minor artifacts after seeking. I plan to fix this properly in the next couple of days.

One thing which was slightly unexpected - vlc may (?) start playback before all of the stream header data is read. This means that when getting the frame count, for very small files we may read back into the header data. I need to investigate if this is a problem or not - so far it has not caused any issues.


27th May 2010

Today I hit the first problem. In order to seek to an inter frame, we need to rewind to the previous intra frame and the pre-roll to the target. However, when vlc does a pre-roll it does so at the normal frame rate. This can be uncomfortable for the user because it could require a pre-roll of quite a few frames, so you can see the time slider moving at the bottom but there is no video or audio playing until the pre-roll catches up. What I would like to know, is there a way to tell the player "give me the next packet as quick as possible" ? Then it would zip through the pre-roll quickly and the user would not notice anything. I have emailed my mentor to ask this (since the player code is done outside of the demuxer).

Implementing pre-roll for theora meant a minor change to the theora.c codec file. This change should not impact any of the other demuxers.


28th May 2010

I will be changing ISP's today, so there may be some unavoidable disruption to my schedule.

Resolved the problem with slow pre-roll (thanks Fenrir !).

For future reference, use es_out_Control(), and send ES_OUT_SET_NEXT_DISPLAY_TIME to set the pre-roll target time.

ogg/theora is pretty much done now. I still need a clip with subtitles to test. There seems to be a strange problem with seeking in Kate files, it's not possible to seek to the middle of titles. Will take a look at this next week, but I think it is not an important issue.


TODO:

This week:

- seek to the exact frame required (i.e skip frames until we hit the target frame) - done (but see above)

- work out how to tell if we are dealing with a remote stream (which has no seek capabilities) - done

- check audio and subtitle sync after a seek - done

- test with more ogg/theora clips (including very small clips, see above) - postponed due to ISP change, OS upgrade and other issues - done

- clean up the code - done (but left some debug output in to help with the dirac seeking)

- see if we can pull/use more meta data from the stream (e.g. comments, codec version, aspect ratio, frame and picture size, x and y offsets) - done (no changes needed/possible)


Week 2:

- look briefly into Kate issue, see if it is trivial to fix
- check with very small file and with captions
- look at meta data


I plan to backport the updated code back in to LiVES, since I made a few corrections and improvements to the code. Split the LiVES code into demuxer/decoder - this will be necessary for the next part.


31 May

I had a look at the metadata, it seems that comments and so on are already read and included by the theora decoder. So no changes are needed here. It would be nice to show the codec version, but this seems not to be possible.

Regarding the Kate issue, for some reason now I discovered that Kate subtitles are not being read on my machine - vlc is complaining that it has no decoder for them ! I will look into this. I am testing with the Elephants Dream with subtitles clip - audio and video seek are working perfectly, so I would imagine that subtitles will be fine also.

The "kate issue" mentioned from last week actually seems to have nothing to do with Kate, in fact it seems to be peculiar to that one clip - vlc also has some strange timing issues with it.


1 Jun

OK, managed to test Kate (seems like I was missing the Kate libraries), and all seems OK. Got a bit sidetracked today by some urgent issues in LiVES, and an upgrade from ubuntu Karmic to Lucid.


2 Jun

All issues have been resolved now, apart from 1 item which came up during my testing. For clips with small number of frames, with few changes between frames (e.g. 10 blank frames), we are unable to find the keyframe. I believe this is due to the minumum size of the search domain (8500 bytes). The solution may be to keep reducing the search size until we find the keyframe. I will test this as part of the backporting into LiVES. Apart from this everything is working nicely for ogg/theora. I will have a first patch ready this week.


3 Jun

Preparing LiVES codebase for backport of code from vlc.

Sent first set of patches to vlc-devel.

Today was a holiday in Brazil, so I worked only half a day.


4 Jun

Some minor problems with the first patch were fixed, and it was checked and resubmitted.

I have now backported most of the updated code back into LiVES - except for some things to do with metadata.




Week 3:

Still todo:

- grab extra metadata in LiVES.

- see if LiVES could use the decoder codecs from vlc

- look into the 1 remaining oggseek issue


Investigate how seeking works with dirac video streams (it is slightly different to theora in that there is a lower bound, but no upper bound). Test implementation of this in LiVES first.


7 Jun

Worked on some formatting issues to do with the patch I submitted last week.


8 Jun

Working on some more formatting issues to do with the patch.


9 Jun

I hope the patch is OK now, I think I covered all of the issues mentioned in feedback.


Here is a brief explanation of how the seeking works:


We create an index on the fly, this index is basically: offset in file -> maximum frame number

When asked to seek to a particular frame, we first check the index to see if we have approximate boundaries for the seek, otherwise we will seek in the whole file, from data_start until the end.

The area to be searched is divided into two halves. We first check the upper half, and get the highest and lowest granulepos. (The granulepos is basically keyframe * keyframe_offset + frame offset). If our target frame lies in this region we subdivide it into two halves, otherwise we check the lower half from earlier. We stop when we have found the keyframe for our target frame, or the search region is < minimum_page_size.

What we are aiming to find on this first pass, is the highest keyframe (sync point) which is <= target frame.

Once we have found this, we need to rewind a bit further, because the ogg container only discloses where a frame ends, not where it begins. So we do a second pass and find the highest granulepos < target granulepos from the last step. We begin decoding from here, ignoring any frames which are output on this first page. We then start counting down until we reach the target frame.

As we discover keyframes (sync points) these are added to the index. Also, if we discover a higher frame number which is based on the same keyframe we update our index. Additionally, during normal playback the index is updated with keyframes as we play them.


If the codec/demuxer is installed and working properly you can see this in operation - the first seek takes a noticable fraction of a second, subsequent seeks become increasingly faster as the keyframe -> highest frame index is built up.


I believe this is the most efficient way of seeking in ogg (at least for theora - and probably for dirac; although dirac seems a little different in that there is a lower bound but no upper bound for dirac, and according to the dirac spec the granulepos shows the first frame decoded on a page rather than the last frame).

There is currently one known issue, which is if the entire file is < min_page_size, we never find any keyframes. I am working on a fix for this, I believe the solution in this case is simply to divide the min_page_size by 2 until we get a keyframe produced.



Also, one issue which I suggested on the vlc-devel mailing list:


I would like to propose a new flag for the stream:

STREAM_CAN_EXACTSEEK

- the proposed meaning of this flag is that within the stream one can seek exactly to any given frame without artifacts in the frame. This flag must be settable by the demuxer plugin, and is not fixed - for example you could have a container with two video streams one after the other, the first could set this flag and the second (using a different codec) could be not seekable.

Rationale: I understand that you are creating libvlc with the intention of this being used for video editing applications. I know from my own experience with LiVES that such applications require demuxers which can deliver the exact frame requested, so generally one would need to look at STREAM_CAN_EXACTSEEK | STREAM_CAN_FASTSEEK to see if the stream is immediately usable or requires further processing (caching, indexing, etc).


Just for fun today I started creating an avformat demuxer for LiVES based on the vlc avformat demuxer.


10 Jun

I played around a bit more with the avfilter demuxer in LiVES to see if it would be useful at all for the ogg filter, but I found it is totally unsuitable for serious video editing. Quite often it gives the wrong frame rate (1000 fps) and it is pretty much useless for seeking. The seek function doesn't even attempt to find a keyframe, and the display_picture_number is always 0. The only things it seems to get right every time are the frame size and pixel format, and the audio format.



Week 4:

Continue work from week 3, and begin porting dirac code from LiVES to vlc.



Week 5+:

TBD. (Maybe look at x and y offsets, maybe look at multiple clips within one ogg container, possibly add ogg skeleton support).

Unofficial codecs in ogg (ogm, e.g. PCM audio, divx, xvid) ?

Frei0r effects ?

Jack audio output ?

dv demuxer / decoder ?

Port other demuxers/decoders to LiVES ?


Note:

I will be mostly unavailable after 29th July (attending Piksel Summer Camp and then family commitments), but I should have all of the coding for the main project done well before that.