This page contains the information about AviSynth internal working (most of this info is deduced from AviSynth source code, because documentation is sparse and AviSynth developers are hard to reach) and GSoC 2009 "AviSynth plugin wrapper for GStreamer" project, codenamed "GstAVSynth".

AviSynth is a framework aimed at video- (and, to some extent, audio-) information processing and serving that information to an application, presenting it frame-by-frame (sample-by-sample) as an AVI file (hence the name) via Windows Multimedia AVIFile API.

To do that AviSynth constructs a graph that consists of AviSynth filters. Filters may process video/audio frames, or produce them (source filters). Filter graph is constructed from AviSynth script, written by user in AviSynth scripting language. Filters could be implemented as dynamically-loadable modules (plugins) or as scripts, or both - AviSynth exposes scripting functionality to plugins, so it is not forbidden for plugins to rely on other plugins (as a script would do).

AviSynth internals

AviSynth exists as an instance of the ScriptEnvironment class, that implements IScriptEnvironment interface. IScriptEnvironment interface is defined in avisynth.h and is the only way for plugins and external applications to communicate with AviSynth. A pointer to ScriptEnvironment object is passed along to all plugins, so that they can use AviSynth facilities. Plugins are forbidden from storing this pointer. AviSynth creates one ScriptEnvironment object when Windows attempts to use AviSynth to open a script via AVIFile API.

When ScriptEnvironment is created, it checks for CPU extensions (to provide this information to filters that are able to use CPU extensions), sets up memory limits for itself and performs pre-scanning of all plugins.

Plugin loading

AviSynth finds all dynamically-loadable modules in its plugin directory and looks for initialization functions in each of these modules. If such function is found, AviSynth calls it, passing ScriptEnvironment object to it.

Initialization function usually does just one thing - it calls AddFunction method of the ScriptEnvironment (multiple times, if a particular plugin implements more than one AviSynth filter).

When AviSynth pre-scans the plugins, AddFunction works in pre-scanning mode: it only stores function name and function parameters specification string in the function table, which contains information about each plugins and about all functions that the plugin implements.

When function is invoked from a script or from another plugin, it is looked up first in local function table (loaded functions), then in pre-scanned plugin function table (functions that are not loaded yet; if function is found in this table, the appropriate plugin is loaded and its functions are added to local function table), then in built-in function table (functions that are part of AviSynth core and do not require loading).

Plugins are not prohibited from calling AddFunctions with the same arguments more than once, and pre-scanned function table will hold duplicate entries, but only the most recent entry will be found on a look-up. Plugins are allowed to add functions that override any of the built-in functions, except for LoadPlugin function.

Filter graph building

AviSynth script consists of a list of statements. Most of the statements are expressions, with common operations (assignment, addition, subtraction etc) and function calls. If statement first-level expression is not an assignment, the result that expression is implicitly assigned to "last" variable. Also, if first function argument is of type "clip", and it is not specified in the function call, its value is implicitly defined by the "last" variable. Thus a script

Func1()
Func2(foo=bar)

where Func1() returns a clip object and Func2 takes a clip object and an argument named "foo", is implicitly equivalent to

a = Func1()
b = Func2(a, foo=bar)
return b

A script must return a clip object, and most of the functions will return clip objects rather than integers or string. Clip object is an object of a class that implements IClip interface. AviSynth builds a filter graph from these objects when it first parses a script. Then it starts calling GetFrame() method of the last clip in the graph (the clip object that got implicitly assigned to "last" variable in the last statement, or the clip object that was returned by "return" statement) to get a next frame to serve to an application.

A clip object usually makes a few calls to GetFrame() methods of its clip children (upstream filters in the filter graph) and processes the buffer(s) it gets (by rewriting parts of a buffer, or by creating a new buffer). Source filters don't call GetFrame(), since the have no children. They generate frame buffers by themselves (usually by reading from external source).

AviSynth scripts have a few special statements that are re-evaluated for every frame, the other statements are only evaluated when a script is pre-parsed and a filter graph is created. AviSynth plugins on the other hand are free to do any per-frame calculations, since they work through GetFrame().

AviSynth plugins can call Invoke method of the script environment to invoke other plugins (external or built-in ones). AviSynth allows such calls do be made inside GetFrame() method, although developers advice against it, because Invoke builds a filter graph branch (which is the desirable behaviour when filter graph is being built), which is then used to get necessary frame, and is destroyed afterward (because GetFrame(), unlike filter object constructors, will not preserve the branch). Obviously, this causes a performance drop.

AviSynth plugins are allowed to work with AviSynth script variables and can manipulate the scope. In conjunction with Invoke method that gives AviSynth plugins all the same possibilities that are available to AviSynth scripts.

Cache

When AviSynth builds a filter graph from a script, it inserts a cache at each link. Invoke call however will not cause video frames to be cached, that is why AviSynth exposes cache-related functions via IScriptEnvironment.

GstAVSynth

GstAVSynth will implement some of the AviSynth interfaces (mostly the IScriptEnvironment) which allows it to interact with AviSynth plugins.

GstAVSynth is aimed at video filters only, because AviSynth audio capabilities are unimpressive and audio processing is implemented separately in AviSynth.

ScriptEnvironment instantiation

While plugin initialization functions usually only call AddFunction method of the ScriptEnvironment, they are not prohibited from calling other methods too. Still, pre-scanning of the plugins (the process of finding plugin libraries, querying them for function names and creating GStreamer element classes based on this data) can be done without having a fully functional ScriptEnvironment object.

GStreamer pipeline may have to use AviSynth plugins in more than one place, and these plugins will not have any AviSynth-specific interactions among them, only generic GStreamer linkage. This is roughly equivalent to an application opening two AviSynth scripts at once, each script with its own ScriptEnvironment.

While sharing one ScriptEnvironment amongst all GstAVSynth elements may look attractive, it may introduce serious threading-related issues, and also requires ScriptEnvironment to exist outside of GstAVSynth elements, probably as a singleton.

Plugin pre-scanning

GstAVSynth pre-scans the plugins when it is loaded (i.e. in plugin_init() function). It creates a dummy ScriptEnvironment object that only supports AddFunction() method and does nothing when other methods are called.

Filter graph building

GstAVSynth will not implement Invoke() method, since a quick survey has shown that most of the plugins (at least the ones used in MeGUI) do not use Invoke to call anything other than trivial built-in functions (usually - caching and colorspace conversion). Some plugins may be designed to work with other plugins (such as UnDot/Deen or Telecide/Decimate), but that would require full-featured AviSynth script engine, which is out of scope of GstAVSynth project.

Thus, GstAVSynth will not build a filter graph, since each instance of GstAVSynth will wrap around exactly one filter.

GstAVSynth element will have two properties:

* Initial cache size * Name of the function to call

All other properties will be interpreted as function parameters, except for clip argument(s). Each clip argument will be interpreted as a sink-pad. First pad will be always-pad, other pads will be sometimes-pads, because some filters that accept two or more clips may work fine with just one clip. Each filter will have one source-pad.

Some filters will return something other than IClip-derived object, meaning that instead of building a chain of filters AviSynth should apply the filter for each frame. GstAVSynth won't support that - each filter must return IClip-derived object (that implements GetFrame() method), else the filter is considered unusable.

Some properties may conflict with generic element properties, such as "name". A good way of solving this is to only interpret properties as AviSynth filter properties when they have names prefixed with some string, like "avs_". This is yet to be resolved.

Cache

GstAVSynth will have internal cache, that stores N frames. It will look much alike to a ringbuffer, but that is only a matter of definition, because it wouldn't be built on the same concurrent producer/consumer model as audioringbuffer, where producer starts to overwrite buffer elements when consumer can't read them fast enough.

GstAVSynth cache will provide random access to N frames that reached its sink. For GstAVSynth elements with more than one sink a separate cache will be maintained for each sink.

GstAvSynth will invoke underlying AviSynth plugin right away and will keep getting (and caching) frames from upstream until the plugin gets all the frames it requires to produce one frame. This is true for both pull and push modes.

Cache size will grow beyond initial size, if the plugin requests a frame outside of cached frame range.

Cache size would not grow beyond memory limit, deduced from the amount of free memory available at GstAVSynth element creation time. When memory limit is reached, GstAVSynth will throw fatal error. That is the smartest behaviour at the moment; later it could be possible to implement out-of-cache-range-and-beyond-memory-limit frame requests via seeks, probably with negative playback rate.

Each buffer placed in cache will have "touched" flag. Each time it is requested by a filter, its "touched" flag will be set. After a filter produced one frame, the cache will find all untouched frames and remove them from the cache. It will also reset "touched" flags on all the frames still cached. This will ensure that unnecessary frames will not be in cache. It is possible to use a counter instead of a flag, and to uncache a frame only when it remains untouched for C cycles (1 cycle - 1 call to filter's GetFrame() method; C is a function of N (cache size)).

This will also allow the cache to keep up with a filter that changes the framerate. Imagine a filter that takes 5 frames to produce 1 frame and that reduces framerate by the factor of 2 (say, 24 fps -> 12 fps). For simplicity initial cache size is 0. Also, for simplicity we will assume that the cache blocks incoming data flow even if it does have enough space for some new buffers (essentially reduces 2-threaded parallel system to 1-threaded system with switching contexts):

Cache: empty, cache size is 0

AVSynth asks filter for frame #0
Filter asks for source frame #0
Cache blocks until it gets source frame #0
Cache: 0
Cache size is 1
Cache returns source frame #0

Filter asks for source frame #1
Cache blocks until it gets source frame #1

...

Cache: 0 1 2 3 4
Cache size is 5
Cache returns source frame #4

Filter returns frame #0 (made from source frames #0-4)
AVSynth pushes it downstream.
Cache looks for untouched frames (finds none).

AVSynth calls the filter for frame #1
Filter asks for source frame #2 (because it skips frames, for frame #1 it requires source frames #2-6, not #1-5)
Cache: 0 1 2 3 4
Cache returns source frame #2

...

Cache returns source frame #4

Filter asks for source frame #5
Cache: 0 1 2 3 4
Cache blocks until it gets source frame #5
Cache: 0 1 2 3 4 5
Cache size is 6
Cache returns source frame #5

Filter asks for source frame #6
Cache: 0 1 2 3 4 5
Cache blocks until it gets source frame #6
Cache: 0 1 2 3 4 5 6
Cache size is 7
Cache returns source frame #6

Filter returns frame #1 (made from source frames #2-6)
AVSynth pushes it downstream
Cache looks for untouched frames, finds frames #0 and #1
Cache: 2 3 4 5 6 x x
Cache size is still 7

AVSynth calls the filter for frame #2 (requires source frames #4-8)
Filter asks for source frame #4
...
Filter asks for source frame #7
Cache: 2 3 4 5 6 x x
Cache blocks until it gets source frame #7
Cache: 2 3 4 5 6 7 x
Cache size is still 7
Cache returns source frame #7

Filter asks for source frame #8
Cache: 2 3 4 5 6 x x
Cache blocks until it gets source frame #8
Cache: 2 3 4 5 6 7 8
Cache size is still 7
Cache returns source frame #8

Filter returns frame #2 (made from source frames #4-8)
AVSynth pushes it downstream
Cache looks for untouched frames, finds frames #2 and #3
Cache: 4 5 6 7 8 x x
Cache size is still 7

While the filter keeps skipping over 1 frame in each cyclye and requesting 5 frames, a cache of the size = 7 will be enough. If the filter at this point asks for source frame #2 (for some reason):

Cache: 4 5 6 7 8 x x
Cache clears itself, resizes to 7 + (4 - 2) = 7 + 2 = 9 seeks back to source frame #2, blocks until it gets source frame #2
Cache: 2 x x x x x x x x
Cache size is now 9
Cache returns source frame #2

Filter asks for source frame #6 (falling back to familiar pattern, it needs source frames #6-10)
Cache: 2 x x x x x x x x
Cache blocks until it gets source frame #6
Cache: 2 3 4 5 6 x x x x
Cache size is still 9
Cache returns source frame #6

Cache will expand until its size is enough to keep sustain the filter's activity without seeking backwards.

In real environment it is undesirable to keep the cache size just large enough to enable the filter to produce just one frame. Cache should be large enough to utilize the time that filter spends producing a frame (after it got all the frames it needs for the job) to refill itself, which is not possible in the model described above (the cache removes some buffers and frees 2 slots only after a frame is produced). Because the cache can't know in advance which frames to discard, it should grow larger than R + J (R - number of frames required to produce one frame, in this case R = 5 (we won't count abnormal jump back to frame #2), J - frame jump per cycle, in this case J = 2). Possible solutions:

#1 Wait until the cache size stabilizes (remains the same for L - X frames, where L is the number of frame that caused the last cache resize, and X is the number of frame that caused the last cache reset (and a seek backwards)) and then enlarge the cache by the factor of 2.

#2 Remember the number of frames (T) touched in the last cycle. When the cache becomes full, enlarge it right away, unless the cache size is larger than Y*T (Y is a positive integer constant, equal to 2 or 4 maybe). This is the solution that got implemented.

Event handling

This section should describe the big picture of how GstAVSynth should behave when something happens.

Flush start/stop

Flush start event makes elements to discard any further data they get from upstream, regardless of where it comes from (upstream or downstream). Flush stop event makes elements to discard any data they've got, including segment information, and start accepting data from upstream again. Newsegment event is to be expected after Flush stop. Flushing usually occurs as a response to a seek event or some other flow interruption.

When a flush comes to one of the sinkpads, it should only affect the part of the pipeline connected to that pad. There is no reason to believe that data coming on other sinks should be flushed as well. Flush event should not be passed downstream either. If we had a seek event coming from downstream that caused one of the upstream pipelines to flush, that means other pipelines are likely to flush too. That would make downstream pipeline to be flushed more than once, which doesn't make any sense. Instead, GstAVSynth should emit flush events by itself when it thinks it is right to do so (when seeking occurs, for example). If GstAVSynth did a seek on upstream pipeline by itself, it will get flush events and therefore should not pass them downstream.

A flush stop coming from upstream invalidates the segment information for the pad it comes onto. Pad will discard upstream buffers after flush start and until it gets newsegment event. Flush stop should also clear EOS state (see below) from a pad, so that it won't cause downstream EOS event, but will wait for more frames (or issue a seek) instead.

When a flush comes to the source pad, it affects the whole element and should be passed upstream through all sinkpads.

EOS

EOS only comes from upstream. GstAVSynth should collect EOS on a sinkpad without forwarding it downstream. Instead, GstAVSynth should emit EOS event once it runs out of data on one of its sinkpads (when GetFrame() of its cache asks for a frame that is not cached yet) that already got EOS event - it won't get the requested frame from upstream, meaning that it is not possible for underlying filter to produce frames anymore, hence the EOS sent downstream. GstAVSynth stops pushing data downstream. At this point GstAVSynth should start discarding all buffers coming from upstream with GST_FLOW_UNEXPECTED, suggesting upstream elements that it is EOS indeed (demuxers do that to sources, it is appropriate for a filter to do that?).

Newsegment

Newsegment is sent downstream and indicates new playback start/stop positions, playback rate and speed. Newsegment always precedes first buffer being sent after a seek (or first buffer sent after pipeline construction). Sinks should drop buffers that are not in current segment range. Newsegment without a flush event doesn't reset stream time to 0.

Each sinkpad of GstAVSynth stores a segment it gets from upstream to use its start/stop values to clip incoming data.

It is not obvious what GstAVSynth should do with rate/applied_rate information, if it is different for some of its sinkpads.

GstAVSynth should compare segment duration with the duration it got from the last duration query, and if the duration is different, it should recreate the filter - it is assumed that most filter will not request incoming frame's VideoInfo (which contains the number of frames, i.e. duration) repeatedly, but do it once when they are created.

Other segment fields are of no interest. GstAVSynth maintins its own segment which is calculated based on VideoInfo of the output frames and should send Newsegment events downstream when this segment changes.

Tag

It is not obvious what to do with metadata.

QOS

GstAVSynth cannot adjust its quality. It is expected from GstAVSynth to perform full and precise processing of the source material in its complete and best form and it should never be expected to be fast enough to match realtime playback.

Seek

Seek event may come from downstream to request different playback position and rate in the stream. Seek may be flushing (to discard any data and jump to the position specified), precise (to take the time necessary to jump precisely) and/or segment (to make the pipeline post SEGMENT_DONE message instead of EOS event when a segment ends).

When a seek event comes from downstream, GstAVSynth will simply store the new segment structure and reply that seek has been successful. If it is a flushing seek, GstAVSynth will send flush events downstream. GstAVSynth should also tell each of its sinkpads that a seek has occurred and remember that itself too. All seeks are treated as precise. GstAVSynth should remember a segment seek and post SEGMENT_DONE message when it reaches EOS condition (see above).

Internal segment is always in GST_FORMAT_DEFAULT format. Its time field holds current frame number. When GstAVSynth updates it to new values (start = time = seek.start, stop = seek.stop), the next call to GetFrame() method of a filter will get the new frame number, which will be adjusted by filter and passed to the cache via its own GetFrame() method. The cache will notice that a seek has happened and will send a seek event upstream to seek for a requested frame. Once all upstream pipelines finish their seeks and a new frame is produced by a filter, GstAVSynth will remember that it had a seek event and will compare current segment with new segment it got from a seek event. If there is a difference, it will send Newsegment event downstream and update current segment with a new one. GstAVSynth will also clip all output frames to current segment (prevents it from pushing an old frame downstream if it was caught by a seek event in the middle of producing a frame).

This two-stage seek scheme is necessary because GstAVSynth does not know how to convert incoming seek start/stop values to be passed upstream. And even so, if GetFrame() of a cache gets a request for frame #N and it seeks to that frame, it does not mean that a filter won't ask for frame #N-1 after that (i.e. a filter may not request frames in incrementing sequential order), which will cause yet another seek. Because of that all seeks involving GstAVSynth will be somewhat slower.

It is possible to fix that by calculating last request frame range for each sinkpad after a frame has been produced, and using that range to seek each upstream pipeline (clearing each cache) before a cache has a chance to do so. At best that would complete the seek after one attempt. At works it would only increase the number of seeks by 1.

AviSynth (last edited 2009-07-06 02:11:43 by 188)