Technical Notes

NB: This document is a little outdated, but still describes the basics quite well. The actual implementation has changed though.

The DMA engine

(Some rough notes)

To prevent any possible deadlocks/races/etc, the whole system is designed as a synchronous state machine, with the clock being the video clock.

The possible inputs to the system are:
1) v4l2 file ops
2) interrupts (generated at the end of each odd/even field)

To simplify the problem, all v4l2 file ops are serialised by use of mutexes (so only one input can occur at any time, annd we don't have to worry about out of order/interrupted operations).

There are 3 possible operations:
1) dummy - simply skip all input data
2) preview - dma directly to primary/overlay surface
3) capture - dma to capture buffers
3a) capture dummy - skip data with same geometry as capture

For all possible operations, two data structures are generated.
1) The geometry registers for the bt848
2) The risc code for the target (one for each buffer)

When the device is opened, the risc engine is started using the dummy capture. When the user starts a preview, a flag (btv->pv) is set. The interrupt handler checks for this flag, and if it is set, sets up the same field of the next frame to use the preview risc engine and geometry. When the flag is cleared, the opposite operation is performed. This means that for the next two interrupts preview is still active. a synchronisation mechanism is provided (btv->sync), which is set to 3 to request a sync. If the interrupt sees a nonzero value here, it will clear 1 bit on an even field and one on an odd field, then send a wakeup signal to the user control. When this process is complete, you can be assured that the preview structures are no longer in use, and can be manipulated. A similar process is used for the capture operations, except a double sync is used at the end to clear the internal pipeline.

In this way, all operations on the active risc code are made by the interrupt handler approx 1 field period before the code will be used. These operations are all done by single 32 bit aligned wites which should be atomic on any 32 bit processor, so even if latency is high, it will still be safe.