SE450:G8
Group Members
Jamie, Abigail, Sreeram, Randeep, Andrew Olsen, Surya, Sashank
Deliverables
- 3-4 slides presentation lasting no more than 4 minutes on Tuesday 27th March. Must email the presentation(labelled with group number) to John Hosking prior to 12 noon on the day.
- One page summary paper in IEEE format (INDIVIDUAL WORK) due on 5pm Thursday 5th April stating:
- The paper we reveiwed and the DSVL it describes
- Appoaches you used to evaluate the DSVL described in the paper.
- Summary of your group's findings about the DSVL.
NOTE: IEEE format is 2-column format of single-spaced text in 10 point Times Roman with 12 point interline space. All printed material, including illustrations, must be kept within a print area of 6-7/8 inches (17.5 cm) wide by 8-7/8 inches (22.5 cm) high
Overview
The paper MosaicQuery presents a visual language designed specifically for video domain to specify a series of key points which can be used to query a database of videos. The videos are segmented into smaller frame interval to represent each scene and are processed using the mosaic algorithm to extract feature descriptors contained within each shots and saved into a database.
The language specified 3 notations to represent each of the main component in the video domain. They are the camera frame, object and background. The notations can be used to specify a query regarding specific movement of camera frame, movement of objects within the camera frames and overall movement (background/mosaic). Each of the notation has its own manipulation tools, for example camera frame can be zoomed in/out, tilted, panned; object can be made smaller or larger with respect of the camera frame. The query is then processed through the use of XML based back-end data structure against the shots that have been collected into the database. The results from the query can be used to further refine the search for a particular type of shot or a new query can be formed from scratch.
Strengths
- fast through the use of offline feature extraction.
- easy to learn with only 3 notations to get used to.
- simple.
- query can be as simple as possible(background notation) to complex multi frames and multi objects
- query does not have to be complete to be able to get feedback through submitting the query and use one of the result of the query, start from scratch or refine original query.
- clear flow of events.
- small fragments of shots means individual query should be able to fit within a single screen space.
- 2 representation of information, visual and XML code - can modify XML code to obtain more accurate search query.
Weaknesses
- expensive feature extraction process.
- program still in prototype(language is finalised).
- usefullness of query depends greatly on the mosaic algorithm used to segment the videos into individual shots.
Cognitive Dimensions
Abstraction gradient - Abigail
Definition: What are the minimum and maximum levels of abstraction? Can fragments be encapsulated? This is an abstraction-tolerant system. It uses primitive components like camera frame, object reference and background and operators like concatenation/composition. The user can use predefined examples to specify queries, thus minimising abstraction. At the same time, the user can create queries from scratch. It has a low initial level of abstractions so that users can start using the program direectly without much training.
Closeness of mapping - Abigail
Definition: What ‘programming games’ need to be learned?
The system maps closely to the real world. The camera frame looks much like a real visual frame in a camera, the object reference is placed within the camera frame and moves with respect to the frame, like in a real camera. The arrows show movement with respect to time, the position of the frame specifies which position of the real world scene is framed. The arrow shows the orientation of the camera,the size of the frame is inversely proportional to the zoom factor which closely resembles how our eyes percieve objects at a distance.
Consistency
Definition: When some of the language has been learnt, how much of the rest can be inferred?
I think the language is pretty consistent from as far as is described in the paper, which is not much...
Diffuseness - Abigail
Definition: How many symbols or graphic entities are required to express a meaning?
This implies the verbosity of a language. This language is visually based and quite terse. Only primitive operators are used and of the 3 basic components, we only need the background (mosaic) to creae a query. We don't need to learn too many new words to start using the language.
Error-proneness - Abigail
Definition: Does the design of the notation induce ‘careless mistakes’?
Not as far as I can see. Visual clues are provided to support user in evaluating the current configuration of each camera frame. There is also good support to refine searches, using both the original query and retrieved queries. There are contextual pop up menus for each frame, which should prohibit mistakes from performing the wrong action on the wrong frame.
Hard mental operations - Jamie
Definition:
Are there places where the user needs to resort to fingers or penciled annotation to keep track of what’s happening?
- If the camera is making some complex movement (pan + zoom, pan + tilt, etc) this may be hard for the user to keep track of. Some people may find it difficult to visualise how this affects the camera frame, and what position the next frame should be in.
Hidden dependencies - Jamie
Definition:
Is every dependency overtly indicated in both directions? Is the indication perceptual or only symbolic?
The dependencies are all one way. Each frame depends on the frame previous to it, and the order of the frames is indicated by an arrow from the first to the next frame.
Premature commitment - Jamie
Definition:
Do programmers have to make decisions before they have the information they need?
Maybe. Most people will have a rough idea of what they are searching for before they start. However it is the relationship between earlier and later frames that makes up the query, so there may be some premature commitment.
Progressive evaluation - Sashank
Definition: Can a partially-complete program be executed to obtain feedback on “How am I doing�??
The Mosaic-Based Video Query Language can be used for progressive evaluation as it allows for half a completed query to be submitted. As a reply you can then see a visual representation of the query. Thereafter you can progressively add to your query for submission, thus getting a visual feedback with every submission.
Role-expressiveness - Sashank
Definition: Can the reader see how each component of a program relates to the whole?
Since there are only three primitive components.It makes it extremely clear to see the whole from its parts.Each of the three components are positionally relative to each other, hence there is always a recognisable flow in a sentence or query.
Secondary notation - Sashank
Definition: Can programmers use layout, color, or other cues to convey extra meaning, above and beyond the ‘official’ semantics of the language?
To ensure simplicity, the Mosaic Query Language sacrifices some completeness. Thus there is no secondary notation available, just the three primary components is the basis of the entire language.
Viscosity - Jamie
Definition:
How much effort is required to perform a single change?
It Depends on the program used to construct these queries. For instance, changing the radius of a tilt and pan motion would be difficult unless there was some tool support for setting those kinds of things up.
Visibility - Sashank
Definition: Is every part of the code simultaneously visible (assuming a large enough display), or is it at least possible to compare any two parts side-by-side at will? If the code is dispersed, is it at least possible to know in what order to read it?
Visibility is completely implementation specific, thus when talking about the Mosaic-Based Video Query Language there is no restriction to the visibility of complete queries or sentences. There is an implementation which we could refer to, found in the allocated paper. In this implementation, each shot is divided into smaller intervals, this encourages users to minimize the number of frames they use within an interval.This characteristic in return allows for the successful deployment of a complete query or sentence within a view.