Although currently the composition of music sports videos requires a tech-savvy professional with an artist’s touch, the future may enable any amateur to create their own personalized video using software in the works by a group of scientists. With its fully or semi-automatic modes, the system would turn a tedious, time-consuming, and skilled task into a hobbyist’s evening activity.
Scientists Jinjun Wang et al., representing different institutions in China and Singapore, have presented a novel approach to personalized music sports video composition. Their system can automatically detect events, players, or teams, and then smoothly integrate the scenes with music, all while maintaining the artistic quality of professional videos. They predict that home users will easily be able to customize sports videos for themselves, greatly increasing the production of these videos, as well as expanding the audience.
“We have introduced a real-time sports event detection and broadcasting system and tested it using FIFA World Cup 2006 games,” Wang told PhysOrg.com. “In addition, with our second-step system, we are able to utilize the detected events to provide value-added services. With the ability to generate music sports video, it is possible to have prelude/postlude, half-break commentary, and summary TV programs using the latest game scenes.”
In their paper published in IEEE Transactions on Multimedia, the scientists explain how they optimized the features for the intelligent automatic system. For example, a user can request certain events (such as goals in soccer videos or three-pointers in basketball) to include in a video.
To satisfy the request, the system uses “semantic content extraction,” meaning that it searches the text for key words associated with the events. Text not only includes closed captioning, but also web casts from sites such as the BBC and Yahoo, where text often involves very detailed information. Rather than simple word matching, the software (dtSearch) uses techniques to filter unwanted scenes (e.g. ignores “goal kick” when searching for “goal”) and other advanced options.
Then, to align the sports scenes with music, the system can automatically choose a song whose phrasing, beats, and lyrical structure matches with the dynamics of the scene shots.
“The more different types of events the user need for ‘editing,’ the more processing time is required by our system,” Wang explained. “In an extreme case where all the types of events are required for detection, the system needs around 90 minutes to process a 90-minute soccer game—near real-time. The second step, the ‘editing" step, is quite fast—usually less than one minute for typical pop music.”
If a user has editing preferences (for example, they want certain shots to align with certain parts of a song), the system can also work in semi-automatic mode. A user’s stipulations can become fairly complex, as well, overriding the system’s inherent rules.
“A computer program won't create new things unless it's been taught to,” Wang explained. “In fact, computers are more suitable for tedious or computational tasks, such as selecting precise in/out frames and alignment work. The human must tell the computer what tasks to do.
“The difference with our system is that it is able to use some high-level, abstract rules,” he continued. “For example, a user may say ‘I want the music "Hero" and Beckham's shooting scene from EPL [English Premier League] 2004 to compose a music video,’ and our system can do the rest of the work, finding Beckham's shooting events and aligning them to the music to achieve smooth shot transitions and understandable video content. The contribution of our system is that, since it is able to execute certain high-level rules for video editing, people can personalize this rule to produce their customized music video.”
Even when the system performed fully automatically, the artistic results were impressive to viewers, who consistently rated the system higher in all scoring categories compared with other similar systems. In addition to their sensible structure, the videos also demonstrated a high degree of artistic quality, which may be somewhat surprising for a completely computerized system.
“We think that, given limited material, a good selection must follow some rules,” Wang said. “Since we are not artists, we have to do some statistical work to discover these rules. If a music video can satisfy most of the predefined rules, its artistic attribute won't be bad. But of course, it is usually necessary to conduct subjective evaluation to see how much the predefined rules are suitable.”
Wang said that currently the system targets the broadcasting industry, but hopefully general users will benefit from it in the future.
“We are definitely doing investigations to support more application areas based on the technique,” he said. “A software program like ‘muvee’ [a currently available program] for the general public is surely one of the best options.”
Sample videos created with this software are temporarily available at: www.ntu.edu.sg/home5/Y020002/Research/MSV_demo/Introduction.htm.
Citation: Wang, Jinjun, Chng, Engsiong, Xu, Changsheng, Lu, Hanqinq, and Tian, Qi. “Generation of Personalized Music Sports Video Using Multimodal Cues.” IEEE Transactions on Multimedia, Vol. 9, No. 3, April 2007.
Copyright 2007 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.
Explore further: Microsoft Research project can interpret, caption photos