Pose estimation applied to an amateur (me) on the top and the no. 1 in the world on the bottom. The sound corresponds to the top clip (unmute to hear).
In pose estimation (PE), a 2D video is given as input, and a skeleton (see video above) is returned as the output. The computer vision community has made incredible technical achievements in PE algorithms over the past few years, transforming a traditionally difficult problem to essentially a mature technology. PE has many exciting applications, including motion tracking, character animation, and posture correction for physiotherapy. However, relatively little attention has been given to the application of PE in racquet sports (specifically, tennis, squash, and badminton) – in particular, to the problems of AI-assisted strategy and coaching.
AI-assisted statistics in sport reporting
There has been a serious drive to incorportate PE techniques in venues other than racquet sports. For example, the NYT has used PE to better track and analyze gymnastics routines1, the MLB has used PE to replay iconic baseball moments from interactive angles2, and numerous apps and papers have attempted to use PE to enable virtual coaching in activities such as weight-lifting and yoga3, where motion is slower and more deliberate. (Not to mention, this is all in the last 3 years!)
In these applications, the emphasis is on better sport reporting, or automated coach of beginners in the sport. But what about using PE to gain a competitive edge in training and competing?
Pose estimation of gymnastics events from the New York Times.
Pose estimation in racquet sports
There are several key differentiators between racket sports and other sports that make ML techniques more readily applicable:
- Free 3D calibration. In typical PE algorithms, only a 2D pose can be produced at interactive rates. On a tennis / badminton court, the court lines are clearly marked, and provide depth information. This allows for very accurate 3D pose estimation techniques4.
- Easily segmented actions. Rallies in racquet sports are clearly delimited by the impact sound of the ball. The frequency and amplitude of the racquet sounds identifies starts / ends of a rally as well as different type of shots (lob vs. smash).
- Videos typically occlusion-free. A tough issue to handle in typical PE applications is the presence of multiple bodies, or bodies occuluded by other objects. In singles play for tennis / badminton, there is only one player on each side of the court, so no occlusion issues occur. Furthermore, badminton and squash are played indoors in well-lit areas, so weather effects are not an issue.
- Small motion variance. In tennis / badminton, top-level athletes typically have very little variance when executing the same shot twice. Not only is this helpful for shot identification applications, it’s can be useful for technical purposes as well. For example, typical 2D to 3D pose estimation techniques involves building a “dictionary” of possible 3D poses. Since there are relatively few “unique motions” in racket sports, these poses can easily be collected.
What can we do with perfect pose estimation?
Given that PE has matured incredibly rapidly as a technology, our main research interest is in how we can apply the existing techniques to help players gain a competitive edge. There are several applications (along with possible research questions), all of which I am very interested in exploring:
Quantitative player comparisons
Many coaches employ the use of video replay to teach their students. Often, a developing player is told to emulate a professional player with similar style, height, and weight. By applying pose estimation, it is possible to accelerate this training process by informing the student exactly where they differ from the professional. In these cases, having a precise measurement of the differences (say, difference in arm angle or contact point) is incredibly helpful, as major differences in power output is often dependent on only a few degrees of arm or shoulder rotation.
Simulation of player movements
In the 2018 World Tour Finals5 between Kento Momota (rank #1) and Shi Yuqi (rank #2), Shi seemed to win the match effortlessly, after losing 3 out of 3 times to Kento earlier that year. So what happened? As it turned out, Shi exploited a weakness in Kento’s coverage of the forehand front corner, and decisively won the match with better strategy. Is it possible to automate this strategy process?
One answer may lie in recent work by Zhang et al.6. In this work, the authors show that it is possible to generate video-realistic sprites of professional players from video footage alone. Although visually appealing clips are generated, the method itself does not constrain the players in a physically realistic way. An interesting question here would be whether its possible to build a physically realistic simulator of a professional player. This can be used for a human-in-the-loop type strategy analysis: users can input sequences of shots to feed to the simulation, and see which sequences are most effective. Alternatively, we can also imagine using this system to generate realistic videos of the player themselves in various game situations. This can be used by the player to explore and improve their own weak areas.
Computer generated tennis game between Federer and Nadal.
Advanced movement statistics
In the Jordan era of basketball, Michael Jordan’s trainer famously counted each of MJ’s steps during his games, and trained each leg proportional to the number of steps taken7. The philosophy here was to train each muscle group for the load that was actually required of it in a game, in order to maximize training efficiency and minimize training time (on top of all the other training that Michael had to do).
With PE technology, we can easily take the same philosophy to another level. With relatively simple pose classifiers, we can build up comprehensive statistics of a player’s movements on court. These statistics can show us average load on certain muscle groups, the shot selection tendencies of a player, as well as potentially dangerous movements that can be corrected preventatively.
Prediction of future player shots
In any fast-paced sport, prediction is every bit as important as reaction. This is true especially for badminton, where the shuttle can travel up to speeds of 426 km/h (265 mi/h). In these cases, human reaction times are typically too slow to respond to a shot dynamically. Instead, one must predict and go to the most likely spot for a shot.
In recent work by Shimizu et al.8, a simple classifier was built to predict shots made by tennis players before the ball is hit. This classifier took in features from the players’ pose and past hits to predict future shots. Using this classifier, it was shown that shots can be predicted with roughly 70% accuracy 0.3s before they are made. Though this may not sound like a lot, this is a incredibly large amount of time, as players typically only have a second of time to return the shot under high stress situations. An interesting research problem is whether this system can be made real-time or not. If the system can be made real-time, it can easily be combined with a haptic device (e.g. a vibrating earring) to give players an extra “sixth sense” during training and competition.
Concluding remarks
As technology gets better and better, athletes have the opportunity to train smarter and smarter. At the highest levels of competition, an athlete must pursue every single advantage possible, both mentally and physically. Our hope is for this work to be a stepping stone towards higher levels of human-possible play.
As Venus Williams once said: "[Sports] is mostly mental. You win or lose the match before you even go out there." It’s the preparation before that matters.
Acknowledgements
A special thanks to Jui-Hsien Wang, Rajiv Rai, and Toby Ng for reading through initial drafts of this proposal.
-
“Estimating 3D Poses of Athletes at Live Sporting Events”. Dan Oved, Amelia Pisapia, Anna Gudnason. New York Times R&D. Source. ↩︎
-
“Ozuna’s ‘selfie’ from a whole new angle”. Mike Petriello. Source. ↩︎
-
“Soccer on Your Tabletop”. Konstantinos Rematas, Ira Kemelmacher-Shlizerman, Brian Curless, Steve Seitz. Source. ↩︎
-
Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players. Source. ↩︎
-
How Michael Jordan’s Trainer Helped Him Become the Greatest of All Time. Source. ↩︎
-
“Prediction of Future Shot Direction using Pose and Position of Tennis Player”. Tomohiro Shimizu, Ryo Hachiuma, Hideo Saito, Takashi Yoshikawa, Chonho Lee. Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports (MMSports ‘19). Source. ↩︎