Artificial intelligence (AI), deep learning, and natural language processing will be the next transformative technologies for streaming. They all will have an impact on streaming through all stages of production, from content creation to consumption. With the proliferation of AI in many different industries, there’s no doubt that it will be heavily used for live streaming on a wider scale in the near future.
Some of the companies and technologies that are making headway in this space include Google Cloud Video Intelligence, Conviva’s Video AI Architecture, Nvidia DLA, and IBM’s Watson technology. All of these technologies currently deploy AI in varying degrees—especially in the cloud—but we’ll soon see AI making inroads into other facets of streaming as well.
AI can help replace the production workforce behind the camera and even perform mundane and time-consuming tasks that involve labor-intensive content/data management. Currently, AI is being used in viewer metrics, network and technical troubleshooting, and ad serving, but there are other potential uses that remain virtually untapped.
Smart Camera Tracking and Video Frame Composition
Although there are currently several motion-tracking camera systems that allow automated tracking of moving subjects in front of the camera, they all require producers to place transmitters or sensors on the subject. AI will be able to track speakers, athletes, or entertainers without needing any type of additional hardware or sensors. Deep learning algorithms will analyze the video and follow people doing different activities, whether on a stage or in other environments, while simultaneously keeping them perfectly framed within the camera. Even now, this technology enables drones to follow athletes sprinting on a field and tracks the targets with unrelenting precision.
In addition, there is a direct correlation between creative visual storytelling and mathematics. The key components of video imaging—frame rates, focal lengths, aperture, and composition—are based on ratios and require at least a basic understanding of the math behind them to use them effectively.
The Golden Ratio (a proportion, prized for millennia by artists, architects, and scientists alike, in which the ratio of two numbers is the same as the ratio of their sum to the larger of the two quantities) can be programmed into deep-learning-based visual perception algorithms. Thus, AI-enabled cameras can be optimized to capture the most aesthetically pleasing video images for the human eye, a task that has traditionally been performed by camera operators. AI will eventually replace the need for a camera operator in most cases. In addition, AI will be programmed to track subjects using the golden ratio and the principals of visual hierarchy as its foundation.
Real-Time Video Switching
Deep learning algorithms are automating the editing and video creation process, and will assist in bringing AI to real-time video switching as well. Intelligent software will select optimum cameras shots or angles based on the content of the stream by using facial, emotional, gesture, clothing, body, color recognition, and other imaging data and cues. The program will determine what is in each frame of the stream and decide if it is a wide, medium, or close-up angle, along with choosing what subject matter or person it includes. The software will analyze the audio, video, and other aspects of the stream and switch a full event or show by recognizing faces, speech, movements, or events based on many other factors.
These auto-mixing features will be included in video switchers in the future to allow for a completely AI-switched production. It will eventually replace the role of a technical director for live events.
Computer-vision-based video switchers can work independently on embedded systems or devices on-premises using existing hardware. Cameras can even leverage a networked cloud server if needed.
Creating Automated Actions and Triggers for Real-Time Graphics, Animations, or CG Characters
A neural network can identify target objects or people with facial recognition, which can trigger production events such as generating a lower-third for a presenter at a conference. Facial recognition could also generate graphical statistics on a particular player on the field, or even allow control of a CG character to be inserted into a stream.
Cognitive technology will be prevalent in everything—sports, eSports, corporate communications, education, and live events. This will integrate data-driven assets and visualizations that change according to specific actions, times, locations, or dynamic data in relation to the stream.
Natural Language Processing (NLP) allows for automated live transcription, translation, interpretation, captioning, and audio description for use in meetings, lectures, or events. This would be useful for multinational corporations that need live captioning for town halls, product launches, or general communications in multiple languages for a worldwide audience.
Video Analytics and Metadata Extraction for Data Management
As companies get much more involved with streaming, the sheer volume of data generated from video is increasing exponentially. The information derived from this data can be leveraged beyond what humans can extract manually.
AI will interpret streaming content and extract metadata by generating descriptive tags, categories, and summaries automatically. This will allow for more intelligent analytics, content insights, and better content management, paving the way for efficient methods of monetizing video through targeted ads.
read more here: