Overview for Voice-enabling Your App and Content on Fire TV
With the release of Fire TV Cube, and the ability to link Echo devices to Fire TV, customers can interact with their TVs in a hands-free way. This interaction is referred to as "far-field control." They can ask Alexa to play content, search for content, control playback, and change channels on their Fire TV using voice. Without Fire TV Cube or a paired Echo, customers can press the microphone button on the Alexa Voice Remote to say voice commands to Fire TV. This interaction is referred to as "near-field" control.
To support voice interactions on Fire TV, it's becoming increasingly important that you voice-enable your apps. There are several techniques for voice-enabling your app: Video Skills Kit, Media Session API, and In-App Voice Scrolling and Selection.
Key Terms
Integrating with Alexa introduces some terms that might be unfamiliar. The following glossary defines some of these terms.
Video Skills Kit (VSK)
You can integrate the Video Skills Kit (VSK) for your Fire TV app. With the VSK, customers can use natural language commands to search for your app's content, launch your app, control media playback, and change the channel. Implementing the VSK involves building a Lambda to support the directives from Alexa, integrating the Alexa Client Library, and handling cloud-to-app communication through ADM. Catalog integration is also a requirement to implement the VSK for Fire TV. To get started, see Video Skills Kit for Fire TV Apps Overview.
Integrating the VSK for Fire TV gives customers the following capabilities:
- App launching: When a customer asks to play or search for specific content, Alexa automatically launches the correct Fire TV app. When customers say "Alexa, open <app name>," Alexa opens the app's homepage. The video skill automatically calls the Alexa Video Skill API to launch the app.
- Quick play: Customers can ask Alexa to play a video by saying, "Alexa, play <show name> " or "Alexa, play <show name> on <app name>." Alexa routes the customer to the correct app with that content, and Fire TV begins playback automatically (rather than just going to the detail page).
- Search: Customers can ask Alexa to perform universal searches for content by saying "Alexa, find <show name>." Searches like this, which don't limit the scope to an app, are known as "universal searches," because they look for the content across all catalog-integrated Fire TV apps. Searches that limit their scope to a specific app are known as "local searches." Customers can also perform local searches by saying "Alexa, find <show name> on <app name>" or "Alexa, find <genre> on <app name>."
- Transport Controls: Customers can control playback by voice (for example, "Alexa, fast forward," "Alexa, next," or "Alexa, pause"). Other commands include rewind, resume, stop, and timed skips like "fast forward 5 minutes."
- Channel Change: For apps that offer live TV functionality, customers can switch between channels through utterances such as "Alexa, tune to <app name>."
Integrating the Video Skills Kit (VSK) into your Fire TV app makes it easier for customers to discover and play your content.
Media Session API
If you don't have the bandwidth or resources to implement the VSK, or if your planned implementation is some months into the future, you can voice-enable the media playback controls in your app using the Media Session API. Media Session is an Android API that provides streaming apps with the ability to receive media commands. It's the recommended best practice for handling events from remote controls, Bluetooth, ADB, and the Fire TV companion app.
With Media Session integrated, customers can say commands such as "Play," "Pause," "Rewind," etc., during media playback. These commands work in both near-field and far-field devices.
Media Session does not support the more advanced voice controls described in the Video Skills Kit, where customers can launch apps and search for content through voice. Media Session integration just voice-enables the playback controls.
If you've already implemented Media Session in your app (most developers have), there's little to no extra work to voice-enable Media Session. You just add a special Alexa permission to your app manifest. Full details are available here: Voice-enabling Transport Controls with Media Session API.
In-app Scrolling and Selection
With Fire TV Cube, customers can perform scrolling and selection using common Alexa phrases. The in-app voice scrolling and selection works by mapping D-pad navigation events to voice commands. D-pad refers to the remote control's directional keypad, which scrolls right, left, up, and down. Alexa converts these voice commands into D-pad navigation events and sends them to the app.
In-app scrolling and selection is a feature that Amazon manually activates on the back end for apps, after verifying that the app supports the commands. Amazon is gradually increasing the number of apps with scrolling and selection enabled. For more details, see In-App Voice Scrolling and Selection.
Last updated: May 29, 2026

