Alexa: Overview and Sample Use Case

Alexa (named after the ancient library of Alexandria) is Amazon’s cloud based voice-control system available on millions of devices from Amazon and any third-party device manufacturers. With Alexa, you can add your own customized skills to any of Amzon Echo devices and from determining weather to storing contacts, planning trips and setting reminders to playing music and even quiz games, Alexa can do anything. The Alexa Voice Service Platform handles all the Text to speech conversions which makes all the interactions possible.

Why Alexa?

One of the foremost reason of using Alexa is “Responsiveness”. To use Alexa, you need not use any button for activation. You just say “Alexa” or “Echo” or “Computer” or “Amazon”, which are the triggers for Alexa, followed by the activity you want to perform and your work would be done. Only you have to be careful about the alexa set-up and using the correct commands. Alexa Echo speaker is currently in its 2nd generation and has every possible feature available from smart home systems to digital-assistant abilities.

What devices Alexa works on?

Amazon Alexa perhaps delivers the best experience on the Amazon Echo. However, Amazon also has brought the smart assistant to other home hubs, such as the Echo Dot and the Tap. Alexa supports Amazon’s Fire TV set-top box and Fire HD 8 tablet too.
Amazon also has allowed some third parties to support Alexa. For example, the LG SmartThinQ hub and the Pebble Core wearable come with Alexa support.

Architectural Workflow

Consider a scenario wherein Alexa determines The NSE Stock Prices.
Alexa
This is how the flow communication will look like:

  • The user commands an Echo device, using any of the available trigger words so that Echo knows that it is being addressed, and identifies the Skill that the user wishes to interact with. For example, for my skill called StockPrice, I ask “Alexa, give the stock price for TCS”. In this case, “Alexa” is the trigger word to make the Echo listen, and “StockPrice” identifies the skill that the user wants to direct their enquiry to.
  • Echo sends this request to the Amazon Voice Service Platform, which handles speech recognition, turning the user’s speech into tokens identifying the “skill”. Then, it breaks the skill down into a structured representation and sends it to the Custom Alexa skill. In our example, the “skill” would be that the user wants to know “stockprice”, and the context for that would be that they are interested specifically in stock price of a specified company.
  • Intents, and possible parameter values for the skills are held by the Alexa Service Platform as configuration items for the Skill.The intent and its slots,slot types and slot values for the user’s request are then sent in JSON format document to the server side Skill implementation for processing. The Alexa Service Platform knows where to send these requests as it maintains a set of Lambda ARNs for each Custom Skill.
  • The Custom Skill receives the JSON via a HTTPs request or is implemented as an AWS Lambda function, via invocation of the Lambda function at the configured ARN. The AWS Lambda and the custom skill are integrated using the “Alexa Skill Kit” trigger which is added by enabling a skill using a “Skill ID”.
  • The Custom Skill code parses the JSON, reading the intent and its contents, and then performs suitable processing to retrieve data appropriate to those, for example, API calls or retrieving data from database. In our example, the code would need to call the Alpha Vantage API to get stock prices of a company.
  • A response in JSON format is then sent back to the Alexa Voice Service Platform containing both the text that Alexa should speak to the user and if required also the image diplay of the response if we are using a device like Echo Show.
  • The Alexa Service Platform receives the response, and uses text to speech conversion logic to speak the response to the user.

Building a Basic Skill

  • To build a basic skill in Alexa there are 2 main pre-requisites:
    • To have an Amazon Developer Account so as to build your customised skill.
    • To also have Amazon Management Console Account so as to create your Lambda.
  • To build a customised skill you should know the following terms:
    • Skill: A Skill is nothing but your application which you intent to publish on your Alexa device.
    • Invocation: The name of the skill which you need to mention so as to start interacting with your skill.
    • Intents, slots and utterances: Intent is the action that fulfils user’s request. Intents can have slots that represent the variable information within an intent. A sample utterance the way you invoke your intent.
    • Slot types: Every slot has a type that handles the user’s spoken data. For eg. AMAZON.NUMBER converts the number “five” to “5”.
    • JSON Editor: Your whole interaction model will be represented in JSON format. You can create as well as edit your JSON data. You can also upload a JSON file of your own.
    • Interfaces: Interfaces provide additional directives and request types for specific additional features in your skill. For ex., You can use an Audio Controller for streaming music.
    • Endpoints: Specify the endpoint for your skill. Alexa sends requests to this endpoint when users invoke your skill. If you are hosting your service as an AWS Lambda function, select the AWS Lambda ARN option and enter the ARN for your function in the Default Region endpoint text box.

Alexa Skill Types

  • Custom Skill: A skill that can handle just about any type of request. (It is selected by default) For example: Look up information from a web service, Integrate with a web service to order something (order a car from Uber, order a pizza from Domino’s Pizza), Interactive games, Just about anything else you can think of.
  • Flash Briefing Skill: The Flash Briefing Skill API defines the words users say to invoke the flash briefing or news request (utterances) and the format of the content so that Alexa can provide it to the customer.
  • Smart Home Skill: A skill that lets a user control and query cloud- enabled smart home devices such as lights, door locks, cameras, thermostats and smart TVs. For example: Turn off the lights, Change the brightness of dim lights, change the volume, etc.
  • Video Skill: The Video Skill API defines the requests the skill can handle (device directives) and the words users say to invoke those requests (utterances). For example: Play a movie, change the channel, etc.
  • You can build your Lambda function in any of these languages: Python, Java, Go and .NET

Conclusion

“What’s good for developers is ultimately good for consumers,” said Rob Pulciani, Amazon’s general manager of Alexa skills. Alexa, thus has proved to be a game changer in IT in recent times. Right now, Alexa has a leg up on Google Assistant and Apple’s Siri. Thus, Alexa has maintained itself as a healthy competitor in the age of machines today.

Leave a Reply

Your email address will not be published. Required fields are marked *