Building an IoT and Azure demo - Part 9: Azure Stream Analytics

(Jump to part 1)

Let's say that we have a few thousands devices sending data to our Azure Event Hub, this is likely to generate tons of data.  This is why Event Hub's retention period is by default just one day.  Yikes!  How to I produce that monthly report for my manager?  Well, you need to read the Event Hub data in real time and store it into a data store.  One way to do this is by using another Azure service: Stream Analytics.  If you heard of SQL Server's StreamInSight before, this is it as a service in the cloud.

Let's create a new Stream Analytics.  In the azure portal, you select New then Stream Analytics and the only option available is Quick Create.

I then need to add an input.  Click on the Inputs tab and on the Add Input button at the bottom of the page.

 An event hub is of type data stream.

In step 2, select Event Hub.

Then I need to specify an alias (name) and my event hub name.  Remember the event hub's receive policy I created for my console app?  Let's reuse it here.

My Stream Analytics is now connected to my Event Hub.  I now need to create a query that will aggregate the data.  I'll then save the query results so instead of saving an enormous amount of data, I'll only save the aggregate that I can use for that monthly report.

Let's click on the Query tab.  I can type the query directly in the portal page.  What I want is sum the button clicks for each device in each time period of one minute.

SELECT id,  System.TimeStamp AS OutTime,
    SUM(CAST(a AS BIGINT)) AS ButtonA,
    SUM(CAST(b AS BIGINT)) AS ButtonB
INTO
    someoutput
FROM IoTDemo1
GROUP BY id, TumblingWindow(minute,1)

Let's take a look at the query.  The SELECT section is standard T-SQL code that sums the buttons clicks.  The INTO section will sends the query results to an output that I haven't created yet.  In the FROM section, I need to specify the input I created earlier.  The GROUP BY section will group the results by device id and by a time window of one minute. 

Notice the TumblingWindow value?  That's not standard SQL.  Let's step back for a moment and try to understand how Stream Analytics works.  A standard database will query data stored in the database while Stream Analytics will query the live data stream over a period of time.  The analogy I like to use is that a database let you easily count the red cars that are parked in a parking lot while Stream Analytics let you easily count the numbers of red cars rolling on a highway right in front of you for each 5 minutes period.  That's not easy to do with a relational database and large amounts of data. 

 

So the query will sum the button clicks for each minute and this is what I want to save.  I now need to create an output to send the data to a data store by clicking on the Outputs tab, and the Add Output button at the bottom of the screen.

You can use a variety of data stores.  For this demo, I'll use Table storage.

I then need to create a new storage account of use an existing one.  Now what to use as the partition and row keys? For my demo, I'll use the device ID as the partition key and the OutTime value I created in the SELECT portion of the query as the row key.

There's one setting not showing in the above screenshot and that's the batch size.  The default is 100 but for the demo, I'll set that to 1 or 2 so people can see the results a little bit faster. 

One last thing.  I need to update the query so that it sends the results to that output.  Simple, just use the name of the output in the INTO section. 

SELECT id,  System.TimeStamp AS OutTime,
    SUM(CAST(a AS BIGINT)) AS ButtonA,
    SUM(CAST(b AS BIGINT)) AS ButtonB
INTO
    iotdemotablestorage
FROM IoTDemo1
GROUP BY id, TumblingWindow(minute,1)

Input: check!  Query: check!  Output: check!  It's now time to click on the Start button at the bottom of the page and tap on these devices buttons like crazy!

Let's now open the table using the Visual Studio Azure tools.  Woohoo!  My table has been created and we do have rows with aggregate data!

 

Let's now try this using 3 devices: the Raspberry Pi, a Samsung Galaxy Tab 3 7" tablet and a Moto G phone.

 

Awesome!  The next step is to create a second output and send the results to Power BI and show the results in some nice graph.  Stay tuned!

(Jump to part 10)

blog comments powered by Disqus

Page List

Month List