Azure Stream analytics – window functions

Lets go over what Azure Stream Analytics is and then we will look at specifically Window functions, statistical functions , and scaling functions in Azure Stream analytics

  • Azure stream analytics can essentially
    • intake millions of events per second t variable loads
    • perform real time analytics on continuous streams of data
    • connect with Event hub for stream ingestion , and Azure blob for historical service
    • output to power BI as an output within Azure Stream analytics

Basically Azure Stream analytics can take inputs from Event Hubs, Iot Hubs or Blob storage and process it with SQL based query and then push it to SQL server, Power BI, Data Lake Storage, Cosmos DB, Service Bus , Synapse , function etc

Steps to configure

Set up a stream analytics job – this gives you a few options – Hosting environment – this can be cloud or Edge – You can use edge only if you are deploying it on an on-premise IOT gateway edge device and the other option is Streaming units which is an abstraction of the computation resources available to process the query . You can choose to store all the data directly into a data lake if we select the secure all private data assets in the storage account

there is a section where you can write the SQL query . a subset of t-sql is supported in Azure Stream analytics

lets take a look at the window function in Azure Stream analytics

Window functions can be used in the group by section of the sql query. The simplest of these window functions is the Tumbling window . If you want an average of all the temp recordings over a window of 10 seconds , then you essentially defined a window thats based on a time window . The window ends when the time ends , so essentially there is no overlap , an event can only be in one tumbling window.

if however you want to see moving average , lets say the moving average of price of security over 10 seconds with a hop of 2 seconds , then the window slides 2 seconds , but the new 10 second window is essentially the last 8 seconds and the next 2 seconds . This is the Hopping window . The tumbling window is essentially the hopping window where the hop size is the same size as the window size.

The sliding window is tricky to understand , lets say you have an event stream that has a variable speed, so instead of hopping every 2 seconds like the hopping window , what if the hop is based on when the event happens. take a look at this example

https://www.oreilly.com/library/view/stream-analytics-with/9781788395908/87d7eea1-cf76-42a9-91ed-b68d9364febf.xhtml

all of these windows that we have seen so far has all been fixed time length window.

A session window instead is based on grouping events together if they happen within the timeout window specified. if the timeout exceeds , the window is closed and new window is opened. if we need the window to be grouped based on keys , then events are grouped by key and session window is applied to each group independently.

on a final note , if we want to calculate the moving average every 10 seconds as well as every 30 seconds and every 60 seconds , you can use the Windows() function to apply multiple windows to the same stream. The windows function accepts an ID as the identity of the window definition and then results can be grouped based on this id.