Application Integration

Scheduling Serverless Data Streams Using AWS Lambda And Cloudwatch Events

It can be easy to think of real time data streams as operating 24/7. When in reality, you need to be much more precise with your data streams, and be able to turn them on when things are happening, such as when markets open on a Monday morning, and turn them off when markets close Friday afternoon. After streaming data from existing APIs for a while, you can begin to predict when data will flow, something that will vary depending on the type of data API you are working with. To help our consumers be much more precise in how they deliver their data streams, we’ve begun developing a line of AWS Lambda functions that allow for the delivery and scaling of real-time data streams in a serverless environment.

AWS provides you with some pretty robust ways of scheduling your serverless functions by scheduling expressions using rate or cron, providing you with a robust set of approaches to defining with your data streams turn on, and how often they run. Allowing you to turn the constraint that each Lambda function can only run for up to five minutes, in a feature that you can orchestrate by just turning on again after five minutes and running indefinitely, or you can schedule and turn on exactly when you know data streams will be at their fullest, and business is being done via APIs. Providing unlimited possibilities when it comes to orchestrating and scaling your data streams on exactly the schedule that you define.

Once you begin to look at real-time data streams as more than an always-on feature and begin looking at it as an event-driven architectural component that can be turned on and off on demand, it becomes a more powerful tool. Depending on your business, and the industry you operate in, data will flow at different times, and combined with solutions like AWS Lambda and Cloudwatch Events allows you to better control the flow of your information, turning it on and off when you need it most. If nobody is at work, why do you need to be streaming data to dashboards? Maybe training machine learning models at night, taking the data from the previous data and developing new models based upon the previous day is the most efficient use of your data streams. Once you begin to combine real-time data streams with event-driven approaches, your view of the landscape will change dramatically.

We have just published our first AWS Lambda streaming API prototype to Github. It uses our stock market API and provides a serverless deployment package for our Node.js SDK. You can download the AWS Lambda package directly on Github, and we will have published into the AWS Serverless Application Repository in a couple of weeks. We will work on more precise serverless data streams built upon existing APIs, going well beyond just market data, and demonstrating what is possible when you turn existing APIs into real-time streams. Allowing anyone to schedule data streams from a common 3rd party or internal data sources, and drive real-time dashboards, fill data lakes, and training machine learning models based upon data as it becomes available. Learning to be much more precise with our data streams, turning them on when they need to be on, and turning them off when they aren’t needed–providing a much more on-demand, pay as you go approach to delivering data in real time.

AI in Finance White paper - data streams
**Original source: blog