Streams

Generating Serverless Data Streams From An OpenAPI Definition

How are we generating serverless data streams…

As we work to make the Streamdata.io API Gallery more than just a catalog of APIs that you can browse, and turn it into an actionable directory of machine-readable data, content, and algorithmic resources we are investing in the development of a wide variety of OpenAPI-driven connectors. These connectors make each entry in the gallery something that can be easily deployed on a variety of platforms. One platform we are focusing our resources on is Amazon Web Services (AWS), which because of its market share in the cloud, makes it a viable target for helping us turn AWS customers into Streamata.io customers, using the valuable 3rd party resources we’ve already defined.

To help us deploy, manage, and scale the number of AWS connectors we are publishing, the process has to be automated, otherwise we couldn’t deal with the volume of streaming data connectors we are going to need to provide. This is why we are using the OpenAPI specification as the engine for the Streamdata.io API Gallery because it will also act as the machine-readable definition we can use to generate, and evolve each individual streaming API connector we are delivering. A significant portion of our AWS API connectors will be delivered as serverless functions using AWS Lambda. The scalable platform allows deploying individual scripts that go with each individual API path we’ve published to the Streamata.io API Gallery, providing a single unit of computing for each individual unit of resource we’ve made available.

First, we’ve broken down each individual API path into its own OpenAPI definition as part of the publishing process to the gallery. We need each individual API path well defined as individual resources before we can get started, which should contain everything we need to generate a serverless data stream using AWS Lambda. Using the OpenAPI definition we can then auto-generate a Node.js, Go, or Python script, defining the host, base URL, path, and any path, header, or query parameters we will need. We also have the response schema defined, allowing us to handle the initial stream response and each incremental update as part of each script. Turning any existing OpenAPI definition into a real-time stream by proxying each API using Streamdata.io, and publishing the resulting data streams anywhere a customer may want it within their AWS infrastructure.

OpenAPI provides us with the machine-readable definition we will need to profile and discover each individual API resource within the gallery, as well as generate the AWS Lambda script we need to work with the Streamdata.io proxy and deliver each individual data stream to where it is needed within a customer’s existing AWS infrastructure. Most often this means publishing to an AWS S3 data lake, but we also find ourselves regularly publishing to AWS RDS and SimpleDB databases, and other times publishing streams to other external APIs using webhooks. OpenAPI has become the bridge that helps us deliver streams from existing 3rd party JSON APIs, and deploy and scale those streams within our clients existing AWS infrastructure, and deliver the value data and content into whichever AWS systems they desire.

AI in Finance White paper - serverless

**Original source: streamdata.io blog